Friday Aug 10, 2007

Ten weeks go by pretty quickly. I wish I had more time: to finish projects I'm already working on, to start new ones, to keep on having fun here, and to post an appropriate and compelling retrospective. Alas, it's not happening, as I've been told my blog gets shut down as soon as I walk out today, so: thanks everyone.

My email is dan.kuebrich at gmail.com.

Thursday Aug 09, 2007

If you ran the following code, what would you expect it to do?
	int sh = 33;
	printf("5 >> 33 = %d\n", 5 >> sh);
Reduced to looking at the last byte of an integer 5, we see that 00000101 >> 33 = 00000000. Right? Actually, it turns out, the answer is 2 (00000010). This is a consequence of the shift operation implementation in hardware: the operand to the shift command (on both Intel and SPARC assembly) is modulo the size of the data being operated on. So if you shift a 32-bit int by 32, you'll get no change, etc. This differs from the expected behavior in c. After all, (int)(5 / pow(2,32)) = 0, not 5. However, it turns out this would be pretty expensive for the compiler to correct for: virtually every shift would also be augmented by a conditional. And thus, for what I can only assume to be that reason, the C99 standard actually states that this behavior is undefined:

"The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined."

Tuesday Jul 24, 2007

These blog posts are always somewhat delayed (as though I live in some sort of time-warped zone), and I'm currently mostly helping with a prototype of a new packaging system being put together by Stephen, Danek, and Bart. However, also in the pipeline is the .zfs/props extension I wrote about below. It has brought me great deal of VFS-implementation flashbacks, and, more importantly, an enhanced appreciation of MDB as a debugging tool. I remember not too long ago bemoaning its assembly-level nature to my carpool-mate Colin; an ironic foreshadowing of the shift in opinion which would precipitate over the coming weeks.

The story begins dereferencing a NULL pointer. Yet unlike other NULL pointers, this was a special NULL pointer--one with an address much greater than that of any NULL pointer which came before it. While the panic stated it was NULL, it also printed out its value, which happened to be larger than zero (by at least a factor of infinity), though obviously not in valid kernel space. A little investigation brought me to this segment of trap.c:
uts/i86pc/os/trap.c:die():192
192	if (type == T_PGFLT && addr < (caddr_t)KERNELBASE) {
193		panic("BAD TRAP: type=%x (#%s %s) rp=%p addr=%p "
194		    "occurred in module \"%s\" due to %s",
195		    type, trap_mnemonic, trap_name, (void *)rp, (void *)addr,
196		    mod_containing_pc((caddr_t)rp->r_pc),
197		    addr < (caddr_t)PAGESIZE ?
198		    "a NULL pointer dereference" :
199		    "an illegal access to a user address");
200	} else
201		panic("BAD TRAP: type=%x (#%s %s) rp=%p addr=%p",
202		    type, trap_mnemonic, trap_name, (void *)rp, (void *)addr);
Pointers dereferenced in the first page are treated as NULL. OK, this makes sense: they could be indexes of a NULL array, or fields in a NULL struct, etc. However, a comparison of the pagesize to the address in question revealed that it was in fact not in the first page. The code is pretty clear, so what could be going on? There was only one option: to walk through the code in MDB and see what was happening.

I'd really like to draw out this investigation with suspense and mystery, but it's really hard to stuff that all in between firing up MDB and looking at the disassembled code (estimated time elapsed: < 5 minutes). For the gory details, see CR 6578504. SPOILERS: It turns out that the SunStudio 11 compiler, switched to between builds 23 and 24 of ONNV, cleverly "optimizes" out the ternary operator on lines 197-199, leaving only the string "a NULL pointer dereference". This only occurs on x64 compilation, and means that you will never get a report of illegally accessing a user address in kernel mode. Cool.

This blog copyright 2007 by dank