Infrequent Update Number Six
These blog posts are always somewhat delayed (as though I live in some sort of time-warped zone), and I'm currently mostly helping with a prototype of a new packaging system being put together by Stephen, Danek, and Bart. However, also in the pipeline is the .zfs/props extension I wrote about below. It has brought me great deal of VFS-implementation flashbacks, and, more importantly, an enhanced appreciation of MDB as a debugging tool. I remember not too long ago bemoaning its assembly-level nature to my carpool-mate Colin; an ironic foreshadowing of the shift in opinion which would precipitate over the coming weeks.
The story begins dereferencing a NULL pointer. Yet unlike other NULL pointers, this was a special NULL pointer--one with an address much greater than that of any NULL pointer which came before it. While the panic stated it was NULL, it also printed out its value, which happened to be larger than zero (by at least a factor of infinity), though obviously not in valid kernel space. A little investigation brought me to this segment of trap.c:
I'd really like to draw out this investigation with suspense and mystery, but it's really hard to stuff that all in between firing up MDB and looking at the disassembled code (estimated time elapsed: < 5 minutes). For the gory details, see CR 6578504. SPOILERS: It turns out that the SunStudio 11 compiler, switched to between builds 23 and 24 of ONNV, cleverly "optimizes" out the ternary operator on lines 197-199, leaving only the string "a NULL pointer dereference". This only occurs on x64 compilation, and means that you will never get a report of illegally accessing a user address in kernel mode. Cool.
The story begins dereferencing a NULL pointer. Yet unlike other NULL pointers, this was a special NULL pointer--one with an address much greater than that of any NULL pointer which came before it. While the panic stated it was NULL, it also printed out its value, which happened to be larger than zero (by at least a factor of infinity), though obviously not in valid kernel space. A little investigation brought me to this segment of trap.c:
uts/i86pc/os/trap.c:die():192
192 if (type == T_PGFLT && addr < (caddr_t)KERNELBASE) {
193 panic("BAD TRAP: type=%x (#%s %s) rp=%p addr=%p "
194 "occurred in module \"%s\" due to %s",
195 type, trap_mnemonic, trap_name, (void *)rp, (void *)addr,
196 mod_containing_pc((caddr_t)rp->r_pc),
197 addr < (caddr_t)PAGESIZE ?
198 "a NULL pointer dereference" :
199 "an illegal access to a user address");
200 } else
201 panic("BAD TRAP: type=%x (#%s %s) rp=%p addr=%p",
202 type, trap_mnemonic, trap_name, (void *)rp, (void *)addr);
Pointers dereferenced in the first page are treated as NULL. OK, this makes sense: they could be indexes of a NULL array, or fields in a NULL struct, etc. However, a comparison of the pagesize to the address in question revealed that it was in fact not in the first page. The code is pretty clear, so what could be going on? There was only one option: to walk through the code in MDB and see what was happening.I'd really like to draw out this investigation with suspense and mystery, but it's really hard to stuff that all in between firing up MDB and looking at the disassembled code (estimated time elapsed: < 5 minutes). For the gory details, see CR 6578504. SPOILERS: It turns out that the SunStudio 11 compiler, switched to between builds 23 and 24 of ONNV, cleverly "optimizes" out the ternary operator on lines 197-199, leaving only the string "a NULL pointer dereference". This only occurs on x64 compilation, and means that you will never get a report of illegally accessing a user address in kernel mode. Cool.