Alan Hargreaves' Weblog
The ramblings of an Australian SaND TSC* Principal Field Technologist
* Solaris and Network Domain Technology Support Centre - The group I work forTags
(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
Friday Aug 14, 2009
Why I hate macros that make pointer dereferences look like structure elements
I have a colleague who generated an IDR patch for tcp in Solaris 10 for me to give relief to a customer for a bug while the formal fix was in progress.
As a part of the fix we had this code fragment
18984 /*
18985 * If the SACK option is set, delete the entire list of
18986 * notsack'ed blocks.
18987 */
18988 if (tcp->tcp_sack_info != NULL) {
18989 if (tcp->tcp_notsack_list != NULL)
18990 TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);
18991 }
replaced with this code fragment (the fix actually has a lot more to it than this, but this is what was relevent.)
18936 /* 18937 * If the SACK option is set, delete the entire list of 18938 * notsack'ed blocks. 18939 */ 18940 18941 if (tcp->tcp_notsack_list != NULL) 18942 TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);
Now, the assembly around here reads
ip:tcp_process_shrunk_swnd+0x38: ldx [%i0 + 0xf8], %g1 ip:tcp_process_shrunk_swnd+0x3c: add %g3, %i1, %g2 ip:tcp_process_shrunk_swnd+0x40: stw %g2, [%i0 + 0x80] ip:tcp_process_shrunk_swnd+0x44: ldx [%g1 + 0x48], %i5
and the register dump
pc: 0x7bed2918 ip:tcp_process_shrunk_swnd+0x44: ldx [%g1 + 0x48], %i5
npc: 0x7bed291c ip:tcp_process_shrunk_swnd+0x48: subcc %i5, 0x0, %g0 ( cmp %i5, 0x0 )
global: %g1 0
%g2 0x761c %g3 0x68ec
%g4 0x600216f6e6c %g5 0
%g6 0x1c %g7 0x2a101f89ca0
out: %o0 0x600210d8640 %o1 0x1e0
%o2 0x5a8 %o3 0x3c8
%o4 0x600216f6e6c %o5 0x600216f68c4
%sp 0x2a101f88c61 %o7 0x7bed2900
loc: %l0 0xc7341c85 %l1 0x2000
%l2 0x60010972000 %l3 0x600210d8640
%l4 0x1000 %l5 0x1000
%l6 0x1000 %l7 0x5
in: %i0 0x600210d8640 %i1 0xd30
%i2 0 %i3 0xc73439b5
%i4 0 %i5 0x4
%fp 0x2a101f88d11 %i7 0x7becbed4
The last instruction is where we paniced (yes the customer paniced [twice] as a result of this). As we can see from the register dump, %g1 is NULL, so we definitely have a NULL pointer dereference going on.
So where did this come from? It looks like a dereference of 0xf8 from %i0. %i0 is a (tcp_t *) making %g1 a (tcp_sack_info *), namely arg0->tcp_sack_info if we look at the structure; but hang on, the code says tcp->tcp_notsack_list, not tcp->tcp_sack_info. Indeed that element name does not exist within a tcp_t.
A light dawns when we see that:
299 #define tcp_notsack_list tcp_sack_info->tcp_notsack_list
So in reality line 18941 is doing:
18941 if (tcp->tcp_sack_info->tcp_notsack_list != NULL)
Without checking whether or not tcp->tcp_sack_info is non-NULL. The correct line should perhaps read
18941 if (tcp->tcp_sack_info != NULL && tcp->tcp_notsack_list != NULL)
Now this would probably not have made it as far as in IDR patch delivered to a customer, if we didn't have that macro definition because alarm bells would have rung that we were doing another dereference!
Posted at 10:40AM Aug 14, 2009 by Alan Hargreaves in Solaris |

