Alan Hargreaves' Weblog

The ramblings of an Australian SaND TSC* Principal Field Technologist

* Solaris and Network Domain Technology Support Centre - The group I work for

Tags

(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
pageicon Friday Aug 14, 2009

Why I hate macros that make pointer dereferences look like structure elements

I have a colleague who generated an IDR patch for tcp in Solaris 10 for me to give relief to a customer for a bug while the formal fix was in progress.

As a part of the fix we had this code fragment

 18984          /*
 18985           * If the SACK option is set, delete the entire list of
 18986           * notsack'ed blocks.
 18987           */
 18988          if (tcp->tcp_sack_info != NULL) {
 18989                  if (tcp->tcp_notsack_list != NULL)
 18990                          TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);
 18991          }

replaced with this code fragment (the fix actually has a lot more to it than this, but this is what was relevent.)

 18936          /*
 18937           * If the SACK option is set, delete the entire list of
 18938           * notsack'ed blocks.
 18939           */
 18940
 18941          if (tcp->tcp_notsack_list != NULL)
 18942                  TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);

Now, the assembly around here reads

ip:tcp_process_shrunk_swnd+0x38:       ldx      [%i0 + 0xf8], %g1
ip:tcp_process_shrunk_swnd+0x3c:       add      %g3, %i1, %g2
ip:tcp_process_shrunk_swnd+0x40:       stw      %g2, [%i0 + 0x80]
ip:tcp_process_shrunk_swnd+0x44:       ldx      [%g1 + 0x48], %i5

and the register dump

pc:  0x7bed2918 ip:tcp_process_shrunk_swnd+0x44:   ldx  [%g1 + 0x48], %i5
npc: 0x7bed291c ip:tcp_process_shrunk_swnd+0x48:   subcc  %i5, 0x0, %g0   ( cmp   %i5, 0x0 )
  global:                       %g1                  0
        %g2             0x761c  %g3             0x68ec
        %g4      0x600216f6e6c  %g5                  0
        %g6               0x1c  %g7      0x2a101f89ca0
  out:  %o0      0x600210d8640  %o1              0x1e0
        %o2              0x5a8  %o3              0x3c8
        %o4      0x600216f6e6c  %o5      0x600216f68c4
        %sp      0x2a101f88c61  %o7         0x7bed2900
  loc:  %l0         0xc7341c85  %l1             0x2000  
        %l2      0x60010972000  %l3      0x600210d8640  
        %l4             0x1000  %l5             0x1000  
        %l6             0x1000  %l7                0x5  
  in:   %i0      0x600210d8640  %i1              0xd30  
        %i2                  0  %i3         0xc73439b5  
        %i4                  0  %i5                0x4  
        %fp      0x2a101f88d11  %i7         0x7becbed4  

The last instruction is where we paniced (yes the customer paniced [twice] as a result of this). As we can see from the register dump, %g1 is NULL, so we definitely have a NULL pointer dereference going on.

So where did this come from? It looks like a dereference of 0xf8 from %i0. %i0 is a (tcp_t *) making %g1 a (tcp_sack_info *), namely arg0->tcp_sack_info if we look at the structure; but hang on, the code says tcp->tcp_notsack_list, not tcp->tcp_sack_info. Indeed that element name does not exist within a tcp_t.

A light dawns when we see that:

   299  #define tcp_notsack_list	        tcp_sack_info->tcp_notsack_list

So in reality line 18941 is doing:

 18941          if (tcp->tcp_sack_info->tcp_notsack_list != NULL)

Without checking whether or not tcp->tcp_sack_info is non-NULL. The correct line should perhaps read

 18941          if (tcp->tcp_sack_info != NULL && tcp->tcp_notsack_list != NULL)

Now this would probably not have made it as far as in IDR patch delivered to a customer, if we didn't have that macro definition because alarm bells would have rung that we were doing another dereference!

Comments:

Post a Comment:
Comments are closed for this entry.