星期一 七月 31, 2006

Diag. ICMP checksum error in kernel with Dtrace

This topic talks about how to locate a kernel checksum error problem.



The problem is interesting: when enable jumbo frame to 16K Bytes, test
pairs could not ping each other with specified ping payload size(2006
Bytes in this case) on x64 platforms.



While checking netstat infor on remote side, there is "icmpInCksumErrs"
increasing with the incoming ICMP echo request. The checksum error are
from either sending or receiving side then.



Sending side calculates ICMP checksum in application. In ping.c, line 1940 - 1946

   1940 	if (family == AF_INET) {
1941 if (!use_udp)
1942 icp->icmp_cksum = in_cksum((ushort_t *)icp, cc);
1943
1944 i = sendto(send_sock, (char *)out_pkt, cc, 0, whereto,
1945 sizeof (struct sockaddr_in));
1946 }

Ethereal sniffering results shows the ICMP checksum sent by application
is correct. So the problem must be the ICMP checksum func in kernel
caused the problem.



How could we prove this? We could start by checking input IP packet contents, using below D-script:



ip_input:entry

{

        /*extern void ip_input(ill_t *, ill_rx_ring_t *, mblk_t *, size_t);

          get the mblk of input packet */

        self->ip_mp = (mblk_t *)(arg2);

        /* make self->iph point to the data header, IP header here */

        self->iph = (ipha_t *)(self->ip_mp->b_rptr);

        /* We only interest in ICMP packets here, so we need protocol */

        self->protocol = self->iph->ipha_protocol;

}

ip_cksum:entry

/self->protocol == 1/ /* ICMP only */

{

    stack(); /* just curious, no need here */

}

ip_cksum:return

{

    printf("0x%x\n", arg1); /* get the ip_cksum return value*/

}





Note, in icmp_inbound, there are two places where ICMP checksum are calculated(ip.c):



first is the LINE 1741 - 1747, which validates incoming ICMP checksum. The return value should be 0.



   1741 	/* ICMP header checksum, including checksum field, should be zero. */
1742 if (sum_valid ? (sum != 0 && sum != 0xFFFF) :
1743 IP_CSUM(mp, iph_hdr_length, 0)) {
1744 BUMP_MIB(&icmp_mib, icmpInCksumErrs);
1745 freemsg(first_mp);
1746 return;
1747 }



second is LINE 2013 - 2017, which calculates outgoing ICMP packet and should be the right ICMP checksum.

   2013 	/* Send out an ICMP packet */
2014 icmph->icmph_checksum = 0;
2015 icmph->icmph_checksum = IP_CSUM(mp, iph_hdr_length, 0);
2016 if (icmph->icmph_checksum == 0)
2017 icmph->icmph_checksum = 0xFFFF;



Let's run above D-script on a normal system, with background ping traffic, the output is below:



# dtrace -s icmp.d

dtrace: script 'icmp.d' matched 5 probes

CPU    
ID                   
FUNCTION:NAME

  0 
25208                  
ip_cksum:entry

              ip`icmp_inbound+0x120

              ip`ip_proto_input+0xa62

              ip`ip_input+0x619

              dls`i_dls_link_rx_promisc+0x213

              mac`mac_rx+0x53

              e1000g`e1000g_intr+0xc4

              unix`intr_thread+0x136



  0 
25209                 
ip_cksum:return return: 0xffff   <--- Incoming
ICMP~(0xffff) == 0, correct



  0 
25208                  
ip_cksum:entry

              ip`icmp_inbound+0x5e3

              ip`ip_proto_input+0xa62

              ip`ip_input+0x619

              dls`i_dls_link_rx_promisc+0x213

              mac`mac_rx+0x53

              e1000g`e1000g_intr+0xc4

              unix`intr_thread+0x136



  0 
25209                 
ip_cksum:return return: 0xce97  <--- Outgoing ICMP, not zero



Comparing with what we see in the bugy system:

bash-3.00# dtrace -s icmp.d

dtrace: script 'icmp.d' matched 5 probes

CPU    
ID                   
FUNCTION:NAME

  1 
23017                  
ip_cksum:entry



              ip`icmp_inbound+0x24a

              ip`ip_proto_input+0x479

              ip`ip_input+0x4ab

              dls`i_dls_link_rx+0x18c

              mac`mac_rx+0x45

              e1000g`e1000g_intr_work+0x176

              e1000g`e1000g_intr+0x3c

              unix`av_dispatch_autovect+0x78

              unix`intr_thread+0x50



  1 
23018                 
ip_cksum:return return: 0x417d   <--- should be 0xffff



...



  1 
23017                  
ip_cksum:entry

              ip`icmp_inbound+0x24a

              ip`ip_proto_input+0x479

              ip`ip_input+0x4ab

              dls`i_dls_link_rx+0x18c

              mac`mac_rx+0x45

              e1000g`e1000g_intr_work+0x176

              e1000g`e1000g_intr+0x3c

              unix`av_dispatch_autovect+0x78

              unix`intr_thread+0x50



  1 
23018                 
ip_cksum:return return: 0x60e9 <--- should be 0xffff



ip_cksum validates wrong checksum when dealing with incoming ICMP
packets. Since ip_ocsum will be called by ip_cksum when checking IP
protocol checksum, we could take a close look into ip_ocsum and learn
why this problem could happen.



ip_ocsum in Solaris are platform dependent. For example, SPARC has ip_ocsum.s while x86 has i86_subr.s.



This counts for why we see different symbols on various platforms. On SPARC, the ping problem happens with another payload size.



See following articles on the checksum code.