星期一 七月 31, 2006
Diag. ICMP checksum error in kernel with Dtrace
This topic talks about how to locate a kernel checksum error problem.
The problem is interesting: when enable jumbo frame to 16K Bytes, test
pairs could not ping each other with specified ping payload size(2006
Bytes in this case) on x64 platforms.
While checking netstat infor on remote side, there is "icmpInCksumErrs"
increasing with the incoming ICMP echo request. The checksum error are
from either sending or receiving side then.
Sending side calculates ICMP checksum in application. In ping.c, line 1940 - 1946
1940 if (family == AF_INET) {
1941 if (!use_udp)
1942 icp->icmp_cksum = in_cksum((ushort_t *)icp, cc);
1943
1944 i = sendto(send_sock, (char *)out_pkt, cc, 0, whereto,
1945 sizeof (struct sockaddr_in));
1946 }
Ethereal sniffering results shows the ICMP checksum sent by application
is correct. So the problem must be the ICMP checksum func in kernel
caused the problem.
How could we prove this? We could start by checking input IP packet contents, using below D-script:
ip_input:entry
{
/*extern void ip_input(ill_t *, ill_rx_ring_t *, mblk_t *, size_t);
get the mblk of input packet */
self->ip_mp = (mblk_t *)(arg2);
/* make self->iph point to the data header, IP header here */
self->iph = (ipha_t *)(self->ip_mp->b_rptr);
/* We only interest in ICMP packets here, so we need protocol */
self->protocol = self->iph->ipha_protocol;
}
ip_cksum:entry
/self->protocol == 1/ /* ICMP only */
{
stack(); /* just curious, no need here */
}
ip_cksum:return
{
printf("0x%x\n", arg1); /* get the ip_cksum return value*/
}
Note, in icmp_inbound, there are two places where ICMP checksum are calculated(ip.c):
first is the LINE 1741 - 1747, which validates incoming ICMP checksum. The return value should be 0.
1741 /* ICMP header checksum, including checksum field, should be zero. */
1742 if (sum_valid ? (sum != 0 && sum != 0xFFFF) :
1743 IP_CSUM(mp, iph_hdr_length, 0)) {
1744 BUMP_MIB(&icmp_mib, icmpInCksumErrs);
1745 freemsg(first_mp);
1746 return;
1747 }
second is LINE 2013 - 2017, which calculates outgoing ICMP packet and should be the right ICMP checksum.
2013 /* Send out an ICMP packet */
2014 icmph->icmph_checksum = 0;
2015 icmph->icmph_checksum = IP_CSUM(mp, iph_hdr_length, 0);
2016 if (icmph->icmph_checksum == 0)
2017 icmph->icmph_checksum = 0xFFFF;
Let's run above D-script on a normal system, with background ping traffic, the output is below:
# dtrace -s icmp.d
dtrace: script 'icmp.d' matched 5 probes
CPU
ID
FUNCTION:NAME
0
25208
ip_cksum:entry
ip`icmp_inbound+0x120
ip`ip_proto_input+0xa62
ip`ip_input+0x619
dls`i_dls_link_rx_promisc+0x213
mac`mac_rx+0x53
e1000g`e1000g_intr+0xc4
unix`intr_thread+0x136
0
25209
ip_cksum:return return: 0xffff <--- Incoming
ICMP~(0xffff) == 0, correct
0
25208
ip_cksum:entry
ip`icmp_inbound+0x5e3
ip`ip_proto_input+0xa62
ip`ip_input+0x619
dls`i_dls_link_rx_promisc+0x213
mac`mac_rx+0x53
e1000g`e1000g_intr+0xc4
unix`intr_thread+0x136
0
25209
ip_cksum:return return: 0xce97 <--- Outgoing ICMP, not zero
Comparing with what we see in the bugy system:
bash-3.00# dtrace -s icmp.d
dtrace: script 'icmp.d' matched 5 probes
CPU
ID
FUNCTION:NAME
1
23017
ip_cksum:entry
ip`icmp_inbound+0x24a
ip`ip_proto_input+0x479
ip`ip_input+0x4ab
dls`i_dls_link_rx+0x18c
mac`mac_rx+0x45
e1000g`e1000g_intr_work+0x176
e1000g`e1000g_intr+0x3c
unix`av_dispatch_autovect+0x78
unix`intr_thread+0x50
1
23018
ip_cksum:return return: 0x417d <--- should be 0xffff
...
1
23017
ip_cksum:entry
ip`icmp_inbound+0x24a
ip`ip_proto_input+0x479
ip`ip_input+0x4ab
dls`i_dls_link_rx+0x18c
mac`mac_rx+0x45
e1000g`e1000g_intr_work+0x176
e1000g`e1000g_intr+0x3c
unix`av_dispatch_autovect+0x78
unix`intr_thread+0x50
1
23018
ip_cksum:return return: 0x60e9 <--- should be 0xffff
ip_cksum validates wrong checksum when dealing with incoming ICMP
packets. Since ip_ocsum will be called by ip_cksum when checking IP
protocol checksum, we could take a close look into ip_ocsum and learn
why this problem could happen.
ip_ocsum in Solaris are platform dependent. For example, SPARC has ip_ocsum.s while x86 has i86_subr.s.
This counts for why we see different symbols on various platforms. On SPARC, the ping problem happens with another payload size.
See following articles on the checksum code.
Posted at 04:14下午 七月 31, 2006 by raymond in Sun | 评论[9]