Kacheong Poon's Weblog
潘嘉昌的Weblog
All | General | Solaris

20050615 星期三 六月 15, 2005

Solaris TCP Window Update Solaris TCP Window Update
When people check out the TCP source code in OpenSolaris, they may find that some pieces of the code do not follow exactly as specified in the various RFCs.  Here is an example and the reason why Solaris deviates from the RFCs.

On page 72 of RFC 793, the criteria on updating the TCP send window is specified as the following.

          If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ and
SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set
SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK.

Note that SND.WND is an offset from SND.UNA, that SND.WL1
records the sequence number of the last segment used to update
SND.WND, and that SND.WL2 records the acknowledgment number of
the last segment used to update SND.WND. The check here
prevents using old segments to update the window.

And on page 94 of RFC 1122, the first condition above is corrected to

          Similarly, the window should be updated if: SND.UNA =<
SEG.ACK =< SND.NXT.

In Solaris, we use a different check.  See the following piece of code in usr/src/uts/common/inet/tcp/tcp.c

swnd_update:
/*
* The following check is different from most other implementations.
* For bi-directional transfer, when segments are dropped, the
* "normal" check will not accept a window update in those
* retransmitted segemnts. Failing to do that, TCP may send out
* segments which are outside receiver's window. As TCP accepts
* the ack in those retransmitted segments, if the window update in
* the same segment is not accepted, TCP will incorrectly calculates
* that it can send more segments. This can create a deadlock
* with the receiver if its window becomes zero.
*/
if (SEQ_LT(tcp->tcp_swl2, seg_ack) ||
SEQ_LT(tcp->tcp_swl1, seg_seq) ||
(tcp->tcp_swl1 == seg_seq && new_swnd > tcp->tcp_swnd)) {
/*
* The criteria for update is:
*
* 1. the segment acknowledges some data. Or
* 2. the segment is new, i.e. it has a higher seq num. Or
* 3. the segment is not old and the advertised window is
* larger than the previous advertised window.
*/

The check

SND.WL1 = SEG.SEQ and SND.WL2 =< SEG.ACK

is modified to be

SND.WL2 < SEG.ACK

Without the change of conditions, a combination of zero window and segment drop can cause a deadlock in TCP.  The reason is that according to the RFCs, TCP does not use window update in out of order segments (retransmitted segments because of drop are out of order), yet the ACK field in those segments is processed.  This can cause a sender A to send more than the other side's (B's) receive window.   This is because the ACK field moves the left edge of the window forward, but as the window update (being 0) in the same segment is not used, TCP will continue to use the old send window which is bigger.  Thus from A's perspective, the whole send window moves forward.  Those out of window segments will be dropped by B.  And once A sends beyond B's receive window, all ACKs from A to B will also be dropped by B because they are out of window (TCP uses the latest sequence number in ACK segments).  In a bi-directional transfer, this means that B will keep on retransmitting its data as those ACKs from A are not acceptable.  This connection will be hung.  Note that this is not a problem in uni-directional transfer.

If a segment (even out of order) passes the normal TCP acceptance test and the ACK field acknowledges new data, it should mean that the window update in the segment must also be used.  Window update and the ACK field are really tied together.  One cannot use the ACK field without also using the window update.  This issue was discussed in the now closed tcp-impl mailing list several years ago.  But AFAIK, there is no write up on this issue.  So there may still be implementations which have this problem handling bi-directional transfer.


Technorati Tag:
Technorati Tag:

( 6月 15 2005, 08:35:16 上午 HKT ) Permalink Comments [2]


Archives
Language
Links
Referrers