
星期三 六月 15, 2005
Solaris TCP Window Update
Solaris TCP Window Update
When people check out the TCP source code in OpenSolaris, they may find that
some pieces of the code do not follow exactly as specified in the
various RFCs. Here is an example and the reason why Solaris
deviates from the RFCs.
On page 72 of RFC 793,
the criteria on updating the TCP send window is specified as the
following.
If SND.UNA < SEG.ACK =< SND.NXT, the send window should be updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ and SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK.
Note that SND.WND is an offset from SND.UNA, that SND.WL1 records the sequence number of the last segment used to update SND.WND, and that SND.WL2 records the acknowledgment number of the last segment used to update SND.WND. The check here prevents using old segments to update the window.
And on page 94 of RFC
1122, the first condition above is corrected to
Similarly, the window should be updated if: SND.UNA =< SEG.ACK =< SND.NXT.
In Solaris, we use a different check. See the following piece of
code in usr/src/uts/common/inet/tcp/tcp.c
swnd_update: /* * The following check is different from most other implementations. * For bi-directional transfer, when segments are dropped, the * "normal" check will not accept a window update in those * retransmitted segemnts. Failing to do that, TCP may send out * segments which are outside receiver's window. As TCP accepts * the ack in those retransmitted segments, if the window update in * the same segment is not accepted, TCP will incorrectly calculates * that it can send more segments. This can create a deadlock * with the receiver if its window becomes zero. */ if (SEQ_LT(tcp->tcp_swl2, seg_ack) || SEQ_LT(tcp->tcp_swl1, seg_seq) || (tcp->tcp_swl1 == seg_seq && new_swnd > tcp->tcp_swnd)) { /* * The criteria for update is: * * 1. the segment acknowledges some data. Or * 2. the segment is new, i.e. it has a higher seq num. Or * 3. the segment is not old and the advertised window is * larger than the previous advertised window. */
The check
SND.WL1 = SEG.SEQ and SND.WL2 =< SEG.ACK
is modified to be
SND.WL2 < SEG.ACK
Without the change of conditions, a combination of zero window and
segment drop can cause a deadlock in TCP. The reason is that
according to the RFCs, TCP does not use window update in out of order
segments (retransmitted segments because of drop are out of order), yet
the ACK field in those segments is processed. This can cause a
sender A to send more than the other side's (B's) receive
window. This is because the ACK field moves the left edge
of the window forward, but as the window update (being 0) in the same
segment is not used, TCP will continue to use the old send window which
is bigger. Thus from A's perspective, the whole send window moves
forward. Those out of window segments will be dropped by B.
And once A sends beyond B's receive window, all ACKs from
A to B will also be dropped by B because they are out of window (TCP
uses
the latest sequence number in ACK segments). In a bi-directional
transfer, this means that B will keep on retransmitting its data as
those ACKs from A are not acceptable. This connection will be
hung. Note that this is not a problem in uni-directional transfer.
If a segment (even out of order) passes the normal TCP acceptance test
and the ACK field acknowledges new data, it should mean that the window
update in the segment must also be used. Window
update and the ACK field are really tied together. One cannot use
the ACK field without also using the window update. This issue
was discussed in the now closed tcp-impl mailing list several years
ago. But AFAIK, there is no write up on this issue. So
there may still be implementations which have this problem handling
bi-directional transfer.
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
( 6月 15 2005, 08:35:16 上午 HKT )
Permalink
|