
星期二 六月 14, 2005
Magic ndd(1M) tunables
Magic ndd(1M) tunables
We often got requests from customers asking the meaning of some
IP/TCP/UDP/... ndd(1M)
parameters and when to change them. One common reason is that
some
of those parameters are thought to be secret
magic knobs for
improving network performance. But in reality, nearly all of
those parameters are not supposed to be changed at all. They are
there just in case of abnormal situations. Now that OpenSolaris is available, the
truth of the use of those parameters is finally
revealed! People can just look at the code and comments!
I'll describe one TCP ndd(1M)
parameter, tcp_use_smss_as_mss_opt,
added
in
Solaris 10 as an example to illustrate this.
You can find the following piece of code in the usr/src/uts/common/inet/tcp/tcp.c
file (you need to scroll down a little to find it).
case TCPS_SYN_RCVD: flags |= TH_SYN;
/* * Reset the MSS option value to be SMSS * We should probably add back the bytes * for timestamp option and IPsec. We * don't do that as this is a workaround * for broken middle boxes/end hosts, it * is better for us to be more cautious. * They may not take these things into * account in their SMSS calculation. Thus * the peer's calculated SMSS may be smaller * than what it can be. This should be OK. */ if (tcp_use_smss_as_mss_opt) { u1 = tcp->tcp_mss; U16_TO_BE16(u1, wptr); }
The code above is executed when a TCP end point is in SYN-RECEIVED
state (defined as TCPS_SYNC_RCVD
in the code) and it is composing the SYN/ACK segment in response to
an
incoming SYN
segment. The variable wptr
in the above code is a
pointer to where the TCP MSS (Maximum Segment Size) option value in the
SYN/ACK is. So if
the ndd
parameter tcp_use_smss_as_mss_opt
is set to a non zero value, the TCP MSS option value will be set to tcp->tcp_mss. The
field tcp_mss in the tcp_t structure is the
actual sending MSS size, which is calculated using the TCP MSS option
value in the received SYN segment, the length of additional TCP options
in each segment, the outgoing networking interface's MTU size and
possibly IPsec header overhead. The default value of tcp_use_smss_as_mss_opt is
0. So why does one want to use the sending MSS size as the
advertised TCP MSS option value? The MSS option value is supposed
to mean the maximum segment size the local TCP end point can receive,
not send.
The reason is briefly described in the comments above the code.
We introduced this parameter to get around some broken middle
boxes. Without this parameter, the local TCP end point uses the
outgoing network interface's MTU size to calculate the TCP MSS option
value since how big a segment the local TCP end point can receive is
determined by the MTU size of the network interface. If the
network interface is a normal Ethernet card, the MSS option value is
1460. Note that this value is independent of the MSS option value
in the SYN segment
advertised by the other side of a connection.
The other side should also use the same method to calculate the TCP MSS
option value. Both sides of a connection then calculate the
correct send MSS size based on the above fact. Everything should
work as expected...
The problem comes when there is a broken middle box, such as a DSL
modem/router using PPPoE. Suppose machine A has a normal Ethernet
interface but it is connected to the Internet using DSL with
PPPoE. A's TCP stack may not know about the reduced MTU size
because of PPPoE. This is usually not a problem as path MTU
discovery can handle this issue. But if the DSL modem/router is
broken and does funny things, we got into trouble. One funny
thing is that it may modify the MSS option value A sends out. The
value A sends out should be 1460, but the modem/router can reset it to
a lower value based on the PPPoE overhead. By doing this, the
modem/router thinks that it
helps solve the path MTU issue. Thus it can now forget about the
need of path MTU discovery and not send any ICMP messages required for
path MTU discovery to work.
Suppose A is trying to talk to machine B, which also uses Ethernet
interface. The modem/router changes A's TCP MSS option value to X
but it does not change the MSS option value (should be 1460) in B's
SYN/ACK. While B
will not send a segment larger than X bytes to
A, A can send a full 1460 bytes segment to B. These full 1460
bytes segments to B will be dropped by the modem/router. And
since the modem/router does not participate in path MTU discovery, A's
TCP stack will never know about the problem and the connection will
just hang.
We have some customers experiencing exactly this problem with their
clients who are behind such broken middle boxes. While their
clients can connect to our customers' servers, the clients cannot do
any transactions as data sent to those servers are dropped by those
middle boxes. One work around is to lower the MTU size of our
customers' servers. Then the calculated TCP MSS option value will
also be smaller. This is not optimal as not all of their clients
are behind such broken middle boxes. For those clients not behind
such broken middle boxes, this work around will reduce the network
performance to our customers' servers as they cannot send full size
segments.
We introduced the tcp_use_smss_as_mss_opt
parameter to work around this problem. In the above case, if the
tunable is set to 1, Solaris TCP (as B) will use X as the TCP MSS
option value. Then A will only send segments as large as X to
B. And for those clients not behind such broken middle boxes,
they can send full size segments.
If there is no such broken middle box, there is really no need to have tcp_use_smss_as_mss_opt
parameter... There are other ndd(1M) parameters which were
introduced for similar unusual
circumstances. They are not secret
magic knobs.
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
( 6月 14 2005, 11:19:40 上午 HKT )
Permalink
Trackback URL: http://blogs.sun.com/kcpoon/entry/magic_ndd_1m_tunables
|