Now that we have Gigabytes/sec class of Network Attached
OpenStorage and
highly threaded
CMT
servers to attach from you figure just connecting the two would be
enough to open the pipes for immediate performance. Well ... almost.
Our openstorage system can deliver great
performance but we often find
limitation on the client side. Now that NAS servers can deliver so much power,
their NAS client can themselve be powerful servers trying to deliver
GB/sec class services to the internet.
CMT servers are great throughput engines for that, however they
deliver the goods when the whole stack is threaded. So in a recent
engagement, my collegue David Lutz found that we needed one tuning at
each of 4 levels in Solaris : IP, TCP, RPC and NFS.
| Service | Tunable |
| IP | ip_soft_ring_cnt |
| TCP | tcp_recv_hiwat |
| RPC | clnt_max_conns |
| NFS | nfs3_max_threads |
| NFS | nfs4_max_threads |
ip_soft_rings_cnt requires tuning up to Solaris 10 update 7.
The default value of 2 is not enough to sustain the high throughput in
a CMT environment. A value of 16 proved beneficial.
In /etc/system :
* To drive 10Gbe in CMT in Solaris 10 update 7 : see blogs.sun.com/roch
set ip_soft_rings_cnt=16
The receive socket buffer size is critical to the TCP connection
performance. The buffer is not preallocated and memory is only
used if and when the application is not reading the data
it has requested. The default at 48K is from the age of 10MB/s Network cards
and 1GB/sec systems. Having a larger value allows the peer to not
throttle it's flow pending the returning TCP ACK. This is specially
critical in high latency environment, urban area networks or other
large fat network but it's also critical in the datacenter to reach a
reasonable portion of the 10Gbe available in today's NIC. It turns out
that NFS connection inherit the TCP default for the system and so it's
interesting to run with a value between 400K and 1MB :
ndd -set /dev/tcp_recv_hiwat 400000
But even with this, a single TCP connection is not enough to extract
the most out of 10Gbe on CMT. And the solaris rpc client will
establish a single connection to any of the server it connects to.
The code underneath is highly threaded but did suffer from a few bugs
when trying to tune that number of connections notably
6696163,
6817942
both of which are fixed in S10 update 8.
With that release, it becomes interesting to tune the number of RPC
connections for instance to 8.
In /etc/system :
* To drive 10Gbe in CMT in Solaris 10 update 8 : see blogs.sun.com/roch
set clnt_max_conns=8
And finally, above the RPC layer, NFS does implement a pool of threads
per mount point to service asynchronous requests. These will be mostly
used in streaming workloads (readahead and writebehind) while other
synchronous requests will be issued within the context of the
application thread. The default number of asynchronous requests is
likely to limit performance in some streaming scenario. So
I would experiment with
In /etc/system :
* To drive 10Gbe in CMT in Solaris 10 update 7 : see blogs.sun.com/roch
set nfs3_max_threads=32
set nfs4_max_threads=32
As usual
YMMV and use
them with the usual circumspection, remember that
tuning
is evil but it's better to know about these factors than being in
the dark and stuck with lower than expected performance.