Soft rings is a feature that I worked on recently and putback the changes into
S10 update 2. This feature improves incoming network traffic performance. This is the worker thread
model of processing packets. The incoming traffic is made to land on a
soft ring and a worker thread will pick up the packet and deliver it to
IP.
Let's for a minute go back and see what problem we are trying to solve:
The FireEngine architecture introduced a per-CPU
synchronization mechanism called vertical perimeter inside TCP/IP
module. These vertical perimeters are implemented using a serialization
queue abstraction called squeue. A connection is bound to an instance of
squeue when the connection is initialized. Afterwards all packets for
the connection are always processed on the same squeue. In the case of
new incoming connections, they get bound to the squeue of the CPU that
took the interrupt. This helps achieve better cache locality and
increased network performance.
Now on systems consisting of slow cpus (CPU speed less than 1 Ghz), a
single CPU will not be able to handle incoming load of 1 Gbps. On the
other hand, even faster CPUs will not be able to handle loads generated
by 10 Gbps NICs. The solution would be to fanout the load to be handled
by multiple CPUs.
The current solution of enabling this by setting ip_squeue_fanout to 1
is suboptimal (or rather one can say it is broken). With
ip_squeue_fanout set to 1, for new incoming TCP connections a random
squeue that could belong to any one of the CPUs in the system gets
selected and then the packet could get processed in the same context.
This is bad because what you want here is to have the other CPU to do
the processing of the packets belonging to its squeue.
The problem is addresses by soft rings. Soft rings is an abstraction
that simulates hardware Rx ring functionality in software. Multiple soft
rings can be configured on a system (tunable: ip_soft_rings_cnt). By
default 2 soft rings are configured. Incoming traffic is made to land on
one of the soft rings. The soft ring will have pointer to the right
squeue to which the packet has to be delivered. A worker thread will be
created for each soft ring and this worker thread will pick up the
packet from the soft ring and deliver it to IP. The worker thread will
have affinity to the CPU to which the squeue belong. All this helps in
efficient processing of the packets.
Other considerations:
Fanout based on the hardware/platform:
Consider Niagara processors. Niagara processor contain multiple cores in
a single chip. Each core in turn can process 4 threads. When handling
software fanout, due consideration is given to tie in the incoming data
to be handled by threads (these thread are counted as CPUs) in the same
core that took the interrupt. This would help preserve interrupt to
cpu/core affinity.
Same is the case with AMD dual core processors. It would be optimal if
the load can be fanned out to CPUs on the same core to capitalize on the
shared L2 cache.
How to enable soft rings ?
You need to have Solaris 10 update 2.
On Niagara platforms (T1000 and T2000s), it is enabled by default.
On other platforms, it can be enabled by setting ip_squeue_fanout to 1.
ip_soft_rings_cnt has a default value of 2. A value of 2 or 3 has been
found to be optimal for getting good performance on 1Gbps NICs on the
Niagara platforms.