OpenSolaris beats Linux on Memcached !
Following on the heels of our memcached performance tests
on SunFire X2270 ( Sun's Nehalem-based server) running OpenSolaris, we
ran the same tests on the same server but this time on RHEL5. As
mentioned in the post presenting the first memcached results,
a 10GBE Intel Oplin card was used in order to achieve the high
throughput rates possible with these servers. It turned out that using
this card on linux involved a bit of work resulting in driver and kernel
re-builds.
- With the default ixgbe driver from the RedHat distribution (version 1.3.30-k2 on kernel 2.6.18)), the interface simply hung during the benchmark test.
- This led to downloading the driver from the Intel site (1.3.56.11-2-NAPI) and re-compiling it. This version does work and we got a maximum throughput of 232K operations/sec on the same linux kernel (2.6.18). However, this version of the kernel does not have support for multiple rings.
- The kernel version 2.6.29 includes support for multiple rings but still doesn't have the latest ixgbe driver which is 1.3.56-2-NAPI. So we downloaded, built and installed these versions of the kernel and driver. This worked well giving a maximum throughput of 280K with some tuning.
Results Comparison
The system running
OpenSolaris and memcached 1.3.2 gave us a maximum throughput of 350K
ops/sec as previously reported. The same system running RHEL5 (with
kernel 2.6.29) and the same version of memcached resulted in 280K
ops/sec. OpenSolaris outperforms Linux by 25% !
Linux Tuning
The following Linux tunables were changed to try and get the best performance:
net.ipv4.tcp_timestamps = 0 net.core.wmem_default = 67108864 net.core.wmem_max = 67108864 net.core.optmem_max = 67108864 net.ipv4.tcp_dsack = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_window_scaling = 0 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_max_syn_backlog = 200000Here are the ixgbe specific settings that were used (2 transmit, 2 receive rings):
RSS=2,2 InterruptThrottleRate =1600,1600
OpenSolaris Tuning
The following settings in /etc/system were used to set the number of MSIX:
set ddi_msix_alloc_limit=4 set pcplusmp:apic_intr_policy=1For the ixgbe interface, 4 transmit and 4 receive rings gave the best performance :
tx_queue_number=4, rx_queue_number=4
Finally, we bound the crossbow threads:
dladm set-linkprop -p cpus=12,13,14,15 ixgbe0

Once again another reason to choose OpenSolaris for your webservers!
Posted by Ramon van Belzen on May 22, 2009 at 01:36 AM PDT #
Why not implement niche features of Solaris in Linux?
Posted by New York on May 22, 2009 at 05:40 AM PDT #
Is this test possible with any of the BSD's? If so it would be a nice comparison too.
Posted by Andreas on May 22, 2009 at 06:12 AM PDT #