Tuesday Apr 14, 2009
Tuesday Apr 14, 2009
Today Sun announced multiple servers based on Intel's Nehalem Processor. I had early access to a Sun Fire X4270 server (2 socket) for a couple of weeks. I used this opportunity to test some of latest MySQL performance and scalability enhancements. For someone unfamiliar with this system, this is a 2 socket 2U server with support for a max of 144GB of memory. With hyperthreading turned on, the operating system sees 16 CPUs.
Before I share the results of my findings, lets get clear on the terminology. Socket refers to physical sockets on the motherboard. CPU refers to the number of processors seen by the operating system. Core refers to the physical processing unit. A Nehalem socket has 4 cores. Thread refers to the hyperthreading threads. One Nehalem core has 2 threads. Using this terminology, the Sun Fire X4270 has 16 CPUs (2 sockets, 4 cores per socket, 2 threads per core).
I used the ever popular Sysbench benchmark. I used an internal version based off version of MySQL 5.1 running on OpenSolaris. Since the goal of this experiment was showcase MySQL (and Innodb) scalability, (and the X4270 system), I used a cached workload. You should be able to see similar speedups for regular applications, provided there are no IO bottlenecks and no known MySQL scalability issues are being exercised. The X4270 supports 16x2.5" disk drives (SATA, SAS or SSD) so IO should not be a problem for most workloads. I used the tunings mentioned in my earlier blog Maximizing Sysbench OLTP performance for MySQL.
Nehalem incorporates Hyperthreading technology. Hyperthreading allows a core to run an additional software thread. along with the original thread. Since there is very little dedicated chip resources for the second thread, you cannot expect to see 2x boost in performance.
There two ways you can disable
hyperthreading on Solaris.
| Experiment | Sockets | CPUs seen by Solaris |
Read only TPS |
Read Write TPS |
|---|---|---|---|---|
| Hyperthreading ON | 2 | 16 | 6310 | 4652 |
| Hyptherthreading OFF | 2 | 8 | 4648 | 3584 |
| 35% | 29.7% | |||
There are two ways to study system
scaling.
To study system scaling across sockets, we typically fully populate each core/socket before moving on to the next core/socket. For example with the X4270, we use
| 1 core using both threads (1 CPUs) |
| 2 cores using both threads in each core (4 CPUs) |
| 4 cores using all threads (1 full socket) (8 CPUs) |
| 2 full sockets(16 CPUs) |
By fully allocating CPUs per core one socket at a time, we are basically showing what would happen if you only had the number of CPUs shown. This approach shows the best scalability and is also the most realistic approach.
| Sockets | CPUs seen by Solaris |
Read Only TPS |
ReadWrite TPS |
|---|---|---|---|
| 1 Socket | 8 | 3364 | 2616 |
| 2 Sockets | 16 | 6310 | 4652 |
| 1.87x | 1.77x | ||
As you can see from above, going from 1 socket to 2 socket, we see a
87% improvement in ReadOnly test and 77% improvement in
Read-Write performance.
![]() |
|