Monday August 20, 2007
Sun SPARC Enterprise M9000 vs Sun Fire E25k - Datapoints
As you know, Sun has a new high-end server available ; The Sun SPARC Enterprise M9000.
Detailed documentations can even be found here.
But how do you compare the performance of an E25k versus a M9000 ?
Tough question....so here are some datapoints before you start developping your own Customer Benchmark.
Chips
Let's start by a side-by-side chip comparison :
| Chips | UltraSPARC-IV+ | SPARC64 VI |
| Manufacturing | 90 nm | 90nm |
| Die size | 356 sq mm | 421 sq mm |
| Transistors | 295 million | 540 million |
| Cores | 2 | 2 |
| Threads/Core | 1 | 2 |
| Frequency | 1.8Ghz | 2.28Ghz |
| L1 I-cache | 64 KB | 128KB/core |
| L1 D-Cache | 64 KB | 128KB/core |
| on-chip L2-cache | 2 MB | 6 MB |
| off-chip L3-cache | 32 MB | None |
Interesting but not necessary helpful. What is delicate to determine is how the very different memory architectures will influence the performance levels.The SPARC64 VI chips has 3 times more L2 cache but has 5.6 times less total chip cache. As we know, one popular workload was very much influenced by the addition of a L3 cache on the UltraSPARC-IV+ : Online Transaction Processing or OLTP.
A note on multi-threading : The two threads of the SPARC VI 64 processor are not designed to double the throughput of a single core. The goals are to minimize CPU core wait time and increase CPU core utilization. A critical piece of information is that the two threads share the two Translation Lookaside buffers.All of the results below have been obtained with the second thread disabled on each core as we obtained similar or better performance doing so. More info can be found here.
And of course this is only a chip comparison, let's learn more with a side-by-side view of the M9000 and E25k servers :
Servers
| Systems | E25k | M9000 |
| Max processors | 72 | 64 |
| Max cores | 144 | 128 |
| Max HW threads | 144 | 256 |
| Max memory | 1152 GB | 2048 GB |
| Memory bandwidth | 173 GB/s | 737 GB/s |
| I/O bandwidth | 36 GB/s | 244 GB/s |
| Max internal disks | 0 | 64 |
| Max domains | 18 | 24 |
| OS support | Solaris 9 or 10 | Solaris 10 U4 |
| Media | None | DVD, DAT |
| Power type | 1 phase | 1 or 3 phase |
| Max Power | 30.6 kW | 42.6 kW |
With this table, we certainly have a better idea of the immense capacity of this servers, but it still does not help us to estimate performance. Now, I did not have the luxury to test two fully loaded servers...so here is what I tested. The big decision was to use the same number of CPU boards on each system (called CMUs on the M9000).
So here is the tested hardware :
-
Hardware Stack
Server
System clock freq.
Per Domain
Role
Model
Qty
Sockets@Freq.
RAM
SPARC-VI 64 server
M9000-32
1
960 MHz
16@ 2280MHz (32 cores)
256GB
UltraSPARC-IV+ server
E25k
1
150 MHz
16@ 1800MHz (32 cores)
256GB
Console
X4200
1
800 Mhz
2@ 2600Mhz
8GB
Storage
SE6540
1
8xRAID1
56x73GB 15k drives
8GB
Frequency
Regarding performance, the first metric we can look at is the basic CPU frequency ratio.
This value is a good starting point to base our expectation even if we know that comparing frequency on different chips has little meaning.
| Server |
M9000 | E25k |
| Frequency | 2280 Mhz | 1800 Mhz |
| Comparison | 1.26 | 1 |
Can we conclude that we will observe a 1.26 speed up if we upgrade our current 4 -boards E25k with a 4-CMUs M9000 ?
Java workloads
Not exactly.So let's try to be a little bit more specific using five different 100% Java (1.6) workloads :
- iGenCPU v3 - Fractal simulation 50% Integer / 50% floating point
- iGenRAM v3 - Lotto simulation (Memory allocation and search)
- iGenBATCH v2 - (Oracle 10g batch using partionning, triggers,
stored procedures and sequences)
- iGenOLTP v4 - (Heavy-weight OLTP)
Datapoints
The values showed hare are peak results obtained by building the complete scalability curve. The response times mentioned are average, at peak and in Milliseconds.
| E25k | M9000 | |||
| Throughput | RT (ms) | Throughput | RT (ms) | |
| iGenCPU v3 | 303 fractals/second | 105 | 728 fractals/second | 44 |
| iGenRAM v3 | 2865 lottos/ms | 55 | 4881 lottos/ms | 17 |
| iGenBatch v2 | 35 TPS | 907 | 50 TPS | 626 |
| iGenOLTP v4 | 3938 TPM | 271 | 4500 TPM | 351 |
As we are trying to compare to the M-value 1.33 factor, let's look at those results by giving a factor 1 to the E25k.
First, here is throughput :
| Throughput | E25k | M9000 |
| 'iGenCPU v3 | 1 | 2.403 |
| 'iGenRAM v3 | 1 | 1.704 |
| 'iGenBATCH v2 | 1 | 1.450 |
| 'iGenOLTP v4 | 1 | 1.143 |
| Frequency | 1 | 1.26 |
Which would be this chart :

Performance notes on throughput
- As you can see, for pure CPU calculations, the M9000 is 2.4 times more powerful than the E25k. Way beyond the M-value.
- Memory allocation & access time are really faster on the
M9000 causing a 1.7 times increase in Throughput.
- Only one index is below the M-value : OLTP. It seems that
the large reduction in total chip cache (all levels) has a big
impact on this workload.
And here is the average reponse time at peak throughput (still using a base 1 for the E25k) :
| RT | E25k | M9000 |
| iGenCPU v3 | 1 | 0.419 |
| iGenRAM v3 | 1 | 0.301 |
| iGenBATCH v2 | 1 | 0.690 |
| iGenOLTP v4 | 1 | 1.295 |
And the chart :

Performance notes on response time
- The CPU & RAM micro-benchmarks show very impressive
improvements in response time. What takes 1s on the E25k, takes about
400ms on the M9000 at peak throughput.
- Because of the richness of the batch benchmark and the inclusion of CPU intensive Oracle stored procedures, we observe a nice factor of 0.69
- Oracle OLTP is disappointing on the M9000 with an increase in
response time at peak throughput. Upcoming release of Solaris and
Oracle 10g should improve this result.
As you can notice from te diversity of this factors, we should be really busy in the Sun Solution Center - Customer benchmarking group. There is no magic number...and yes, it is only by testing your own application that you will obtain the relevant numbers.
See you next time in the wonderful world of benchmarking....
Aug 20 2007, 05:07:47 PM PDT Permalink
I don't think you really mean those system max memory sizes in the System table do you?
Posted by James Mansion on August 21, 2007 at 02:16 AM PDT #
The system max memory sizes are correct.
Posted by MrBenchmark on August 21, 2007 at 09:22 AM PDT #
Actually E25K (the official name is not E25000) can have 1152GB of memory with the high-density dimms or 576 with the medium-density dimms. There are 18 boards not 16 boards so that is why it is more than just 1/2TB or 1TB.
BM Seer;
Thanks for the detailed memory informations. It is now corrected...
Posted by MrBenchmark on August 21, 2007 at 02:05 PM PDT #
Comments are closed for this entry.