Monday July 06, 2009
What processor will fuel your first private Cloud : INTEL Nehalem or AMD Istanbul ?
Where IT is
going ...
You
may have observed the big trend of the moment : Take your old slide
decks, banners and marketing brochures and try to plug in the word
cloud
as many times as
possible. A current Google search of the words Cloud Computing yield
today more than 31 million results ! Even if you search only on Cloud
(getting 175 Million+
results), the first entry in the list (discounting the Sponsored
results) is this
one. Amazing fashion of the moment !
As we recently described in this white paper, there are not one but many clouds. I had recent conversations on this topic with customers in our Menlo Park Executive Briefing Center . While they all say that they will not be able to host their entire IT department in a Public Cloud. , they are interested in the notion of combining a Public cloud service with multiple Private Clouds - this is the notion of Hybrid Cloud.

Private
clouds
The
Sun Solution
Centers and SUN
Professional Services are starting now to build the first private
clouds architectures based on Sun Open Source products. The most
common building block for those is the versatile Sun
Blade 6000. Why ? Because of the capacity of this chassis to host
many different type of CPU's (x86 & SPARC) and operating systems
(Windows, Linux, OpenSolaris, Solaris or even Vmware
vSphere). At the same time, INTEL and AMD have released two
exceptional chips : the INTEL XEON 5500 (code name Nehalem) and the
six-core AMD Opteron (code name Istanbul). I had the opportunity to
test these chips recently and will give you here a few data points.
Cloud benchmarks
We may not have today any Cloud related standard benchmarks. However, if I look at the different software components of a private cloud, it seems that Computing capabilities (in integer and floating point) and Memory Performance are the two key dimensions to explore. You may argue that your cloud need a database component ...but improved caching mechanism (memcached for example) and the commoditization of Solid State Disks (see this market analysis and also here) are moving database performance profiles toward memory or cpu intensive workloads. Additionally, the exceptional power of 10-Gbit based Hybrid storage appliances (like the Sun Storage 7410 Unified Storage System) makes us less concerned by I/O & network bound situations. It is good to know that this new storage appliances are a key element of our public cloud infrastructure.

Nehalem & Istanbul Executive summary
Both AMD & INTEL had customer investments in mind as their new chips use the same sockets than before ... so they can be used in previously released chassis. What you will typically have to do after upgrading to the new processors is to download the latest platform BIOS. Another good idea is also to check on your OS level ... the latest OS releases include upgraded libraries and drivers. Those are critical if performance is near the top of your shopping list. See here for example.
For other features, please refer to the key characteristics below :
|
Feature |
INTEL Xeon X5500 (Nehalem) |
AMD Opteron 2435 (Istanbul) |
|
Release date |
March 29, 2009 |
June 1st, 2009 |
|
Manufacturing |
45 nm |
45 nm |
|
Frequency (tested) |
2.8Ghz |
2.6Ghz |
|
Cores |
4 |
6 |
|
Strands/core |
2 [if NUMA on] |
1 |
|
Total #strands |
8 |
6 |
|
L1 cache |
256 KB [32KB I. + 32KB D. per core] |
768 KB [128 KB per core] |
|
L2 Cache |
1 MB [256KB per core] |
3 MB [512KB per core] |
|
L3 cache |
2 MB shared |
6 MB shared |
|
Memory type |
DDR3 1333Mhz max. * |
DDR2 800 Mhz |
|
Nom. Power |
95 W |
75W |
|
Major Innovations |
Second level branch predictor & TLB |
Power savings and HW virtualization |
Note : For this test, we used DDR3 1066Mhz.
Now, here is our hardware list :
|
Role |
Model |
Blade |
Sockets@freq |
RAM |
|
AMD Opteron 'Istanbul' |
SB6000 |
X6260 |
2@2.6Ghz |
24 GB |
|
INTEL XEON 'Nehalem' |
SB6000 |
X6270 |
2@2.8Ghz |
24 GB |
|
Console |
X4150 |
N/A |
2@2.8Ghz |
16 GB |

Calculation
performance : iGenCPU
iGenCPU is a
calculation benchmark written in Java. It calculates Benoit
Mandelbrot's fractals using a custom Imaginary
Numbers library. The main benefit of this workload is that it
naturally creates a 50% floating point and 50% integer calculation.
As the number of floating operations produced by commercial software
increase every year, this type of performance profile is getting
closer and closer to what modern web servers (like Apache) and
application servers (like Glassfish) will produce.
Here are the results (AMD Istanbul in Blue, INTEL Nehalem in Red) :

Observations :
Very similar peak throughput (984 fractals/s on INTEL, 1008 fractals/s on AMD)
The AMD chip produce superior throughput at any level of concurrency. At 8 threads, which is a very common scalability limit for commercial virtualization products, it produces 28% more throughput than Nehalem.
It shows the superiority of the Opteron calculation co-processors as we had already observed on previous quad-core generation.
It is more important for calculation to have larger L1/L2 cache then faster L1/L2 cache. The Opteron micro-architecture is naturally a better fit for this workload.

Memory
performance : iGenRAM
It is a classic brain exercise when you can not sleep : imagine what you would do with $94 million in your bank account. The iGenRAM benchmark was initially developed in C to produce an accurate simulation of the California Lotto winner determination. It is highly memory intensive using 1Gigabyte of memory per thread. Memory allocation time as well as memory search performance produce a combined throughput number plotted below :

Observations
:
The faster DDR3 memory and higher frequency of the INTEL chip make it a better fit for memory intensive workloads. In peak, the Nehalem based system produce 23% more throughput than its competitor.
For a small number of threads (1 to 4), both system produce very similar numbers.
Second level predictor on this repetitive workload most likely help the Nehalem-based system to improve its scalability curve tangent past four threads
As noted, we used DDR3 1066Mhz for this Nehalem test. DDR3 1333Mhz is also available and will increase the INTEL chip advantage on this workload.

Conclusion
At complex question, complex answer... As you have noted, these benchmarks show the AMD Istanbul better suited for calculation intensive workloads but also show better memory performance of the INTEL Nehalem. Therefore, different layers within your private cloud will need to be profiled if you want to determine what is your best choice. And guess which Operating System comes equipped with the right set of tools (I.e Dynamic Tracing) to make the determination : Solaris or OpenSolaris .
[Last minute note: I also performed Oracle 10g database benchmarks on these blades. Maybe for another article..]
See you next time
in the wonderful world of benchmarking....
Xeon 5500/Nehalem has *8* (not *2* as you mention) MB L3 cache, completely reversing that comparison point in your chart (a couple of part numbers have 4MB, but all the Xeon 5500s Sun sells have 8MB).
Interesting that your iGenCPU results favor Istanbul, as SPEC2006CPU int and fp (rate and single threaded) results both *strongly* favor Nehalem. Have you tried with HyperThreading off and running up to 8 threads on Nehalem (some code performs better that way)?
Posted by Bruce on August 02, 2009 at 08:42 PM PDT #