BM Seer Unofficial thoughts from an anonymous Sun employee

Sun's Delivered Memory Bandwidth Leadership: Stream Benchmark

Wednesday Apr 18, 2007

Sun has faster delivered memory bandwidth than the best that IBM or HP can do. The Sun SPARC Enterprise M9000 beat IBM p5 595 by 10% on Stream TRIAD benchmark. The Sun SPARC Enterprise M9000 beat the HP Integrity Superdome by 33% on Stream TRIAD benchmark. The Sun SPARC Enterprise M9000, running with 2.4GHz SPARC64 VI processors, delivered a Stream TRIAD benchmark result of 227.1GB/s.

Don't let the core count confuse you, IBM cores cost over twice Sun's cores. Look at the other benchmark results posted to see that IBM costs more, is slower, and has fewer cores - but it is the best IBM that offers.

Be careful to compare measured/delivered bandwidth, other vendors sometimes try to confuse with peaks.

Stream Performance Chart - GB/s (1 MB=10^9 B, *not* 2^x B, bigger is better)

System GHz cores COPY SCALE ADD TRIAD
Sun SE M9000 2.4 128 224.4 223.1 224.2 227.1
IBM p5 595 2.3 64 186.1 179.6 200.4 206.2
HP Integrity SuperDome 1.6 128 154.5 153.0 169.5 170.8
HP Integrity SuperDome 1.6 64 116.1 114.6 127.9 128.7
Sun SE M9000 2.4 64 114.9 114.6 130.0 134.4
IBM p5-575 2.2 8 77.9 81.2 96.7 100.5
Sun SE M8000 2.4 32 60.3 60.2 69.3 69.6
Sun SE M5000 2.15 16 24.8 24.8 25.2 25.3
Sun SE M4000 2.15 8 12.6 12.5 12.7 12.7

Benchmark Description

The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) for simple vector kernels. All memory accesses are sequential, so a picture of how fast regular data can be moved through the system is portrayed. Properly run, the benchmark displays the characteristics of the memory system of the machine and not the advantages of running from the systems memory caches.

STREAM counts how many bytes that were read plus how many bytes that were written. For the simple "Copy" kernel, this is exactly twice the number obtained from the "bcopy" convention. STREAM does this because three of the four kernels do arithmetic, so it makes sense to count both the data read into the CPU and the data written back from the CPU. The "Copy" kernel does no arithmetic, but for consistency, counts bytes the same way as the other three.

The sequential nature of the memory references is the benchmark's biggest weakness. The benchmark does not expose limitations in a system's interconnect to move data from anywhere in the system to anywhere.

Disclosure Statement:

Stream is a publically available benchmark and can be found at http://www.cs.virginia.edu/stream. Results as of 4/13/07.

System Configuration

Systems under test:

  • Sun SPARC Enterprise M9000
  • 64 x 2.4GHz SPARC64 VI processors
  • 1TB memory
  • Solaris 10
  • Sun Studio 12

Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

Post a Comment:
Comments are closed for this entry.