Top 10 Efficiency Thoughts
When the Top500 list came out last week, internally we were also having a discussion regarding the "efficiency" of the system. How much of the theoretical peak performance of a system does the Linpack application get. It gets more complicated with the new Intel Nehalem based systems, where the CPU's can go into Turbo Mode. To obtain the theoretical peak, we have been using the following formula: Ghz x flops/cycle x number of core/socket x number of sockets/blade x number of blades.
For example, a fully loaded Sun 6048 system with the Sun Blade X6275 would get:
2.93 Ghz x 4 flops/cycle x 4 cores/socket x 4 sockets/blade x 48 blades = 9.0 TeraFlops. As an example, when running HPL on this rack, if we obtained 8.0 Teraflops, then the efficiency would be 8.0/9.0 = 88 %.
However, what do we use, if the clock rate, in the Intel 5500 Series processors clocks up to 3.2 Ghz for part of the run ? What becomes the theoretical, peak performance ? Which should be used, the 2.93 Ghz, or the 3.2 Ghz ? Thoughts ?
Below, I have extracted a subset of the Top500 listing from June, 2009. It is just the Top 10 list. I have also added a new column, "H", which just divides Rmax by Rpeak . I have only included the top 10 entries. Notice how # 10, the Sun/Bul l system at Juelich is the most efficient in this benchmark. It is a combination of the system, and the interconnect. Although some other systems with less performance might show a higher efficiency, this is really exciting that we got almost 90 % on such a large system, in the top 10.
