BM Seer Unofficial thoughts from an anonymous Sun employee

judging by the wrong things: IBM & TPC-C

Tuesday Feb 20, 2007

Is IBM 3.3x or 1.4x faster? - I guess it depends if you use a over-optimised benchmark like TPC-C. As mentioned yesterday, IBM doesn't publish on a variety of standard benchmarks like SPECint_rate2006 or SPECjbb2005 on their high-end systems so we have to look at the SPECint_rate2000 which is just about to be EOL'ed and completely replaced by SPECint_rate2006.

First let's compare an IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) to a HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, single core/CPU) on SPECint_rate2000.

Constructing a SPECint_rate2000 ratio
1.4x = 1513/1108
we find that the IBM 595 is 1.4x faster, it makes sense because this isn't the latest HP dual-core Itanium2. Both IBM and HP systems have results on TPC-C U SPECint_rate2000.

OK now using TPC-C, let's compare a IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) to a HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, single core/CPU).

Constructing a TPC-C ratio
3.3x = 4033378/1231433
what?
comparing the same systems the IBM is 3.3x faster ?!? Looks that TPC-C over-inflates what can be expected from IBM.

My guess is IBM over-optimised and played lots of tuning tricks on TPC-C, correct? So is TPC-C relavent to customers if this is the case?

...maybe that's why seven years ago Sun, upon publishing a world record TPC-C result said:

"It's well-understood in the technical communities that TPC-C no longer represents current customer workloads since the transaction load that its models are made of are small, primitive and disconnected transactions. While this model was acceptable for the workloads of the late 1980s, it misses the mark..."
http://www.sun.com/smi/Press/sunflash/2000-08/sunflash.20000831.1.html

You'll also notice the Aug 2000 press release said, "Customer workloads nowadays require a more ad hoc workload than the TPC-C specifies."

Disclosure Statements

IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) 4,033,378 tpmC, 2.97 US $/tpmC, Avail 01/22/07, IBM DB2 9, IBM AIX 5L V5.3, Microsoft COM+. HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread), 1,231,433 tpmC, 4.82 US $/tpmC, Avail 06/05/06, Microsoft SQL Server 2005 Enterprise Edt SP1, Microsoft Windows Server 2003 Datacenter Ed.(64-bit)SP1. Results as of 2/15/07, see http://www.tpc.org.

IBM System p5 595 (Power5+ 2.3GHz 64p, 128thread), 64 cores, 32 chips, 2 cores/chip (SMT on), 1513 SPECint_rate2000. HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, 16 cells), 64 cores, 64 chips, 1 core/chip, 1108 SPECint_rate2000. SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from http://www.spec.org. as of 2/15/07.

World record TPC-C results referenced above was an overall performance world record at August 31, 2000. Sun Enterprise 10000 server (Starfire) running Sybase Adaptive Server Enterprise (ASE), 156,873.03 tpmC, $48.81 price/tpmC, available February 28, 2001. A full disclosure report and executive summary are available through the TPC Web site located at http://www.tpc.org.

[7] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

Any benchmark ultimately benchmarks the slowest component, which is not always the processor. So what is TPC-C benchmarking?

TPC-C ultimately is a data access latency benchmark, rather than a data processing benchmark. How fast can you get data into the CPU to process it? At one time, the bottleneck for TPC-C was disk access. Then fibre-channel made it possible to connect thousands of disks to a server. Then, especially once 64-bit databases were available, the bottleneck was memory capacity. Now with multiple terabytes of RAM in a system, the latency being measured has come down to the processor cache.

It is interesting the best performing TPC-C processor is POWER5, with a large (36MB) on module L3 cache. The second best performing TPC-C processor is Intel Itanium Montecito, with a large (24MB) on chip L3 cache. The third best performing TPC-C processor is the Intel Xeon 7100, with a large (16MB) on chip L3 cache.

These big, fast caches seem to benefit the TPC-C workload because all of the other bottlenecks have been eliminated for this one particular benchmark workload. But what happens when your workload is random enough that the big, fast cache does not help?

That is the problem with TPC-C today. When you put over 6,500 disk drives behind one server, and fill that server with 2TB of RAM, you are left benchmarking something (cache latency) which could have been benchmarked using lmbench for 1/10th of the cost.

I would guess someone could come up with a mathmatical formula which could estimate TPC-C performance to within 10% just by taking SPECint data, lmbench data, and total memory capacity for a system.

Posted by Mark on February 20, 2007 at 12:13 PM PST #

Regarding TPCC results above, number of hard drives (6760 for IBM and about 1700 for HP) and also RAM size (2TB vs 1TB) is so different that it's almost certain CPU was NOT a limiting factor for Superdome (but likely was for p595).

Posted by Igor on February 20, 2007 at 03:46 PM PST #

There are a lot of factors of course, but take a look at TPC-C results over time and there are things which just look to major software optimisations. Remember TPC-C is a simplistic old workload in terms of database #transactions and #tables and other things. I not buying it... Did IBM 'dhrystone' TPC-C (to create a new verb)

Posted by BM Seer on February 20, 2007 at 10:24 PM PST #

Obviously IBM and Oracle and Microsoft (not so sure about Sybase ASE) do sw optimization work all the time (at least they should - I didn't work for any of them). However, if you'll analyze more or less recent TPCC results I'm almost sure you'll see high correlation between # of hard drives (after multiplying # of 10K rpm drives by some appropriate adjustment factor - like 0.8 or so to convert into equivalent number of 15K rpm drives) and TPCC. As to CPU differences, SPECint_rate2000 may be much less sensitive to cache size than TPCC; and IBM p595 has VERY large L3 cache (128MB per MCM if I'm not mistaken) which would explain relatively higher TPCC results relative to SPECint. If you'll look at published research (by Ailamaki etc) cache stalls seems to be really important performance factor for TPCC-like workloads. I'm also willing to risk $10 (to put my money where my mouth is) that correlation between # of hard drives and TPCC will be way higher than one between TPCC and SPECint_rate2000.

Posted by Igor on February 21, 2007 at 08:34 AM PST #

Your blogging sw repeatedly marked my detailed reply as spam. Such a nice way for blog owner to have a last word in a technical discussions.

Posted by Igor on February 21, 2007 at 08:44 AM PST #

I don't know why, you are registering as spam. I went in and allowed all of the none resubmission comments of course.

Posted by BM Seer on February 21, 2007 at 11:50 AM PST #

Repeatedly previewing a comment will result in the spamfilter kicking in. I think the spamfilter interprets this as a robot trying to guess the math question.

Posted by Mark on February 23, 2007 at 11:28 AM PST #

Post a Comment:
Comments are closed for this entry.