BM Seer Unofficial thoughts from an anonymous Sun employee

IBM power6 chip already 3rd fastest

Friday Jun 15, 2007

IBM statement no longer true. IBM press release proudly stated 3 weeks ago that "IBM Unleashes World's Fastest Chip in Powerful New Computer". Now the 1.4 GHz UltraSPARC T1 chip is 10% faster than the IBM POWER6 chip.

The IBM p570 Power6 4.7GHz 1 chip/2core/4threads had a score of 88,089 SPECjbb2005 bops. It is also very clear that you can not compare systems performance on a per core basis. You'd have to do it by system price, go ahead and price out a 4RU p570 and a 2RU T2000, or check the wattage. Also as you can see IBM's faster GHz and fewer very expensive cores end up delivering slower system performance.

SPECjbb2005 (ordered by perf, bops : SPECjbb2005 Business Operations per Second, bigger is better)

System Date Processors Performance
(Chips, Cores, Threads) GHz / Type SPECjbb 2005
bops
JVMs SPECjbb 2005
bops/JVM
Dell PowerEdge 860 1/07 (1, 4, 4) 2.4 Xeon 112092 1 112092
Sun Blade T6300 6/07 (1, 8, 32) 1.4 US-T1 96523 4 24121
IBM p570 6/07 (1, 2, 4) 4.7 POWER6 88089 1 88089
Fujitsu TX150 6/07 (1, 2, 2) 2.66 Xeon 70324 1 70324
Dell PowerEdge 840 10/06 (1, 2, 2) 2.66 Xeon 52002 1 52002
Fujitsu RX100 10/06 (1, 2, 2) 2.4 Xeon 49892 1 49892

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Disclosure Statement:

SPECjbb2005 Sun Fire T6300 (1 chip, 8 cores) 96523 SPECjbb2005 bops, 24131 SPECjbb2005 bops/JVM, IBM p570 (1 chip, 2 cores) 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM, Fujitsu TX150 (1 chip, 2 cores) 70324 SPECjbb2005 bops, 70324 SPECjbb2005 bops/JVM, Dell PowerEdge 860 (1 chip, 4 cores) 112092 SPECjbb2005 bops, 112092 SPECjbb2005 bops/JVM, Dell PowerEdge 840 (1 chip, 2 cores) 52052 SPECjbb2005 bops, 52052 SPECjbb2005 bops/JVM, Fujitsu RX100 (1 chip, 2 cores) 49892 SPECjbb2005 bops, 49892 SPECjbb2005 bops/JVM. IBM p570 Power6 (1 chip, 2 cores, 4 threads) 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM. SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 6/14/2007 on http://www.spec.org.
Results
Sun Blade T6300: 96523 SPECjbb2005 bops
24131 SPECjbb2005 bops/JVM
Reference Date: June 6, 2007
Systems: Sun Blade T6300, 32GB
Total Number Processors: 1
Processor/GHz of Server: US-T1 1.4 GHz
Operating System: Solaris 10 8/07
JVM: Java HotSpot(TM) 32-Bit Server, Version 1.6.0_02

See Also

Sun Press Release

[10] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

Not good that the quad 2.4GHz intel beats the T1 1.4GHz :( as theres a quad 3.0GHz out now :(

Also has Solaris 10 update 4 slipped back to August? Really wanted to start using ISCSI of our nas boxes :(

Posted by kangcool on June 15, 2007 at 07:51 AM PDT #

In the table above, the Dell 860 is a 4 core, single chip system. Not 2 core. And it does not do 'hyperthreading', so the theads remain at 4.

All of that said, the Dell 860 starts at $1700 for the config tested vs $41,000 for the Sun. Yes, the dell has less memory, but for an app server, 32GB is overkill (you cant purchase the 1.4GHz chip with less than that).

The sun box also draws less power, but does it pull $38,000 worth of less power? They both take the same rack space (1U dell vs 10 in 10U for the sun blade)

Posted by John on June 15, 2007 at 08:20 AM PDT #

The number of JVM's column leads one wondering what might be the single JVM performance of the T6300, what with the T3600 using four of them and all the other results only using one. Similarly, one is left wondering what the "bitness" (32 vs 64) of the competitive numbers if the T6300 was using 32-bit JVM's.

Posted by rick jones on June 15, 2007 at 03:41 PM PDT #

Rick, you are fishing for a red herring if you want think single JVM performance of the US-T1 is relevant. The Java, and Java apps such as the SPECjbb2005 benchmark workload scale according to thread count, not chip or socket count.

The US-T1 is already running 8 threads per JVM, twice as many as the any other single chip result.

A single JVM result would be like running a single JVM SPECjbb2005 on the 32-thread POWER6 IBM p570. IBM ran 8 JVMs on that particular configuration.

As for JVM bitness, that is available in the full results at spec.org. The Xeons are all running JRocket, which is 64-bit. However, given the relatively small maximum heap sizes configured, and the size of the maximum heap relative to total installed memory, it appears they are effectively operating like a 32-bit JVM, using parallel garbage collection.

In other words, another red herring.

John, I disagree the Dell 860 is a single-chip system. It is a single socket system, but the Intel Clovertown is two Woodcrest chips on a single module, similar to how IBM puts as many as four POWER4 or POWER5 chips on a single module in their large servers. I do not recall IBM announcing POWER4 in 2001 as an 8-core chip, nor did they do that with POWER5.

Personally, if a saw cuts two adjacent cores on a silicon platter apart first, putting them onto the same module later does not make them a chip again. Also, both chips connect separately to the FSB. There is no bus arbitration like on all other multi-core chips. Oh, and Oracle and IBM both charge twice as much for Clovertown software licenses as they do for Woodcrest or Opteron.

I have no doubt when AMD releases its 8-core MCM Montreal processor, Intel will be saying it is not really an 8-core chip. Until Intel releases its 8-core module version of Nehalem.

And speaking of appsevers, Intel Woodcrest and Clovertown do not seem to have ANY entries on the SPECjAppServer 2004 benchmark? I wonder why that is? Maybe disabling the prefetch in the BIOS (something done on all Core2 based SPECjbb results except for AMD's hostile submission) hurts the network performance required in the SPECjAppServer benchmark. Running on default BIOS settings with the prefect enabled lowers Core2 SPECjbb performance by 20%. Oh, and regarding US-T1 prices, you can get 1.0 GHz T1000s starting at $3,995.

kangcool, the 3.0 GHz Clovertown is a desktop only product for now. We probably won't see Xeons exceed 3.0 GHz until Intel's 45nm Penryn Xeons ship.

Posted by Mark on June 15, 2007 at 06:14 PM PDT #

The intel game of using two chips on a single socket vs 4 cores on a single chip is an implementation detail. As the user, I could care less about how they did it, just as long as it performs at a reasonable cost. (At least you updated the core count for the dell. )

As for the main point, yeah, a T1000 with a cost of $3,995 would only be 2x as expensive as the dell. You neglected to factor in the massive drop in speed: an anticipated SPEC result of 68,945 (assuming a perfect match for GHz and performance). So great, sun is willing to sell me something that cost 2x as much and runs almost half as slow? Yeah, the T1 is a great chip for some applications, but you wont last long in my shop with math like that. Overall, the pricing needs to be brought in line with the rest of the market. Simply offering the faster chips in boxes with less than 32G of ram would help. The anniversary special was nice, I picked up 3 T2000's for the cost of 1 during that time. Suddenly it was more than cost competitive.

As for the app server numbers... If network performance suffers as you say, since the dell is almost 2x as fast, even with the 20% network hit I'm ahead of the game thanks to the much slower T1 chip. I get more performance at half the cost. What's my motivation again? oh yeah, bang for the buck.

The argument would hold better if you pointed out the differences in JVM's (jrocket vs the sun jvm used by most app servers). And state that using the same jvm, the sun looks better. Of course, in order to do that I would need to see how the two jvm's compare with each other on the same hardware. Those seem to be lacking for some reason.

As for the oracle licenses, did you really want to open that can of worms? Yeah, they charge 2x as much for the quad intel because they charge per core. An 8 core T1 chip needs a 2 cpu license (0.25 factor). A 4 core x86-64 chip needs 2 licenses while a 4 core one needs 2 (0.5 factor) and a dual processor USIV+ (4 cores) needs 3 licenses (0.75 factor). (I love this new math.) So, as you can see, a 4 core Intel chip would cost the same to license as an 8 core Sun T1 based system, but only 2/3 that of a 4 core Sun IV+ or POWER6. Now the question is this: is the sun/ibm chip faster per core? not from the benchmarks I've seen where the dual proc / 4 core x4200 M2 amd box more than keeps up with a 4 proc / 8 core v490 (see the TPC-H 100GB benchmark, because that db is the size I usually use)

This is why paying attention to per core (not socket) performance can impact things.

Now, you want to talk about benchmarking pet peeves... here is mine: Ever notice how everybody (sun included) only counts a 3 year 'named user plus' oracle license in their TPC benchmarks? Yup, it is because it helps minimize the impact of all of those extra cores on the overall system cost.

In sun's TPC-H benchmark with the E25K, only about $1,000,00 is for the oracle software w/ 3 years of support. Wow, that is cheap, how does sun do it? Simple, the price out the benchmarking special: 3 year named user plus (along with the 3510 disk arrays that they love to benchmark but hate to sell since they are made by DotHill). With 144 cores and 108 oracle licenses needed, a normal business using perpetual processor licenses would pay almost $4,000,000 for that same system (w/ 3 years support and factoring in the standard volume discount).

OMG, that kinda screws with the numbers. Suns TCO numbers suddenly double.... Yes, double. That 64 core system that post better numbers _per core_ and cost more on paper (since the hardware is more expensive) is actually cheaper in reality due to the software license cost (which got sidestepped with the named user license).

Back to a previous post of mine from a few days ago, why do you think we are moving from SPARC to Opteron? Yup, we are trying to avoid having to purchase more oracle licenses. Per core performance matters, since that is how the software is being licensed.

Anyway, that is enough ranting about the fun with benchmarks. Lies, damn lies and benchmarks.

Posted by John on June 15, 2007 at 09:38 PM PDT #

> for the quad intel because they charge per core. You have old info, now only EE edition charge per cores, standart and standart one editions charge per socket on x86 hardware. A 4 core x86-64 chip needs only one standart one licence ($5K)

Posted by Triffids on June 16, 2007 at 04:45 AM PDT #

In addition, for all programs with Standard Edition or Standard Edition One in the program name, Oracle recognizes a socket as equivalent to a processor for the purposes of counting and licensing these programs. The number of cores on a chip in a socket does not matter when determining the number of processor licenses required for these programs.

http://www.oracle.com/corporate/pricing/multicore_faq.pdf

An 8 core T1 chip needs one SE licence the as 4-core Xeon. according to SAP-SD benchmarks 4780 SAPS vs 4400 SAPS

Posted by Triffids on June 16, 2007 at 05:09 AM PDT #

Regarding cores, chips, modules etc., what did those who consider Clovertown to be a four core processor think of HP's mx2 daughtercard solution to fit two Itanium chips into a single socket? This was a daughtercard with an Itanium pinout on the bottom, and two Itanium sockets on top. It added an L4 cache and a bus arbitrator. Is this a chip? Is IBM's large MCM a chip?

I agree cores should count, but I don't think ISVs should penalize a design decision to implement 8 small cores vs. 2 big cores if the two chips produce equivalent application performance. Oracle's core multiplier and IBM's weighted system are a step in the right direction, but I feel Oracle's decision basing SE licensing on sockets is flawed. We may end up with Intel and AMD designing processor modules specifically around Oracle licensing dynamics. For that matter Sun could put two Niagara2 processors onto a module and cram four modules into a server to create a 64 core 512 thread Oracle SE system. But it is nuts do design a system for a particular applications license requirement, rather than many applications performance requirements.

On my point about SPECjAppServer, I stand corrected. A Chinese OEM has released a SPECjAppServer2004 benchmark using BEA WebLogic on Woodcrest chips, and the performance is very good.

Posted by Mark on June 16, 2007 at 09:36 AM PDT #

Ok, just so it is all in one place, from http://store.oracle.com, oracle prices are as follows (all list):
Standard Edition One: $5k per socket (limit 2 sockets, no RAC)
Standard Edition: $15K per socket (limit 4 sockets, rac included. If using rac you may not exceed 4 sockets in the total cluster)
Enterprise Edition: $40K per processor (with multicore multiplier of .25, .5 or .75, depending on the type of cpu)

Limits are based on the 'maximum system' capacity. Thus if you have an x4600, you must run EE on that system, even if it only has 2 processor boards on it.

I'll skip over the named user costs since the way they count users makes it unusable for most shops.

I used the Oracle EE edition rather than SE as an example in the core vs socket vs performance of the 'system' vs performance per 'core' because that is what most large corporations run, and that is the software used for all of the TPC benchmarks. It also shows the issue of dealing with software that cost a lot more than the system. If oracle only ran $1k per core, nobody would care since the cost of an e25k with its 144 cores would make the oracle cost a minor fraction of the total system cost. But at 40K per core, it makes the server look cheap. Thus, folks will actually pay more per core or a higher performing core in order to reduce the total number of cores that they need to license. For most folks, it's all about performance per oracle cpu license required.

As for SE: you can't run oracle SE on a system that uses more than 4 sockets (or a rac cluster that has more than 4 sockets total), so it will not scale very far. And some apps require the EE features, thus forcing you to the higher cost product. If SE or SE One works for your environment, then by all means save your money (larry does not need another yacht) and go for it. The licensing will now influence the server that you purchase since you can now go for a higher core count without bumping into the oracle tax.

In a way, I feel bad for oracle: how else do they charge for their product and not create all sorts of loopholes? Per core was about the only way to do it for the time being. They could have brought back the power unit, but that was killed by folks who wanted to upgrade their cpu's and not have to pay oracle for the extra MHz (especially since it was during the day when intel was moving from 500MHz to 2.5GHz in rapid order while at the same time showing that GHz != performance).

On the other hand, I dont feel bad for oracle since their support has gone downhill over the past 6 years (can I please get somebody who speaks english?), the product quality is slipping (some products will not even install due to installer bugs... how did that get past QA?), and the cost is still through the roof. Suddenly postgres is looking really good, if only my app vendors would support it.

There is no easy way to craft a license policy that maintains the high cost but does not result in folks playing games to try and avoid that high cost. Maybe we will see them come out with something similar to the old veritas 'tier' system (which had its own issues. It classified a T2000 as a '1c' along with a v490 that cost 3x as much). Maybe they will just charge 50-100% of the list price of the server. Smaller servers have a higher % (like 100), while bigger boxes get a "volume discount" by using a smaller % (like 50). Base the cutoffs on the total number of cores in the box (1-32 for 100%, 32-128 for 75% and 128+ for 50%)

One thing is clear: the cost numbers tossed out via the TPC benchmarks are worthless, and you have to take the numbers, mold them to fit your environment, and then make the decision. Going with the 'world record price/performance' benchmark result will rarely provide you with the best bang for the buck.

Posted by john on June 16, 2007 at 12:20 PM PDT #

I don't eat herring :) Still, when this blog is linked-to as a "chip to chip" comparison, seems that it is reasonable to ask about the number of JVM's needed per "chip."

Posted by rick jones on June 20, 2007 at 08:54 AM PDT #

Post a Comment:
Comments are closed for this entry.