BM Seer Unofficial thoughts from an anonymous Sun employee

UltraSPARC T2, and Old UltraSPARC T1 world records & new Xeon's

Thursday Aug 09, 2007

Postscript:

Be careful when comparing performance results, as an example look at a comment in yesterday's "Can I use 64 threads in a chip?" posting. At least this comment pointed out that you can use 4-8 threads in 2 chip Intel-based systems, but it was really trying to be a stab at UltraSPARC Performance. Here was the comment: One really needs to look at the complete data on those .pdf's to make a fair comparison (also in the disclosure statement below).

  • First: The T2000 SAP-SD used a 1.2GHz UltraSPARC T1, Sun now ships faster 1.4GHz UltraSPARC T1, and has announced 1.4GHz UltraSPARC T2. The 1.4GHz T2 has double the threads of that 1.4GHz (double the computational power).
  • Second: The T2000 SAP-SD result was submitted in Dec 2005, at that time it was near the performance of the expensive 4-way POWER5 IBM p550.
  • Third: The 2-chip Dual-core Xeon SAP-SD result above was submitted 18 months after the T2000 SAP-SD result.
  • Fourth: Different versions of the benchmark. The 2-chip Dual-core Xeon was run with ECC 6.0 (not SAP 5.0). The a newer version of the benchmark takes more computational work to produce the same results. Dual-core SAP-SD result was also run with Solaris 10 on Xeon, how cool is that!
  • Fifth: The 2-chip quad-core Xeon SAP-SD result above was submitted 19 months after the T2000 SAP-SD result.
  • Sixth: The Sun result used open-source MySQL MaxDB database, how cool is that! The Xeon results used Oracle or MicroSoft SQL Server.
      postscript:
      Sun latter used Oracle, others suggested US T1 has some sort of silly database limitation - NOT TRUE!

    You'll see more results soon.

    Triffids, as a reminder if you work for a partner company of SAP you must put the following disclosures when you post results. If you are not they you don't need to put this in, but as you can see the data in it would have allowed you to make a better comparison of systems. Don't worry I'm not asking you to identify yourself at all.

    Disclosure Statement:

    Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark Sun Fire T2000 (1-way, 1 proc, 8 cores, 32 threads) 1x 1.2 GHz UltraSPARC T1, 32 GB mem, 950 SD benchmark users, 1.91 sec avg response time, Cert#2005047., MaxDB 7.5 database, Solaris 10; Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark IBM System eServer p5 550 (4-way, 4 procs, 4 cores, 8 threads) 4x 1.9 GHz POWER5+, 32GB mem, 1,000 SD benchmark users, 1.97s avg resp time, Cert#2005040, IBM DB2 Universal Database 8.2.2, SuSE Linux Enterprise Server 9; Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model BFi20 S2 (2 procs, 4 cores, 4 threads) 2x Intel Xeon 5160, 3.0 GHz, 16GB mem, 1,020 SD benchmark users, 1.94s avg resp time, Cert#2007031, Oracle 10g, Solaris 10; Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model TX300 S3 (2 procs, 8 cores, 8 threads) 4x Quad-Core Intel Xeon Processor X5355 2.66 GHz, 32GB mem, 1865 SD benchmark users, 1.99s avg resp time, Cert#2007025, SQL Server 2005, Windows Server 2003 Enterprise Edition; SAP, R/3, mySAP reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark.

    I edited in:
    2 processors into Quad-Core Intel Xeon Processor X5355 2.66 GHz

    ...and..

    32 threads to the Sun Fire T2000, 1 processor / 8 cores ...in order to make the comparisons more consistent.

  • Comments:

    >First
    Intel already ships X5355 2.66 GHz and has announced 80-core chip :)

    >Second
    SUNs minus: Intel and AMD get 2x as much performance bonus in this 20 months, and SUN only +200Mhz to T1 chip ...

    >Third+Fifth
    Something has been changed in T2000 architecture in this 18-19 months ? I don't think +200Mhz to T1 chip or newer i/o subsystem can compensate 2x performance penalty.

    >Fourth
    Agreed.

    >Sixth
    And why they doing this ? Why they use Unicode and MAXDB in SAP benchmarks, Sybase IQ in TPC-H ?
    My version - they can't produce performance record on "traditional" software and try compete in TCO.

    P.S. I'm not partner company of SAP, Oracle, IBM, HP, etc

    Posted by Triffids on August 09, 2007 at 01:16 PM PDT #

    It strikes me that benchmarks for computers are important but typically very misleading.

    The "hottest" issue in large scale computing is the effective use of electricity.

    Intel pushed the clock to greater and greater speeds and effectively forced
    their competitors to compete on a single identifiable metric: clock speed. Along the way many other chip fabs just switched off commodity computing chips to other chip markets.

    And Intel got Apple to re-consider PowerPC and sign up. Sun added AMD to it's product suite and recently added Intel chips as well to it's design portfolio.

    But, some clever guys at Sun also considered a difference design perspective: where's the bottleneck for work done per watt of power.

    If the throughput mismatch between CPU -> memory - I/O (disk and network) is 1 to 10 to 100 then focus on concurrency where the greatest bottleneck exists: the CPU consumes instructions at 3+ GHz and the memory sends more instructions while the CPU's is idle... you pay the same energy bill for idle cycles that you do for work cycles.

    The T1 and T2 seek to integrate complete system designs (CPU's (cores), memory access (Level 2 crossbar), I/O (PCI-E), Network (GigE) and context switching for processes (threads) into a single piece of silicon.

    If these test is to get from LA to New York for one person: the fastest running engine wins: the Porsche for example.

    If the test is to move 500 people from LA to New York in one vehicle then the Greyhound bus will win.

    To read a benchmark carefully you need to determine if they win by using 500 Porches or multiple trips with Greyhound buses. Tradeoffs are everything for applying benchmarks to your situation. Time, money and the laws of physics all apply here.

    The T2 is like a Greyhound in a single chip. Does it surprize you that Intel has pre-announced a chip with many cores? No. There's a market for
    18 wheel trucks too... it's just not the same market the new Corvette is targeting. It's actually a larger market because the intent is not the thrill of speed... it's the efficiency of business operations.

    The T2 will open many eyes to ways to lower power costs and get more work done.

    I recently stood behind a rack of Intel dual core servers and the heat out the back was like a hair dryer... 24x7. But the owner wasn't drying hair. Just creating heat running an email complex.

    A rack of T2's blows air like a room fan... just a few degrees over the air coming in. But the effective work done by the two racks is comparable. The power and cooling bill is NOT.

    Benchmarks can indicate these benefits as well. But you need to look at what your are trying to measure.

    Work done per watt is like "ton of cargo moved per gallon". It's useful to consider if you run a shipping company. Just what exactly is the world of internet commerce shipping if not bits over the wire. Not bits moved in the truck.

    Just my thoughts on reaching the people that don't get why the T2 is important and why design still has multiple criteria beyond the performance of desktop games.

    I would like a Porsche for the weekends but not for getting boxes moved.

    Posted by McD on August 09, 2007 at 05:09 PM PDT #

    Triffid, Glad you agree that these Xeon results were 18-19months after UltraSPARC T1 and not a good comparison. I think you will be shocked
    how UltraSPARC T2 will beat X64 systems head-to-head current-to-current.
    You may have to re-write your preconceptions. Though it was funny
    that it took TWO dual-core chips 18 months latter to beat a ONE chip T1.
    ok TWO quad-cores 19 months latter is twice the performance of ONE chip T1.

    The UltraSPARC T2 has twice the computation threads of the UltraSPARC T2. Yes it will be fun to see system comparisons and how UltraSPARC does
    in the marketplace, but given the recent announcement Sun has the world's fastest chip and the great power/perf and that fact that major customers (who can buy any chip) are buying racks of UltraSPARC T1, I think others {besides you of course :) } will be buying lots of UltraSPARC T2 to improve their datacenters.

    Just like the UltraSPARC T1 announcement it will take a long time for
    competitors to try to respond. In fact here is a blog entry that was
    6 months after US T1 1.2GHz before X64 or POWER6 could respond.
    http://blogs.sun.com/bmseer/entry/t2000_has_6_months_of

    I think at that point the industry couldn't believe that CMT really worked. Now many big customers are buying racks and racks of UltraSPARC T1 because of the perf/watt leadership over X64. Remember how everyone
    tried to hide the 500watts draw by large memory Xeon systems?
    http://blogs.sun.com/bmseer/entry/obscursed_woodcrest_wattage
    remember that quoting chip wattage or nominal watts don't help datacenter managers?
    http://blogs.sun.com/bmseer/entry/watts_a_matter_with_their

    As a final point the UltraSPARC T1/T2 have no problem with any kind of software, results have been posted on a variety of benchmarks with Oracle, DB2, MySQL, Postgres, MySQL MaxDB, variety of web servers, app servers... The UltraSPARC T1 can run any commercial software and get great performance (Web, Appl, and DB tiers). The UltraSPARC T2 adds industry-leading floating-point performance.

    Posted by BM Seer on August 10, 2007 at 08:53 AM PDT #

    >ok TWO quad-cores 19 months latter is twice the performance of ONE chip T1.

    NO :) Already 19 months cheap x86 servers is twice ...
    Let's open sap-sd benchmarks:
    1 month later 4 chips / 8 cores XEON get 5330 SAPS - it's already faster than T2000 and much cheaper then $32K for T2000.
    2 months later 2 Opteron chips / 4 core (PRIMERGY Model RX220) get 4370 SAPS. So on x86 server with 16Gb RAM witch cost less than $12K I can get the same/near power than T2000 for $32K.

    In day then T2000 arrive in stores customers was available to get 3 SF v20z servers to build 2.5-3 times faster system on the same $32K budget ...

    >I think at that point the industry couldn't believe that CMT really worked.
    Of course industry couldn't believe, industry see prices, SAPS and many other benchmarks ;)
    http://www.anandtech.com/IT/showdoc.aspx?i=2727&p=7
    http://tweakers.net/reviews/649/8

    >You may have to re-write your preconceptions.
    I don't think that T2 chip has any chance versus quard-core AMD Barselona. Quard-core AMD servers will be available next months. T2 has to be more than 4 times faster than T1, it's impossible ..

    Posted by Triffids on August 10, 2007 at 12:03 PM PDT #

    We'll see...

    but customers do see the value of UltraSPARC T1 today! ...and they
    can buy anything like you can and they do carefully watch the bottomline. Remember memory costs are really starting to dominate so you have to
    look at equally sized systems:
    http://www.sun.com/customers/servers/betfair_eco.xml

    “Sun’s CoolThreads technology generates more CPU cycles while using less power,” says Devine. “Nothing like it has existed before. We replaced racks and racks of Dell servers with Sun Fire T1000 and T2000 servers. The result is more computing power in the same physical space and significantly less power—a double win for us.” Thanks largely to the savings of the Sun eco responsible servers, Betfair has achieved 200% usage growth - all within the same datacenter space.

    Posted by BM Seer on August 13, 2007 at 10:08 AM PDT #

    >reducing storage power and cooling costs by 60%
    OK, they saved $5 on power cost, but loose more than 400% on Hardware cost, because of "memory costs are really starting to dominate" ;)

    IMHO SUN must publish TPC-E/SAP-SD benchmarks with oracle/db2 to proof that T2 is suitable on rdbms tasks ...

    Posted by Triffids on August 16, 2007 at 01:18 AM PDT #

    I'm still waiting for Oracle or DB2 to publish on TPC-E, where are those guys? Where are those Dell or HP results? Also where is that
    POWER6 result on TPC-E at 4 core, 8 core or 16 core? They seem to have no problem publishing the over-optimized TPC-C...

    If all of them have systems now they should be publishing.

    Posted by BM Seer on August 16, 2007 at 08:19 AM PDT #

    Ugu, there is SUN+Psotgres ? :)
    I think problem is in DB2, check first 2 results, they use IL Snapshot, not native for mssql IL Serializable. I think DB2 doesn't have MVCC and can't show great result ...

    Posted by Triffids on August 16, 2007 at 08:39 AM PDT #

    Post a Comment:
    Comments are closed for this entry.