BM Seer Facts & Questions from an Anonymous Sun Source

SAS Extract, Transform, and Load Sun Fire E25K UltraSPARC IV+ 1.95 GHz

Wednesday Apr 18, 2007

The Sun Fire E25K running Solaris 10 11/06 and configured with Sun StorageTek 6140 arrays utilizing Sun StorageTek QFS 4.5 achieved multiple World Records on the SAS Extract, Transform, and Load (ETL) benchmarks. The EEC Enterprise Data Integration Test Suite is an application that performs large scale data integration operations for data warehousing.

  • A combination of a Sun E25K (72 1.95 GHz US-IV+) and 20 ST6140 storage arrays achieved World Record throughput of 5.9 TB per hour for the Bulkload with Data Validation into Text.

    The Sun Fire E25K (32 1.95 GHz US-IV+) delivered a throughput of 3.02 TB/hr, which is 61% faster than recently published results by HP on a similar test using an Integrity Superdome server (64 1.6 GHz Intel Itanium2). When loading into a relational data store instead of text, the 32-way Sun Fire E25K was 74% faster than the HP Integrity Superdome on a similar test.

  • SAS also raised the data integration benchmark standard by increasing the workload complexity and loaded the same data into a full star schema data model in a relational data store while performing data validation, integrity constraints, dimension table builds, dimension lookups, and index generation.
  • Primary data transformations used in this test were Lookup, SQL Join, File Reader, Loop, User Written Code and Table Loader.

    During this significantly complex task SAS sustained a data load rate of 1.93 TB/hour on the 72 processor Sun server configuration. This is also a new World Record for this level of complexity and data volume.

  • The Sun Fire E25K (72 1.95 GHz US-IV+) and 20 Sun StorageTek ST6140 arrays showed a performance improvement of 7% for the Bulk Load with Data Validation to Text over the previous Sun Fire E25K (72 1.8GHz US-IV+) and 20 Sun StorageTek ST6140 arrays.
  • The Sun Fire E25K (72 1.95 GHz US-IV+) and 20 Sun StorageTek ST6140 arrays showed a performance improvement of 5% for the Bulkload to Text over the previous Sun Fire E25K (72 1.8GHz US-IV+ ) and 20 Sun StorageTek ST6140 arrays.
  • The benchmark tests also highlighted new technology from SAS, including SAS Data Integration Studio 3.4, SAS Scalable Performance Data Server, SAS Grid Server, and SAS 9.13, as well as the effectiveness of SAS on the Sun QFS file system.
  • When performance and execution matter - SAS chooses Sun: These benchmark results represent significant engineering effort, collaboration and coordination between SAS and Sun. The results also illustrate the commitment of the two companies to provide the best solutions for the most demanding data integration requirements.

    Performance Comparison

    Load to Dataset

    1 Million Customer Table
    System Processors Bulk
    load
    Full Star
    Schema Build
    with Data
    Validation
    Bulkload with
    Data
    Validation
    Type & GHz Chips, Cores
    Sun Fire E25K US-IV+ 1.95 72,144 5.97 TB/hr   5.90 TB/hr
    Sun Fire E25K US-IV+ 1.95 32,64     3.02 TB/hr
    Sun Fire E25K US-IV+ 1.8 72,144 5.68 TB/hr   5.53 TB/hr
    Sun Fire E25K US-IV+ 1.8 32,64   1.54 TB/hr 2.80 TB/hr
    HP Integrity Superdome Itanium2 1.6 64,64     1.88 TB/hr
    Sun Fire E25K US-IV+ 1.5 48,96 3.9 TB/hr 1.5 TB/hr 3.0 TB/hr


    Load to Relational Data Store

    1 Million Customer Table
    System Processors Bulk
    load
    Full Star Schema Build with Data Validation + Bulkload with Data Validation Full Star Schema Build with Data Validation ++
    Type & GHz Chips, Cores
    Sun Fire E25K US-IV+ 1.95 72,144     4.44 TB/hr 1.93 TB/hr
    Sun Fire E25K US-IV+ 1.95 32,64     2.32 TB/hr  
    Sun Fire E25K US-IV+ 1.8 72,144 4.23 TB/hr 2.64 TB/hr 4.15 TB/hr 1.86 TB/hr
    Sun Fire E25K US-IV+ 1.8 32,64   1.31 TB/hr 2.14 TB/hr  
    HP Integrity Superdome Itanium2 1.6 64,64     1.33 TB/hr  
    100 Million Customer Table
    Sun Fire E25K US-IV+ 1.95 24,48       590 GB/hr

    +     PLUS integrity constraints, dimension table builds, and dimension lookups

    ++   PLUS integrity constraints, dimension table builds, dimension lookups, and index generation

    Benchmark Description

    The ETL (Extract, Transform and Load) benchmark reads in a multi terabyte data set, performs data transformation, and loads it into SAS Intelligent Storage (datasets or SPDS), simulating an operation typical of large scale data integration operations for data warehousing. Complexity can be increased by adding in data validation, Star Schema Builds (with dimension table builds and lookups), as well as index creation.

    Disclosure Statement:

    SAS ETL Sun Fire E25K 5.90 TB/hr, 72 1.95 GHz US-IV+, Sun StorageTek 6140 Array, Solaris 10 11/06, Sun StorageTek QFS 4.5, SAS Enterprise Data Integration Server 9.1.3 SP4. Results as of 04/09/2007. More info www.sas.com.

    Press Releases

  • SAS Press Release on sas.com
  • SAS Press Release on businesswire.com
  • Informatica/HP Press Release

    Results Summary

    Performance: 5.97 TB/hr Bulkload to datasets
      5.90 TB/hr Bulkload to datasets with Data Validation
      4.44 TB/hr Bulkload to Relational Data Store with Data Validation
      1.93 TB/hr Full Star Schema Build to Relational Data Store with Data Validation and Index Generation
      * TB/hr metric is derived from the input data size, not the total IO bandwidth being used, which is over 2x larger the reported TB/hr result.
    Server: Sun Fire E25K, 72 1.95 GHz US-IV+, 288GB memory.
    Storage: 20 Sun StorageTek 6140s, 146GB 15K RPM, 16 Drives per array, 320 total spindles
    Operating system: Solaris 10 11/06
    SAS S/W: SAS Enterprise Data Integration Server 9.1.3 SP4
    Filesystem: Sun StorageTek QFS 4.5
    Processors: 72 UltraSPARC IV+ 1.95 GHz

    Like this post? del.icio.us | furl | slashdot | technorati | digg
  • Funny math to the core

    Monday Apr 09, 2007

    IBM thinks it is about the core count or performance per core. Get real. It is about the whole system. You can do the math based on the info in the TPC-H submissions below...
    Sun: $4,207,126 /144 core = ?
    IBM: $5,358,874 /64 core = ?

    It is clear to see that IBM's cores each cost more than 2.5 times more than Sun's cores. Before you get too confused with 'rotten-to-the-core-math', just remember this. The IBM system costs more and the IBM system is a slower on the TPC-H benchmark. http://blogs.sun.com/bmseer/entry/database_world_record_sun_us.

    • The Sun Fire E25K 1.8GHz outperformed the IBM p5-595 (Power5+) by 14% and also had 31% better price/performance. Also beat the p595 by 26% on the multi-user test (Throughput).
    • The Sun Fire E25K beat the HP Integrity Superdome (Itanium2) by 60% on performance and 34% on price/performance. Sun also beat the Itanium2 Superdome by 72% for the multi-user test (Throughput).
    • Last week Sun announced Sun Fire E25K systems with 1.95GHz processors.

    TPC-H Disclosure Statement:

    Sun Fire E25K 114,713.7 QphH@3000GB, $36.68/QphH@3000GB, avail 04/09/07, HP BladeSystem ProLiant BL25p cluster 64p DC 110,576.5 QohH@3000GB, $37.80/QphH@3000GB avail 06/08/06, Sun Fire E25K 105430.9 QphH@3000GB, $54.87/QphH@3000GB, avail 01/23/06, IBM eServer p5 595 100,512.3 QphH@3000GB, $53.32/QphH@3000GB, avail 03/01/06, HP Integrity Superdome 71,847.8 QphH@3000GB, $55.79/QphH@3000GB, avail 01/18/06, Sun Fire E25K 59,435.7 QphH@3000GB, $100.66/QphH@3000GB, avail 07/27/05, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Database World record Sun US-IV+ beats IBM power5+ again

    Monday Apr 09, 2007

    World Record Performance and World Record Single-System Price/Performance: The Sun Fire E25K (UltraSPARC IV+), Sun StorEdge 6140 Arrays, and running Solaris 10 combined with Oracle 10g achieved World Record TPC-H performance of 114,713.7 QphH@3000GB and World Record price/performance of $36.68/QphH@3000GB for non- clustered systems. The Sun Fire E25K had the best price/performance of the top six performing systems.

    ...and remember this was done with 1.8GHz US-IV+, last week Sun announced 1.95GHz and 2.1GHz, see previous blog postings for results on those processors. The future holds more interesting postings, keep checking back...

    • The Sun Fire E25K outperformed the IBM p5-595 (Power5+) by 14% and also had 31% better price/performance. Also beat the p595 by 26% on the multi-user test (Throughput).
    • The Sun Fire E25K beat the HP Integrity Superdome (Itanium2) by 60% on performance and 34% on price/performance. Sun also beat the Itanic Superdome by 72% for the multi-user test (Throughput).
    • The Sun Fire E25K configured with Sun StorEdge 6140 arrays delivered huge IO performance of over 21 GB/sec which is made possible by a delivered Memory Bandwidth of 62 GB/sec.
    • The TPC-H result demonstrates that the Sun Fire E25K can handle the increasingly large databases required of DSS systems. The Sun Fire E25K delivered more than 18 GB/sec of real delivered IO throughput with Oracle 10g.
    • This result demonstrates effectiveness of Solaris 10 running Oracle 10g. Oracle has chosen Solaris 10 as its preferred Open Source 64-bit Development and Deployment environment. There was hardly any OS tuning needed. The /etc/system and /etc/project file has a basic set of parameters for a large system.

    TPC-H @3000GB Performance Chart (QphH = the Composite Metric, bigger is better)

    $/QphH = Price/Performance metric (smaller is better)
    QppH = Power Numerical Quantity
    QthH = Throughput Numerical Quantity

    System Composite
    (QphH)
    3 Year Total
    System Cost
    $/perf
    ($/QphH)
    Power
    (QppH)
    Thruput
    (QthH)
     
    #Proc
    Disk
    GB
    Sun Fire E25K 114,713.7 $4,207,126 $36.68 136,798.4 96,194.3 72 63.3 TB
    HP Proliant BL25p 110,576.5 $4,179,238 $37.80 116,379.3 105,063.0 64 69.6 TB
    Sun Fire E25K 105,430.9 $5,784,902 $54.87 121,805.8 91,257.4 72 94.8 TB
    IBM p5 595 100,512.3 $5,358,874 $53.32 132,598.2 76,190.5 64 37.7 TB
    HP Integrity Superdome 71,847.8 $4,008,065 $55.79 92,335.6 55,905.9 64 40.6 TB
    Sun Fire E25K 59,435.7 $5,982,737 $100.66 73,686.8 59,435.7 72 84.4 TB
    IBM xSeries 346 54,465.9 $1,761,686 $32.34 90,854.7 32,651.4 64 25.6 TB
    HP Integrity Superdome 30,956.6 $2,326,457 $75.16 41,779.5 22,937.4 32 19.6 TB


    System  
    Procs
     
    Cluster
    Proc
    GHz
    Proc Type OS Database RDBMS+HW
    Avail
    Sun Fire E25K 72 N 1.8 UltraSPARC IV+ Solaris 10 Oracle 10g 04/09/2007
    HP ProLiant BL25p 64 Y 2.6 AMD Opteron 285 Red Hat Enterprise Linux 4 Oracle 10g 06/08/2006
    Sun Fire E25K 72 N 1.5 UltraSPARC IV+ Solaris 10 Oracle 10g 01/27/2006
    IBM p5 595 64 N 1.9 POWER 5 AIX 5L V5.3 Oracle 10g 03/01/2006
    HP Integrity Superdome 64 N 1.6 Itanium2 HP-UX 11.i V2 Oracle 10g 01/18/2006
    Sun Fire E25K 72 N 1.2 UltraSPARC IV Solaris 10 Oracle 10g 07/27/2005
    IBM xSeries 346 64 Y 3.6 Intel Xeon Suse Linux DB2 UDB 8.2 08/15/2005
    HP Integrity Superdome 32 N 1.6 Itanium2 Windows Server 2003 Microsoft SQL Server 05/05/2006

    Benchmark Description

    The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

    TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

    The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

    Disclosure Statement:

    Sun Fire E25K 114,713.7 QphH@3000GB, $36.68/QphH@3000GB, avail 04/09/07, HP BladeSystem ProLiant BL25p cluster 64p DC 110,576.5 QohH@3000GB, $37.80/QphH@3000GB avail 06/08/06, Sun Fire E25K 105430.9 QphH@3000GB, $54.87/QphH@3000GB, avail 01/23/06, IBM eServer p5 595 100,512.3 QphH@3000GB, $53.32/QphH@3000GB, avail 03/01/06, HP Integrity Superdome 71,847.8 QphH@3000GB, $55.79/QphH@3000GB, avail 01/18/06, Sun Fire E25K 59,435.7 QphH@3000GB, $100.66/QphH@3000GB, avail 07/27/05, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

    See Also:

    Oracle Press Release (oracle.com)

    Oracle Press Release (yahoo.com)

    Ideas International Benchmark page

    Result details

  • Audited Results
  • DB Size:
  • 3000 GB (Scale Factor 3000)
  • Composite:
  • 114,713.7 QphH@3000GB
  • $/perf:
  • $36.68/QphH@3000GB
  • Available:
  • April 9, 2007
  • System:
  • One Sun Fire E25K
  • Processors:
  • 72 UltraSPARC IV+ 1.8 GHz / 2MB L2 Cache, 32 MB L3 Cache
  • Storage:
  • 63.3 Terabytes of disk
  • Database:
  • Oracle Database 10g Enterprise Edition Release 2 with Partitioning & Automatic Storage Management
  • OS:
  • Solaris 10 Update 3
  • Total 3 year Cost:
  • $4,207,126
  • Other Metrics
  • TPC-H Power:
  • 136,798.4
  • Throughput:
  • 96,194.3
  • DB Load Time:
  • 4 hours 52 minutes

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Promises, promises & IBM

    Thursday Feb 15, 2007

    IBM POWER6 info in a CNET article.

    The say, "The first Power6 systems, lower-end models, are due to arrive midway through 2007."

    So in the mean time will IBM start publishing the benchmarks they've avoided on IBM p5 595 POWER5+ any time soon? Or is it just too embarrassing to show SPECjbb2005, SPECint_rate2006, etc. results compared to Sun 1.8GHz US-IV+ systems?

    when do the high-end power6 systems start to show? ...late 2007 or 2008?

    [4] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg