BM Seer Unofficial thoughts from an anonymous Sun employee

SAS Extract, Transform, and Load Sun Fire E25K UltraSPARC IV+ 1.95 GHz

Wednesday Apr 18, 2007

The Sun Fire E25K running Solaris 10 11/06 and configured with Sun StorageTek 6140 arrays utilizing Sun StorageTek QFS 4.5 achieved multiple World Records on the SAS Extract, Transform, and Load (ETL) benchmarks. The EEC Enterprise Data Integration Test Suite is an application that performs large scale data integration operations for data warehousing.

  • A combination of a Sun E25K (72 1.95 GHz US-IV+) and 20 ST6140 storage arrays achieved World Record throughput of 5.9 TB per hour for the Bulkload with Data Validation into Text.

    The Sun Fire E25K (32 1.95 GHz US-IV+) delivered a throughput of 3.02 TB/hr, which is 61% faster than recently published results by HP on a similar test using an Integrity Superdome server (64 1.6 GHz Intel Itanium2). When loading into a relational data store instead of text, the 32-way Sun Fire E25K was 74% faster than the HP Integrity Superdome on a similar test.

  • SAS also raised the data integration benchmark standard by increasing the workload complexity and loaded the same data into a full star schema data model in a relational data store while performing data validation, integrity constraints, dimension table builds, dimension lookups, and index generation.
  • Primary data transformations used in this test were Lookup, SQL Join, File Reader, Loop, User Written Code and Table Loader.

    During this significantly complex task SAS sustained a data load rate of 1.93 TB/hour on the 72 processor Sun server configuration. This is also a new World Record for this level of complexity and data volume.

  • The Sun Fire E25K (72 1.95 GHz US-IV+) and 20 Sun StorageTek ST6140 arrays showed a performance improvement of 7% for the Bulk Load with Data Validation to Text over the previous Sun Fire E25K (72 1.8GHz US-IV+) and 20 Sun StorageTek ST6140 arrays.
  • The Sun Fire E25K (72 1.95 GHz US-IV+) and 20 Sun StorageTek ST6140 arrays showed a performance improvement of 5% for the Bulkload to Text over the previous Sun Fire E25K (72 1.8GHz US-IV+ ) and 20 Sun StorageTek ST6140 arrays.
  • The benchmark tests also highlighted new technology from SAS, including SAS Data Integration Studio 3.4, SAS Scalable Performance Data Server, SAS Grid Server, and SAS 9.13, as well as the effectiveness of SAS on the Sun QFS file system.
  • When performance and execution matter - SAS chooses Sun: These benchmark results represent significant engineering effort, collaboration and coordination between SAS and Sun. The results also illustrate the commitment of the two companies to provide the best solutions for the most demanding data integration requirements.

    Performance Comparison

    Load to Dataset

    1 Million Customer Table
    System Processors Bulk
    load
    Full Star
    Schema Build
    with Data
    Validation
    Bulkload with
    Data
    Validation
    Type & GHz Chips, Cores
    Sun Fire E25K US-IV+ 1.95 72,144 5.97 TB/hr   5.90 TB/hr
    Sun Fire E25K US-IV+ 1.95 32,64     3.02 TB/hr
    Sun Fire E25K US-IV+ 1.8 72,144 5.68 TB/hr   5.53 TB/hr
    Sun Fire E25K US-IV+ 1.8 32,64   1.54 TB/hr 2.80 TB/hr
    HP Integrity Superdome Itanium2 1.6 64,64     1.88 TB/hr
    Sun Fire E25K US-IV+ 1.5 48,96 3.9 TB/hr 1.5 TB/hr 3.0 TB/hr


    Load to Relational Data Store

    1 Million Customer Table
    System Processors Bulk
    load
    Full Star Schema Build with Data Validation + Bulkload with Data Validation Full Star Schema Build with Data Validation ++
    Type & GHz Chips, Cores
    Sun Fire E25K US-IV+ 1.95 72,144     4.44 TB/hr 1.93 TB/hr
    Sun Fire E25K US-IV+ 1.95 32,64     2.32 TB/hr  
    Sun Fire E25K US-IV+ 1.8 72,144 4.23 TB/hr 2.64 TB/hr 4.15 TB/hr 1.86 TB/hr
    Sun Fire E25K US-IV+ 1.8 32,64   1.31 TB/hr 2.14 TB/hr  
    HP Integrity Superdome Itanium2 1.6 64,64     1.33 TB/hr  
    100 Million Customer Table
    Sun Fire E25K US-IV+ 1.95 24,48       590 GB/hr

    +     PLUS integrity constraints, dimension table builds, and dimension lookups

    ++   PLUS integrity constraints, dimension table builds, dimension lookups, and index generation

    Benchmark Description

    The ETL (Extract, Transform and Load) benchmark reads in a multi terabyte data set, performs data transformation, and loads it into SAS Intelligent Storage (datasets or SPDS), simulating an operation typical of large scale data integration operations for data warehousing. Complexity can be increased by adding in data validation, Star Schema Builds (with dimension table builds and lookups), as well as index creation.

    Disclosure Statement:

    SAS ETL Sun Fire E25K 5.90 TB/hr, 72 1.95 GHz US-IV+, Sun StorageTek 6140 Array, Solaris 10 11/06, Sun StorageTek QFS 4.5, SAS Enterprise Data Integration Server 9.1.3 SP4. Results as of 04/09/2007. More info www.sas.com.

    Press Releases

  • SAS Press Release on sas.com
  • SAS Press Release on businesswire.com
  • Informatica/HP Press Release

    Results Summary

    Performance: 5.97 TB/hr Bulkload to datasets
      5.90 TB/hr Bulkload to datasets with Data Validation
      4.44 TB/hr Bulkload to Relational Data Store with Data Validation
      1.93 TB/hr Full Star Schema Build to Relational Data Store with Data Validation and Index Generation
      * TB/hr metric is derived from the input data size, not the total IO bandwidth being used, which is over 2x larger the reported TB/hr result.
    Server: Sun Fire E25K, 72 1.95 GHz US-IV+, 288GB memory.
    Storage: 20 Sun StorageTek 6140s, 146GB 15K RPM, 16 Drives per array, 320 total spindles
    Operating system: Solaris 10 11/06
    SAS S/W: SAS Enterprise Data Integration Server 9.1.3 SP4
    Filesystem: Sun StorageTek QFS 4.5
    Processors: 72 UltraSPARC IV+ 1.95 GHz

    Like this post? del.icio.us | furl | slashdot | technorati | digg
  • Comments:

    Post a Comment:
    Comments are closed for this entry.