Tuesday Nov 24, 2009

The Sun SPARC Enterprise M9000 server (64 processors, 256 cores, 512 threads) set a World Record on the SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.
  • The Sun SPARC Enterprise M9000 server with 2.88 GHz SPARC64 VII processors achieved 32,000 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 server result is 8.6x faster than the only IBM 5GHz POWER6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any IBM 595 results on the current SAP enhancement package 4 for SAP ERP 6.0 (unicode) Standard Sales and Distribution (SD) Benchmark. This benchmark has been current for almost a year. IBM p595 systems only have 8x more cores than the system than IBM system 550.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • This new result is 1.84x times greater than the previous record result delivered on the Sun SPARC Enterprise M9000 server which used 32 processors.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note 1139642 for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Results (in decreasing performance)

(ERP 6.0 EP is the current version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
64xSPARC 64 VII @2.88GHz
1152 GB
Solaris 10
Oracle10g
32,000 2009
6.0 EP4
(Unicode)
175,600 18-Nov-09
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    32,000
    Average dialog response time:
    0.93 seconds
    Throughput:

    Fully processed order line items/hour:
    3,512,000

    Dialog steps/hour:
    10,536,000

    SAPS:
    175,600
    SAP Certification:
    2009046

Hardware Configuration:

    Sun SPARC Enterprise M9000
      64 x 2.88GHz SPARC64 VII, 1152 GB memory

Software Configuration:

    Solaris 10
    SAP enhancement package 4 for SAP ERP 6.0 (unicode)
    Oracle10g

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmarks as of 11/18/09: Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 32,000 SAP SD Users, 64 x 2.88 GHz SPARC VII, 1152 GB memory, Oracle10g, Solaris10, Cert# 2009046. Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Thursday Nov 19, 2009

The Sun SPARC Enterprise T5240 server running the Sun Java Messaging server 7.2 achieved a World Record SPECmail2009 result using Sun Storage 7310 Unified Storage System and ZFS file system.  Sun's OpenStorage platforms enable another world record.

  • World record SPECmail2009 benchmark using the Sun SPARC Enterprise T5240 server (two 1.6GHz UltraSPARC T2 Plus), Sun Communications Suite 7, Solaris 10, and the Sun Storage 7310 Unified Storage System achieved 14,500 SPECmail_Ent2009 users at 69,857 Sessions/Hour.

  • This SPECmail2009 benchmark result clearly demonstrates that the Sun Messaging Server 7.2, Solaris 10 and ZFS solution can support a large, enterprise level IMAP mail server environment as a low cost 'Sun on Sun' solution, delivering the best performance and maximizing data integrity and availability of Sun Open Storage and ZFS.

  • The Sun SPARC Enterprise T5240 server supported 2.4 times more users with 2.4 times better sessions/hour rate than AppleXserv3 solution on the SPECmail2009 benchmark.

  • There are no IBM Power6 results on this benchmark.

  • The configuration using Sun OpenStorage outperformed all previous results with traditional direct attached storage and significantly higher number of disk devices.

SPECmail2009 Performance Landscape (ordered by performance)

System Performance Disks OS Messaging
Server
Users Sessions/
hour
Sun SPARC Enterprise T5240
2 x 1.6GHz UltraSPARC T2 Plus
14,500 69,857 58
NAS
Solaris 10 CommSuite 7.2
Sun JMS 7.2
Sun SPARC Enterprise T5240
2 x 1.6GHz UltraSPARC T2 Plus
12,000 57,758 80
DAS
Solaris 10 CommSuite 5
Sun JMS 6.3
Sun Fire X4275
2 x 2.93GHz Xeon X5570
8,000 38,348 44
NAS
Solaris 10 Sun JMS 6.2
Apple Xserv3,1
2 x 2.93GHz Xeon X5570
6,000 28,887 82
DAS
MacOS 10.6 Dovecot 1.1.14
apple 0.5
Sun SPARC Enterprise T5220
1 x 1.4GHz UltraSPARC T2
3,600 17,316 52
DAS
Solaris 10 Sun JMS 6.2

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org

Users - SPECmail_Ent2009 Users
Sessions/hour - SPECmail2009 Sessions/hour
NAS - Network Attached Storage
DAS - Direct Attached Storage

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5240
      2 x 1.6 GHz UltraSPARC T2 Plus processors
      128 GB memory
      2 x 146GB, 10K RPM SAS disks, 4 x 32GB SSDs

External Storage:

    2 x Sun Storage 7310 Unified Storage System, each with
      32 GB of memory
      24 x 1 TB 7200 RPM SATA Drives

Software Configuration:

    Solaris 10
    ZFS
    Sun Java Communications Suite 7 Update 2
      Sun Java System Messaging Server 7.2
      Directory Server 6.3

Benchmark Description

The SPECmail2009 benchmark measures the ability of corporate e-mail systems to meet today's demanding e-mail users over fast corporate local area networks (LAN). The SPECmail2009 benchmark simulates corporate mail server workloads that range from 250 to 10,000 or more users, using industry standard SMTP and IMAP4 protocols. This e-mail server benchmark creates client workloads based on a 40,000 user corporation, and uses folder and message MIME structures that include both traditional office documents and a variety of rich media content. The benchmark also adds support for encrypted network connections using industry standard SSL v3.0 and TLS 1.0 technology. SPECmail2009 replaces all versions of SPECmail2008, first released in August 2008. The results from the two benchmarks are not comparable.

Software on one or more client machines generates a benchmark load for a System Under Test (SUT) and measures the SUT response times. A SUT can be a mail server running on a single system or a cluster of systems.

A SPECmail2009 'run' simulates a 100% load level associated with the specific number of users, as defined in the configuration file. The mail server must maintain a specific Quality of Service (QoS) at the 100% load level to produce a valid benchmark result. If the mail server does maintain the specified QoS at the 100% load level, the performance of the mail server is reported as SPECmail_Ent2009 SMTP and IMAP Users at SPECmail2009 Sessions per hour. The SPECmail_Ent2009 users at SPECmail2009 Sessions per Hour metric reflects the unique workload combination for a SPEC IMAP4 user.

Key Points and Best Practices

  • Each Sun Storage 7310 Unified Storage System was configured with one J4400 JBOD array with 22x1TB SATA drives to a mirrored device and 4 shared volumes are built under the mirrored device. Total 8 mirrored volumes from 2 x Sun Storage 7310 are mounted on the system under test (SUT) messaging mail indexes and mail messages file system using NFSV4 protocol. Four SSDs were used as the SUT internal disks. Each SSD is configured as a ZFS file system. Four such ZFS directories are used for the messaging server queue, store metadata, LDAP and queue. SSDs substantially reduced the store metadata and queue latencies.

  • Each Sun Storage 7310 Unified Storage System was connected to the SUT via a dual 10-Gigabit Ethernet Fiber XFP card.

  • The Sun Storage 7310 Unified Storage System software version is 2009.08.11,1-0.

  • The clients used these Java options: java -d64 -Xms4096m -Xmx4096m -XX:+AggressiveHeap

  • Substantial performance improvement and scalability was observed with Sun Communications Suite7 update2, Java Messaging Server 7.2 and Directory Server 6.2

  • See the SPEC Report for all OS, network and messaging server tunings.

See Also

Disclosure Statement

SPEC, SPECmail reg tm of Standard Performance Evaluation Corporation. Results as of 10/22/09 on www.spec.org. SPECmail2009: Sun SPARC Enterprise T5240, SPECmail_Ent2009 14,500 users at 69,857 SPECmail2009 Sessions/hour. Apple Xserv3,1, SPECmail_Ent2009 6,000 users at 28,887 SPECmail2009 Sessions/hour.

Thursday Nov 05, 2009

TPC-C Sun SPARC Enterprise T5440 with Oracle RAC World Record Database Result

Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.

  • The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 12/14/09.

  • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by 26% and has 16% better price/performance on the TPC-C benchmark.

  • The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).

  • The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.

  • The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.

  • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.

  • The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.

  • Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.

  • This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.

More information on this benchmark will be posted in the next several days.

Performance Landscape

TPC-C results (sorted by tpmC, bigger is better)


System
tpmC Price/tpmC Avail Database Cluster Racks w/KtpmC
12 x Sun SPARC Enterprise T5440 7,646,487 2.36 USD 12/14/09 Oracle 11g RAC Y 9 9.6
IBM Power 595 6,085,166 2.81 USD 12/10/08 IBM DB2 9.5 N 76 56.4
HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46 to be added

Avail - Availability date
w/KtmpC - Watts per 1000 tpmC
Racks - clients, servers, storage, infrastructure

Sun and IBM TPC-C Response times


System
tpmC

Response Time

New Order 90th%

Response Time

New Order Average

12 x Sun SPARC Enterprise T5440 7,646,487 0.170 0.168
IBM Power 595 6,085,166 1.69
1.22
Response Time Ratio - Sun Better

9.9x 7.3x

Sun uses 7x comparison to highlight the differences in response times between Sun's solution and IBM.  Although notice that Sun is 10x faster on New Order transactions that finish in the 90% percentile.

It is also interesting to note that none of Sun's response times, avg or 90th percentile, for any transaction is over 0.25 seconds. While IBM does not have even one interactive transaction, not even the menu, below 0.50 seconds. Graphs of Sun's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Results and Configuration Summary

Hardware Configuration:

    9 racks used to hold

    Servers:
      12 x Sun SPARC Enterprise T5440
      4 x 1.6 GHz UltraSPARC T2 Plus
      512 GB memory
      10 GbE network for cluster
    Storage:
      60 x Sun Storage F5100 Flash Array
      61 x Sun Fire X4275, Comstar SAS target emulation
      24 x Sun StorageTek 6140 (16 x 300 GB SAS 15K RPM)
      6 x Sun Storage J4400
      3 x 80-port Brocade FC switches
    Clients:
      24 x Sun Fire X4170, each with
      2 x 2.53 GHz X5540
      48 GB memory

Software Configuration:

    Solaris 10 10/09
    OpenSolaris 6/09 (COMSTAR) for Sun Fire X4275
    Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning
    Tuxedo CFS-R Tier 1
    Sun Web Server 7.0 Update 5

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 12/14/09. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 11/5/09.

Friday Oct 23, 2009

A fantastic source of technical Best Practices is at
http://wikis.sun.com/display/Performance/Home

This wiki hosts the combined wisdom of many performance engineers from across Sun. It has information about Hardware, Software, ZFS, Oracle and other various performance topics.  This wiki attempts to categorize and present information so it is easy to find and use. It is getting started, but please let us know if there are any topics which would be useful.

Thursday Oct 15, 2009

Overview and Significance of Results

Oracle and Sun's Flash Cache technology combines New features in Oracle with the Sun Storage F5100 to improve database performance. In Oracle databases, the System Global Area (SGA) is a group of shared memory areas that are dedicated to an Oracle “instance” (Oracle processes in execution sharing a database) . All Oracle processes use the SGA to hold information. The SGA is used to store incoming data (data and index buffers) and internal control information that is needed by the database. The size of the SGA is limited by the size of the available physical memory.

This benchmark tested and measured the performance of a new Oracle Database 11g (Release2) feature, which allows to extend the SGA size and caching beyond physical memory, to a large flash memory storage device as the Sun Storage F5100 flash array.

One particular benchmark test demonstrated a dramatic performance improvement (almost 5x) using the Oracle Extended SGA feature on flash storage by reaching SGA sizes in the hundreds of GB range, at a more reasonable cost than equivalently sized RAM and with much faster access times than disk I/O.

The workload consisted in a high volume of SQL select transactions accessing a very large table in a typical business oriented OLTP database. To obtain a baseline, throughput and response times were measured applying the workload against a traditional storage configuration and constrained by disk I/O demand (DB working set of about 3x the size of the data cache in the SGA). The workload was then executed with an added Sun Storage F5100 Flash Array configured to contain an Extended SGA of incremental size.

The tests have shown scaling throughput along with increasing Flash Cache size.

Table of Results

F5100 Extended SGA Size (GB) Query Txns / Min Avg Response Time (Secs) Speedup Ratio
No 76338 0.118 N/A
25 169396 0.053 2.2
50 224318 0.037 2.9
75 300568 0.031 3.9
100 357086 0.025 4.6




Configuration Summary

Server Configuration:

    Sun SPARC Enterprise M5000 Server
    8 x SPARC64 VII 2.4GHz Quad Core
    96 GB memory

Storage Configuration:

    8 x Sun Storage J4200 Arrays, 12x 146 GB 15K RPM disks each (96 disks total)
    1 x Sun Storage F5100 Flash Array

Software Configuration:

    Oracle 11gR2
    Solaris 10

Benchmark Description

The workload consisted in a high volume of SQL select transactions accessing a very large table in a typical business oriented OLTP database.

The database consisted of various tables: Products, Customers, Orders, Warehouse Inventory (Stock) data, etc. and the Stock table alone was 3x the size of the db cache size.

To obtain a baseline, throughput and response times were measured applying the workload against a traditional storage configuration and constrained by disk I/O demand. The workload was then executed with an added Sun Storage F5100 Flash Array configured to contain an Extended SGA of incremental size.

During all tests, the in memory SGA data cache was limited to 25 GB .

The Extended SGA was allocated on a “raw' Solaris Volume created with the Solaris Volume Manager (SVM) on a set of devices (Flash Modules) residing on the Sun Storage F5100 flash array.

Key Points and Best Practices

In order to verify the performance improvement brought by extended SGA, the feature had to be tested with a large enough database size and with a workload requiring significant disk I/O activity to access the data. For that purpose, the size of the database needed to be a multiple of the physical memory size, avoiding the case in which the accessed data could be entirely or almost entirely cached in physical memory.

The above represents a typical “use case” in which the Flash Cache Extension is able to show remarkable performance advantages.

If the DB dataset is already entirely cached, or the DB I/O demand is not significant or the application is already saturating the CPU for non database related processing, or large data caching is not productive (DSS type Queries), the Extended SGA may not improve performance.

It is also relevant to know that additional memory structures needed to manage the Extended SGA are allocated in the “in memory” SGA, therefore reducing its data caching capacity.

Increasing the Extended Cache beyond a specific threshold, dependent on various factors, may reduce the benefit of widening the Flash SGA and actually reduce the overall throughput.

This new cache is somewhat similar architecturally to the L2ARC on ZFS. Once written, flash cache buffers are read-only, and updates are only done into main memory SGA buffers. This feature is expected to primarily benefit read-only and read-mostly workloads.

A typical sizing of database flash cache is 2x to 10x the size of SGA memory buffers. Note that header information is stored in the SGA for each flash cache buffer (100 bytes per buffer in exclusive mode, 200 bytes per buffer in RAC mode), so the number of available SGA buffers is reduced as the flash cache size increases, and the SGA size should be increased accordingly.

Two new init.ora parameters have been introduced, illustrated below:

    db_flash_cache_file = /lfdata/lffile_raw
    db_flash_cache_size = 100G
The db_flash_cache_file parameter takes a single file name, which can be a file system file, a raw device, or an ASM volume. The db_flash_cache_size parameter specifies the size of the flash cache. Note that for raw devices, the partition being used should start at cylinder 1 rather than cylinder 0 (to avoid the disk's volume label).

See Also

Disclosure Statement

Results as of October 10, 2009 from Sun Microsystems.

Tuesday Oct 13, 2009

The Sun SPARC Enterprise T5440 server with 1.6GHz UltraSPARC T2 Plus with Solaris Containers, Sun Flash Open Storage, and Sun JAVA System Web Server 7.0 Update 5 achieved World Record SPECweb2005.
  • Sun has obtained a World Record SPECweb2005 performance result of 100,209 SPECweb2005 on the Sun SPARC Enterprise T5440, running Solaris 10 10/09 Sun JAVA System Web Server 7.0 Update 5, and Java Hotspot™ Server VM.

  • This result demonstrates performance leadership of the Sun SPARC Enterprise T5440 server and its scalability, by using Solaris Containers to consolidate multiple web serving environments, and Sun OpenStorage Flash technology to store large datasets for fast data retrieval.

  • The Sun SPARC Enterprise T5440 delivers 21% greater SPECweb2005 performance than the HP DL370 G6 with 3.2GHz Xeon W5580 processors.

  • The Sun SPARC Enterprise T5440 delivers 40% greater SPECweb2005 performance than the HP DL 585 G5 with four 3.114 GHz Opteron 8393 SE processors.

  • The Sun SPARC Enterprise T5440 delivers 2x the SPECweb2005 performance of the HP DL 580 G5 with four 2.66GHz Xeon X7460 processors.

  • There are no IBM Power6 results on the SPECweb2005 benchmark.

  • This benchmark result clearly demonstrates that the Sun SPARC Enterprise T5440 running Solaris 10 10/09 and Sun Java System Webserver 7.0 Update 5 can support thousands of concurrent web server sessions and is an industry leader in web serving with a Sun solution.

Performance Landscape

Server

Processor

SPECweb2005

Banking*

Ecomm*

Support*

Webserver

OS

Sun T5440

4x 1.6 T2 Plus

100,209

176,500

133,000

95,000

Java WebServer

Solaris

HP DL370 G6

2x 3.2 W5580

83,073

117,120

142,080

76,352

Rock

RedHat
Linux

HP DL585 G5

4x 3.11 O8393

71,629

117,504

123,072

56,320

Rock

RedHat
Linux

HP DL580 G5

4x 2.66 X7460

50,013

97,632

69,600

40,800

Rock

RedHat
Linux

* Banking - SPECweb2005-Banking
   Ecomm - SPECweb2005-Ecommerce
   Support - SPECweb2005-Support

Results and Configuration Summary

Hardware Configuration:

  1 Sun SPARC Enterprise T5440 with

  • 4 x UltraSPARC T2 Processor 8 core, 64 threads, 1.6 GHz
  • 254 GB memory
  • 6 x 4Gb PCI Express 8-Port Host Adapter (SG-XPCIE8SAS-E-Z)
  • 1 x Sun Storage F5100 Flash Array (TA5100RASA4-80AA)
  • 1 x Sun Storage F5100 Flash Array (TA5100RASA4-40AA)

Server Software Configuration:

  • Solaris 10 10/09
  • JAVA System Web Server 7.0 Update 5
  • Java Hotspot™ Server VM

Network configuration:

  • 1 x Arista DCS-7124s 24-10GbE port  switch
  • 1 x Cisco 2970 series (WS-C2970G-24TS-E) switch for the three 1 GbE networks

Back-end Simulator:

  1 Sun Fire X4270 with

  • 2 x 2.93 GHz Intel X5570 Quad core
  • 48GB memory
  • Solaris 10 10/09
  • JSWS 7.0 Update 5
  • Java Hotspot™ Server VM

Clients:

  8 Sun Blade™ T6320

  • 1 x 1.417 GHz UltraSPARC-T2
  • 64 GB memory
  • Solaris 10 5/09
  • Java Hotspot™ Server VM

  8 Sun Blade™ 6270

  • 2 x 2.93 GHz Intel X5570 Quad core
  • 36 GB memory
  • Solaris 10 5/09
  • Java Hotspot™ Server VM

Benchmark Description

SPECweb2005, successor to SPECweb99 and SPECweb99_SSL, is an industry standard benchmark for evaluating Web Server performance developed by SPEC. The benchmark simulates multiple user sessions accessing a Web Server and generating static and dynamic HTTP requests. The major features of SPECweb2005 are:

  • Measures simultaneous user sessions
  • Dynamic content: currently PHP and JSP implementations
  • Page images requested using 2 parallel HTTP connections
  • Multiple, standardized workloads: Banking (HTTPS), E-commerce (HTTP and HTTPS), and Support (HTTP)
  • Simulates browser caching effects
  • File accesses more accurately simulate today's disk access patterns

Key Points and Best Practices

  • The server was divided into four Solaris Containers and a single web server instance was executed in each container.
  • Four processor sets were created (with varying numbers of threads depending on the workload) to run the web server in. This was done to reduce memory access latency using the physical memory closest to the processor.  All interrupts were run on the remaining threads.
  • Each web server is executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Two Sun Storage F5100 Flash Arrays (holding the target file set and logs) were shared by the four containers  for fast data retrieval.   
  • Use of Solaris Containers highlights the consolidation of multiple web serving environments on a single server.
  • Use of the Sun Ext I/O Expansion unit and Sun Storage F5100 Flash Arrays highlight the expandability of the server.

    Disclosure Statement

    Sun SPARC Enterprise T5440 (8 cores, 1 chip) 100209 SPECweb2005, was submitted to SPEC for review on October 13, 2009.  HP ProLiant DL370 G6 (8 cores, 2 chips) 83,073 SPECweb2005. HP ProLiant DL585 G5 (16 cores, 4 chips) 71,629 SPECweb2005. HP ProLiant DL580 G5 (24 cores, 4 chips) 50,013 SPECweb2005. SPEC, SPECweb reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of Oct 10, 2009.

    Tuesday Oct 13, 2009

    Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark Sun SPARC Enterprise M9000/32 SPARC64 VII

    World Record on 32-processor using SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

    • The Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) set a World Record on 32-processor using SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark, as Oct. 12th, 2009.

    • The 32-way Sun SPARC Enterprise M9000 with 2.88 GHz SPARC64 VII+ processors achieved 17,430 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

    • The Sun SPARC Enterprise M9000 result is 4.6x faster than the only IBM 5GHz Power6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

    • IBM has not submitted any p595 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

    • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

    • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details.

    • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

    Performance Landscape

    SAP-SD 2-Tier Performance Table (in decreasing performance order).

    SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
    (New version of the benchmark as of January 2009)

    System OS
    Database
    Users SAP
    ERP/ECC
    Release
    SAPS Date
    Sun SPARC Enterprise M9000
    32xSPARC 64 VII @2.88GHz
    1024 GB
    Solaris 10
    Oracle10g
    17,430 2009
    6.0 EP4
    (Unicode)
    95,480 12-Oct-09
    IBM System 550
    4xPower6@5GHz
    64 GB
    AIX 6.1
    DB2 9.5
    3,752 2009
    6.0 EP4
    (Unicode)
    20,520 16-Jun-09

    Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

    Results and Configuration Summary

    Certified Result:

      Number of SAP SD benchmark users:
      17,430
      Average dialog response time:
      0.95 seconds
      Throughput:

      Fully processed order line items/hour:
      1,909,670

      Dialog steps/hour:
      5,729,000

      SAPS:
      95,480
      SAP Certification:
      2009038

    Hardware Configuration:

      Sun SPARC Enterprise M9000
        32 x 2.88GHz SPARC64 VII, 1024 GB memory
        6 x 6140 storage arrays

    Software Configuration:

      Solaris 10
      SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
      Oracle10g

    Benchmark Description

    The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

    SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

    Disclosure Statement

    Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 10/12/09: Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

    Tuesday Oct 13, 2009

    Significance of Results

    • Four Sun Blade X6270 (2 processors, 8 cores, 16 threads), running SAP ERP application Release 6.0 Enhancement Pack 4 (Unicode) with Oracle Database on top of Solaris 10 OS delivered the highest eight-processor result on the two-tier SAP SD-Parallel Standard Application Benchmark, as of Oct 12th, 2009.

    • Four Sun Blade X6270 servers with Intel Xeon X5570 processors achieved 1.9x performance improvement from two Sun Blade X6270 with the same processors.

    • Two Sun Blade X6270 (2 processors, 8 cores, 16 threads), running SAP ERP application Release 6.0 Enhancement Pack 4 (Unicode) with Oracle Database on top of Solaris 10 OS delivered the highest four-processor result on the two-tier SAP SD-Parallel Standard Application Benchmark, as of Oct 12th, 2009.

    • Two Sun Blade X6270 servers with Intel Xeon X5570 processors achieved 1.9x performance imporvement over a single 2-processor Sun Blade X6270 system.

    • A one node Sun Blade X6270 server with Intel Xeon X5570 processors running Oracle RAC delivers the same result as a Sun Fire X4270 server with Intel Xeon X5570 processors running Oracle with no performance difference between Oracle 10g and Oracle 10g RAC.

    • This benchmark highlights the near-linear scaling of Oracle 10g Real Application Cluster runs on Sun Microsystems hardware in a SAP environment.

    • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details.

    • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

    Performance Landscape

    SAP SD-Parallel 2-Tier Performance Table (in decreasing performance order).

    SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
    (New version of the benchmark as of January 2009)

    System OS
    Database
    Users SAP
    ERP/ECC
    Release
    SAPS Date
    Four Sun Blade X6270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g Real Application Clusters
    13,718 2009
    6.0 EP4
    (Unicode)
    75,762 12-Oct-09
    Two Sun Blade X6270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g Real Application Clusters
    7,220 2009
    6.0 EP4
    (Unicode)
    39,420 12-Oct-09
    One Sun Blade X6270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g Real Application Clusters
    3,800 2009
    6.0 EP4
    (Unicode)
    20,750 12-Oct-09
    Sun Fire X4270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g
    3,800 2009
    6.0 EP4
    (Unicode)
    21,000 21-Aug-09

    Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

    Results and Configuration Summary

    Four Sun Blade X6270 Servers, each with two Intel Xeon X5570 2.93 GHz(2 processors, 8 cores, 16 threads)

      Number of SAP SD benchmark users:
      13,718
      Average dialog response time:
      0.86 seconds
      Throughput:

      Dialog steps/hour:
      4,545,729

      SAPS:
      75,762
      SAP Certification:
      2009041

    Two Sun Blade X6270 Servers, each with two Intel Xeon X5570 2.93 GHz(2 processors, 8 cores, 16 threads)

      Number of SAP SD benchmark users:
      7,220
      Average dialog response time:
      0.99 seconds
      Throughput:

      Dialog steps/hour:
      2,365,000

      SAPS:
      39,420
      SAP Certification:
      2009040

    One Sun Blade X6270 Servers, with two Intel Xeon X5570 2.93 GHz(2 processors, 8 cores, 16 threads)

      Number of SAP SD benchmark users:
      3,800
      Average dialog response time:
      0.99 seconds
      Throughput:

      Dialog steps/hour:
      1,245,000

      SAPS:
      20,750
      SAP Certification:
      2009039

    Software:

      Oracle 10g Real Application Clusters
      Solaris 10 OS

    Benchmark Description

    The SAP Standard Application Sales and Distribution - Parallel (SD-Parallel) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.
    SD Versus SD-Parallel
    The SD-Parallel Benchmark consists of the same transactions and user interaction steps as the SD Benchmark. This means that the SD-Parallel Benchmark runs the same business processes as the SD Benchmark. The difference between the benchmarks is the technical data distribution. An Additional Rule for Parallel and Distributed Databases
    The additional rule is: Equally distribute the benchmark users across all database nodes for the used benchmark clients (round-robin-method). Following this rule, all database nodes work on data of all clients. This avoids unrealistic configurations such as having only one client per database node.
    The SAP Benchmark Council agreed to give the parallel benchmark a different name so that the difference can be easily recognized by any interested parties - customers, prospects, and analysts. The naming convention is SD-Parallel for Sales & Distribution - Parallel.
    SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

    Disclosure Statement

    SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of Oct 12th, 2009: Four Sun Blade X6270 (each 2 processors, 8 cores, 16 threads) 13,718 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, each 48 GB memory, running two-tier SAP Sales and Distribution Parallel (SD-Parallel) standard SAP SD benchmark with Oracle 10g Real Application Clusters and Solaris 10, Cert# 2009041. Two Sun Blade X6270 (each 2 processors, 8 cores, 16 threads) 7,220 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, each 48 GB memory, running two-tier SAP Sales and Distribution Parallel (SD-Parallel) standard SAP SD benchmark with Oracle 10g Real Application Clusters and Solaris 10, Cert# 2009040. Sun Blade X6270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, running two-tier SAP Sales and Distribution Parallel (SD-Parallel) standard SAP SD benchmark with Oracle 10g Real Application Clusters and Solaris 10, Cert# 2009039. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, running two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark with Oracle 10g and Solaris 10, Cert# 2009033.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

    Tuesday Oct 13, 2009

    The SPEC CPU2006 benchmarks were run on the new 2.88 GHz and 2.53 GHz SPARC64 VII processors for the Sun SPARC Enterprise Mseries servers. The new processors were tested in the Sun SPARC Enterprise M4000, M5000, M8000, M9000 servers.


    • The Sun SPARC Enterprise M9000 server running the new 2.88 GHz SPARC64 VII processors beats the IBM Power 595 server running 5.0 GHz POWER6 processors by 20% on the SPECint_rate2006 benchmark.

    • The Sun SPARC Enterprise M9000 server running the new 2.88 GHz SPARC64 VII processors beats the IBM Power 595 server running 5.0 GHz POWER6 processors by 29% on the SPECint_rate_base2006 benchmark.

    • The Sun SPARC Enterprise M9000 server with 64 SPARC64 VII 2.88GHz processors delivered results of 2590 SPECint_rate2006 and 2100 SPECfp_rate2006.

    • The Sun SPARC Enterprise M9000 server with 64 SPARC64 VII processors at 2.88GHz improves performance vs. 2.52 GHz by 13% for SPECint_rate2006 and 5% for SPECfp_rate2006.

    • The Sun SPARC Enterprise M9000 server with 32 SPARC64 VII 2.88GHz processors delivered results of 1450 SPECint_rate2006 and 1250 SPECfp_rate2006.

    • The Sun SPARC Enterprise M9000 server with 32 SPARC64 VII processors at 2.88GHz improves performance vs. 2.52 GHz by 17% for SPECint_rate2006 and 13% for SPECfp_rate2006.

    • The Sun SPARC Enterprise M8000 server with 16 SPARC64 VII 2.88GHz processors delivered results of 753 SPECint_rate2006 and 666 SPECfp_rate2006.

    • The Sun SPARC Enterprise M8000 server with 16 SPARC64 VII processors at 2.88GHz improves performance vs. 2.52 GHz by 18% for SPECint_rate2006 and 14% for SPECfp_rate2006.

    • The Sun SPARC Enterprise M5000 server with 8 SPARC64 VII 2.53GHz processors delivered results of 296 SPECint_rate2006 and 234 SPECfp_rate2006.

    • The Sun SPARC Enterprise M5000 server with 8 SPARC64 VII processors at 2.53GHz improves performance vs. 2.40 GHz by 12% for SPECint_rate2006 and 5% for SPECfp_rate2006.

    • The Sun SPARC Enterprise M4000 server with 4 SPARC64 VII 2.53GHz processors delivered results of 152 SPECint_rate2006 and 116 SPECfp_rate2006.

    • The Sun SPARC Enterprise M4000 server with 4 SPARC64 VII processors at 2.53GHz improves performance vs. 2.40 GHz by 13% for SPECint_rate2006 and 4% for SPECfp_rate2006.

    Performance Landscape

    SPEC CPU2006 Performance Charts - bigger is better, selected results, please see www.spec.org for complete results. All results as of 10/07/09.

    In the tables below
    "Base" = SPECint_rate_base2006 or SPECfp_rate_base2006
    "Peak" = SPECint_rate2006 or SPECfp_rate2006

    SPECint_rate2006 results - large systems

    System Processors Base
    Copies
    Performance Results Comments
    Cores/
    Chips
    Type GHz Base Peak
    SGI Altix 4700 Bandwidth 1024/512 Itanium 2 1.6 1020 9031 na
    Sun Blade X6440 Cluster 768/192 Opteron 8384 2.7 705 8845 na
    SGI Altix 4700 Density 256/128 Itanium 2 1.66 256 2893 3354
    vSMP Foundation 128/32 Xeon X5570 2.93 255 3147 na
    SGI Altix 4700 Bandwidth 256/128 Itanium 2 1.6 256 2715 2971
    SPARC Enterprise M9000 256/64 SPARC64 VII 2.88 511 2400 2590 New
    SPARC Enterprise M9000 256/64 SPARC64 VII 2.52 511 2088 2288
    IBM Power 595 64/32 POWER6 5.0 128 1866 2155
    HP Superdome 128/64 Itanium 2 1.6 128 1534 1648
    SPARC Enterprise M9000 128/32 SPARC64 VII 2.88 255 1370 1450 New
    SPARC Enterprise M9000 128/64 SPARC64 VI 2.4 255 1111 1294
    SPARC Enterprise M9000 128/32 SPARC64 VI 2.52 255 1141 1240
    Unisys ES7000 96/16 Xeon X7460 2.66 96 999 1049
    SGI Altix ICE 8200EX 32/8 Xeon X5570 2.93 64 931 999
    IBM Power 575 32/16 POWER6 4.7 64 812 934
    IBM Power 570 32/16 POWER6+ 4.2 64 661 832
    SPARC Enterprise M8000 64/16 SPARC64 VII 2.88 127 706 753 New
    SPARC Enterprise M9000 64/32 SPARC64 VI 2.4 127 553 650
    SPARC Enterprise M8000 64/16 SPARC64 VII 2.52 127 565 637

    SPECint_rate2006 results - small systems

    System Processors Base
    Copies
    Performance Results Comments
    Cores/
    Chips
    Type GHz Base Peak
    Sun Fire X4440 24/4 Opteron 8435 SE 2.6 24 296 377
    SPARC Enterprise M5000 32/8 SPARC64 VII 2.53 64 267 296 New
    Sun Blade X6440 16/4 Opteron 8389 2.9 16 226 292
    HP ProLiant BL680c G5 24/4 Xeon E7458 2.4 24 247 268
    SPARC Enterprise M5000 32/8 SPARC64 VII 2.4 63 232 264
    IBM Power 550 8/4 POWER6+ 5.0 16 215 263
    Sun Fire X2270 8/2 Xeon X5570 2.93 16 223 260
    SPARC Enterprise T5240 16/2 UltraSPARC T2 Plus 1.6 127 171 183
    SPARC Enterprise M4000 16/4 SPARC64 VII 2.53 32 136 152 New
    SPARC Enterprise M4000 16/4 SPARC64 VII 2.4 32 118 135

    SPECfp_rate2006 results - large systems

    System Processors Base
    Copies
    Performance Results Comments
    Cores/
    Chips
    Type GHz Base Peak
    SGI Altix 4700 Bandwidth 1024/512 Itanium 2 1.6 1020 10583 na
    SGI Altix 4700 Density 1024/512 Itanium 2 1.66 1020 10580 na
    Sun Blade X6440 Cluster 768/192 Opteron 8384 2.7 705 6502 na
    SGI Altix 4700 Bandwidth 256/128 Itanium 2 1.6 256 3419 3507
    ScaleMP vSMP Foundation 128/32 Xeon X5570 2.93 255 2553 na
    IBM Power 595 64/32 POWER6 5.0 128 1681 2184
    IBM Power 595 64/32 POWER6 5.0 128 1822 2108
    SPARC Enterprise M9000 256/64 SPARC64 VII 2.88 511 1930 2100 New
    SPARC Enterprise M9000 256/64 SPARC64 VII 2.52 511 1861 2005
    SGI Altix 4700 Bandwidth 128/64 Itanium 2 1.66 128 1832 1947
    HP Superdome 128/64 Itanium 2 1.6 128 1422 1479
    SPARC Enterprise M9000 128/32 SPARC64 VII 2.88 255 1190 1250 New
    SPARC Enterprise M9000 128/64 SPARC64 VI 2.4 255 1160 1225
    SPARC Enterprise M9000 128/32 SPARC64 VII 2.52 255 1059 1110
    IBM Power 575 32/16 POWER6 4.7 64 730 839
    SPARC Enterprise M8000 64/16 SPARC64 VII 2.88 127 616 666 New
    SPARC Enterprise M9000 64/32 SPARC64 VI 2.52 127 588 636
    IBM Power 570 32/16 POWER6+ 4.2 64 517 602
    SPARC Enterprise M8000 64/32 SPARC64 VI 2.4 127 538 582

    SPECfp_rate2006 results - small systems

    System Processors Base
    Copies
    Performance Results Comments
    Cores/
    Chips
    Type GHz Base Peak
    Supermicro H8QM8-2 24/4 Opteron 8435 SE 2.8 24 261 287
    SPARC Enterprise T5440 32/4 UltraSPARC T2 Plus 1.6 255 254 270
    IBM Power 560 16/8 POWER6+ 3.6 32 226 263
    SPARC Enterprise M5000 32/8 SPARC64 VII 2.53 64 218 234 New
    SPARC Enterprise M5000 32/8 SPARC64 VII 2.4 63 208 223
    IBM Power 550 8/4 POWER6+ 5.0 16 188 222
    ASUS Z8PE-D18 8/2 Xeon X5570 2.93 16 197 203
    SPARC Enterprise T5240 16/2 UltraSPARC T2 Plus 1.6 127 124 133
    SPARC Enterprise M4000 16/4 SPARC64 VII 2.53 32 111 116 New
    SPARC Enterprise M4000 16/4 SPARC64 VII 2.4 32 107 112

    Results and Configuration Summary

    Test Configurations:

    Sun SPARC Enterprise M9000
    64 x 2.88 GHz SPARC64 VII
    1152 GB (448 x 2GB + 64 x 4GB)
    Solaris 10 5/09
    Sun Studio 12 Update 1

    Sun SPARC Enterprise M9000
    32 x 2.88 GHz SPARC64 VII
    704 GB (160 x 2GB + 96 x 4GB)
    Solaris 10 5/09
    Sun Studio 12 Update 1

    Sun SPARC Enterprise M8000
    16 x 2.88 GHz SPARC64 VII
    512 GB (128 x 4GB)
    Solaris 10 10/09
    Sun Studio 12 Update 1

    Sun SPARC Enterprise M5000
    8 x 2.53 GHz SPARC64 VII
    128 GB (64 x 2GB)
    Solaris 10 10/09
    Sun Studio 12 Update 1

    Sun SPARC Enterprise M4000
    4 x 2.53 GHz SPARC64 VII
    32 GB (32 x 1GB)
    Solaris 10 10/09
    Sun Studio 12 Update 1

    Results Summary:

    M9000 M9000 M8000 M5000 M4000
    SPECint_rate_base2006 2400 1370 706 267 136
    SPECint_rate2006 2590 1450 753 296 152
    SPECfp_rate_base2006 1930 1190 616 218 111
    SPECfp_rate2006 2100 1250 666 234 116
    SPECint_base2006 - - 12.4 - 12.1
    SPECint2006 - - 13.6 - 12.9
    SPECfp_base2006 - - 15.6 - 13.3
    SPECfp2006 - - 16.5 - 13.9
    SPECfp2006 - autopar - - 28.2 - -
    SPECfp2006 - autopar - - 33.9 - -

    Benchmark Description

    SPEC CPU2006 is SPEC's most popular benchmark, with over 8000 results published in the three years since it was introduced. It measures:

    • "Speed" - single copy performance of chip, memory, compiler
    • "Rate" - multiple copy (throughput)

    The rate metrics are used for the throughput-oriented systems described on this page. These metrics include:

    • SPECint_rate2006: throughput for 12 integer benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
    • SPECfp_rate2006: throughput for 17 floating point benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

    There are "base" variants of both the above metrics that require more conservative compilation, such as using the same flags for all benchmarks.

    Key Points and Best Practices

    Result on this page for the Sun SPARC Enterprise M9000 server were measured on a Fujitsu SPARC Enterprise M9000. The Sun SPARC Enterprise M9000 and Fujitsu SPARC Enterprise M9000 are electronically equivalent. Results for the Sun SPARC Enterprise M8000, M4000 and M5000 were measured on those systems. The similarly named Fujitsu sytems are electronically equivalent.

    Use the latest compiler. The Sun Studio group is always working to improve the compiler. Sun Studio 12 Update 1, which are used in these submissions, provides updated code generation for a wide variety of SPARC and x86 implementations.

    I/O still counts. Even in a CPU-intensive workload, some I/O remains. This point is explored in some detail at http://blogs.sun.com/jhenning/entry/losing_my_fear_of_zfs.

    See Also

    Disclosure Statement

    SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from www.spec.org as of 7 October 2009. Sun's new results quoted on this page have been submitted to SPEC. Sun SPARC Enterprise M9000 2400 SPECint_rate_base2006, 2590 SPECint_rate2006, 1930 SPECfp_rate_base2006, 2100 SPECfp_rate2006; Sun SPARC Enterprise M9000 (32 chips) 1370 SPECint_rate_base2006, 1450 SPECint_rate2006, 1190 SPECfp_rate_base2006, 1250 SPECfp_rate2006; Sun SPARC Enterprise M8000 706 SPECint_rate_base2006, 753 SPECint_rate2006, 616 SPECfp_rate_base2006, 666 SPECfp_rate2006; Sun SPARC Enterprise M5000 267 SPECint_rate_base2006, 296 SPECint_rate2006, 218 SPECfp_rate_base2006, 234 SPECfp_rate2006; Sun SPARC Enterprise M4000 136 SPECint_rate_base2006, 152 SPECint_rate2006, 111 SPECfp_rate_base2006, 116 SPECfp_rate2006; Sun SPARC Enterprise M9000 (2.52GHz) 2088 SPECint_rate_base2006, 2288 SPECint_rate2006, 1860 SPECfp_rate_base2006, 2010 SPECfp_rate2006; Sun SPARC Enterprise M9000 (32 chips 2.52GHz) 1140 SPECint_rate_base2006, 1240 SPECint_rate2006, 1060 SPECfp_rate_base2006, 1110 SPECfp_rate2006; Sun SPARC Enterprise M8000 (2.52GHz) 565 SPECint_rate_base2006, 637 SPECint_rate2006, 538 SPECfp_rate_base2006, 582 SPECfp_rate2006; Sun SPARC Enterprise M5000 (2.4GHz) 232 SPECint_rate_base2006, 264 SPECint_rate2006, 208 SPECfp_rate_base2006, 223 SPECfp_rate2006; Sun SPARC Enterprise M4000 (2.4GHz) 118 SPECint_rate_base2006, 135 SPECint_rate2006, 107 SPECfp_rate_base2006, 112 SPECfp_rate2006; IBM Power 595 1866 SPECint_rate_base2006, 2155 SPECint_rate2006,

    Sunday Oct 11, 2009

    TPC-C Sun SPARC Enterprise T5440 with Oracle RAC World Record Database Result

    Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.

    • The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 12/14/09.

    • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by 26% and has 16% better price/performance on the TPC-C benchmark.

    • The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).

    • The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.

    • The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.

    • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.

    • The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.

    • Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.

    • This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.

    More information on this benchmark will be posted in the next several days.

    Performance Landscape

    TPC-C results (sorted by tpmC, bigger is better)


    System
    tpmC Price/tpmC Avail Database Cluster Racks w/KtpmC
    12 x Sun SPARC Enterprise T5440 7,646,487 2.36 USD 12/14/09 Oracle 11g RAC Y 9 9.6
    IBM Power 595 6,085,166 2.81 USD 12/10/08 IBM DB2 9.5 N 76 56.4
    Bull Escala PL6460R 6,085,166 2.81 USD 12/15/08 IBM DB2 9.5 N 71 56.4
    HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46 to be added

    Avail - Availability date
    w/KtmpC - Watts per 1000 tpmC
    Racks - clients, servers, storage, infrastructure

    Results and Configuration Summary

    Hardware Configuration:

      9 racks used to hold

      Servers:
        12 x Sun SPARC Enterprise T5440
        4 x 1.6 GHz UltraSPARC T2 Plus
        512 GB memory
        10 GbE network for cluster
      Storage:
        60 x Sun Storage F5100 Flash Array
        61 x Sun Fire X4275, Comstar SAS target emulation
        24 x Sun StorageTek 6140 (16 x 300 GB SAS 15K RPM)
        6 x Sun Storage J4400
        3 x 80-port Brocade FC switches
      Clients:
        24 x Sun Fire X4170, each with
        2 x 2.53 GHz X5540
        48 GB memory

    Software Configuration:

      Solaris 10 10/09
      OpenSolaris 6/09 (COMSTAR) for Sun Fire X4275
      Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning
      Tuxedo CFS-R Tier 1
      Sun Web Server 7.0 Update 5

    Benchmark Description

    TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

    POSTSCRIPT: Here are some comments on IBM's grasping-at-straws-perf/core attacks on the TPC-C result:
    c0t0d0s0 blog: "IBM's Reaction to Sun&Oracle TPC-C

    See Also

    Disclosure Statement

    TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 12/14/09. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 10/11/09.

    Tuesday Sep 22, 2009

    Two-Processor Performance using 8 Virtual CPU Solaris 10 Container Configuration:
    • Sun achieved 36% better performance using Solaris and Solaris 10 containers than a similar configuration on SUSE Linux using VMware ESX Server 4.0 on the same benchmark both using 8 virtual cpus.
    • Solaris Containers are the best virtualization technology for SAP projects and has been supported for more than 4 years. Other virtualization technologies suffer various overheads that decrease performance.
    • The Sun Fire X4270 server with 48G memory and a Solaris 10 container configured with 8 virtual CPUs achieved 2800 SAP SD Benchmark users and beat the Fujitsu PRIMERGY RX300 S5 server with 96G memory and the SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0 by 36%. Both results used the same CPUs and were running the SAP ERP application release 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark.
    • Both the Sun and Fujitsu results were run at 50% and 48% utilization respectively. With these servers being half utilized, there is headroom for additional performance.
    • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
    • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details. Note: username and password for SAP Service Marketplace required.
    • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details. Note: username and password for SAP Service Marketplace required.

    SAP-SD 2-Tier Performance Landscape (in decreasing performance order).


    SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results (New version of the benchmark as of January 2009)

    System OS
    Database
    Virtualized? Users SAP
    ERP/ECC
    Release
    SAPS SAPS/
    Proc
    Date
    Sun Fire X4270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g
    no 3,800 2009
    6.0 EP4
    (Unicode)
    21,000 10,500 21-Aug-09
    IBM System 550
    4xPower6 @5GHz
    64 GB
    AIX 6.1
    DB2 9.5
    no 3,752 2009
    6.0 EP4
    (Unicode)
    20,520 5,130 16-Jun-09
    HP ProLiant DL380 G6
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    SUSE Linux Ent Svr 10
    MaxDB 7.8
    no 3,171 2009
    6.0 EP4
    (Unicode)
    17,380 8,690 17-Apr-09
    Sun Fire X4270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10 container
    (8 virtual CPUs)
    Oracle 10g
    YES
    50% util
    2,800 2009
    6.0 EP4
    (Unicode)
    15,320 7,660 10-Sep-09
    Fujitsu PRIMERGY RX300 S5
    2xIntel Xeon X5570 @2.93GHz
    96 GB
    SUSE Linux Ent Svr 10 on
    VMware ESX Server 4.0
    MaxDB 7.8
    YES
    48% util
    2,056 2009
    6.0 EP4
    (Unicode)
    11,230 5,615 04-Aug-09

    Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

    Results and Configuration Summary

    Hardware Configuration:

      One, Sun Fire X4270
        2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
        48 GB memory
        Sun StorageTek CSM200 with 32 * 73GB 15KRPM 4Gb FC-AL and 32 * 146GB 15KRPM 4Gb FC-AL Drives

    Software Configuration:

      Solaris 10 container configured with 8 virtual CPUs
      SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
      Oracle 10g

    Sun has submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

        Number of benchmark users:
      2,800
        Average dialog response time:
      0.971 s

      Fully processed order line:
      306,330

      Dialog steps/hour:
      919,000

      SAPS:
      15,320
        SAP Certification:
      2009034

    Benchmark Description

    The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

    SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

    Key Points and Best Practices

    • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

    • Solaris 10 Container best practices how-to guide

    Disclosure Statement

    Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 09/10/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10, Cert# 2009006. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10 container configured with 8 virtual CPUs, Cert# 2009034. Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

    Tuesday Sep 01, 2009

    Significance of Results

    Sun SPARC Enterprise T5220, T5240 and T5440 servers ran benchmarks using the Aho-Corasick string searching algorithm. String searching or pattern matching are important to a variety of commercial, government and HPC applications. One of the core functions needed for text identification algorithms in data repositories is real-time string searching. For this benchmark, the IBM, HP and Sun systems used the Aho-Corasick algorithm for string searching.

    Sun SPARC Enterprise T5440

    • A 1.6 GHz Sun SPARC Enterprise T5440 server could search a book as tall as Mt. Everest (29,208 feet, 861 GB book) in 61 seconds, which corresponds to a string search rate of 14.2 GB/s.

    • A 1.6 GHz Sun SPARC Enterprise T5440 server can search at a rate of 14.2 GB/s, which corresponds to searching a book containing one terabyte of data (34,745 feet high) in only 70 seconds.

    • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching at a rate of 14.2 GB/s which is 29.9 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s

    • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching 3.7 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5440 server has a 1.7 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other tests).

    • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 12% improvement over the 1.4 GHz Sun SPARC Enterprise T5440.

    • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 2x speedup over the 1.6 GHz Sun SPARC Enterprise T5240 server which demonstrated a 2.3x speedup over the 1.4 GHz Sun SPARC Enterprise T5220 server.

    Sun SPARC Enterprise T5240

    • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching at a rate of 7.22 GB/s which is 15.4 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

    • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching 1.9 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5240 server has a 2.4 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other

    • The 1.6 GHz Sun SPARC Enterprise T5240 server demonstrated a 14% speedup over the 1.4 GHz Sun SPARC Enterprise T5240 server.

    Sun SPARC Enterprise T5220

    • The 1-chip 1.4 GHz Sun SPARC Enterprise T5220 server performed string searching at a rate of 3.16 GB/s which is 6.7 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

    Performance Landscape

    System Throughput
    (GB/sec)
    Chips Cores
    Sun SPARC Enterprise T5440 (1.6 GHz) 14.2 4 32
    Sun SPARC Enterprise T5440 (1.4 GHz) 12.7 4 32
    Sun SPARC Enterprise T5240 (1.6 GHz) 7.2 2 16
    Sun SPARC Enterprise T5240 (1.4 GHz) 6.4 2 16
    HP DL-580 (2.9 GHz) 3.9 4 16
    Sun SPARC Enterprise T5220 (1.4 GHz) 3.2 1 8
    IBM Cell Broadband Engine DD3 Blade (3.2 GHz) 0.475 2 16

    Results and Configuration Summary

    Hardware Configuration:
      Sun SPARC Enterprise T5440 (1.6 GHz)
        4 x 1.6 GHz UltraSPARC T2 Plus processors
        256 GB
      Sun SPARC Enterprise T5440 (1.4 GHz)
        4 x 1.4 GHz UltraSPARC T2 Plus processors
        128 GB
      Sun SPARC Enterprise T5240 (1.6 GHz)
        2 x 1.6 GHz UltraSPARC T2 Plus processors
        64 GB
      Sun SPARC Enterprise T5240 (1.4 GHz)
        2 x 1.4 GHz UltraSPARC T2 Plus processors
        64 GB
      Sun SPARC Enterprise T5220 (1.4 GHz)
        1 x 1.4 GHz UltraSPARC T2 processor
        32 GB

    Software Configuration:

      Sun SPARC Enterprise T5440 (1.6 GHz)
        OpenSolaris 2009.06
        Sun Studio 12 (Sun C 5.9 2007.05)
      Sun SPARC Enterprise T5440 (1.4 GHz)
        Solaris 10 2008.07
        Sun Studio 12 (Sun C 5.9 2007.05)
      Sun SPARC Enterprise T5240 (1.6 GHz)
        OpenSolaris 2009.06
        Sun Studio 12 (Sun C 5.9 2007.05)
      Sun SPARC Enterprise T5240 (1.4 GHz)
        Solaris 10 2008.07
        Sun Studio 12 (Sun C 5.9 2007.05)
      Sun SPARC Enterprise T5220 (1.4 GHz)
        Solaris 10 2008.07
        Sun Studio 12 (Sun C 5.9 2007.05)

    Benchmark Description

    One of the core functions needed for text identification algorithms in data repositories is real-time string searching. This string searching benchmark demonstrates the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code creation and speed of code execution. In IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008, IBM describes a variant of the Aho-Corasick string searching algorithm that uses deterministic finite automata. The algorithm first constructs a graph that represents a dictionary, then walks that graph using successive input characters from a text file. Each "state" in the graph includes a state transition table (STT) that is accessed using the next input character from the text file to determine the address of the next state in the graph. IBM defines an automaton as a two-step loop that: (1) obtains the address of the next state from the STT, and (2) fetches the next state in the graph.

    IBM reports the performance of its Cell Broadband Engine (CBE) to execute this algorithm to search a 4.4 MB version of the King James Bible using a dictionary of the 20,000 most used words in the English language (average word length of 7.59 characters). Each of the 8 synergistic processing elements (SPEs) of each of the two CBEs executes 16 automata, for a total of 256 automata. All automata and hence all SPEs access a single, shared dictionary.

    IBM describes elaborate optimizations of the Aho-Corasick algorithm, including state shuffling, state replication, alphabet shuffling and state caching. These optimizations were required to: (1) overcome "memory congestion", i.e., contention amongst the SPEs for access to the shared dictionary, and (2) compensate for the limited local storage that is associated with each SPE. These optimizations were necessary to achieve the performance reported for the CBE DD3 Blade.

    IBM does not provide references that indicate where to obtain the dictionary and Bible. IBM reports the algorithmic performance in Gbits/s but does not indicate whether an 8-bit byte is extended to 10 bits as required for network transmission.

    In order to closely approximate the dictionary and Bible that were used by IBM, Sun used a dictionary of 25,143 English words (the Open Solaris file cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/spell/list) for which the average word length is 7.2 characters, and a 4.6 MB version of the King James Bible (www.patriot.net/users/bmcgin/kjv12.zip). For reporting of results in Gbits/s, the length of a byte is assumed to be 8 bits.

    Key Points and Best Practices

    • Power was measured during execution of the Aho-Corasick algorithm using a WattsUp power meter, and the average rate of power consumption is presented.

    • The Aho-Corasick algorithm as deployed on the IBM Cell Broadband Engine DD3 Blade required substantial optimization and tuning to achieve the reported performance, whereas on the Sun SPARC Enterprise T5220, T5240 or T5440 servers only a basic implementation of the algorithm and a simple compilation were needed.

    • In order to demonstrate the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code generation and speed of code execution, Sun implemented the Aho-Corasick algorithm using ANSI C. No optimizations of the algorithm were required to achieve the performance reported for the T5220, T5240 and T5440. The source code was compiled using the -m32 -xO3 and -xopenmp options. The dictionary is represented using a graph that comprises 82 MB. Each core of the T5220, T5240 or T5440 executes 8 automata using one OpenMP thread per automaton. Thus, the T5220 executes 64 total automata, the T5240 executes 128 total automata and the T5440 executes 256 total automata. All automata and hence all cores access a single, shared dictionary. Access to this dictionary is accelerated by the large, shared L2 caches of the Sun SPARC Enterprise T5220, T5240 and T5440.

    See Also

    Friday Aug 28, 2009

    Sun Fire X4270 Server World Record Two Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

    • World Record 2-processor performance result on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark on the Sun Fire X4270 server.

    • The Sun Fire X4270 server with two Intel Xeon X5570 processors (8 cores, 16 threads) achieved 3,800 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle 10g database and Solaris 10 operating system.

    • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.

    • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the IBM System 550 server using 4 POWER6 processors, 64 GB memory and the AIX 6.1 operating system.
    • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the HP ProLiant BL460c G6 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Windows Server 2008 operating system.

    • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

    • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

    Performance Landscape

    SAP-SD 2-Tier Performance Table (in decreasing performance order).

    SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
    (New version of the benchmark as of January 2009)

    System OS
    Database
    Users SAP
    ERP/ECC
    Release
    SAPS SAPS/
    Proc
    Date
    Sun Fire X4270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g
    3,800 2009
    6.0 EP4
    (Unicode)
    21,000 10,500 21-Aug-09
    IBM System 550
    4xPower6 @5GHz
    64 GB
    AIX 6.1
    DB2 9.5
    3,752 2009
    6.0 EP4
    (Unicode)
    20,520 5,130 16-Jun-09
    Sun Fire X4270
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Solaris 10
    Oracle 10g
    3,700 2009
    6.0 EP4
    (Unicode)
    20,300 10,150 30-Mar-09
    HP ProLiant BL460c G6
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Windows Server 2008
    Enterprise Edition
    SQL Server 2008
    3,415 2009
    6.0 EP4
    (Unicode)
    18,670 9,335 04-Aug-09
    Fujitsu PRIMERGY TX/RX 300 S5
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Windows Server 2008
    Enterprise Edition
    SQL Server 2008
    3,328 2009
    6.0 EP4
    (Unicode)
    18,170 9,085 13-May-09
    HP ProLiant BL460c G6
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Windows Server 2008
    Enterprise Edition
    SQL Server 2008
    3,310 2009
    6.0 EP4
    (Unicode)
    18,070 9,035 27-Mar-09
    HP ProLiant DL380 G6
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Windows Server 2008
    Enterprise Edition
    SQL Server 2008
    3,300 2009
    6.0 EP4
    (Unicode)
    18,030 9,015 27-Mar-09
    Fujitsu PRIMERGY BX920 S1
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Windows Server 2008
    Enterprise Edition
    SQL Server 2008
    3,260 2009
    6.0 EP4
    (Unicode)
    17,800 8,900 18-Jun-09
    NEC Express5800
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    Windows Server 2008
    Enterprise Edition
    SQL Server 2008
    3,250 2009
    6.0 EP4
    (Unicode)
    17,750 8,875 28-Jul-09
    HP ProLiant DL380 G6
    2xIntel Xeon X5570 @2.93GHz
    48 GB
    SuSE Linux Enterprise Server 10
    MaxDB 7.8
    3,171 2009
    6.0 EP4
    (Unicode)
    17,380 8,690 17-Apr-09

    Complete benchmark results may be found at the SAP benchmark website: http://www.sap.com/benchmark.

    Results and Configuration Summary

    Hardware Configuration:

      One, Sun Fire X4270
        2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
        48 GB memory
        Sun Storage 6780 with 48 x 73GB 15KRPM 4Gb FC-AL and 16 x 146GB 15KRPM 4Gb FC-AL Drives

    Software Configuration:

      Solaris 10
      SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
      Oracle 10g

    Certified Results:

              Performance: 3800 benchmark users
              SAP Certification: 2009033

    Benchmark Description

    The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

    SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

    Key Points and Best Practices

    • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

    See Also

    Benchmark Tags

    World-Record, Performance, SAP-SD, Solaris, Oracle, Intel, X64, x86, HP, IBM, Application, Database

    Disclosure Statement

      Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 08/21/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,415 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009031. Fujitsu PRIMERGY TX/RX 300 S5 (2 processors, 8 cores, 16 threads) 3,328 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009014. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,310 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009003. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,300 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009004. Fujitsu PRIMERGY BX920 S1 (2 processors, 8 cores, 16 threads) 3,260 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009024. NEC Express5800 (2 processors, 8 cores, 16 threads) 3,250 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009027. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

      SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

    Wednesday Aug 12, 2009

    Significance of Results

    The Sun SPARC Enterprise T5240 server running the Sun Java Messaging server 6.3 achieved World Record SPECmail2009 results using ZFS.

    • A Sun SPARC Enterprise T5240 server powered by two 1.6 GHz UltraSPARC T2 Plus processors running the Sun Java Communications Suite 5 software along with the Solaris 10 Operating System and using six Sun StorageTek 2540 arrays achieved a new World Record 12000 SPECmail_Ent2009 IMAP4 users at 57,758 Sessions/hour for SPECmail2009.
    • The Sun SPARC Enterprise T5240 server achieve twice the number of users and sessions/hour rate than the Apple Xserv3,1 solution equipped with Intel Nehalem processors.
    • The Sun result was obtained using ~10% fewer disk spindles with the Sun StorageTek 2540 RAID controller direct attach storage solution versus Apple's direct attached storage.
    • This benchmark result demonstrates that the Sun SPARC Enterprise T5240 server together with Sun Java Communication Suite 5 component Sun Java System Messaging Server 6.3, Solaris 10 and ZFS on Sun StorageTek 2540 arrays supports a large, enterprise level IMAP mail server environment. This solution is reliable, low cost, and low power, delivering the best performance and maximizing the data integrity with Sun's ZFS file systems.

    Performance Landscape

    SPECmail2009 (ordered by performance)

    System Processors Performance
    Type GHz Ch, Co, Th SPECmail_Ent2009
    Users
    SPECmail2009
    Sessions/hour
    Sun SPARC Enterprise T5240 UltraSPARC T2 Plus 1.6 2, 16, 128 12,000 57,758
    Sun Fire X4275 Xeon X5570 2.93 2, 8, 16 8,000 38,348
    Apple Xserv3,1 Xeon X5570 2.93 2, 8, 16 6,000 28,887
    Sun SPARC Enterprise T5220 UltraSPARC T2 1.4 1, 8, 64 3,600 17,316

    Notes:

      Number of SPECmail_Ent2009 users (bigger is better)
      SPECmail2009 Sessions/hour (bigger is better)
      Ch, Co, Th: Chips, Cores, Threads

    Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org

    Results and Configuration Summary

    Hardware Configuration:

      Sun SPARC Enterprise T5240

        2 x 1.6 GHz UltraSPARC T2 Plus processors
        128 GB
        8 x 146GB, 10K RPM SAS disks

      6 x Sun StorageTek 2540 Arrays,

        4 arrays with 12 x 146GB 15K RPM SAS disks
        2 arrays with 12 x 73GB 15K RPM SAS disks

      2 x Sun Fire X4600 benchmark manager, load generator and mail sink

        8 x AMD Opteron 8356 2.7 GHz QC processors
        64 GB
        2 x 73GB 10K RPM SAS disks

      Sun Fire X4240 load generator

        2 x AMD Opteron 2384 2.7 GHz DC processors
        16 GB
        2 x 73GB 10K RPM SAS disks

    Software Configuration:

      Solaris 10
      ZFS
      Sun Java Communication Suite 5
      Sun Java System Messaging Server 6.3

    Benchmark Description

    The SPECmail2009 benchmark measures the ability of corporate e-mail systems to meet today's demanding e-mail users over fast corporate local area networks (LAN). The SPECmail2009 benchmark simulates corporate mail server workloads that range from 250 to 10,000 or more users, using industry standard SMTP and IMAP4 protocols. This e-mail server benchmark creates client workloads based on a 40,000 user corporation, and uses folder and message MIME structures that include both traditional office documents and a variety of rich media content. The benchmark also adds support for encrypted network connections using industry standard SSL v3.0 and TLS 1.0 technology. SPECmail2009 replaces all versions of SPECmail2008, first released in August 2008. The results from the two benchmarks are not comparable.

    Software on one or more client machines generates a benchmark load for a System Under Test (SUT) and measures the SUT response times. A SUT can be a mail server running on a single system or a cluster of systems.

    A SPECmail2009 'run' simulates a 100% load level associated with the specific number of users, as defined in the configuration file. The mail server must maintain a specific Quality of Service (QoS) at the 100% load level to produce a valid benchmark result. If the mail server does maintain the specified QoS at the 100% load level, the performance of the mail server is reported as SPECmail_Ent2009 SMTP and IMAP Users at SPECmail2009 Sessions per hour. The SPECmail_Ent2009 users at SPECmail2009 Sessions per Hour metric reflects the unique workload combination for a SPEC IMAP4 user.

    Key Points and Best Practices

    • Each Sun StorageTek 2540 array was configured with 6 hardware RAID1 volumes. A total of 36 RAID1 volumes were configured with 24 of size 146GB and 12 of size 73GB. Four ZPOOLs of (6x146GB RAID1 volumes) were mounted as the four primary message stores and ZFS file systems. Four ZPOOLs of (8x73GB RAID1 volumes) were mounted as the four primary message indexes. The hardware RAID1 volumes were created with 64K stripe size without read ahead turned on. The 7x146GB internal drives were used to create four ZPOOLs and ZFS file systems for the LDAP, store metadata, queue and the mailserver log.

    • The clients used these Java options: java -d64 -Xms4096m -Xmx4096m -XX:+AggressiveHeap

    • See the SPEC Report for all OS, network and messaging server tunings.

    See Also

    Disclosure Statement

    SPEC, SPECmail reg tm of Standard Performance Evaluation Corporation. Results as of 08/07/2009 on www.spec.org. SPECmail2009: Sun SPARC Enterprise T5240 (16 cores, 2 chips) SPECmail_Ent2009 12000 users at 57,758 SPECmail2009 Sessions/hour. Apple Xserv3,1 (8 cores, 2 chips) SPECmail_Ent2009 6000 users at 28,887 SPECmail2009 Sessions/hour.

    Wednesday Jul 22, 2009

    Sun has upgraded the UltraSPARC T2 and UltraSPARC T2 Plus processors to 1.6 GHz. As described in some detail in yesterday's post, new results show SPEC CPU2006 performance improvements vs. previous systems that often exceed the clock speed improvement.  The scaling can be attributed to both memory system improvements and software improvements, such as the Sun Studio 12 Update 1 compiler.

    A MHz improvement within a product line is often useful.  If yesterday's chip runs at speed n and today's at n*1.12 then, intuitively, sure, I'll take today's.

    Comparing MHz across product lines is often counter-intuitive.  Consider that Sun's new systems provide:

    • up to 68% more throughput than the 4.7 GHz POWER6+ [1], and
    • up to 3x the throughput of the Itanium 9150N [2].

    The comparisons are particularly striking when one takes into account the cache size advantage for both the POWER6+ and the Itanium 9150N, and the MHz advantage for the POWER6+:

    Processor GHz Number of
    hw cache levels
    Size of
    last cache
    (per chip)
    SPECint_rate_base2006
    UltraSPARC T2
    UltraSPARC T2 Plus
    1.6 2 4 MB 1 chip: 89
    2 chips: 171
    4 chips: 338
    POWER6+ 4.7 3 32 MB Best 2 chip result: 102. UltraSPARC T2 Plus delivers 68% more integer throughput [1]
    Itanium 9150N 1.6 3 24 MB Best 4 chip result: 114. UltraSPARC T2 Plus delivers 3x the integer throughput. [2]

    These are per-chip results, not per-core or per-thread. Sun's CMT processors are designed for overall system throughput: how much work can the overall system get done.  

    A mystery: With comparatively smaller caches and modest clock rates, why do the Sun CMT processors win?

    The performance hole: Memory latency. From the point of view of a CPU chip, the big performance problem is that memory latency is inordinately long compared to chip cycle times.

    A hardware designer can attempt to cover up that latency with very large caches, as in the POWER6+ and Itanium, and this works well when running a small number of modest-sized applications. Large caches become less helpful, though, as workloads become more complex.

    MHz isn't everything. In fact, MHz hardly counts at all when the problem is memory latency. Suppose the hot part of an application looks like this:

      loop:
           computational instruction
           computational instruction
           computational instruction
           memory access instruction
           branch to loop
    

    For an application that looks like this, the computational instructions may complete in only a few cycles, while the memory access instruction may easily require on the order of 100ns - which, for a 1 GHz chip, is on the order of 100 cycles. If the processor speed is increased by a factor of 4, but memory speed is not, then memory is still 100ns away, and when measured in cycles, it is now 400 cycles distant. The overall loop hardly speeds up at all.

    Lest the reader think I am making this up - consider page 8 of this IBM talk from April, 2008 regarding the POWER6:

    latencies

    The IBM POWER systems have some impressive performance characteristics - if your application is tiny enough to fit in its first or second level cache. But memory latency is not impressive. If your workload requires multiple concurrent threads accessing a large memory space, Sun's CMT approach just might be a better fit.

    Operating System Overhead A context switch from one process to another is mediated by operating system services. The OS parks context from the process that is currently running - typically saving dozens of program registers and other context (such as virtual address space information); decides which process to run next (which may require access to several OS data structures); and loads the context for the new process (registers, virtual address context, etc.). If the system is running many processes, then caches are unlikely to be helpful during this context switch, and thousands of cycles may be spent on main memory accesses.

    Design for throughput: Sun's CMT approach handles the complexity of real-world applications by allowing up to 64 processes to be simultaneously on-chip. When a long-latency stall occurs, such as an access to main memory, the chip switches to executing instructions on behalf of other, non-stalled threads, thus improving overall system throughput. No operating system intervention is required as resources are shared among the processes on the chip.

    [1] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090427-07263.html
    [2] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090522-07485.html

    Competitive results retrieved from www.spec.org   20 July 2009.  Sun's CMT results have been submitted to SPEC.  SPEC, SPECfp, SPECint are registered trademarks of the Standard Performance Evaluation Corporation.

    This blog copyright 2009 by John Henning