Tuesday Nov 24, 2009

The Sun SPARC Enterprise M9000 server (64 processors, 256 cores, 512 threads) set a World Record on the SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.
  • The Sun SPARC Enterprise M9000 server with 2.88 GHz SPARC64 VII processors achieved 32,000 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 server result is 8.6x faster than the only IBM 5GHz POWER6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any IBM 595 results on the current SAP enhancement package 4 for SAP ERP 6.0 (unicode) Standard Sales and Distribution (SD) Benchmark. This benchmark has been current for almost a year. IBM p595 systems only have 8x more cores than the system than IBM system 550.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • This new result is 1.84x times greater than the previous record result delivered on the Sun SPARC Enterprise M9000 server which used 32 processors.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note 1139642 for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Results (in decreasing performance)

(ERP 6.0 EP is the current version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
64xSPARC 64 VII @2.88GHz
1152 GB
Solaris 10
Oracle10g
32,000 2009
6.0 EP4
(Unicode)
175,600 18-Nov-09
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    32,000
    Average dialog response time:
    0.93 seconds
    Throughput:

    Fully processed order line items/hour:
    3,512,000

    Dialog steps/hour:
    10,536,000

    SAPS:
    175,600
    SAP Certification:
    2009046

Hardware Configuration:

    Sun SPARC Enterprise M9000
      64 x 2.88GHz SPARC64 VII, 1152 GB memory

Software Configuration:

    Solaris 10
    SAP enhancement package 4 for SAP ERP 6.0 (unicode)
    Oracle10g

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmarks as of 11/18/09: Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 32,000 SAP SD Users, 64 x 2.88 GHz SPARC VII, 1152 GB memory, Oracle10g, Solaris10, Cert# 2009046. Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Friday Nov 20, 2009

Significance of Results

A Sun Blade 6048 chassis with 48 Sun Blade X6275 server modules ran benchmarks using the NAMD molecular dynamics applications software. NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD is driven by major trends in computing and structural biology and received a 2002 Gordon Bell Award.

  • The cluster of 32 Sun Blade X6275 server modules was 9.2x faster than the 512 processor configuration of the IBM BlueGene/L.

  • The cluster of 48 Sun Blade X6275 server modules exhibited excellent scalability for NAMD molecular dynamics simulation, up to 37.8x speedup for 48 blades relative to 1 blade.

  • For largest molecule considered, the cluster of 48 Sun Blade X6275 server modules achieved a throughput of 0.028 seconds per simulation step.
Molecular dynamics simulation is important to biological and materials science research. Molecular dynamics is used to determine the low energy conformations or shapes of a molecule. These conformations are presumed to be the biologically active conformations.

Performance Landscape

The NAMD Performance Benchmarks web page plots the performance of NAMD when the ApoA1 benchmark is executed on a variety of clusters. The performance is expressed in terms of the time in seconds required to execute one step of the molecular dynamics simulation, multiplied by the number of "processors" on which NAMD executes in parallel. The following table compares the performance of the Sun Blade X6275 cluster to several of the clusters for which performance is reported on the web page. In this table, the performance is expressed in terms of the time in seconds required to execute one step of the molecular dynamics simulation. A smaller number implies better performance.

Cluster Name and Interconnect Throughput for 128 Cores
(seconds per step)
Throughput for 256 Cores
(seconds per step)
Throughput for 512 Cores
(seconds per step)
Sun Blade X6275 InfiniBand 0.014 0.0073 0.0048
Cambridge Xeon/3.0 InfiniPath 0.016 0.0088 0.0056
NCSA Xeon/2.33 InfiniBand 0.019 0.010 0.008
AMD Opteron/2.2 InfiniPath 0.025 0.015 0.008
IBM HPCx PWR4/1.7 Federation 0.039 0.021 0.013
SDSC IBM BlueGene/L MPI 0.108 0.061 0.044

The following tables report results for NAMD molecular dynamics using a cluster of Sun Blade X6275 server modules. The performance of the cluster is expressed in terms of the time in seconds that is required to execute one step of the molecular dynamics simulation. A smaller number implies better performance.

Blades Cores STMV molecule (1) f1 ATPase molecule (2) ApoA1 molecule (3)
Thruput
(secs/ step)
spdup effi'cy Thruput
(secs/ step)
spdup effi'cy Thruput
(secs/ step)
spdup effi'cy
48 768 0.0277 37.8 79% 0.0075 35.2 73% 0.0039 22.2 46%
36 576 0.0324 32.3 90% 0.0096 27.4 76% 0.0045 19.3 54%
32 512 0.0368 28.4 89% 0.0104 25.3 79% 0.0048 18.1 57%
24 384 0.0481 21.8 91% 0.0136 19.3 80% 0.0066 13.2 55%
16 256 0.0715 14.6 91% 0.0204 12.9 81% 0.0073 11.9 74%
12 192 0.0875 12.0 100% 0.0271 9.7 81% 0.0096 9.1 76%
8 128 0.1292 8.1 101% 0.0337 7.8 98% 0.0139 6.3 79%
4 64 0.2726 3.8 95% 0.0666 4.0 100% 0.0224 3.9 98%
1 16 1.0466 1.0 100% 0.2631 1.0 100% 0.0872 1.0 100%

spdup - speedup versus 1 blade result
effi'cy - speedup efficiency versus 1 blade result

(1) Satellite Tobacco Mosaic Virus (STMV) molecule, 1,066,628 atoms, 12 Angstrom cutoff, Langevin dynamics, 500 time steps
(2) f1 ATPase molecule, 327,506 atoms, 11 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps
(3) ApoA1 molecule, 92,224 atoms, 12 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps

Results and Configuration Summary

Hardware Configuration

    48 x Sun Blade X6275, each with
      2 x (2 x 2.93 GHz Intel QC Xeon X5570 (Nehalem) processors)
      2 x (24 GB memory)
      Hyper-Threading (HT) off, Turbo Mode on

Software Configuration

    SUSE Linux Enterprise Server 10 SP2 kernel version 2.6.16.60-0.31_lustre.1.8.0.1-smp
    OpenMPI 1.3.2
    gcc 4.1.2 (1/15/2007), gfortran 4.1.2 (1/15/2007)

Benchmark Description

Molecular dynamics simulation is widely used in biological and materials science research. NAMD is a public-domain molecular dynamics software application for which a variety of molecular input directories are available. Three of these directories define:
  • the Satellite Tobacco Mosaic Virus (STMV) that comprises 1,066,628 atoms
  • the f1 ATPase enzyme that comprises 327,506 atoms
  • the ApoA1 enzyme that comprises 92,224 atoms
Each input directory also specifies the type of molecular dynamics simulation to be performed, for example, Langevin dynamics with a 12 Angstrom cutoff for 500 time steps, or particle mesh Ewald dynamics with an 11 Angstrom cutoff for 500 time steps.

Key Points and Best Practices

Models with large numbers of atoms scale better than models with small numbers of atoms.

The Intel QC X5570 processors include a turbo boost feature coupled with a speed-step option in the CPU section of the Advanced BIOS settings. Under specific circumstances, this can provide cpu overclocking which increases the processor frequency from 2.93GHz to 3.33GHz. This feature was was enabled when generating the results reported here.

See Also

Disclosure Statement

NAMD, see http://www.ks.uiuc.edu/Research/namd/performance.html for more information, results as of 11/17/2009.

Tuesday Oct 13, 2009

Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark Sun SPARC Enterprise M9000/32 SPARC64 VII

World Record on 32-processor using SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • The Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) set a World Record on 32-processor using SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark, as Oct. 12th, 2009.

  • The 32-way Sun SPARC Enterprise M9000 with 2.88 GHz SPARC64 VII+ processors achieved 17,430 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 result is 4.6x faster than the only IBM 5GHz Power6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any p595 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    17,430
    Average dialog response time:
    0.95 seconds
    Throughput:

    Fully processed order line items/hour:
    1,909,670

    Dialog steps/hour:
    5,729,000

    SAPS:
    95,480
    SAP Certification:
    2009038

Hardware Configuration:

    Sun SPARC Enterprise M9000
      32 x 2.88GHz SPARC64 VII, 1024 GB memory
      6 x 6140 storage arrays

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle10g

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 10/12/09: Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Monday Oct 12, 2009

Significance of Results

Results on the Sun Storage 6180 Array with 8Gb connectivity are presented for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6180 Array outperforms the IBM DS5020 by 77% in price performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6180 Array outperforms the IBM DS5020 by 91% in price performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6180 Array is 50% faster than the previous generation, the Sun Storage 6140 Array and IBM DS4700 on the SPC-2 benchmark using RAID 5 data protection.

Performance Landscape

SPC-2 Performance Chart (in increasing price-performance order)

Sponsor System SPC-2 MBPS $/SPC-2 MBPS ASU Capacity (GB) TSC Price Data Protection Level Date Results Identifier
Sun SS6180 1,286.74 $45.47 3,504.693 $58,512 RAID 6 10/08/09 B00044
IBM DS5020 1,286.74 $87.04 3,504.693 $112,002 RAID 6 10/08/09 B00042
Sun SS6180 1,244.89 $42.53 3,504.693 $52,951 RAID 5 10/08/09 B00043
IBM DS5020 1,244.89 $75.30 3,504.693 $93,742 RAID 5 10/08/09 B00041
Sun J4400 887.44 $25.63 23,965.918 $22,742 unprotected 08/15/08 B00034
IBM DS4700 823.62 $106.73 1,748.874 $87,903 RAID 5 04/01/08 B00028
Sun ST6140 790.67 $67.82 1,675.037 $53,622 RAID 5 02/13/07 B00017
Sun ST2540 735.62 $37.32 2,177.548 $27,451 RAID 5 04/10/07 B00021
IBM DS3400 731.25 $34.36 1,165.933 $25,123 RAID 5 02/27/08 B00027
Sun ST2530 672.05 $26.15 1,451.699 $17,572 RAID 5 08/16/07 B00026
Sun J4200 548.80 $22.92 11,995.295 $12,580 Unprotected 07/10/08 B00033

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

    30 146.8GB 15K RPM drives (for RAID 5)
    36 146.8GB 15K RPM drives (for RAID 6)
    4 Qlogic HBA

Server Configuration:

    IBM system x3850 M2

Software Configuration:

    MS Win 2003 Server SP2
    SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6180 Array 1,286.74 SPC-2 MBPS, $/SPC-2 MBPS $45.47, ASU Capacity 3,504.693 GB, Protect RAID 6, Cost $58,512.00, Ident. B00044. Sun Storage 6180 Array 1,244.89 SPC-2 MBPS, $/SPC-2 MBPS $42.53, ASU Capacity 3,504.693 GB, Protect RAID 5, Cost $52,951.00, Ident. B00043.

Monday Oct 12, 2009

Significance of Results

Results on the Sun Storage 6180 Array with 8Gb connectivity are presented for the SPC-1 benchmark.
  • The Sun Storage 6180 Array outperforms the IBM DS5020 by 72% in price performance on the SPC-1 benchmark.

  • The Sun Storage 6180 Array is 50% faster than the previous generation, Sun Storage 6140 Array and IBM DS4700 on the SPC-1 benchmark.

  • The Sun Storage 6180 Array betters the HDS 2100 by 27% in price performance on the SPC-1 benchmark.

  • The Sun Storage 6180 Array has 16% better IOPS/Drive performance than the HDS 2100 on the SPC-1 benchmark.

Performance Landscape

SPC-1 Performance Chart (in increasing price-performance order)

Sponsor System SPC-1 IOPS $/SPC-1 IOPS ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
HDS AMD 2300 42,502.61 $6.96 7,955.000 $295,740 Mirroring 3/24/09 A00077
HDS AMD 2100 31,498.58 $5.85 3,967.500 $187,321 Mirroring 3/24/09 A00076
Sun SS6180 (8Gb) 26,090.03 $4.70 5,145.060 $122,623 Mirroring 10/09/09 A00084
IBM DS5020 (8Gb) 26,090.03 $8.08 5,145.060 $210,782 Mirroring 8/25/09 A00081
Fujitsu DX80 19,492.86 $3.45 5,355.400 $67,296 Mirroring 9/14/09 A00082
Sun STK6140 (4Gb) 17,395.53 $4.93 1,963.269 $85,823 Mirroring 10/16/06 A00048
IBM DS4700 (4Gb) 17,195.84 $11.67 1,963.270 $200,666 Mirroring 8/21/06 A00046

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

    80 x 146.8GB 15K RPM drives
    8 Qlogic HBA

Server Configuration:

    IBM system x3850 M2

Software Configuration:

    MS Windows 2003 Server SP2
    SPC-1 benchmark kit

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price/performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS reg tm of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6180 Array 26,090.03 SPC-1 IOPS, ASU Capacity 5,145.060GB, $/SPC-1 IOPS $4.70, Data Protection Mirroring, Cost $122,623, Ident. A00084.


Tuesday Sep 22, 2009

Two-Processor Performance using 8 Virtual CPU Solaris 10 Container Configuration:
  • Sun achieved 36% better performance using Solaris and Solaris 10 containers than a similar configuration on SUSE Linux using VMware ESX Server 4.0 on the same benchmark both using 8 virtual cpus.
  • Solaris Containers are the best virtualization technology for SAP projects and has been supported for more than 4 years. Other virtualization technologies suffer various overheads that decrease performance.
  • The Sun Fire X4270 server with 48G memory and a Solaris 10 container configured with 8 virtual CPUs achieved 2800 SAP SD Benchmark users and beat the Fujitsu PRIMERGY RX300 S5 server with 96G memory and the SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0 by 36%. Both results used the same CPUs and were running the SAP ERP application release 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark.
  • Both the Sun and Fujitsu results were run at 50% and 48% utilization respectively. With these servers being half utilized, there is headroom for additional performance.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details. Note: username and password for SAP Service Marketplace required.
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details. Note: username and password for SAP Service Marketplace required.

SAP-SD 2-Tier Performance Landscape (in decreasing performance order).


SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results (New version of the benchmark as of January 2009)

System OS
Database
Virtualized? Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
no 3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
no 3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SUSE Linux Ent Svr 10
MaxDB 7.8
no 3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10 container
(8 virtual CPUs)
Oracle 10g
YES
50% util
2,800 2009
6.0 EP4
(Unicode)
15,320 7,660 10-Sep-09
Fujitsu PRIMERGY RX300 S5
2xIntel Xeon X5570 @2.93GHz
96 GB
SUSE Linux Ent Svr 10 on
VMware ESX Server 4.0
MaxDB 7.8
YES
48% util
2,056 2009
6.0 EP4
(Unicode)
11,230 5,615 04-Aug-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun StorageTek CSM200 with 32 * 73GB 15KRPM 4Gb FC-AL and 32 * 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10 container configured with 8 virtual CPUs
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Sun has submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

      Number of benchmark users:
    2,800
      Average dialog response time:
    0.971 s

    Fully processed order line:
    306,330

    Dialog steps/hour:
    919,000

    SAPS:
    15,320
      SAP Certification:
    2009034

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

  • Solaris 10 Container best practices how-to guide

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 09/10/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10, Cert# 2009006. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10 container configured with 8 virtual CPUs, Cert# 2009034. Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Tuesday Sep 01, 2009

Significance of Results

Sun SPARC Enterprise T5220, T5240 and T5440 servers ran benchmarks using the Aho-Corasick string searching algorithm. String searching or pattern matching are important to a variety of commercial, government and HPC applications. One of the core functions needed for text identification algorithms in data repositories is real-time string searching. For this benchmark, the IBM, HP and Sun systems used the Aho-Corasick algorithm for string searching.

Sun SPARC Enterprise T5440

  • A 1.6 GHz Sun SPARC Enterprise T5440 server could search a book as tall as Mt. Everest (29,208 feet, 861 GB book) in 61 seconds, which corresponds to a string search rate of 14.2 GB/s.

  • A 1.6 GHz Sun SPARC Enterprise T5440 server can search at a rate of 14.2 GB/s, which corresponds to searching a book containing one terabyte of data (34,745 feet high) in only 70 seconds.

  • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching at a rate of 14.2 GB/s which is 29.9 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s

  • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching 3.7 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5440 server has a 1.7 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other tests).

  • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 12% improvement over the 1.4 GHz Sun SPARC Enterprise T5440.

  • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 2x speedup over the 1.6 GHz Sun SPARC Enterprise T5240 server which demonstrated a 2.3x speedup over the 1.4 GHz Sun SPARC Enterprise T5220 server.

Sun SPARC Enterprise T5240

  • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching at a rate of 7.22 GB/s which is 15.4 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

  • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching 1.9 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5240 server has a 2.4 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other

  • The 1.6 GHz Sun SPARC Enterprise T5240 server demonstrated a 14% speedup over the 1.4 GHz Sun SPARC Enterprise T5240 server.

Sun SPARC Enterprise T5220

  • The 1-chip 1.4 GHz Sun SPARC Enterprise T5220 server performed string searching at a rate of 3.16 GB/s which is 6.7 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

Performance Landscape

System Throughput
(GB/sec)
Chips Cores
Sun SPARC Enterprise T5440 (1.6 GHz) 14.2 4 32
Sun SPARC Enterprise T5440 (1.4 GHz) 12.7 4 32
Sun SPARC Enterprise T5240 (1.6 GHz) 7.2 2 16
Sun SPARC Enterprise T5240 (1.4 GHz) 6.4 2 16
HP DL-580 (2.9 GHz) 3.9 4 16
Sun SPARC Enterprise T5220 (1.4 GHz) 3.2 1 8
IBM Cell Broadband Engine DD3 Blade (3.2 GHz) 0.475 2 16

Results and Configuration Summary

Hardware Configuration:
    Sun SPARC Enterprise T5440 (1.6 GHz)
      4 x 1.6 GHz UltraSPARC T2 Plus processors
      256 GB
    Sun SPARC Enterprise T5440 (1.4 GHz)
      4 x 1.4 GHz UltraSPARC T2 Plus processors
      128 GB
    Sun SPARC Enterprise T5240 (1.6 GHz)
      2 x 1.6 GHz UltraSPARC T2 Plus processors
      64 GB
    Sun SPARC Enterprise T5240 (1.4 GHz)
      2 x 1.4 GHz UltraSPARC T2 Plus processors
      64 GB
    Sun SPARC Enterprise T5220 (1.4 GHz)
      1 x 1.4 GHz UltraSPARC T2 processor
      32 GB

Software Configuration:

    Sun SPARC Enterprise T5440 (1.6 GHz)
      OpenSolaris 2009.06
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5440 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5240 (1.6 GHz)
      OpenSolaris 2009.06
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5240 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5220 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)

Benchmark Description

One of the core functions needed for text identification algorithms in data repositories is real-time string searching. This string searching benchmark demonstrates the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code creation and speed of code execution. In IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008, IBM describes a variant of the Aho-Corasick string searching algorithm that uses deterministic finite automata. The algorithm first constructs a graph that represents a dictionary, then walks that graph using successive input characters from a text file. Each "state" in the graph includes a state transition table (STT) that is accessed using the next input character from the text file to determine the address of the next state in the graph. IBM defines an automaton as a two-step loop that: (1) obtains the address of the next state from the STT, and (2) fetches the next state in the graph.

IBM reports the performance of its Cell Broadband Engine (CBE) to execute this algorithm to search a 4.4 MB version of the King James Bible using a dictionary of the 20,000 most used words in the English language (average word length of 7.59 characters). Each of the 8 synergistic processing elements (SPEs) of each of the two CBEs executes 16 automata, for a total of 256 automata. All automata and hence all SPEs access a single, shared dictionary.

IBM describes elaborate optimizations of the Aho-Corasick algorithm, including state shuffling, state replication, alphabet shuffling and state caching. These optimizations were required to: (1) overcome "memory congestion", i.e., contention amongst the SPEs for access to the shared dictionary, and (2) compensate for the limited local storage that is associated with each SPE. These optimizations were necessary to achieve the performance reported for the CBE DD3 Blade.

IBM does not provide references that indicate where to obtain the dictionary and Bible. IBM reports the algorithmic performance in Gbits/s but does not indicate whether an 8-bit byte is extended to 10 bits as required for network transmission.

In order to closely approximate the dictionary and Bible that were used by IBM, Sun used a dictionary of 25,143 English words (the Open Solaris file cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/spell/list) for which the average word length is 7.2 characters, and a 4.6 MB version of the King James Bible (www.patriot.net/users/bmcgin/kjv12.zip). For reporting of results in Gbits/s, the length of a byte is assumed to be 8 bits.

Key Points and Best Practices

  • Power was measured during execution of the Aho-Corasick algorithm using a WattsUp power meter, and the average rate of power consumption is presented.

  • The Aho-Corasick algorithm as deployed on the IBM Cell Broadband Engine DD3 Blade required substantial optimization and tuning to achieve the reported performance, whereas on the Sun SPARC Enterprise T5220, T5240 or T5440 servers only a basic implementation of the algorithm and a simple compilation were needed.

  • In order to demonstrate the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code generation and speed of code execution, Sun implemented the Aho-Corasick algorithm using ANSI C. No optimizations of the algorithm were required to achieve the performance reported for the T5220, T5240 and T5440. The source code was compiled using the -m32 -xO3 and -xopenmp options. The dictionary is represented using a graph that comprises 82 MB. Each core of the T5220, T5240 or T5440 executes 8 automata using one OpenMP thread per automaton. Thus, the T5220 executes 64 total automata, the T5240 executes 128 total automata and the T5440 executes 256 total automata. All automata and hence all cores access a single, shared dictionary. Access to this dictionary is accelerated by the large, shared L2 caches of the Sun SPARC Enterprise T5220, T5240 and T5440.

See Also

Friday Aug 28, 2009

Sun Fire X4270 Server World Record Two Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record 2-processor performance result on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark on the Sun Fire X4270 server.

  • The Sun Fire X4270 server with two Intel Xeon X5570 processors (8 cores, 16 threads) achieved 3,800 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle 10g database and Solaris 10 operating system.

  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.

  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the IBM System 550 server using 4 POWER6 processors, 64 GB memory and the AIX 6.1 operating system.
  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the HP ProLiant BL460c G6 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Windows Server 2008 operating system.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,700 2009
6.0 EP4
(Unicode)
20,300 10,150 30-Mar-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,415 2009
6.0 EP4
(Unicode)
18,670 9,335 04-Aug-09
Fujitsu PRIMERGY TX/RX 300 S5
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,328 2009
6.0 EP4
(Unicode)
18,170 9,085 13-May-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,310 2009
6.0 EP4
(Unicode)
18,070 9,035 27-Mar-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,300 2009
6.0 EP4
(Unicode)
18,030 9,015 27-Mar-09
Fujitsu PRIMERGY BX920 S1
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,260 2009
6.0 EP4
(Unicode)
17,800 8,900 18-Jun-09
NEC Express5800
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,250 2009
6.0 EP4
(Unicode)
17,750 8,875 28-Jul-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SuSE Linux Enterprise Server 10
MaxDB 7.8
3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09

Complete benchmark results may be found at the SAP benchmark website: http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun Storage 6780 with 48 x 73GB 15KRPM 4Gb FC-AL and 16 x 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Certified Results:

          Performance: 3800 benchmark users
          SAP Certification: 2009033

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

See Also

Benchmark Tags

World-Record, Performance, SAP-SD, Solaris, Oracle, Intel, X64, x86, HP, IBM, Application, Database

Disclosure Statement

    Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 08/21/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,415 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009031. Fujitsu PRIMERGY TX/RX 300 S5 (2 processors, 8 cores, 16 threads) 3,328 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009014. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,310 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009003. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,300 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009004. Fujitsu PRIMERGY BX920 S1 (2 processors, 8 cores, 16 threads) 3,260 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009024. NEC Express5800 (2 processors, 8 cores, 16 threads) 3,250 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009027. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Thursday Aug 27, 2009

Significance of Results

A Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2 Plus processors at 1.6GHz delivered a result of 422782 SPECjbb2005 bops, 26424 SPECjbb2005 bops/JVM. The Sun SPARC Enterprise T5240 consumed an average of 875 Watts of power during the execution of the benchmark.

  • The Sun SPARC Enterprise T5240 server running 2x 1.6 GHz UltraSPARC T2 Plus processor delivered 5% better performance than an IBM Power 570 with 4x 4.7 GHz POWER6 processors as measured by the SPECjbb2005 benchmark.

  • The Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2 Plus processors at 1.6GHz demonstrated 10% better performance than the Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2 Plus processors at 1.4GHz.
  • One Sun SPARC Enterprise T5240 (two 1.6GHz UltraSPARC T2 Plus chips, 2RU) has 2.3 times the power/performance than the IBM Power 570 (8RU) that used four 4.7GHz POWER6 chips.
  • The Sun SPARC Enterprise T5240 used OpenSolaris 2009.06 and the Sun JDK 1.6.0_14 Performance Release to obtain this result.

Performance Landscape

SPECjbb2005 Performance Chart (ordered by performance), select results presented.

bops : SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips Cores Threads GHz Type bops bops/JVM
Sun SPARC Enterprise T5240 2 16 128 1.6 UltraSPARC T2 Plus 422782 26424
IBM Power 570 4 8 16 4.7 POWER6 402923 100731
Sun SPARC Enterprise T5240 2 16 128 1.4 UltraSPARC T2 Plus 384934 24058

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5240
      2 x 1.6 GHz UltraSPARC T2 Plus processors
      64 GB

Software Configuration:

    OpenSolaris 2009.06
    Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Key Points and Best Practices

  • Each JVM executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Each JVM was bound to a separate processor containing 1 core to reduce memory access latency using the physical memory closest to the processor.

See Also

Disclosure Statement

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 8/25/2009 on http://www.spec.org.
Sun SPARC T5240 (2 chips, 16 cores) 422782 SPECjbb2005 bops, 26424 SPECjbb2005 bops/JVM;Sun SPARC T5240 (2 chips, 16 cores) 384934 SPECjbb2005 bops, 24058 SPECjbb2005 bops/JVM; IBM Power 570 (4 chips, 8 cores) 402923 SPECjbb2005 bops, 100731 SPECjbb2005 bops/JVM.

Sun watts were measured on the system during the test.

IBM p 570 4P (2 building blocks) power specifications calculated as 80% of maximum input power reported 7/8/09 in 'Facts and Features Report': ftp://ftp.software.ibm.com/common/ssi/pm/br/n/psb01628usen/PSB01628USEN.PDF

Wednesday Aug 26, 2009

Significance of Results

A Sun SPARC Enterprise T5220 server equipped with one UltraSPARC T2 processor at 1.6GHz delivered a World Record single-chip result of 231464 SPECjbb2005 bops, 28933 SPECjbb2005 bops/JVM. The Sun SPARC Enterprise T5220 consumed an average of 520 Watts of power during the execution of this benchmark.

  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 3% better performance over the Fujitsu TX100 result of 223691 SPECjbb2005 bops which used one 3.16 GHz Xeon X3380 processor.
  • The Sun SPARC Enterprise T5220 (one 1.6 GHz UltraSPARC T2 chip) demonstrated 8% better performance over the IBM x3200 result of 214578 SPECjbb2005 bops which used one 3.16 GHz Xeon X3380 processor.
  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 10% better performance over the Fujitsu RX100 result of 211144 SPECjbb2005 bops which used one 3.16 GHz Xeon X3380 processor.
  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 19% better performance over the IBM X3350 result of 194256 SPECjbb2005 bops which used one 3 GHz Xeon X3370 processor.
  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 2.6X the performance over the IBM p570 result of 88089 SPECjbb2005 bops which used one 4.7 GHz POWER6 processor.
  • One Sun SPARC Enterprise T5220 (one 1.6GHz UltraSPARC T2 Plus chip, 2RU) has 2.1 the power/performance than the IBM Power 570 (4RU) that used two 4.7GHz POWER6 chips.
  • The Sun SPARC Enterprise T5220 used OpenSolaris 2009.06 and the Sun JDK 1.6.0_14 Performance Release to obtain this result.

Performance Landscape

SPECjbb2005 Performance Chart (ordered by performance)

bops : SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips Cores Threads GHz Type bops bops/JVM
Sun SPARC Enterprise T5220 1 8 64 1.6 UltraSPARC T2 231464 28933
Sun Blade T6320 1 8 64 1.6 UltraSPARC T2 229576 28697
Fujitsu TX100 1 4 4 3.16 Intel Xeon 223691 111846
IBM x3200 M2 1 4 4 3.16 Intel Xeon 214578 107289
Fujitsu RX100 1 4 4 3.16 Intel Xeon 211144 105572
IBM Power 570 2 4 8 4.7 POWER6 205917 102959
IBM x3350 1 4 4 3.0 Intel Xeon 194256 97128
Sun SPARC Enterprise T5220 1 8 64 1.4 UltraSPARC T2 192055 24007
IBM Power 570 1 2 4 4.7 POWER6 88089 88089

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5220
      1x 1.6 GHz UltraSPARC T2 processor
      64 GB

Software Configuration:

    OpenSolaris 2009.06
    Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Key Points and Best Practices

  • Each JVM executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Each JVM was bound to a separate processor containing 1 core to reduce memory access latency using the physical memory closest to the processor.

See Also

Disclosure Statement

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 8/25/2009 on http://www.spec.org.
Sun SPARC T5220 231464 SPECjbb2005 bops, 28933 SPECjbb2005 bops/JVM Submitted to SPEC for review; IBM p 570 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM; Fujitsu TX100 223691 SPECjbb2005 bops, 111846 SPECjbb2005 bops/JVM; IBM x3350 194256 SPECjbb2005 bops, 97128 SPECjbb2005 bops/JVM; Sun SPARC Enterprise T5120 192055 SPECjbb2005 bops, 24007 SPECjbb2005 bops/JVM.

Sun watts were measured on the system during the test.

IBM p 570 2P (1 building blocks) power specifications calculated as 80% of maximum input power reported 7/8/09 in "Facts and Features Report": ftp://ftp.software.ibm.com/common/ssi/pm/br/n/psb01628usen/PSB01628USEN.PDF

Wednesday Jul 22, 2009

Sun has upgraded the UltraSPARC T2 and UltraSPARC T2 Plus processors to 1.6 GHz. As described in some detail in yesterday's post, new results show SPEC CPU2006 performance improvements vs. previous systems that often exceed the clock speed improvement.  The scaling can be attributed to both memory system improvements and software improvements, such as the Sun Studio 12 Update 1 compiler.

A MHz improvement within a product line is often useful.  If yesterday's chip runs at speed n and today's at n*1.12 then, intuitively, sure, I'll take today's.

Comparing MHz across product lines is often counter-intuitive.  Consider that Sun's new systems provide:

  • up to 68% more throughput than the 4.7 GHz POWER6+ [1], and
  • up to 3x the throughput of the Itanium 9150N [2].

The comparisons are particularly striking when one takes into account the cache size advantage for both the POWER6+ and the Itanium 9150N, and the MHz advantage for the POWER6+:

Processor GHz Number of
hw cache levels
Size of
last cache
(per chip)
SPECint_rate_base2006
UltraSPARC T2
UltraSPARC T2 Plus
1.6 2 4 MB 1 chip: 89
2 chips: 171
4 chips: 338
POWER6+ 4.7 3 32 MB Best 2 chip result: 102. UltraSPARC T2 Plus delivers 68% more integer throughput [1]
Itanium 9150N 1.6 3 24 MB Best 4 chip result: 114. UltraSPARC T2 Plus delivers 3x the integer throughput. [2]

These are per-chip results, not per-core or per-thread. Sun's CMT processors are designed for overall system throughput: how much work can the overall system get done.  

A mystery: With comparatively smaller caches and modest clock rates, why do the Sun CMT processors win?

The performance hole: Memory latency. From the point of view of a CPU chip, the big performance problem is that memory latency is inordinately long compared to chip cycle times.

A hardware designer can attempt to cover up that latency with very large caches, as in the POWER6+ and Itanium, and this works well when running a small number of modest-sized applications. Large caches become less helpful, though, as workloads become more complex.

MHz isn't everything. In fact, MHz hardly counts at all when the problem is memory latency. Suppose the hot part of an application looks like this:

  loop:
       computational instruction
       computational instruction
       computational instruction
       memory access instruction
       branch to loop

For an application that looks like this, the computational instructions may complete in only a few cycles, while the memory access instruction may easily require on the order of 100ns - which, for a 1 GHz chip, is on the order of 100 cycles. If the processor speed is increased by a factor of 4, but memory speed is not, then memory is still 100ns away, and when measured in cycles, it is now 400 cycles distant. The overall loop hardly speeds up at all.

Lest the reader think I am making this up - consider page 8 of this IBM talk from April, 2008 regarding the POWER6:

latencies

The IBM POWER systems have some impressive performance characteristics - if your application is tiny enough to fit in its first or second level cache. But memory latency is not impressive. If your workload requires multiple concurrent threads accessing a large memory space, Sun's CMT approach just might be a better fit.

Operating System Overhead A context switch from one process to another is mediated by operating system services. The OS parks context from the process that is currently running - typically saving dozens of program registers and other context (such as virtual address space information); decides which process to run next (which may require access to several OS data structures); and loads the context for the new process (registers, virtual address context, etc.). If the system is running many processes, then caches are unlikely to be helpful during this context switch, and thousands of cycles may be spent on main memory accesses.

Design for throughput: Sun's CMT approach handles the complexity of real-world applications by allowing up to 64 processes to be simultaneously on-chip. When a long-latency stall occurs, such as an access to main memory, the chip switches to executing instructions on behalf of other, non-stalled threads, thus improving overall system throughput. No operating system intervention is required as resources are shared among the processes on the chip.

[1] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090427-07263.html
[2] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090522-07485.html

Competitive results retrieved from www.spec.org   20 July 2009.  Sun's CMT results have been submitted to SPEC.  SPEC, SPECfp, SPECint are registered trademarks of the Standard Performance Evaluation Corporation.

Tuesday Jul 21, 2009

Oracle BI EE Sun SPARC Enterprise T5440 World Record Performance

The Sun SPARC Enterprise T5440 server running the new 1.6 GHz UltraSPARC T2 Plus processor delivered world record performance on Oracle Business Intelligence Enterprise Edition (BI EE) tests using Sun's ZFS.
  • The Sun SPARC Enterprise T5440 server with four 1.6 GHz UltraSPARC T2 Plus processors delivered the best single system performance of 28K concurrent users on the Oracle BI EE benchmark. This result used Solaris 10 with Solaris Containers and the Oracle 11g Database software.

  • The benchmark demonstrates the scalability of Oracle Business Intelligence Cluster with 4 nodes running in Solaris Containers within single Sun SPARC Enterprise T5440 server.

  • The Sun SPARC Enterprise Server T5440 server with internal SSD and the ZFS file system showed significant I/O performance improvement over traditional disk for Business Intelligence Web Catalog activity.

Performance Landscape

System Processors Users
Chips Cores Threads GHz Type
1 x Sun SPARC Enterprise T5440 4 32 256 1.6 UltraSPARC T2 Plus 28,000
5 x Sun Fire T2000 1 8 32 1.2 UltraSPARC T1 10,000

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5440
      4 x 1.6 GHz UltraSPARC T2 Plus processors
      256 GB
      STK2540 (6 x 146GB)

Software Configuration:

    Solaris 10 5/09
    Oracle BIEE 10.1.3.4 64-bit
    Oracle 11g R1 Database

Benchmark Description

The objective of this benchmark is to highlight how Oracle BI EE can support pervasive deployments in large enterprises, using minimal hardware, by simulating an organization that needs to support more than 25,000 active concurrent users, each operating in mixed mode: ad-hoc reporting, application development, and report viewing.

The user population was divided into a mix of administrative users and business users. A maximum of 28,000 concurrent users were actively interacting and working in the system during the steady-state period. The tests executed 580 transactions per second, with think times of 60 seconds per user, between requests. In the test scenario 95% of the workload consisted of business users viewing reports and navigating within dashboards. The remaining 5% of the concurrent users, categorized as administrative users, were doing application development.

The benchmark scenario used a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards viz. .Service Manager.. The user then selects the .Service Effectiveness. dashboard, which shows him four distinct reports, .Service Request Trend., .First Time Fix Rate., .Activity Problem Areas., and .Cost Per completed Service Call . 2002 till 2005. . The user then proceeds to view the .Customer Satisfaction. dashboard, which also contains a set of 4 related reports. He then proceeds to drill-down on some of the reports to see the detail data. Then the user proceeds to more dashboards, for example .Customer Satisfaction. and .Service Request Overview.. After navigating through these dashboards, he logs out of the application

This benchmark did not use a synthetic database schema. The benchmark tests were run on a full production version of the Oracle Business Intelligence Applications with a fully populated underlying database schema. The business processes in the test scenario closely represents a true customer scenario.

Key Points and Best Practices

Since the server has 32 cores, we created 4 x Solaris Containers with 8 cores dedicated to each of the containers. And a total of four instances of BI server + Presentation server (collectively referred as an 'instance' here onwards) were installed at one instance per container. All the four BI instances were clustered using the BI Cluster software components.

The ZFS file system was used to overcome the 'Too many links' error when there are ~28,000 concurrent users. Earlier the file system has reached UFS limitation of 32767 sub-directories (LINK_MAX) with ~28K users online -- and there are thousands of errors due to the inability to create new directories beyond 32767 directories within a directory. Web Catalog stores the user profile on the disk by creating at least one dedicated directory for each user. If there are more than 25,000 concurrent users, clearly ZFS is the way to go.

See Also:

Oracle Business Intelligence Website,  BUSINESS INTELLIGENCE has other results

Disclosure Statement

Oracle Business Intelligence Enterprise Edition benchmark, see http://www.oracle.com/solutions/business_intelligence/resource-library-whitepapers.html for more. Results as of 7/20/09.

Tuesday Jul 21, 2009

Sun SPARC Enterprise T5440 Server World Record Four Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record performance result with four processors on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark as of July 21, 2009.
  • The Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC-T2 Plus processors (32 cores, 256 threads)achieved 4,720 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle10g database and Solaris 10 OS.
  • Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC T2 Plus processors beats IBM System 550 by 26% using Oracle10g and Solaris 10 even though they both use the same number of processors.
  • Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC T2 Plus processors beats HP ProLiant DL585 G6 using Oracle10g and Solaris 10 with the same number of processors.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun SPARC Enterprise servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun SPARC Enterprise T5440 Server
4xUltraSPARC T2 Plus@1.6GHz
256 GB
Solaris 10
Oracle10g
4,720 2009
6.0 EP4
(Unicode)
25,830 6,458 21-Jul-09
HP ProLiant DL585 G6
4xAMD Opteron 8439 SE @2.8Hz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
4,665 2009
6.0 EP4
(Unicode)
25,530 6,383 10-Jul-09
HP ProLiant BL685c G6
4xAMD Opteron Processor 8435 @2.6GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
4,422 2009
6.0 EP4
(Unicode)
24,230 6,058 29-May-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL585 G5
4xAMD Opteron Processor 8393 SE@3.1GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,430 2009
6.0 EP4
(Unicode)
18,730 4,683 24-Apr-09
HP ProLiant BL685 G6
4xAMD Opteron Processor 8389 @2.9GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,118 2009
6.0 EP4
(Unicode)
17,050 4,263 24-Apr-09
NEC Express5800
4xIntel Xeon Processor X7460@2.66GHz
64 GB
Windows Server 2008 Enterprise Edition
SQL Server 2008
2,957 2009
6.0 EP4
(Unicode)
16,170 4,043 28-May-09
Dell PowerEdge M905
4xAMD Opteron Processor 8384@2.7GHz
96 GB
Windows Server 2003 Enterprise Edition
SQL Server 2005
2,129 2009
6.0 EP4
(Unicode)
11,770 2,943 18-May-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun SPARC Enterprise T5440 Server
      4 x 1.6 GHz UltraSPARC T2 Plus processors (4 processors / 32 cores / 256 threads)
      256 GB memory
      3 x STK2540 each with 12 x 73GB/15KRPM disks

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle10g
SAE (Strategic Applications Engineering) and ISV-E (ISV Engineering) have submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

Certified Results

    Performance:
    4720 benchmark users
    SAP Certification:
    2009026

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Sun SPARC Enterprise T5440 Server Benchmark Details

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 07/21/09: Sun SPARC Enterprise T5440 Server (4 processors, 32 cores, 256 threads) 4,720 SAP SD Users, 4x 1.6 GHz UltraSPARC T2 Plus, 256 GB memory, Oracle10g, Solaris10, Cert# 2009026. HP ProLiant DL585 G6 (4 processors, 24 cores, 24 threads) 4,665 SAP SD Users, 4x 2.8 GHz AMD Opteron Processor 8439 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009025. HP ProLiant BL685c G6 (4 processors, 24 cores, 24 threads) 4,422 SAP SD Users, 4x 2.6 GHz AMD Opteron Processor 8435, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009021. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL585 G5 (4 processors, 16 cores, 16 threads) 3,430 SAP SD Users, 4x 3.1 GHz AMD Opteron Processor 8393 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009008. HP ProLiant BL685 G6 (4 processors, 16 cores, 16 threads) 3,118 SAP SD Users, 4x 2.9 GHz AMD Opteron Processor 8389, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009007. NEC Express5800 (4 processors, 24 cores, 24 threads) 2,957 SAP SD Users, 4x 2.66 GHz Intel Xeon Processor X7460, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009018. Dell PowerEdge M905 (4 processors, 16 cores, 16 threads) 2,129 SAP SD Users, 4x 2.7 GHz AMD Opteron Processor 8384, 96 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2009017. Sun Fire X4600M2 (8 processors, 32 cores, 32 threads) 7,825 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384, 128 GB memory, MaxDB 7.6, Solaris 10, Cert# 2008070. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071. SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Tuesday Jul 21, 2009

UltraSPARC T2 and T2 Plus Systems

Improved Performance Over 1.4 GHz

Reported 07/21/09

Significance of Results

Results are presented for the SPEC CPU2006 rate benchmarks run on the new 1.6 GHz Sun UltraSPARC T2 and Sun UltraSPARC T2 Plus processors based systems. The new processors were tested in the Sun CMT family of systems, including the Sun SPARC Enterprise T5120, T5220, T5240, T5440 servers and the Sun Blade T6320 server module.

SPECint_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 57% and 37% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5240 server equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 68% and 48% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The single-chip 1.6 GHz UltraSPARC T2 processor-based Sun CMT servers produced 59% to 68% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 integer throughput metrics.

  • On the four-chip Sun SPARC Enterprise T5440 server, when compared versus the 1.4 GHz version of this server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 25% and 20% as measured by the SPEC CPU2006 integer throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into the 2-chip Sun SPARC Enterprise T5240 server, delivered improvements of 20% and 17% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server, as measured by the SPEC CPU2006 integer throughput metrics.

  • On the single-chip Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered performance improvements of 13% to 17% over the 1.4 GHz version of these servers, as measured by the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a SPECint_rate_base2006 score 3X the best 4-chip Itanium based system.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processors, delivered a SPECint_rate_base2006 score of 338, a World Record score for 4-chip systems running a single operating system instance (i.e. SMP, not clustered).

SPECfp_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 35% and 22% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5240 server, equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 40% and 27% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The single 1.6 GHz UltraSPARC T2 processor based Sun CMT servers produced between 24% and 18% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 floating-point throughput metrics.

  • On the four chip Sun SPARC Enterprise T5440 server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 20% and 17% when compared to 1.4 GHz processors in the same system, as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into a Sun SPARC Enterprise T5240 server, delivered an improvement of 12% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server as measured by the SPEC CPU2006 floating-point throughput metrics.

  • On the single processor Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered a performance improvement over the 1.4 GHz version of these servers of between 11% and 10% as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a peak score 3X the best 4-chip Itanium based system, and base 2.9X, on the SPEC CPU2006 floating-point throughput metrics.

Performance Landscape

SPEC CPU2006 Performance Charts - bigger is better, selected results, please see www.spec.org for complete results. All results as of 7/17/09.

In the tables below
"Base" = SPECint_rate_base2006 or SPECfp_rate_base2006
"Peak" = SPECint_rate2006 or SPECfp_rate2006

SPECint_rate2006 results - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 127 136 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 82.1 104 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 89.2 96.7 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 76.4 85.5
Sun SPARC T5120 8/1 UltraSPARC T2 1417 63 76.2 83.9
IBM System p 570 2/1 POWER6 4700 4 53.2 60.9 Best POWER6 result

SPECint_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Fujitsu CELSIUS R670 8/2 Xeon W5580 3200 16 249 267 Best Nehalem result
Sun Blade X6270 8/2 Xeon X5570 2933 16 223 260
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 168 215 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 171 183 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 142 157
IBM Power 520 4/2 POWER6+ 4700 8 101 124 Best POWER6+ peak
IBM Power 520 4/2 POWER6+ 4700 8 102 122 Best POWER6+ base
HP Integrity rx2660 4/2 Itanium 9140M 1666 4 58.1 62.8 Best Itanium peak
HP Integrity BL860c 4/2 Itanium 9140M 1666 4 61.0 na Best Itanium base

SPECint_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 466 499 Best Nehalem result
Note: clustered, not SMP
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 326 417 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 338 360 New.  World record for
4-chip SMP
SPECint_rate_base2006
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 270 301
IBM Power 550 8/4 POWER6+ 5000 16 215 263 Best POWER6 result
HP Integrity BL870c 8/4 Itanium 9150N 1600 8 114 na Best Itanium result

SPECfp_rate2006 - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 102 106 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 65.2 72.2 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 58.1 62.3
SPARC T5120 8/1 UltraSPARC T2 1417 63 57.9 62.3
SPARC T5220 8/1 UltraSPARC T2 1417 63 57.9 62.3
IBM System p 570 2/1 POWER6 4700 4 51.5 58.0 Best POWER6 result

SPECfp_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
ASUS TS700-E6 8/2 Xeon W5580 3200 16 201 207 Best Nehalem result
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 133 147 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 124 133 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 111 119
IBM Power 520 4/2 POWER6+ 4700 8 88.7 105 Best POWER6+ result
HP Integrity rx2660 4/4 Itanium 9140M 1666 4 54.5 55.8 Best Itanium result

SPECfp_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 361 372 Best Nehalem result
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 259 285 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 254 270 New
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 212 230
IBM Power 550 8/4 POWER6+ 5000 16 188 222 Best POWER6+ result
HP Integrity rx7640 8/4 Itanium 2 9040 1600 8 87.4 90.8 Best Itanium result

Results and Configuration Summary

Test Configurations:


Sun Blade T6320
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5120/T5220
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5240
2 x 1.6 GHz UltraSPARC T2 Plus
128 GB (32 x 4GB)
Solaris 10 5/09
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5440
4 x 1.6 GHz UltraSPARC T2 Plus
256 GB (64 x 4GB)
Solaris 10 5/09
Sun Studio 12 Update 1, gccfss V4.2.1

Results Summary:



T6320 T5120 T5220 T5240 T5440
SPECint_rate_base2006 89.2 89.1 89.1 171 338
SPECint_rate2006 96.7 97.0 97.0 183 360
SPECfp_rate_base2006 64.1 64.1 64.1 124 254
SPECfp_rate2006 68.5 68.5 68.5 133 270

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark, with over 7000 results published in the three years since it was introduced. It measures:

  • "Speed" - single copy performance of chip, memory, compiler
  • "Rate" - multiple copy (throughput)

The rate metrics are used for the throughput-oriented systems described on this page. These metrics include:

  • SPECint_rate2006: throughput for 12 integer benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • SPECfp_rate2006: throughput for 17 floating point benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

There are "base" variants of both the above metrics that require more conservative compilation, such as using the same flags for all benchmarks.

See here for additional information.

Key Points and Best Practices

Result on this page for the Sun SPARC Enterprise T5120 server were measured on a Sun SPARC Enterprise T5220. The Sun SPARC Enterprise T5120 and Sun SPARC Enterprise T5220 are electronically equivalent. A SPARC Enterprise 5120 can hold up to 4 disks, and a T5220 can hold up to 8. This system was tested with 4 disks; therefore, results on this page apply to both the T5120 and the T5220.

Know when you need throughput vs. speed. The Sun CMT systems described on this page provide massive throughput, as demonstrated by the fact that up to 255 jobs are run on the 4-chip system, 127 on 2-chip, and 63 on 1-chip. Some of the competitive chips do have a speed advantage - e.g. Nehalem and Istanbul - but none of the competitive results undertake to run the large number of jobs tested on Sun's CMT systems.

Use the latest compiler. The Sun Studio group is always working to improve the compiler. Sun Studio 12, and Sun Studio 12 Update 1, which are used in these submissions, provide updated code generation for a wide variety of SPARC and x86 implementations.

I/O still counts. Even in a CPU-intensive workload, some I/O remains. This point is explored in some detail at http://blogs.sun.com/jhenning/entry/losing_my_fear_of_zfs.

Disclosure Statement

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from www.spec.org as of 16 July 2009.  Sun's new results quoted on this page have been submitted to SPEC.
Sun Blade T6320 89.2 SPECint_rate_base2006, 96.7 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 89.1 SPECint_rate_base2006, 97.0 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5240 172 SPECint_rate_base2006, 183 SPECint_rate2006, 124 SPECfp_rate_base2006, 133 SPECfp_rate2006;
Sun SPARC Enterprise T5440 338 SPECint_rate_base2006, 360 SPECint_rate2006, 254 SPECfp_rate_base2006, 270 SPECfp_rate2006;
Sun Blade T6320 76.4 SPECint_rate_base2006, 85.5 SPECint_rate2006, 58.1 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 76.2 SPECint_rate_base2006, 83.9 SPECint_rate2006, 57.9 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5240 142 SPECint_rate_base2006, 157 SPECint_rate2006, 111 SPECfp_rate_base2006, 119 SPECfp_rate2006;
Sun SPARC Enterprise T5440 270 SPECint_rate_base2006, 301 SPECint_rate2006, 212 SPECfp_rate_base2006, 230 SPECfp_rate2006;
IBM p 570 53.2 SPECint_rate_base2006, 60.9 SPECint_rate2006, 51.5 SPECfp_rate_base2006, 58.0 SPECfp_rate2006;
IBM Power 520 102 SPECint_rate_base2006, 124 SPECint_rate2006, 88.7 SPECfp_rate_base2006, 105 SPECfp_rate2006;
IBM Power 550 215 SPECint_rate_base2006, 263 SPECint_rate2006, 188 SPECfp_rate_base2006, 222 SPECfp_rate2006;
HP Integrity BL870c 114 SPECint_rate_base2006;
HP Integrity rx7640 87.4 SPECfp_rate_base2006, 90.8 SPECfp_rate2006.

Tuesday Jul 21, 2009

Significance of Results

The Sun Blade T6320 server module equipped with one UltraSPARC T2 processor running at 1.6 GHz delivered a World Record single-chip result while running the SPECjbb2005 benchmark.

  • The Sun Blade T6320 server module powered by one 1.6 GHz UltraSPARC T2 processor delivered a result of 229576 SPECjbb2005 bops, 28697 SPECjbb2005 bops/JVM when running the SPECjbb2005 benchmark.
  • The Sun Blade T6320 server module (with one 1.6 GHz UltraSPARC T2 processor) demonstrated 2.6X better performance than the IBM System p 570 with one 4.7 GHz POWER6 processor.
  • The Sun Blade T6320 server module (with one 1.6 GHz UltraSPARC T2 processor) demonstrated 3% better performance than the Fujitsu TX100 result which used one 3.16 GHz Intel Xeon X3380 processor.
  • The Sun Blade T6320 server module (with one 1.6 GHz UltraSPARC T2 processor) demonstrated 7% better performance than the IBM x3200 result which used one 3.16 GHz Xeon X3380 processor.
  • The Sun Blade T6320 server module running the 1.6 GHz UltraSPARC T2 processor delivered 20% better performance than a Sun SPARC Enterprise T5120 with the 1.4 GHz UltraSPARC T2 processor.
  • The Sun Blade T6320 used the OpenSolaris 2009.06 operation system and the Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release JVM to obtain this leading result.

Performance Landscape

SPECjbb2005 Performance Chart (ordered by performance)

bops: SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips Cores Threads GHz Type SPECjbb2005
bops
SPECjbb2005
bops/JVM
Sun Blade T6320 1 8 64 1.6 UltraSPARC T2 229576 28697
Fujitsu TX100 1 4 4 3.16 Intel Xeon 223691 111846
IBM x3200 M2 1 4 4 3.16 Intel Xeon 214578 107289
Fujitsu RX100 1 4 4 3.16 Intel Xeon 211144 105572
IBM x3350 1 4 4 3.0 Intel Xeon 194256 97128
Sun SE T5120 1 8 64 1.4 UltraSPARC T2 192055 24007
IBM p 570 1 2 4 4.7 POWER6 88089 88089

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Results and Configuration Summary

Hardware Configuration:

    Sun Blade T6320
      1 x 1.6 GHz UltraSPARC T2 processor
      64 GB

Software Configuration:

    OpenSolaris 2009.06
    Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Key Points and Best Practices

  • Enhancements to the JVM had a major impact on performance.
  • Each JVM executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Each JVM bound to a separate processor containing 1 core to reduce memory access latency using the physical memory closest to the processor.

See Also

Disclosure Statement

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 7/17/2009 on http://www.spec.org. SPECjbb2005, Sun Blade T6320 229576 SPECjbb2005 bops, 28697 SPECjbb2005 bops/JVM; IBM p 570 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM; Fujitsu TX100 223691 SPECjbb2005 bops, 111846 SPECjbb2005 bops/JVM; IBM x3350 194256 SPECjbb2005 bops, 97128 SPECjbb2005 bops/JVM; Sun SPARC Enterprise T5120 192055 SPECjbb2005 bops, 24007 SPECjbb2005 bops/JVM.

This blog copyright 2009 by John Henning