Tuesday Oct 13, 2009

The Sun Storage F5100 Flash Array can substantially improve performance over internal hard disk drives as shown by the I/O intensive ABAQUS MCAE application Standard benchmark tests on a Sun Fire X4270 server.

The I/O intensive ABAQUS "Standard" benchmarks test cases were run on a single Sun Fire X4270 server. Data is presented for runs at both 8 and 16 thread counts.

The ABAQUS "Standard" module is an MCAE application based on the finite element method (FEA) of analysis. This computer based numerical method inherently involves a substantial I/O component. The purpose was to evaluate the performance of the Sun Storage F5100 Flash Array relative to high performance 15K RPM internal striped HDDs.

  • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "S4b" test case by 14%.

  • The Sun Fire X4270 server coupled with a Sun Storage F5100 Flash Array established the world record performance on a single node for the four test cases S2A, S4B, S4D and S6.

Performance Landscape

ABAQUS "Standard" Benchmark Test S4B: Advantage of Sun Storage F5100

Results are total elapsed run times in seconds

Threads 4x15K RPM
72 GB SAS HDD
striped HW RAID0
Sun F5100
r/w buff 4096
striped
Sun F5100
Performance
Advantage
8 1504 1318 14%
16 1811 1649 10%

ABAQUS Standard Server Benchmark Subset: Single Node Record Performance

Results are total elapsed run times in seconds

Platform Cores S2a S4b S4d S6
X4270 w/F5100 8 302 1192 779 1237
HP BL460c G6 8 324 1309 843 1322
X4270 w/F5100 4 552 1970 1181 1706
HP BL460c G6 4 561 2062 1234 1812

Results and Configuration Summary

Hardware Configuration:
    Sun Fire X4270
      2 x 2.93 GHz QC Intel Xeon X5570 processors
      Hyperthreading enabled
      24 GB memory
      4 x 72 GB 15K RPM striped (HW RAID0) SAS disks
    Sun Storage F5100 Flash Array
      20 x 24 GB flash modules
      Intel controller

Software Configuration:

    O/S: 64-bit SUSE Linux Enterprise Server 10 SP 2
    Application: ABAQUS V6.9-1 Standard Module
    Benchmark: ABAQUS Standard Benchmark Test Suite

Benchmark Description

Abaqus/Standard Benchmark Problems

These problems provide an estimate of the performance that can be expected when running Abaqus/Standard or similar commercially available MCAE (FEA) codes like ANSYS and MSC/Nastran on different computers. The jobs are representative of those typically analyzed by Abaqus/Standard and other MCAE applications. These analyses include linear statics, nonlinear statics, and natural frequency extraction.

Please go here for a more complete description of the tests.

Key Points and Best Practices

  • The memory requirements for the test cases in the ABAQUS Standard benchmark test suite are rather substantial with some of the test cases requiring slightly over 20GB of memory. There are two memory limits one a minimum where out of core "memory" will be used when this limit is exceeded. This requires more time consuming cpu and another maximum memory limit that minimizes I/O operations. These memory limits are given in the ABAQUS output and can be established before making a full execution in a preliminary diagnostic mode run.
  • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the ABAQUS job. This is done in the "abaqus_v6.env" file that either resides in the subdirectory from where the job was launched or in the abaqus "site" subdirectory under the home installation directory.
  • Sometimes when running multiple cores on a single node, it is preferable from a performance standpoint to run in "smp" shared memory mode This is specified using the "THREADS" option on the "mpi_mode" line in the abaqus_v6.env file as opposed to the "MPI" option on this line. The test case considered here illustrates this point.
  • The test cases for the ABAQUS standard module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. On Linux OS's advantage can be taken of excess memory that can be used to cache and accelerate I/O.

See Also

Disclosure Statement

The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or o ther countries: Abaqus, Abaqus/Standard, Abaqus/Explicit. All information on the ABAQUS website is Copyrighted 2004-2009 by Dassault Systemes. Results from http://www.simulia.com/support/v69/v69_performance.php as of October 12, 2009.

Monday Oct 12, 2009

Significance of Results

The Sun Storage F5100 Flash Array can greatly improve performance over internal hard disk drives as shown by the I/O intensive ANSYS MCAE application BMD benchmark tests on a Sun Fire X4270 server.

Select ANSYS 12 BMD benchmarks were run on a single Sun Fire X4270 server. These I/O intensive test cases were run to compare the performance of conventional high performance disk to Sun FlashFire technology.

The ANSYS 12.0 module is an MCAE application based on the finite element method (FEA) of analysis. This computer based numerical method inherently involves a substantial I/O component. The purpose was to evaluate the performance of the Sun Storage F5100 Flash Array relative to high performance 15K RPM internal stripped HDDs.

  • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "BMD-4" test case by 67% in the 8-core/8-thread server configuration.

  • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "BMD-7" test case by 18% in the 8-core/16-thread server configuration.

Performance Landscape

ANSYS 12 "BMD" Test Suite on Single X4270 (24GB mem.) - SMP Mode

Results are total elapsed run times in seconds

Test Case SMP 4x15K RPM
72 GB SAS HDD
striped HW RAID0
Sun F5100
r/w buff 4096
striped
Sun F5100
Performance
Advantage
bmd-4 8 523 314 67%
bmd-7 16 357 303 18%

Results and Configuration Summary

Hardware Configuration:
    Sun Fire X4270
      2 x 2.93 GHz QC Intel Xeon X5570 processors
      Hyperthreading enabled
      24 GB memory
      4 x 72 GB 15K RPM striped (HW RAID0) SAS disks
    Sun Storage F5100 Flash Array
      20 x 24 GB flash modules
      Intel controller

Software Configuration:

    O/S: 64-bit SUSE Linux Enterprise Server 10 SP 2
    Application: ANSYS Multiphysics 12.0
    Benchmark: ANSYS 12 "BMD" Benchmark Test Suite

Benchmark Description

ANSYS is a general purpose engineering analysis MCAE application that is based on the Finite Element Method. It performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior or deformations are concerned. Ansys provides a number of benchmark tests which exercise the capabilities of the software.

Please go here for a more complete description of the tests.

Key Points and Best Practices

Performance Considerations

The performance of Ansys (IO-intensive MCAE application) can be increased by reducing the IO demands of the application by increasing server memory or by using SSDs to increase the bandwidth and reduce the latency. The most I/O intensive case in the ANSYS distributed "BMD" test suite is BMD-4 particularly at the (maximum) 8 core level for a single node.


  • Ansys now takes full advantage of inexpensive RAID0 disk arrays and delivers sustained I/O rates.

  • Large memory can cache file accesses but often the size of ANSYS files grows much larger than the available physical memory so that system file caching is not able to hide the I/O cost.
  • For fast ANSYS runs the recommended configuration is a RAID 0 setup using 4 or more disks and a fast RAID controller. These fast I/O configurations are inexpensive to put together for systems and can achieve I/O rates in excess of 200 MB/sec.
  • SSD drives have much lower seek times, use less power, and tend to be about 2X faster than the fastest rotating disks for sustained throughput. The observed speed of a RAID 0 configuration of SSD drives for ANSYS simulations has been nearly as fast as I/O that is cached by large memory systems. SSD drives then may be the most affordable way to extend the capacity of a system to jobs that are too large to run in-core without incurring the performance penalty usually associated with I/O demands.

More About The ANSYS BMD "Distributed" Benchmarks

ANSYS is a general purpose engineering analysis MCAE application that is based on the Finite Element Method. It performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior or deformations are concerned.

In the most recent release of the ANSYS benchmarks there are now two test suites: The SMP "BM" suite designed to run on a single node with multi processors and the DMP "BMD" suite intended to run on multi node clusters but which can also run on a single node in SMP mode as in this study.

  • The test cases from both ANSYS test suites all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.
  • The ANSYS test cases don't scale very well (BMD better than BM) ; at best on up 8 cores.
  • The memory requirements for the test cases in the ANSYS BMD are greater than for the standard benchmark test suite. The requirements for the standard suite are not great requiring less than 3GB.

See Also

MCAE, SSD, HPC, ANSYS, Linux, SuSE, Performance, X64, Intel

Disclosure Statement

The following are trademarks or registered trademarks of ANSYS, Inc., ANSYS Multiphysics TM. All information on the ANSYS website is Copyrighted by ANSYS, Inc. Results from http://www.ansys.com/services/ss-intel-bench120.htm as of October 12, 2009.

Thursday Jul 23, 2009

This week, Sun continues to highlight the record-breaking performance of its latest update to the chip multi-threaded (CMT) Sun SPARC Enterprise server family running Solaris.  Some of these benchmarks leverage the use of a variety of Sun's unique technologies including ZFS, SSD, various Storage Products and many more. These benchmarks were blogged about by various members or our team and the URLs are shown below.

Messages

  • Sun's CMT is the most powerful CPU regardless of architectural/implementation details (#transistors, #cores, threads, MHz, etc.)!
  • Performance tests show that Sun can outperform IBM Power6 by more than 2x on a variety of benchmarks.
  • Performance tests show Sun's new 1.6GHz CMT systems can be 20% faster than Sun's previous generation 1.4GHz processors, given Sun's continual advancements in both hardware and software.

Benchmark Results Recently Blogged

Sun T5440 Oracle BI EE World Record Performance
http://blogs.sun.com/BestPerf/entry/sun_t5440_oracle_bi_ee

Sun T5440 World Record SAP-SD 4-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode), Beats IBM POWER6 (note1)
http://blogs.sun.com/BestPerf/entry/sun_t5440_world_record_sap

Zeus ZXTM Traffic Manager World Record on Sun T5240
http://blogs.sun.com/BestPerf/entry/top_performance_on_sun_sparc

Sun T5440 SPECjbb2005, Sun 1.6GHz T2 Plus chip is 2.3x IBM 4.7GHz POWER6 chip
http://blogs.sun.com/BestPerf/entry/sun_t5440_specjbb2005_beats_ibm

New SPECjAppServer2004 Performance on the Sun SPARC Enterprise T5440
http://blogs.sun.com/BestPerf/entry/new_specjappserver2004_performance_on_sun

1.6 GHz SPEC CPU2006: World Record 4-chip system, Rate Benchmarks, Beats IBM POWER6
http://blogs.sun.com/BestPerf/entry/1_6_ghz_spec_cpu2006

Sun Blade T6320 World Record 1-chip SPECjbb2005 performance, Sun 1.6GHz T2 Plus chip is 2.6x IBM 4.7GHz POWER6 chip
http://blogs.sun.com/BestPerf/entry/new_specjbb2005_performance_on_the

Comparison Table

Benchmark Sun CMT Tier Software Key Messages
Oracle BI EE Sun T5440 Appl,
Database
Oracle 11g,
Oracle BIEE,
ZFS,
Solaris
  • World Record: T5440
  • Achieved 28,000 users
  • Reference
SAP-SD 2-Tier Sun T5440 Appl,
Database
SAP ECC 6.0,EP4
Oracle 10g,
Solaris
  • World Record 4-socket: T5440
  • T5440 Beats 4-socket IBM 550 5GHz Power6 by 26% (note1)
  • T5440 Beats HP DL585 G6 4-socket Opteron (note1)
  • Unicode version
SPECjAppServer
2004
Sun T5440 Appl, Database Oracle WebLogic,
Oracle 11g,
JDK 1.6.0_14,
Solaris
  • World Record Single System (Appl Tier): T5440
  • T5440 is 6.4x faster of IBM Power 570 4.7GHz Power6
  • T5440 is 73% faster than HP DL 580 G5 Xeon 6C
  • Oracle Fusion Middleware
Sun T5440
SPECjbb2005
Sun T5440 Appl Java HotSpot,
OpenSolaris
  • 1.6GHz US T2 Plus CPU is 2.3x faster of IBM 4.7GHz Power6 CPU
  • 1.6GHz US T2 Plus CPU is 21% faster than previous generation 1.4GHz US T2 Plus CPU
  • Sun T5440 has 2.3x better power/perf than the IBM 570 (8 4.7GHz Power6)
Sun Blade T6320 SPECjbb2005 Sun T6320 Appl Java HotSpot,
OpenSolaris
  • World Record 1-socket: T6320
  • 1.6GHz US T2 Plus CPU is 2.6x faster than IBM 4.7GHz Power6 CPU
  • T6320 is 3% faster than Fujitsu 3.16GHz Xeon QC
SPEC CPU2006 Sun T5440,
Sun T5240,
Sun T5220,
Sun T5120,
Sun T6320
all tiers Sun Studio12,
Solaris,
ZFS
  • World Record 4-socket: T5440
  • 1.6GHz US T2 Plus CPU is 2.6x faster than IBM 4.7GHz Power6 CPU
  • T6320 is 3% faster than Fujitsu 3.16GHz Xeon QC
Zeus ZXTM
Traffic Manager
Sun T5240 Web Zeus ZXTM v5.1r1,
Solaris
  • World Record: T5240
  • T5240 Beats f5 BIG-IP VIPRON by 34%; 2.6x better $/perf
  • T5240 Beats f5 BIG-IP 8800 by 91%; 2.7x better $/perf⁞
  • T5240 Beats Citrix 12000 by 2.2x; 3.3x better $/perf
  • No IBM result

Virtualization

Sun's announcement also included updated virtualization software (LDOMs 1.1). Downloads are available to existing SPARC Enterprise server customers at: http://www.sun.com/servers/coolthreads/ldoms/index.jsp.  Also look the the blog posting "LDoms for Dummies" at http://blogs.sun.com/PierreReynes/entry/ldoms_for_dummies

Try & Buy Program

Sun is also offering free 60-day trials on Sun CMT servers with with a very popular Try and Buy program: http://www.sun.com/tryandbuy.

Benchmark Performance Disclosure Statements (the URLs listed above go into more detail on each of these benchmarks)

Note1: 4-processor world record on the 2-tier SAP SD Standard Application Benchmark with 4720 SD User, as of July 23, 2009, IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. T5440 beats HP new 4-socket Opteron Servers (HPDL585 G6 with 4665 SD User and HP BL685c G6 with 4422 SD User)

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 07/21/09: Sun SPARC Enterprise T5440 Server (4 processors, 32 cores, 256 threads) 4,720 SAP SD Users, 4x 1.6 GHz UltraSPARC T2 Plus, 256 GB memory, Oracle10g, Solaris10, Cert# 2009026. HP ProLiant DL585 G6 (4 processors, 24 cores, 24 threads) 4,665 SAP SD Users, 4x 2.8 GHz AMD Opteron Processor 8439 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009025. HP ProLiant BL685c G6 (4 processors, 24 cores, 24 threads) 4,422 SAP SD Users, 4x 2.6 GHz AMD Opteron Processor 8435, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009021. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL585 G5 (4 processors, 16 cores, 16 threads) 3,430 SAP SD Users, 4x 3.1 GHz AMD Opteron Processor 8393 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009008. HP ProLiant BL685 G6 (4 processors, 16 cores, 16 threads) 3,118 SAP SD Users, 4x 2.9 GHz AMD Opteron Processor 8389, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009007. NEC Express5800 (4 processors, 24 cores, 24 threads) 2,957 SAP SD Users, 4x 2.66 GHz Intel Xeon Processor X7460, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009018. Dell PowerEdge M905 (4 processors, 16 cores, 16 threads) 2,129 SAP SD Users, 4x 2.7 GHz AMD Opteron Processor 8384, 96 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2009017. Sun Fire X4600M2 (8 processors, 32 cores, 32 threads) 7,825 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384, 128 GB memory, MaxDB 7.6, Solaris 10, Cert# 2008070. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071. SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark.

Oracle Business Intelligence Enterprise Edition benchmark, see http://www.oracle.com/solutions/business_intelligence/resource-library-whitepapers.html for more. Results as of 7/20/09.

Zeus is TM of Zeus Technology Limited. Results as of 7/21/2009 on http://www.zeus.com/news/press_articles/zeus-price-performance-press-release.html?gclid=CLn4jLuuk5cCFQsQagod7gTkJA.

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from www.spec.org as of 16 July 2009. Sun's new results quoted on this page have been submitted to SPEC. Sun Blade T6320 89.2 SPECint_rate_base2006, 96.7 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006; Sun SPARC Enterprise T5220/T5120 89.1 SPECint_rate_base2006, 97.0 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006; Sun SPARC Enterprise T5240 172 SPECint_rate_base2006, 183 SPECint_rate2006, 124 SPECfp_rate_base2006, 133 SPECfp_rate2006; Sun SPARC Enterprise T5440 338 SPECint_rate_base2006, 360 SPECint_rate2006, 254 SPECfp_rate_base2006, 270 SPECfp_rate2006; Sun Blade T6320 76.4 SPECint_rate_base2006, 85.5 SPECint_rate2006, 58.1 SPECfp_rate_base2006, 62.3 SPECfp_rate2006; Sun SPARC Enterprise T5220/T5120 76.2 SPECint_rate_base2006, 83.9 SPECint_rate2006, 57.9 SPECfp_rate_base2006, 62.3 SPECfp_rate2006; Sun SPARC Enterprise T5240 142 SPECint_rate_base2006, 157 SPECint_rate2006, 111 SPECfp_rate_base2006, 119 SPECfp_rate2006; Sun SPARC Enterprise T5440 270 SPECint_rate_base2006, 301 SPECint_rate2006, 212 SPECfp_rate_base2006, 230 SPECfp_rate2006; IBM p 570 53.2 SPECint_rate_base2006, 60.9 SPECint_rate2006, 51.5 SPECfp_rate_base2006, 58.0 SPECfp_rate2006; IBM Power 520 102 SPECint_rate_base2006, 124 SPECint_rate2006, 88.7 SPECfp_rate_base2006, 105 SPECfp_rate2006; IBM Power 550 215 SPECint_rate_base2006, 263 SPECint_rate2006, 188 SPECfp_rate_base2006, 222 SPECfp_rate2006; HP Integrity BL870c 114 SPECint_rate_base2006; HP Integrity rx7640 87.4 SPECfp_rate_base2006, 90.8 SPECfp_rate2006.

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 7/17/2009 on http://www.spec.org. SPECjbb2005, Sun Blade T6320 229576 SPECjbb2005 bops, 28697 SPECjbb2005 bops/JVM; IBM p 570 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM; Fujitsu TX100 223691 SPECjbb2005 bops, 111846 SPECjbb2005 bops/JVM; IBM x3350 194256 SPECjbb2005 bops, 97128 SPECjbb2005 bops/JVM; Sun SPARC Enterprise T5120 192055 SPECjbb2005 bops, 24007 SPECjbb2005 bops/JVM.

SPECjAppServer2004, Sun SPARC Enterprise T5440 (4 chips, 32 cores) 7661.16 SPECjAppServer2004 JOPS@Standard; HP DL580 G5 (4 chips, 24 cores) 4410.07 SPECjAppServer2004 JOPS@Standard; HP DL580 G5 (4 chips, 16 cores) 3339.94 SPECjAppServer2004 JOPS@Standard; Two Dell PowerEdge 2950 (4 chips, 16 cores) 4794.33 SPECjAppServer2004 JOPS@Standard; Dell PowerEdge R610 (2 chips, 8 cores) 3975.13 SPECjAppServer2004 JOPS@Standard; Two Dell PowerEdge R610 (4 chips, 16 cores) 7311.50 SPECjAppServer2004 JOPS@Standard; IBM Power 570 (2 chips, 4 cores) 1197.51 SPECjAppServer2004 JOPS@Standard; SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Results from http://www.spec.org as of 7/20/09.

SPECjbb2005 Sun SPARC Enterprise T5440 (4 chips, 32 cores) 841380 SPECjbb2005 bops, 26293 SPECjbb2005 bops/JVM. Results submitted to SPEC. HP DL585 G5 (4 chips, 24 cores) 937207 SPECjbb2005 bops, 234302 SPECjbb2005 bops/JVM. IBM Power 570 (8 chips, 16 cores) 798752 SPECjbb2005 bops, 99844 SPECjbb2005 bops/JVM. Sun SPARC Enterprise T5440 (4 chips, 32 cores) 692736 SPECjbb2005 bops, 21648 SPECjbb2005 bops/JVM. SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 7/20/09.

IBM p 570 8P 4.7GHz (4 building blocks) power specifications calculated as 80% of maximum input power reported 7/8/09 in “Facts and Features Report”: ftp://ftp.software.ibm.com/common/ssi/pm/br/n/psb01628usen/PSB01628USEN.PDF

Thursday Jun 25, 2009

Significance of Results

The Sun SSD (32 GB SATA 2.5" SSD) is the world's first enterprise-quality, open-standard Flash design. Built to an industry-standard JEDEC form factor, the module is being made available to developers and the OpenSolaris Storage community to foster Flash innovation. The Sun SSD delivers unprecedented IO performance, saves on power, space, and cooling, and will enable new levels of server optimization and datacenter efficiencies.

  • The Sun SSD demonstrated performance of 98K 4K random read IOPS on a Sun Fire X4450 server running the Solaris operating system.

Performance Landscape

Solaris 10 Results

Test SSD Result
X4450 X4140/X4440 T5240
Random Read (4K) 98.4K IOPS 33.8K IOPS 71.5K IOPS
Random Write (4K) 31.8K IOPS 16.6K IOPS 14.4K IOPS
50-50 Read/Write (4K) 14.9K IOPS 13.3K IOPS 15.7K IOPS
Sequential Read (MB/sec) 764 MB/sec 463 MB/sec 1012 MB/sec
Sequential Write (MB/sec) 376 MB/sec 470 MB/sec 531 MB/sec

Windows 2003 Server Results

Test SSD Result
X4140 X4440
Random Read (4K) 66.9K IOPS 63.1K IOPS
Random Write (4K) 17.6K IOPS 29.7K IOPS
50-50 Read/Write (4K) 14.9K IOPS 16.9K IOPS
Sequential Read (MB/sec) 832 MB/sec 844 MB/sec
Sequential Write (MB/sec) 121 MB/sec 408 MB/sec

Results and Configuration Summary

Storage:

    4 x Sun SSD
    32 GB SATA 2.5" SSD (24 GB usable)
    2.5in drive form factor

Servers:

    Sun SPARC Enterprise T5240 - 4 internal drive slots used (LSI driver)
    Sun Fire X4450 - 4 internal drive slots used (LSI driver)
    Sun Fire X4440 - 4 internal drive slots used (LSI driver)
    Sun Fire X4140 - 4 internal drive slots used (LSI driver)

Software:

    Window 2003 Server
    OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
    Vdbench 5.0

Benchmark Description

Sun measured a wide variety of IO performance metrics on the Sun SSD using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench profile:

    wd=wm_80dr,sd=sd*,readpct=0,rhpct=0,seekpct=100
    wd=ws_80dr,sd=sd*,readpct=0,rhpct=0,seekpct=0
    wd=rm_80dr,sd=(sd1-sd80),readpct=100,rhpct=0,seekpct=100
    wd=rs_80dr,sd=(sd1-sd80),readpct=100,rhpct=0,seekpct=0
    wd=rwm_80dr,sd=sd*,readpct=50,rhpct=0,seekpct=100
    rd=default
    ###Random Read and writes tests varying transfer size
    rd=default,el=30m,in=6,forx=(4K),forth=(32),io=max,pause=20
    rd=run1_rm_80dr,wd=rm_80dr
    rd=run2_wm_80dr,wd=wm_80dr
    rd=run3_rwm_80dr,wd=rwm_80dr
    ###Sequential read and Write tests varying transfer size
    rd=default,el=30m,in=6,forx=(512k),forth=(32),io=max,pause=20
    rd=run4_rs_80dr,wd=rs_80dr
    rd=run5_ws_80dr,wd=ws_80dr
Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • All measurements were done with the internal HBA and not the internal RAID.

See Also

Disclosure Statement

Sun SSD delivered 71.5K 4K read IOPS and 1012 MB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of June 17, 2009.

Friday Jun 19, 2009

High-Performance Computing (HPC) applications can be dramatically increased by simply using SSDs instead of traditional hard drives. To read about these findings see the Sun BluePrint by Larry McIntosh and Michael Burke, called "Solid State Drives in HPC: Reducing the I/O Bottleneck".

There was a BestPerf blog posting on the NASTRAN/SSD results at:
http://blogs.sun.com/BestPerf/entry/sun_fire_x2270_msc_nastran

Our BestPerf authors will blog about more of their recent benchmarks in the coming weeks.

Tuesday Jun 16, 2009

Significance of Results

The I/O intensive MSC/Nastran Vendor_2008 benchmark test suite was used to compare the performance on a Sun Fire X2270 server when using SSDs internally instead of HDDs.

The effect on performance from increasing memory to augment I/O caching was also examined. The Sun Fire X2270 server was equipped with Intel QC Xeon X5570 processors (Nehalem). The positive effect of adding memory to increase I/O caching is offset to some degree by the reduction in memory frequency with additional DIMMs in the bays of each memory channel on each cpu socket for these Nehalem processors.

  • SSDs can significantly improve NASTRAN performance especially on runs with larger core counts.
  • Additional memory in the server can also increase performance, however in some systems additional memory can decrease memory GHz so this may offset the benefits of increased capacity.
  • If SSDs are not used striped disks will often improve performance of IO-bound MCAE applications.
  • To obtain the highest performance it is recommended that SSDs be used and servers be configured with the largest memory possible without decreasing memory GHz. One should always look at the workload characteristics and compare against this benchmark to correctly set expectations.

SSD vs. HDD Performance

The performance of two striped 30GB SSDs was compared to two striped 7200 rpm 500GB SATA drives on a Sun Fire X2270 server.

  • At the 8-core level (maximum cores for a single node) SSDs were 2.2x faster for the larger xxocmd2 and the smaller xlotdf1 cases.
  • For 1-core results SSDs are up to 3% faster.
  • On the smaller mdomdf1 test case there was no increase in performance on the 1-, 2-, and 4-cores configurations.

Performance Enhancement with I/O Memory Caching

Performance for Nastran can often be increased by additional memory to provide additional in-core space to cache I/O and thereby reduce the IO demands.

The main memory was doubled from 24GB to 48GB. At the 24GB level one 4GB DIMM was placed in the first bay of each of the 3 CPU memory channels on each of the two CPU sockets on the Sun Fire X2270 platform. This configuration allows a memory frequency of 1333MHz.

At the 48GB level a second 4GB DIMM was placed in the second bay of each of the 3 CPU memory channels on each socket. This reduces the memory frequency to 1066MHz.

Adding Memory With HDDs (SATA)

  • The additional server memory increased the performance when running with the slower SATA drives at the higher core levels (e.g. 4- & 8-cores on a single node)
  • The larger xxocmd2 case was 42% faster and the smaller xlotdf1 case was 32% faster at the maximum 8-core level on a single system.
  • The special I/O intensive getrag case was 8% faster at the 1-core level.

Adding Memory With SDDs

  • At the maximum 8-core level (for a single node) the larger xxocmd2 case was 47% faster in overall run time.
  • The effects were much smaller at lower core counts and in the tests at the 1-core level most test cases ran from 5% to 14% slower with the slower CPU memory frequency dominating over the added in-core space available for I/O caching vs. direct transfer to SSD.
  • Only the special I/O intensive getrag case was an exception running 6% faster at the 1-core level.

Increasing performance with Two Striped (SATA) Drives

The performance of multiple striped drives was also compared to single drive. The study compared two striped internal 7200 rpm 500GB SATA drives to a singe single internal SATA drive.

  • On a single node with 8 cores, the largest test xx0cmd2 was 40% faster, a smaller test case xl0tdf1 was 33% faster and even the smallest test case mdomdf1 case was 12% faster.

  • On 1-core the added boost in performance with striped disks was from 4% to 13% on the various test cases.

  • One 1-core the special I/O-intensive test case getrag was 29% faster.

Performance Landscape

Times in table are elapsed time (sec).


MSC/Nastran Vendor_2008 Benchmark Test Suite

Test Cores Sun Fire X2270
2 x X5570 QC 2.93 GHz
2 x 7200 RPM SATA HDDs
Sun Fire X2270
2 x X5570 QC 2.93 GHz
2 x SSDs
48 GB
1067MHz
24 GB
2 SATA
1333MHz
24 GB
1 SATA
1333MHz
Ratio (2xSATA):
48GB/
24GB
Ratio:
2xSATA/
1xSATA
48 GB
1067MHz
24 GB
1333MHz
Ratio:
48GB/
24GB
Ratio (24GB):
2xSATA/
2xSSD

vlosst1 1 133 127 134 1.05 0.95 133 126 1.05 1.01

xxocmd2 1
2
4
8
946
622
466
1049
895
614
631
1554
978
703
991
2590
1.06
1.01
0.74
0.68
0.87
0.87
0.64
0.60
947
600
426
381
884
583
404
711
1.07
1.03
1.05
0.53
1.01
1.05
1.56
2.18

xlotdf1 1
2
4
8
2226
1307
858
912
2000
1240
833
1562
2081
1308
1030
2336
1.11
1.05
1.03
0.58
0.96
0.95
0.81
0.67
2214
1315
744
674
1939
1189
751
712
1.14
1.10
0.99
0.95
1.03
1.04
1.11
2.19

xloimf1 1 1216 1151 1236 1.06 0.93 1228 1290 0.95 0.89

mdomdf1 1
2
4
987
524
270
913
485
237
983
520
269
1.08
1.08
1.14
0.93
0.93
0.88
987
524
270
911
484
250
1.08
1.08
1.08
1.00
1.00
0.95

Sol400_1
(xl1fn40_1)
1 2555 2479 2674 1.03 0.93 2549 2402 1.06 1.03

Sol400_S
(xl1fn40_S)
1 2450 2302 2481 1.06 0.93 2449 2262 1.08 1.02

getrag
(xx0xst0)
1 778 843 1178 0.92 0.71 771 817 0.94 1.03

Results and Configuration Summary

Hardware Configuration:
    Sun Fire X2270
      1 2-socket rack mounted server
      2 x 2.93 GHz QC Intel Xeon X5570 processors
      2 x internal striped SSDs
      2 x internal striped 7200 rpm 500GB SATA drives

Software Configuration:

    O/S: Linux 64-bit SUSE SLES 10 SP 2
    Application: MSC/NASTRAN MD 2008
    Benchmark: MSC/NASTRAN Vendor_2008 Benchmark Test Suite
    HP MPI: 02.03.00.00 [7585] Linux x86-64
    Voltaire OFED-5.1.3.1_5 GridStack for SLES 10

Benchmark Description

The benchmark tests are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction.

The MD (Multi Discipline) Nastran 2008 application performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior and/or deformations are concerned. The new release includes the MARC module for general purpose nonlinear analyses and the Dytran module that employs an explicit solver to analyze crash and high velocity impact conditions.

  • As of the Summer '08 there is now an official Solaris X64 version of the MD Nastran 2008 system that is certified and maintained.
  • The memory requirements for the test cases in the new MSC/Nastran Vendor 2008 benchmark test suite range from a few hundred megabytes to no more than 5 GB.

Please go here for a more complete description of the tests.

Key Points and Best Practices

For more on Best Practices of SSD on HPC applications also see the Sun Blueprint:
http://wikis.sun.com/display/BluePrints/Solid+State+Drives+in+HPC+-+Reducing+the+IO+Bottleneck

Additional information on the MSC/Nastran Vendor 2008 benchmark test suite.

  • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the Nastran job. This is done on the command line with the mem= option. On Linux based systems where the platform has a large amount of memory and where the model does not have large scratch I/O requirements the memory can be allocated to a tmpfs scratch space file system. On Solaris X64 systems advantage can be taken of ZFS for higher I/O performance.

  • The MSC/Nastran Vendor 2008 test cases don't scale very well, a few not at all and the rest on up to 8 cores at best.

  • The test cases for the MSC/Nastran module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). The required scratch file size ranges from less than 1 GB on up to about 140 GB. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system, further enhanced as indicated here by implementing the Lustre based I/O system. High performance interconnects such as Infiniband for inter node cluster message passing as well as I/O transfer from the storage system can also enhance performance substantially.

See Also

Disclosure Statement

MSC.Software is a registered trademark of MSC. All information on the MSC.Software website is copyrighted. MSC/Nastran Vendor 2008 results from http://www.mscsoftware.com and this report as of June 9, 2009.

This blog copyright 2009 by John Henning