Friday Jul 10, 2009

Significance of Results

Sun and Microsoft combined to deliver World Record price performance for Windows based results on the TPC-H benchmark at the 300GB scale factor. Using Microsoft's SQL Server 2008 Enterprise database along with Microsoft Windows Server 2008 operating system on the Sun Fire X4600 M2 server, the result of 2.80 $/QphH@300GB (USD) was delivered.

  • The Sun Fire X4600 M2 provides World Record price-performance of 2.80 $/QphH@300GB (USD) among Windows based TPC-H results at the 300GB scale factor. This result is 14% better price performance than the HP DL785 result.
  • The Sun Fire X4600 M2 trails HP's World Record single system performance (HP: 57,684 QphH@300GB, Sun: 55,185 QphH@300GB) by less than 5%.
  • The Sun/SQL Server solution used fewer disks for the database (168) than the other top performance leaders @300GB.
  • IBM required 79% more disks (300 total) than Sun to get a result of 46,034 QphH@300GB which is 20% below Sun's QphH.
  • HP required 21% more disks (204 total) than Sun to achieve a result of 3.24 $/QphH@300GB (USD) which is 16% worse than Sun's price performance.

This is Sun's first published TPC-H SQL Server benchmark.

Performance Landscape

ch/co/th = chips, cores, threads
$/QphH = TPC-H Price/Performance metric (smaller is better)

System ch/co/th Processor Database QphH $/QphH Price Disks Available
Sun Fire X4600 M2 8/32/32 2.7 Opteron 8384 SQL Server 2008 55,158 2.80 $154,284 168 07/06/09
HP DL785 8/32/32 2.7 Opteron 8384 SQL Server 2008 57,684 3.24 $186,700 204 11/17/08
IBM x3950 M2 8/32/32 2.93 Intel X7350 SQL Server 2005 46,034 5.40 $248,635 300 03/07/08

Complete benchmark results may be found at the TPC benchmark website http://www.tpc.org.

Results and Configuration Summary

Server:

    Sun Fire X4600 M2 with:
      8 x AMD Opteron 8384, 2.7 GHz QC processors
      256 GB memory
      3 x 73GB (15K RPM) internal SAS disks

Storage:

    14 x Sun Storage J4200 each consisting of 12 x 146GB 15,000 RPM SAS disks

Software:

    Operating System: Microsoft Windows Server 2008 Enterprise x64 Edition SP1
    Database Manager: SQL Server 2008 Enterprise x64 Edition SP1

Audited Results:

    Database Size: 300GB (Scale Factor)
    TPC-H Composite: 55,157.5 QphH@300GB
    Price/performance: $2.80 / QphH@300GB (USD)
    Available: July 6, 2009
    Total 3 Year Cost: $154,284.19 (USD)
    TPC-H Power: 67,095.6
    TPC-H Throughput: 45,343.5
    Database Load Time: 17 hours 29 minutes
    Storage Ratio: 76.82

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

SQL Server 2008 is able to take advantage of the lower latency local memory access provides on the Sun Fire 4600 M2 server. This was achieved by setting the NUMA initialization parameter to enable all NUMA optimizations.

Enabling the Windows large-page feature provided a significant performance improvement. Because SQL Server 2008 manages its own memory buffer, the use of large-pages resulted in significant performance increase. Note that to use large-pages, an application must be part of the large-page group of the OS (Windows).

The 64-bit Windows OS and 64-bit SQL Server software were able to utilize the 256GB of memory available on the Sun Fire 4600 M2 server.

See Also

Disclosure Statement

TPC-H@300GB: Sun Fire X4600 M2 55,158 QphH@300GB, $2.80/QphH@300GB, availability 7/6/09; HP DL785, 57,684 QphH@300GB, $3.24/QphH@300GB, availability 11/17/08; IBM x3950 M2, 46,034 QphH@300GB, $5.40/QphH@300GB, availability 03/07/08; TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Friday Jun 05, 2009

Sun recently entered the SPECpower fray with the publication of three results on the SPECpower_ssj2008 benchmark.  Strangely, the three publications documented results on the same hardware platform (Sun Netra X4250) running identical software stacks, but the results were markedly different.  What exactly were we trying to get at?

 Benchmark Configurations

Sun produces robust industrial-grade servers with a range of redundancy features we believe benefit our customers.   These features increase reliability, at the cost of additional power consumption. For example, redundant power supplies and redundant fans allow servers to tolerate faults, and hot-swap capabilities further minimize downtime.

The benchmark run and reporting rules require the incorporation within the tested configuration of all components implied by the model name.  Within these limitations, the first publication was intended to be the best result (that is, the lowest power consumption per unit of performance) achievable on the Sun Netra X4250 platform, by minimizing the configured hardware to the greatest extent possible.

Common Components

All tested configurations had the following components in common:

  • System:  Sun Netra X4250
  • Processor: 2 x Intel L5408 QC @ 2.13GHz
  • 2 x 658 watt redundant AC power supplies
  • redundant fans
  • standard I/O expansion mezzanine
  • standard Telco dry contact alarm

And the same software stack:

  • OS: Windows Server 2003 R2 Enterprise X64 Edition SP2
  • Drivers: platform-specific drivers from Sun Netra X4250 Tools and Drivers DVD Version 2.1N
  • JVM: Java HotSpot 32-Bit Server VM on Windows, version 1.6.0_14

Tiny Configuration

In addition to the common hardware components, the tiny configuration was limited to:

  • 8 GB of Memory (4 x 2048 MB as PC2-5300F 2Rx8)
  • 1 x Sun 146 GB 10K RPM SAS internal drive

This is called the tiny configuration because it seems unlikely that most customers would configure an 8-core server with only one disk and only 1 GB available per core. Nevertheless, from a benchmark point of view, this configuration gave the best result.

Typical Configuration

The other two results were both produced on a configuration we considered much more typical of configurations that are actually ordered by customers.  In addition to the common hardware, these typical configuration included:

  • 32 GB of Memory (8 x 4096 MB as PC2-5300F)
  • 4 x Sun 146 GB 10K RPM SAS internal drives
  • 1 x Sun x8 PCIe Quad Gigabit Ethernet option card (X4447A-Z)

Nothing special was done with the additional components.  The added memory increased the performance component of the benchmark. The other components were installed and configured but allowed to sit idle, so consumed less power than they would have under load.

One Other Thing: Tuning for Performance

So one thing we're getting at is the difference in power consumption between a small configuration optimized for a power-performance benchmark and a typical configuration optimized for customer workloads.  Hardware (power consumption) is only half of the benchmark--the other half being the performance achieved by the System Under Test (SUT).

Tuning Choices 

In all three publications the identical tunings were applied at the software level: identical java command-line arguments and JVM-to-processor affinity.  We also applied, in the case of the better results, the common (but usually non-default) BIOS-level optimization of disabling hardware prefetcher and adjacent cache line prefetch.  These optimizations are commonly applied to produce optimized SPECpower_ssj2008 results but it is unlikely that many production applications would benefit from these settings.  To demonstrate the effect of this tuning, the final result was generated with standard BIOS settings.

 And just so we couldn't be accused of sand-bagging the results, the number of JVMs was increased in the typical configurations to take advantage of the additional memory populated over and above the tiny configuration.  Additional performance was achieved but sadly it doesn't compensate for the higher power consumption of all that memory.

So in summary we tuned:

  • Tiny Configuration: non-default BIOS settings
  • Typical Configuration 1: non-default BIOS settings; additional JVMs to utilize added memory
  • Typical Configuration 2: default BIOS settings; additional JVMs to utilize added memory

At the OS level, all tunings  were identical.

Results 

The results are summarized in this table:

System
(Click system for SPEC full disclosure)

Processors

Performance

Model

GHz

Metric
overall
ssj_ops/watt

Peak
Performance
ssj_ops

Peak
Power
watts

Idle
Power
watts

Sun Netra X4250
(8GB non-default BIOS)

L5408

2.13

600

244832

226

174

Sun Netra X425
(32GB non-default BIOS)

L5408

2.13

478

251555

294

226

Sun Netra X4250
(32GB default BIOS)

L5408

2.13

437

229828

296

225

Conclusions

  • The measurement and reporting methods of the benchmark encourage small memory configurations.  Comparing the first and second result, adding additional memory yielded minimal performance improvement (from 244832 to 251555) but a large increase in power consumption, 68 watts at peak.

  • In our opinion, unrealistically small configurations yield the best results on this benchmark.  On the more typical system, the benchmark overall metric decreased from 600 overall ssj_ops per watt to 478 overall ssj_ops per watt, despite our best effort to utilize the additional configured memory.

  • On typical configurations, reverting to default BIOS settings resulted in a significant decrease in performance (from 25155 to 229828) with no corresponding decrease in power consumption (essentially identical for both results).

Configurations typical of customer systems (with adequate memory, internal disks, and option cards) consume more power than configurations which are commonly benchmarked, while providing no corresponding improvement in SPECpower_ssj2008 benchmark performance. The result is a lower overall power-performance metric on typical configurations and a lack of published benchmark results on robust systems with the capacities and redundancies that enterprise customers desire.

Fair Use Disclosure

SPEC, SPECpower, and SPECpower_ssj are trademarks of the Standard Performance Evaluation Corporation.  All results from the SPEC website (www.spec.com)  as of June 5, 2009.  For a complete set of accepted results refer to that site.

Wednesday Jun 03, 2009

A sample of the various Sun and partner technologies to be discussed:
OpenSolaris, Solaris, Linux, Windows, vmware, gcc, Java, Glassfish, MySQL, Java, Sun-Studio, ZFS, dtrace, perflib, Oracle, DB2, Sybase, OpenStorage, CMT, SPARC64, X64, X86, Intel, AMD

This blog copyright 2009 by John Henning