This is an occasionally-generated index of previous entries in the BestPerf blog. Skip to next entry
Colors used:
Benchmark
Best Practices
Other
This is an occasionally-generated index of previous entries in the BestPerf blog. Skip to next entry
Colors used:
Benchmark
Best Practices
Other
The Sun SPARC Enterprise M9000 server with 2.88 GHz SPARC64 VII processors achieved 32,000 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.
The Sun SPARC Enterprise M9000 server result is 8.6x faster than the only IBM 5GHz POWER6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.
IBM has not submitted any IBM 595 results on the current SAP enhancement package 4 for SAP ERP 6.0 (unicode) Standard Sales and Distribution (SD) Benchmark. This benchmark has been current for almost a year. IBM p595 systems only have 8x more cores than the system than IBM system 550.
HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.
This new result is 1.84x times greater than the previous record result delivered on the Sun SPARC Enterprise M9000 server which used 32 processors.
In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement
Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was
released. This new release has higher cpu requirements and so yields
from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0
(non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of
this is due to the extra overhead from the processing of the larger
character strings due to Unicode encoding. See this SAP Note 1139642
for more details.
Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.
(ERP 6.0 EP is the current version of the benchmark as of January 2009)
| System | OS Database |
Users | SAP ERP/ECC Release |
SAPS | Date |
|---|---|---|---|---|---|
| Sun SPARC Enterprise M9000 64xSPARC 64 VII @2.88GHz 1152 GB |
Solaris 10 Oracle10g |
32,000 | 2009 6.0 EP4 (Unicode) |
175,600 | 18-Nov-09 |
| Sun SPARC Enterprise M9000 32xSPARC 64 VII @2.88GHz 1024 GB |
Solaris 10 Oracle10g |
17,430 | 2009 6.0 EP4 (Unicode) |
95,480 | 12-Oct-09 |
| IBM System 550 4xPower6@5GHz 64 GB |
AIX 6.1 DB2 9.5 |
3,752 | 2009 6.0 EP4 (Unicode) |
20,520 | 16-Jun-09 |
Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.
Certified Result:
| Number of SAP SD benchmark users: | 32,000 | ||
| Average dialog response time: | 0.93 seconds | ||
| Throughput: | |||
| Fully processed order line items/hour: | 3,512,000 | ||
| Dialog steps/hour: | 10,536,000 | ||
| SAPS: | 175,600 | ||
| SAP Certification: | 2009046 | ||
Hardware Configuration:
Software Configuration:
SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark
A Sun Blade 6048 Modular System with 16 Sun Blade X6275 Server Modules configured with QDR InfiniBand cluster interconnect delivered outstanding performance running the FLUENT benchmark test suite truck_111m case.
| FLUENT 12 Benchmark Test Suite - truck_111m |
|||||||
|---|---|---|---|---|---|---|---|
|
Results are "Ratings" (bigger is better) Rating = No. of sequential runs of test case possible in 1 day = 86,400 sec/(Total Elapsed Run Time in seconds) |
|||||||
| System (1) |
cores | Benchmark Test Case | |||||
| truck 111m |
|||||||
| |
|||||||
| Sun Blade X6275, 32 nodes | 256 | 240.0 | |||||
| SGI Altix ICE 8200 IP95, 32 nodes | 256 | 238.9 | |||||
| Intel Whitebox, 32 nodes | 256 | 219.8 | |||||
| |
|||||||
| Sun Blade X6275, 16 nodes | 128 | 129.6 | |||||
| SGI Altix ICE 8200 IP95, 16 nodes | 128 | 120.8 | |||||
| Intel Whitebox, 16 nodes | 128 | 116.9 | |||||
| |
|||||||
| Sun Blade X6275, 8 nodes | 64 | 64.6 | |||||
| SGI Altix ICE 8200 IP95, 8 nodes | 64 | 59.8 | |||||
| Intel Whitebox, 8 nodes | 64 | 57.4 | |||||
(1) Sun Blade X6275, X5570 QC 2.93GHz, QDR
Intel Whitebox, X5560 QC 2.8GHz, DDR
SGI Altix ICE 8200, X5570 QC 2.93GHz, DDR
Hardware Configuration:
Software Configuration:
The benchmark test are representative of typical user large CFD models intended for execution in distributed memory processor (DMP) mode over a cluster of multi-processor platforms.
The Intel X5570 processors include a turbo boost feature coupled with a speedstep option in the CPU section of the advanced BIOS settings. This, under specific circumstances, can provide a cpu upclocking, temporarily increasing the processor frequency from 2.93GHz to 3.2GHz.
Memory placement is a very significant factor with Nehalem processors. Current Nehalem platforms have two sockets. Each socket has three memory channels and each channel has 3 bays for DIMMs. For example if one DIMM is placed in the 1st bay of each of the 3 channels the DIMM speed will be 1333 MHz with the X5570's altering the DIMM arrangement to an off balance configuration by say adding just one more DIMM into the 2nd bay of one channel will cause the DIMM frequency to drop from 1333 MHz to 1067 MHz.
The FLUENT application performs computational fluid dynamic analysis on a variety of different types of flow and allows for chemically reacting species. transient dynamic and can be linear or nonlinear as far
Current FLUENT 12 Benchmark:
http://www.fluent.com/software/fluent/fl6bench/fl6bench_6.4.x/
All information on the Fluent website is Copyrighted 1995-2009 by Fluent Inc. Results from http://www.fluent.com/software/fluent/fl6bench/ as of November 12, 2009 and this presentation.
| Cluster Name and Interconnect | Throughput for 128 Cores (seconds per step) |
Throughput for 256 Cores (seconds per step) |
Throughput for 512 Cores (seconds per step) |
|---|---|---|---|
| Sun Blade X6275 InfiniBand | 0.014 | 0.0073 | 0.0048 |
| Cambridge Xeon/3.0 InfiniPath | 0.016 | 0.0088 | 0.0056 |
| NCSA Xeon/2.33 InfiniBand | 0.019 | 0.010 | 0.008 |
| AMD Opteron/2.2 InfiniPath | 0.025 | 0.015 | 0.008 |
| IBM HPCx PWR4/1.7 Federation | 0.039 | 0.021 | 0.013 |
| SDSC IBM BlueGene/L MPI | 0.108 | 0.061 | 0.044 |
The following tables report results for NAMD molecular dynamics using a cluster of Sun Blade X6275 server modules. The performance of the cluster is expressed in terms of the time in seconds that is required to execute one step of the molecular dynamics simulation. A smaller number implies better performance.
| Blades | Cores | STMV molecule (1) | f1 ATPase molecule (2) | ApoA1 molecule (3) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Thruput (secs/ step) |
spdup | effi'cy | Thruput (secs/ step) |
spdup | effi'cy | Thruput (secs/ step) |
spdup | effi'cy | ||
| 48 | 768 | 0.0277 | 37.8 | 79% | 0.0075 | 35.2 | 73% | 0.0039 | 22.2 | 46% |
| 36 | 576 | 0.0324 | 32.3 | 90% | 0.0096 | 27.4 | 76% | 0.0045 | 19.3 | 54% |
| 32 | 512 | 0.0368 | 28.4 | 89% | 0.0104 | 25.3 | 79% | 0.0048 | 18.1 | 57% |
| 24 | 384 | 0.0481 | 21.8 | 91% | 0.0136 | 19.3 | 80% | 0.0066 | 13.2 | 55% |
| 16 | 256 | 0.0715 | 14.6 | 91% | 0.0204 | 12.9 | 81% | 0.0073 | 11.9 | 74% |
| 12 | 192 | 0.0875 | 12.0 | 100% | 0.0271 | 9.7 | 81% | 0.0096 | 9.1 | 76% |
| 8 | 128 | 0.1292 | 8.1 | 101% | 0.0337 | 7.8 | 98% | 0.0139 | 6.3 | 79% |
| 4 | 64 | 0.2726 | 3.8 | 95% | 0.0666 | 4.0 | 100% | 0.0224 | 3.9 | 98% |
| 1 | 16 | 1.0466 | 1.0 | 100% | 0.2631 | 1.0 | 100% | 0.0872 | 1.0 | 100% |
spdup - speedup versus 1 blade result
effi'cy - speedup efficiency versus 1 blade result
(1) Satellite Tobacco Mosaic Virus (STMV) molecule, 1,066,628 atoms,
12 Angstrom cutoff, Langevin dynamics, 500 time steps
(2) f1 ATPase molecule, 327,506 atoms,
11 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps
(3) ApoA1 molecule, 92,224 atoms,
12 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps
Models with large numbers of atoms scale better than models with small numbers of atoms.
The Intel QC X5570 processors include a turbo boost feature coupled with a speed-step option in the CPU section of the Advanced BIOS settings. Under specific circumstances, this can provide cpu overclocking which increases the processor frequency from 2.93GHz to 3.33GHz. This feature was was enabled when generating the results reported here.
The Sun SPARC Enterprise T5240 server running the Sun Java Messaging server 7.2 achieved a World Record SPECmail2009 result using Sun Storage 7310 Unified Storage System and ZFS file system. Sun's OpenStorage platforms enable another world record.
World record SPECmail2009 benchmark using the Sun SPARC Enterprise T5240 server (two 1.6GHz UltraSPARC T2 Plus), Sun Communications Suite 7, Solaris 10, and the Sun Storage 7310 Unified Storage System achieved 14,500 SPECmail_Ent2009 users at 69,857 Sessions/Hour.
This SPECmail2009 benchmark result clearly demonstrates that the Sun Messaging Server 7.2, Solaris 10 and ZFS solution can support a large, enterprise level IMAP mail server environment as a low cost 'Sun on Sun' solution, delivering the best performance and maximizing data integrity and availability of Sun Open Storage and ZFS.
The Sun SPARC Enterprise T5240 server supported 2.4 times more users with 2.4 times better sessions/hour rate than AppleXserv3 solution on the SPECmail2009 benchmark.
There are no IBM Power6 results on this benchmark.
The configuration using Sun OpenStorage outperformed all previous results with traditional direct attached storage and significantly higher number of disk devices.
| System | Performance | Disks | OS | Messaging Server |
|
|---|---|---|---|---|---|
| Users | Sessions/ hour |
||||
| Sun SPARC Enterprise T5240 2 x 1.6GHz UltraSPARC T2 Plus |
14,500 | 69,857 | 58 NAS |
Solaris 10 | CommSuite 7.2 Sun JMS 7.2 |
| Sun SPARC Enterprise T5240 2 x 1.6GHz UltraSPARC T2 Plus |
12,000 | 57,758 | 80 DAS |
Solaris 10 | CommSuite 5 Sun JMS 6.3 |
| Sun Fire X4275 2 x 2.93GHz Xeon X5570 |
8,000 | 38,348 | 44 NAS |
Solaris 10 | Sun JMS 6.2 |
| Apple Xserv3,1 2 x 2.93GHz Xeon X5570 |
6,000 | 28,887 | 82 DAS |
MacOS 10.6 | Dovecot 1.1.14 apple 0.5 |
| Sun SPARC Enterprise T5220 1 x 1.4GHz UltraSPARC T2 |
3,600 | 17,316 | 52 DAS |
Solaris 10 | Sun JMS 6.2 |
Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org
Users - SPECmail_Ent2009 Users
Sessions/hour - SPECmail2009 Sessions/hour
NAS - Network Attached Storage
DAS - Direct Attached Storage
Hardware Configuration:
External Storage:
Software Configuration:
The SPECmail2009 benchmark measures the ability of corporate e-mail systems to meet today's demanding e-mail users over fast corporate local area networks (LAN). The SPECmail2009 benchmark simulates corporate mail server workloads that range from 250 to 10,000 or more users, using industry standard SMTP and IMAP4 protocols. This e-mail server benchmark creates client workloads based on a 40,000 user corporation, and uses folder and message MIME structures that include both traditional office documents and a variety of rich media content. The benchmark also adds support for encrypted network connections using industry standard SSL v3.0 and TLS 1.0 technology. SPECmail2009 replaces all versions of SPECmail2008, first released in August 2008. The results from the two benchmarks are not comparable.
Software on one or more client machines generates a benchmark load for a System Under Test (SUT) and measures the SUT response times. A SUT can be a mail server running on a single system or a cluster of systems.
A SPECmail2009 'run' simulates a 100% load level associated with the specific number of users, as defined in the configuration file. The mail server must maintain a specific Quality of Service (QoS) at the 100% load level to produce a valid benchmark result. If the mail server does maintain the specified QoS at the 100% load level, the performance of the mail server is reported as SPECmail_Ent2009 SMTP and IMAP Users at SPECmail2009 Sessions per hour. The SPECmail_Ent2009 users at SPECmail2009 Sessions per Hour metric reflects the unique workload combination for a SPEC IMAP4 user.
Each Sun Storage 7310 Unified Storage System was configured with one J4400 JBOD array with 22x1TB SATA drives to a mirrored device and 4 shared volumes are built under the mirrored device. Total 8 mirrored volumes from 2 x Sun Storage 7310 are mounted on the system under test (SUT) messaging mail indexes and mail messages file system using NFSV4 protocol. Four SSDs were used as the SUT internal disks. Each SSD is configured as a ZFS file system. Four such ZFS directories are used for the messaging server queue, store metadata, LDAP and queue. SSDs substantially reduced the store metadata and queue latencies.
Each Sun Storage 7310 Unified Storage System was connected to the SUT via a dual 10-Gigabit Ethernet Fiber XFP card.
The Sun Storage 7310 Unified Storage System software version is 2009.08.11,1-0.
The clients used these Java options: java -d64 -Xms4096m -Xmx4096m -XX:+AggressiveHeap
Substantial performance improvement and scalability was observed with Sun Communications Suite7 update2, Java Messaging Server 7.2 and Directory Server 6.2
See the SPEC Report for all OS, network and messaging server tunings.
SPEC, SPECmail reg tm of Standard Performance Evaluation Corporation. Results as of 10/22/09 on www.spec.org. SPECmail2009: Sun SPARC Enterprise T5240, SPECmail_Ent2009 14,500 users at 69,857 SPECmail2009 Sessions/hour. Apple Xserv3,1, SPECmail_Ent2009 6,000 users at 28,887 SPECmail2009 Sessions/hour.
The Sun F20 card is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.
The Sun Flash Accelerator F20 PCIe Card (low-profile x8 size) has the IOPS performance of over 550 SAS drives or 1,100 SATA drives.
| Test | DOMs | |||
|---|---|---|---|---|
| 4 | 2 | 1 | ||
| Random 4K Read | 101K IOPS | 68K IOPS | 35K IOPS | |
| Maximum Delivered Random 4K Write | 88K IOPS | 44K IOPS | 22K IOPS | |
| Maximum Delivered 50-50 4K Read/Write | 54K IOPS | 27K IOPS | 13K IOPS | |
| Sequential Read (1M) | 1.1 GB/sec | 547 MB/sec | 273 MB/sec | |
| Maximum Delivered Sequential Write (1M) | 567 MB/sec | 243 MB/sec | 125 MB/sec | |
| |
||||
| Sustained Random 4K Write* | 37K IOPS | 18K IOPS | 10K IOPS | |
| Sustained 50/50 4K Read/Write* | 34K IOPS | 17K IOPS | 8.6K IOPS | |
(*) Maximum Delivered values measured over a 1 minute period. Sustained write performance differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.
The Sun Flash Accelerator F20 PCIe Card is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.
| Transfer Size | Service Time (ms) | |
|---|---|---|
| Read | Write | |
| 4 KB | 0.32 | 0.22 |
| 8 KB | 0.34 | 0.24 |
| 16 KB | 0.37 | 0.27 |
| 32 KB | 0.43 | 0.33 |
| 64 KB | 0.54 | 0.46 |
| 128 KB | 0.49 | 1.30 |
| 256 KB | 1.31 | 2.15 |
| 512 KB | 2.25 | 2.25 |
- Latencies are measured application latencies via vdbench tool.
- Please note that the FlashFire F20 card
is a 4KB sector device. Doing IOs of less than
4KB in size, or not aligned on 4KB
boundaries, can result in a significant
performance degradations on write operations.
Storage:
Servers:
Software:
Sun measured a wide variety of IO performance metrics on the Sun Flash Accelerator F20 PCIe Card using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.
Vdbench profile f20-parmfile.txt is here for bandwidth and IOPs. And here is the vdbench profile f20-latency.txt file for latency.
Vdbench is publicly available for download at: http://vdbench.org
Sun Flash Accelerator F20 PCIe Card delivered 100K 4K read IOPS and 1.1 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 14, 2009.
Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.
The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 12/14/09.
The 12-node Sun SPARC Enterprise T5440 server cluster beats the
performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by
26% and has 16% better price/performance on the TPC-C benchmark.
The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).
The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.
The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.
The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.
The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.
Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.
This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.
More information on this benchmark will be posted in the next several days.
System |
tpmC | Price/tpmC | Avail | Database | Cluster | Racks | w/KtpmC |
|---|---|---|---|---|---|---|---|
| 12 x Sun SPARC Enterprise T5440 | 7,646,487 | 2.36 USD | 12/14/09 | Oracle 11g RAC | Y | 9 | 9.6 |
| IBM Power 595 | 6,085,166 | 2.81 USD | 12/10/08 | IBM DB2 9.5 | N | 76 | 56.4 |
| HP Integrity Superdome | 4,092,799 | 2.93 USD | 08/06/07 | Oracle 10g R2 | N | 46 | to be added |
Avail - Availability date
w/KtmpC - Watts per 1000 tpmC
Racks - clients, servers, storage, infrastructure
Sun and IBM TPC-C Response times
System |
tpmC |
Response Time New Order 90th% |
Response Time New Order Average |
|---|---|---|---|
| 12 x Sun SPARC Enterprise T5440 | 7,646,487 | 0.170 | 0.168 |
| IBM Power 595 | 6,085,166 | 1.69 |
1.22 |
| Response Time Ratio - Sun Better |
9.9x | 7.3x |
Sun uses 7x comparison to highlight the differences in response times between Sun's solution and IBM. Although notice that Sun is 10x faster on New Order transactions that finish in the 90% percentile.
It is also interesting to note that none of Sun's response times, avg or 90th percentile, for any transaction is over 0.25 seconds. While IBM does not have even one interactive transaction, not even the menu, below 0.50 seconds. Graphs of Sun's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.
Hardware Configuration:
Software Configuration:
TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.
See Also
TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 12/14/09. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 11/5/09.
A Sun Blade 6048 Modular System with 8 Sun Blade X6275 Server Modules configured with QDR InfiniBand cluster interconnect delivered outstanding performance running the FLUENT 12 benchmark test suite. Sun consistently delivered the best or near best results per node for the 6 benchmark tests considered up to the available nodes considered for these runs.
| FLUENT 12 Benchmark Test Suite |
|||||||||
|---|---|---|---|---|---|---|---|---|---|
|
Results are "Ratings" (bigger is better) Rating = No. of sequential runs of test case possible in 1 day 86,400/(Total Elapsed Run Time in Seconds) |
|||||||||
| System |
Nodes | Ranks | Benchmark Test Case | ||||||
| eddy 417k |
turbo 500k |
aircraft 2m |
sedan 4m |
truck 14m |
truck_poly 14m |
||||
| |
|||||||||
| Sun Blade X6275 | 16 | 128 | 6496.2 | 19307.3 | 8408.8 | 6341.3 | 1060.1 | 984.1 | |
| Best Intel | 16 | 128 | 5236.4 (3) | 15638.0 (7) | 7981.5 (1) | 6582.9 (1) | 1005.8 (1) | 933.0 (1) | |
| Best SGI | 16 | 128 | 7578.9 (5) | 14706.4 (6) | 6789.8 (4) | 6249.5 (5) | 1044.7 (4) | 926.0 (4) | |
| |
|||||||||
| Sun Blade X6275 | 8 | 64 | 5308.8 | 26790.7 | 5574.2 | 5074.9 | 547.2 | 525.2 | |
| Best Intel | 8 | 64 | 5016.0 (1) | 25226.3 (1) | 5220.5 (1) | 4614.2 (1) | 513.4 (1) | 490.9 (1) | |
| Best SGI | 8 | 64 | 5142.9 (4) | 23834.5 (4) | 4614.2 (4) | 4352.6 (4) | 529.4 (4) | 479.2 (4) | |
| |
|||||||||
| Sun Blade X6275 | 4 | 32 | 3066.5 | 13768.9 | 3066.5 | 2602.4 | 289.0 | 270.3 | |
| Best Intel | 4 | 32 | 2856.2 (1) | 13041.5 (1) | 2837.4 (1) | 2465.0 (1) | 266.4 (1) | 251.2 (1) | |
| Best SGI | 4 | 32 | 3083.0 (4) | 13190.8 (4) | 2588.8 (5) | 2445.9 (5) | 266.6 (4) | 246.5 (4) | |
| |
|||||||||
| Sun Blade X6275 | 2 | 16 | 1714.3 | 7545.9 | 1519.1 | 1345.8 | 144.4 | 141.8 | |
| Best Intel | 2 | 16 | 1585.3 (1) | 7125.8 (1) | 1428.1 (1) | 1278.6 (1) | 134.7 (1) | 132.5 (1) | |
| Best SGI | 2 | 16 | 1708.4 (4) | 7384.6 (4) | 1507.9 (4) | 1264.1 (5) | 128.8 (4) | 133.5 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 8 | 931.8 | 4061.1 | 827.2 | 681.5 | 73.0 | 73.8 | |
| Best Intel | 1 | 8 | 920.1 (2) | 3900.7 (2) | 784.9 (2) | 644.9 (1) | 70.2 (2)) | 70.9 (2) | |
| Best SGI | 1 | 8 | 953.1 (4) | 4032.7 (4) | 843.3 (4) | 651.0 (4) | 71.4 (4) | 72.0 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 4 | 550.4 | 2425.3 | 533.6 | 423.0 | 41.6 | 41.6 | |
| Best Intel | 1 | 4 | 515.7 (1) | 2244.2 (1) | 490.8 (1) | 392.2 (1) | 37.8 (1) | 38.4 (1) | |
| Best SGI | 1 | 4 | 561.6 (4) | 2416.8 (4) | 526.9 (4) | 412.6 (4) | 40.9 (4) | 40.8 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 2 | 299.6 | 1328.2 | 293.9 | 232.1 | 21.3 | 21.6 | |
| Best Intel | 1 | 2 | 274.3 (1) | 1201.7 (1) | 266.1 (1) | 214.2 (1) | 18.9 (1) | 19.6 (1) | |
| Best SGI | 1 | 2 | 294.2 (4) | 1302.7 (4) | 289.0 (4) | 226.4 (4) | 20.5 (4) | 21.2 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 1 | 154.7 | 682.6 | 149.1 | 114.8 | 9.7 | 10.1 | |
| Best Intel | 1 | 1 | 143.5 (1) | 631.1 (1) | 137.4 (1) | 106.2 (1) | 8.8 (1) | 9.0 (1) | |
| Best SGI | 1 | 1 | 153.3 (4) | 677.5 (4) | 147.3 (4) | 111.2 (4) | 10.3 (4) | 9.5 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | serial | 155.6 | 676.6 | 156.9 | 110.0 | 9.4 | 10.3 | |
| Best Intel | 1 | serial | 146.6 (2) | 650.0 (2) | 150.2 (2) | 105.6 (2) | 8.8 (2) | 9.7 (2) | |
| |
|||||||||
Sun Blade X6275, X5570 QC 2.93 GHz, QDR SMT on / Turbo mode on
(1) Intel Whitebox (X5560 QC 2.80 GHz, RHEL5, IB)
(2) Intel Whitebox (X5570 QC 2.93 GHz, RHEL5)
(3) Intel Whitebox (X5482 QC 3.20 GHz, RHEL5, IB)
(4) SGI Altix ICE_8200IP95 (X5570 2.93 GHz +turbo, SLES10, IB)
(5) SGI Altix_ICE_8200IP95 (X5570 2.93 GHz, SLES10, IB)
(6) SGI Altix_ICE_8200EX (Intel64 QC 3.00 GHz, Linux, IB)
(7) Qlogic Cluster (X5472 QC 3.00 GHz, RHEL5.2, IB Truescale)
Hardware Configuration:
Software Configuration:
The benchmark tests are representative of typical user large CFD models intended for execution in distributed memory processor (DMP) mode over a cluster of multi-processor platforms.
These processors include a turbo boost feature coupled with a speedstep option in the CPU section of the Advanced BIOS settings. This, under specific circumstances, can provide a cpu up clocking, temporarily increasing the processor frequency from 2.93GHz to 3.2GHz.
Memory placement is a very significant factor with Nehalem processors. Current Nehalem platforms have two sockets. Each socket has three memory channels and each channel has 3 bays for DIMMs. For example if one DIMM is placed in the 1st bay of each of the 3 channels the DIMM speed will be 1333 MHz with the X5570's altering the DIMM arrangement to an off balance configuration by say adding just one more DIMM into the 2nd bay of one channel will cause the DIMM frequency to drop from 1333 MHz to 1067 MHz.
The FLUENT application performs computational fluid dynamic analysis on a variety of different types of flow and allows for chemically reacting species. transient dynamic and can be linear or nonlinear as far
FLUENT 12.0 Benchmark:
http://www.fluent.com/software/fluent/fl6bench/fl6bench_6.4.x/
All information on the Fluent website is Copyrighted 1995-2009 by Fluent Inc. Results from http://www.fluent.com/software/fluent/fl6bench/ as of October 20, 2009 and this presentation.
This is an occasionally-generated index of previous entries in the BestPerf blog. Skip to next entry
Colors used:
Benchmark
Best Practices
Other
A Sun Ultra 27 workstation configured with an nVidia FX5800 graphics card delivered outstanding performance running the SPECviewperf® 10 benchmark.
When compared with other workstations running a single graphics card (i.e. not running two or more cards in SLI mode), the Sun Ultra 27 workstation places first in 6 of 8 subtests and second in the remaining two subtests.
The calculated geometric mean shows that Sun Ultra 27 workstation is 11% faster than competitor's workstations.
The optimum point for price/performance is the nVidia FX1800 graphics card.
Results have been published on the SPEC web site at http://www.spec.org/gwpg/gpc.data/vp10/summary.html.
Performance of the Sun Ultra 27 versus the competition. Bigger is better for each of the eight tests. The comparison is based upon the performance of the Sun Ultra 27 workstation. Performance is measured in frames per second.
| |
3DSMAX | CATIA | ENSIGHT | MAYA | ||||
|---|---|---|---|---|---|---|---|---|
| Perf | % | Perf | % | Perf | % | Perf | % | |
| Sun Ultra 27 FX5800 | 59.34 | 68.81 | 58.07 | 246.09 | ||||
| HP xw4600 ATI FireGL V7700 | 49.71 | 19 | 48.05 | 43 | 57.11 | 2 |
268.62 | -8 |
| HP xw4600 FX4800 | 52.26 | 14 | 63.26 | 12 | 53.79 | 8 |
226.82 | 7 |
| Fujtsu Celsius M470 FX3800 | 53.67 | 11 | 65.25 | 7 | 52.19 | 10 | 227.37 | 7 |
| |
PROENGINEER | SOLIDWORKS | TEAMCENTER | UGS | ||||
| Perf | % | Perf | % | Perf | % | Perf | % | |
| Sun Ultra 27 FX5800 | 68.96 | 152.01 | 42.02 | 36.04 | ||||
| HP xw4600 ATI FireGL V7700 | 47.25 | 32 | 109.71 | 28 | 40.18 | 4 | 56.65 | -57 |
| HP xw4600 FX4800 | 61.15 | 11 | 131.31 | 14 | 28.42 | 32 | 33.43 | 7 |
| Fujtsu Celsius M470 FX3800 | 64.39 | 7 |
139.2 | 8 | 29.02 | 31 | 33.27 | 8 |
Comparison of various frame buffers on the Sun Ultra 27 running SPECviewperf 10. Performance is reported for each test along with the difference in performance as compared to the FX5800 frame buffer. The runs in the table below were made with 3.2GHz W3570 processors.
| |
3DSMAX | CATIA | ENSIGHT | MAYA | PROENGR | SOLIDWRKS | TEAMCNTR | UGS | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Perf | % | Perf | % | Perf | % | Perf | % | Perf | % | Perf | % | Perf | % | Perf | % | |
| FX5800 | 57.07 | 67.84 | 58.63 | 219.4 | 68.05 | 152.3 | 40.85 | 34.73 | ||||||||
| FX3800 | 57.17 | 0 | 66.57 | 2 |
54.91 | 7 |
206.4 | 6 | 66.48 | 2 | 146.3 | 4 | 38.48 | 6 | 33.12 | 5 |
| FX1800 | 56.73 | 1 |
64.33 | 6 |
52.05 | 13 | 189.3 | 16 | 64.67 | 5 | 135.2 | 13 | 34.18 | 20 |
30.46 | 14 |
| FX380 | 45.90 | 24 | 55.81 | 22 | 34.93 | 68 | 120.3 | 82 | 46.09 | 48 | 64.11 | 138 | 17.00 | 140 | 13.88 | 150 |
Hardware Configuration:
Software Configuration:
SPECviewperf measures 3D graphics rendering performance of systems running under OpenGL. SPECviewperf is a synthetic benchmark designed to be a predictor of application performance and a measure of graphics subsystem performance. It is a measure of graphics subsystem performance (primarily graphics bus, driver and graphics hardware) and its impact on the system without the full overhead of an application. SPECviewperf reports performance in frames per second.
Please go here for a more complete description of the tests.
SPECviewperf measures the 3D rendering performance of systems running under OpenGL.
The SPECopcSM project group's SPECviewperf 10 is totally new performance evaluation software. In addition to features found in previous versions, it now provides the ability to compare performance of systems running in higher-quality graphics modes that use full-scene anti-aliasing, and measures how effectively graphics subsystems scale when running multithreaded graphics content. Since the SPECviewperf source and binaries have been upgraded to support changes, no comparisons should be made between past results and current results for viewsets running under SPECviewperf 10.
SPECviewperf 10 requires OpenGL 1.5 and a minimum of 1GB of system memory. It currently supports Windows 32/64.
SPEC® and the benchmark name SPECviewperf® are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Oct 18, 2009. For the latest SPECviewperf benchmark results, visit www.spec.org/gwpg.
The Sun Storage 6780 array outperforms the IBM DS5300 by 51% in price performance for SPC-2 benchmark using RAID 5 data protection.
The Sun Storage 6780 array outperforms the IBM DS5300 by 51% in price performance for SPC-2 benchmark using RAID 6 data protection.
The Sun Storage 6780 Array has 62% better performance than the Fujitsu 800/1100 and delivers a price performance advantage of 5.6x as measured by the SPC-2 benchmark.
The Sun Storage 6800 array with 8Gb connectivity improved performance by 36% over the 4GB connected solution as measured by the SPC-2 benchmark.
SPC-2 Performance Chart (in increasing price-performance order)
| Sponsor | System | SPC-2 MBPS |
$/SPC-2 MBPS |
ASU Capacity (GB) |
TSC Price | Data Protection Level |
Date | Results Identifier |
|---|---|---|---|---|---|---|---|---|
| Sun | SS6780 (8Gb) | 5,634.17 | $44.88 | 16,383.186 | $252,873 | RAID 5 | 10/27/09 | B00047 |
| IBM | DS5300 (8Gb) | 5,634.17 | $67.75 | 16,383.186 | $381,720 | RAID 5 | 10/21/09 | B00045 |
| Sun | SS6780 (8Gb) | 5,543.88 | $45.61 | 14,042.731 | $252,873 | RAID 6 | 10/27/09 | B00048 |
| IBM | DS5300 (8Gb) | 5,543.88 | $68.85 | 14,042.731 | $381,720 | RAID 6 | 10/21/09 | B00046 |
| Sun | SS6780 (4Gb) | 4,818.43 | $53.61 | 16,383.186 | $258,329 | RAID 5 | 02/03/09 | B00039 |
| IBM | DS5300 (4Gb) | 4,818.43 | $93.80 | 16,383.186 | $451,986 | RAID 5 | 09/25/08 | B00037 |
| Sun | SS6780 (4Gb) | 4,675.50 | $55.25 | 14,042.731 | $258,329 | RAID 6 | 02/03/09 | B00040 |
| IBM | DS5300 (4Gb) | 4,675.50 | $96.67 | 14,042.731 | $451,986 | RAID 6 | 09/25/08 | B00038 |
| Fujitsu | 800/1100 | 3,480.68 | $238.93 | 4,569.845 | $831,649 | Mirroring | 03/08/07 | B00019 |
SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric
Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.
Storage Configuration:
Server Configuration:
Software Configuration:
$/Perf, performance, bandwidth, OpenStorage, Storage
SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6780 Array 5,634.17 SPC-2 MBPS, $/SPC-2 MBPS $44.88, ASU Capacity 16,838.186GB, Protect RAID 5, Cost $252,873.00, Ident. B00047. Sun Storage 6780 Array 5,543.88 SPC-2 MBPS, $/SPC-2 MBPS $45.61, ASU Capacity 14,042.731 GB, Protect RAID 6, Cost $252,873.00, Ident. B00048.
See here for publication rules.
A Sun Blade 6048 Modular System with 12 Sun Blade X6275 server modules were clustered together with QDR InfiniBand and using a Lustre File System with QDR InfiniBand to show performance improvements over an NFS file system for reading in Velocity, Epsilon, and Delta Slices and imaging 800 samples of various various grid sizes using the Reverse Time Migration.
This first table presents the initialization time, comparing different number processors along with different problem sizes. The results are presented in seconds and shows the advantage the Lustre file system running over QDR InfiniBand provided when compared to a simple NFS file system.
| Initialization Time Performance Comparison Reverse Time Migration - SMP Threads and MPI Mode |
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Nodes | Procs | 125 x 1151 x 1231 800 Samples |
1243 x 1151 x 1231 800 Samples |
2486 x 1151 x 1231 800 Samples |
|||||||
| Lustre Time (sec) |
NFS Time (sec) |
Lustre Time (sec) |
NFS Time (sec) |
Lustre Time (sec) |
NFS Time (sec) |
||||||
| 24 | 48 | 1.59 | 18.90 | 8.90 | 181.78 | 15.63 | 362.48 | ||||
| 20 | 40 | 1.60 | 18.90 | 8.93 | 181.49 | 16.91 | 358.81 | ||||
| 16 | 32 | 1.58 | 18.59 | 8.97 | 181.58 | 17.39 | 353.72 | ||||
| 12 | 24 | 1.54 | 18.61 | 9.35 | 182.31 | 22.50 | 364.25 | ||||
| 8 | 16 | 1.40 | 18.60 | 10.02 | 183.79 | ||||||
| 4 | 8 | 1.57 | 18.80 | ||||||||
| 2 | 4 | 2.54 | 19.31 | ||||||||
| 1 | 2 | 4.54 | 20.34 | ||||||||
This next table presents the total application run time, comparing different number processors along with different problem sizes. It shows that for larger problems, using the Lustre file system running over QDR InfiniBand provided a big performance advantage when compared to a simple NFS file system.
| Total Application Performance Comparison Reverse Time Migration - SMP Threads and MPI Mode |
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Nodes | Procs | 125 x 1151 x 1231 800 Samples |
1243 x 1151 x 1231 800 Samples |
2486 x 1151 x 1231 800 Samples |
|||||||
| Lustre Time (sec) |
NFS Time (sec) |
Lustre Time (sec) |
NFS Time (sec) |
Lustre Time (sec) |
NFS Time (sec) |
||||||
| 24 | 48 | 251.48 | 273.79 | 553.75 | 1125.02 | 1107.66 | 2310.25 | ||||
| 20 | 40 | 232.00 | 253.63 | 658.54 | 971.65 | 1143.47 | 2062.80 | ||||
| 16 | 32 | 227.91 | 209.66 | 826.37 | 1003.81 | 1309.32 | 2348.60 | ||||
| 12 | 24 | 217.77 | 234.61 | 884.27 | 1027.23 | 1579.95 | 3877.88 | ||||
| 8 | 16 | 223.38 | 203.14 | 1200.71 | 1362.42 | ||||||
| 4 | 8 | 341.14 | 272.68 | ||||||||
| 2 | 4 | 605.62 | 625.25 | ||||||||
| 1 | 2 | 892.40 | 841.94 | ||||||||
The following table presents the run time and speedup of just the computational kernel for different processor counts for the three different problem sizes considered. The scaling results are based upon the smallest number of nodes run and that number is used as the baseline reference point.
| Computational Kernel Performance & Scalability Reverse Time Migration - SMP Threads and MPI Mode |
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Nodes | Procs | 125 x 1151 x 1231 800 Samples |
1243 x 1151 x 1231 800 Samples |
2486 x 1151 x 1231 800 Samples |
|||||||
| X6275 Time (sec) |
Speedup: 1-node |
X6275 Time (sec) |
Speedup: 1-node |
X6275 Time (sec) |
Speedup: 1-node |
||||||
| 24 | 48 | 35.38 | 13.7 | 210.82 | 24.5 | 427.40 | 24.0 | ||||
| 20 | 40 | 35.02 | 13.8 | 255.27 | 20.2 | 517.03 | 19.8 | ||||
| 16 | 32 | 41.76 | 11.6 | 317.96 | 16.2 | 646.22 | 15.8 | ||||
| 12 | 24 | 49.53 | 9.8 | 422.17 | 12.2 | 853.37 | 12.0* | ||||
| 8 | 16 | 62.34 | 7.8 | 645.27 | 8.0* | ||||||
| 4 | 8 | 124.66 | 3.9 | ||||||||
| 2 | 4 | 238.80 | 2.0 | ||||||||
| 1 | 2 | 484.89 | 1.0 | ||||||||
The last table presents the speedup of the total application for different processor counts for the three different problem sizes presented. The scaling results are based upon the smallest number of nodes run and that number is used as the baseline reference point.
| Total Application Scalability Comparison Reverse Time Migration - SMP Threads and MPI Mode |
|||||
|---|---|---|---|---|---|
| Nodes | Procs | 125 x 1151 x 1231 800 Samples Lustre Speedup: 1-node |
1243 x 1151 x 1231 800 Samples Lustre Speedup: 1-node |
2486 x 1151 x 1231 800 Samples Lustre Speedup: 1-node |
|
| 24 | 48 | 3.6 | 17.3 | 17.1 | |
| 20 | 40 | 3.8 | 14.6 | 16.6 | |
| 16 | 32 | 4.0 | 11.6 | 14.5 | |
| 12 | 24 | 4.1 | 10.9 | 12.0* | |
| 8 | 16 | 4.0 | 8.0* | ||
| 4 | 8 | 2.6 | |||
| 2 | 4 | 1.5 | |||
| 1 | 2 | 1.0 | |||
Note: HyperThreading is enabled and running 16 threads per Node.
Software Configuration:
The Reverse Time Migration (RTM) is currently the most popular seismic processing algorithm because of its ability to produce quality images of complex substructures. It can accurately image steep dips that can not be imaged correctly with traditional Kirchhoff 3D or frequency domain algorithms. The Wave Equation Migration (WEM) can image steep dips but does not produce the image quality that can be achieved by the RTM. However, the increased computational complexity of the RTM over the WEM introduces new areas for performance optimization. The current trend in seismic processing is to perform iterative migrations on wide azimuth marine data surveys using the Reverse Time Migration.
This Reverse Time Migration code reads in processing parameters that define the grid dimensions, number of threads, number of processors, imaging condition, and various other parameters. The master node calculates the memory requirements to determine if there is sufficient memory to process the migration "in-core". The domain decomposition across all the nodes is determined by dividing the first grid dimension by the number of nodes. Each node then reads in it's section of the Velocity Slices, Delta Slices, and Epsilon Slices using MPI IO reads. The three source and receiver wavefield state vectors are created: previous, current, and next state. The processing steps through the input trace data reading both the receiver and source data for each of the 800 time steps. It uses forward propagation for the source wave field and backward propagation in time to cross correlate the receiver wavefield. The computational kernel consists of a 13 point stencil to process a subgrid within the memory of each node using OpenMP parallelism. Afterwards, conditioning and absorption are applied and boundary data is communicated to neighboring nodes as each time step is processed. The final image is written out using MPI IO.
Total memory requirements for each grid size:
For this phase of benchmarking, the focus was to optimize the data
initialization. In the next phase of benchmarking, the trace data
reading will be optimized so that each node reads in only it's section
of interest. In this benchmark the trace data
reading skews the Total Application Performance as the number of nodes
increase. This will be optimized in the next phase of benchmarking, as
well as, further node optimization with OpenMP. The IO description for
this benchmark phase on each grid size:
A prominent Seismic Processing algorithm, Reverse Time Migration with Optimal Checkpointing, in SMP "THREADS" Mode, was testing using a Sun Fire X4270 server configured with four high performance 15K SAS hard disk drives (HDDs) and a Sun Storage F5100 Flash Array. This benchmark compares I/O devices for checkpointing wave state information while processing a production seismic migration.
These results show the new trend in seismic processing to run iterative Reverse Time Migrations and migration playback is a reality. This is made possible through the use of Sun FlashFire technology to provide good checkpointing speeds without additional disk cache memory. The application can take advantage of all the memory within a node without regard to checkpoint cache buffers required for performance to HDDs. Similarly, larger problem sizes can be solved without increasing the memory footprint of each computational node.
| Reverse Time Migration Optimal Checkpointing - SMP Threads Mode Grid Size -800 x 1151 x 1231 with 800 Samples - 60GB of memory |
|||||||||
|---|---|---|---|---|---|---|---|---|---|
| Number Checkpts |
HDD | F5100 | |||||||
| Put Time (secs) |
Get Time (secs) |
Total Time (secs) |
Put Time (secs) |
Get Time (secs) |
Total Time (secs) |
F5100 Speedup |
|||
| 80 | 660.8 | 25.8 | 686.6 | 277.4 | 40.2 | 317.6 | 2.2x | ||
| 400 | 1615.6 | 382.3 | 1997.9 | 989.5 | 269.7 | 1259.2 | 1.6x | ||
| Reverse Time Migration Optimal Checkpointing - SMP Threads Mode Grid Size -125 x 1151 x 1231 with 800 Samples - 9GB of memory |
|||||||||
|---|---|---|---|---|---|---|---|---|---|
| Number Checkpts |
HDD | F5100 | |||||||
| Put Time (secs) |
Get Time (secs) |
Total Time (secs) |
Put Time (secs) |
Get Time (secs) |
Total Time (secs) |
F5100 Speedup |
|||
| 80 | 10.2 | 0.2 | 10.4 | 8.0 | 0.2 | 8.2 | 1.3x | ||
| 400 | 52.3 | 0.4 | 52.7 | 45.2 | 0.3 | 45.5 | 1.2x | ||
| 800 | 102.6 | 0.7 | 103.3 | 91.8 | 0.6 | 92.4 | 1.1x | ||
| Reverse Time Migration Optimal Checkpointing Single Thread vs Multithreaded I/O Performance Grid Size -125 x 1151 x 1231 with 800 Samples - 9GB of memory |
|||||||||
|---|---|---|---|---|---|---|---|---|---|
| Number Checkpts |
Single Thread F5100 Total Time (secs) |
Multithreaded F5100 Total Time (secs) |
Multithread Speedup |
||||||
| 80 | 105.3 | 8.2 | 12.8x | ||||||
| 400 | 482.9 | 45.5 | 10.6x | ||||||
| 800 | 963.5 | 92.4 | 10.4x | ||||||
Note: Hyperthreading and Turbo Mode enabled while running 16 threads per node.
Software Configuration:
The Reverse Time Migration (RTM) is currently the most popular seismic processing algorithm because of it's ability to produce quality images of complex substructures. It can accurately image steep dips that can not be imaged correctly with traditional Kirchhoff 3D or frequency domain algorithms. The Wave Equation Migration (WEM) can image steep dips but does not produce the image quality that can be achieved by the RTM. However, the increased computational complexity of the RTM over the WEM introduces new areas for performance optimization. The current trend in seismic processing is to perform iterative migrations on wide azimuth marine data surveys using the Reverse Time Migration.
The Reverse Time Migration with Optimal Checkpointing was introduced so large migrations could be performed within minimal memory configurations of x86 cluster nodes. The idea is to only have three wavestate vectors in memory for each of the source and receiver wavefields instead of holding the entire wavefields in memory for the duration of processing. With the Sun Flash F5100, this can be done with little performance penalty to the full migration time. Another advantage of checkpointing is to provide the ability to playback migrations and facilitate iterative migrations.
The Reverse Time Migration with Optimal Checkpointing is an algorithm designed by Griewank (Griewank, 1992; Blanch et al., 1998; Griewank, 2000; Griewank and Walther, 2000; Akcelik et al., 2003).
For the purposes of this benchmark, this implementation of the Reverse Time Migration with Optimal Checkpointing does not fully implement the optimal memory buffer scheme proposed by Griewank. The intent is to compare various I/O alternatives for saving wave state data for each node in a compute cluster.
This benchmark measures the time to perform the wave state saves and restores while simultaneously processing the wave state data.
This wiki hosts the combined wisdom of many performance engineers from across Sun. It has information about Hardware, Software, ZFS, Oracle and other various performance topics. This wiki attempts to categorize and present information so it is easy to find and use. It is getting started, but please let us know if there are any topics which would be useful.
Link:
http://blogs.sun.com/glennf/entry/exadata_v2_oracle_grid_consolidation
This blog copyright 2009 by John Henning