Roch (rhymes with Spock) Bourbonnais : Kernel Performance Engineering
Bizarre ! Vous avez dit Bizarre ?
Archives
« juin 2009 »
lun.mar.mer.jeu.ven.sam.dim.
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML
Search

Links
 

Today's Page Hits: 487

All | General | ZFS
« Previous month (Apr 2009) | Main | Next month (Jun 2009) »
20090619 vendredi juin 19, 2009
ZFS and OpenStorage things you might have missed
Here are a few things that caught my attention.

First off, a great post showing the scalability of a 7410 with SAS Grid Computing all the way to 900MB/sec+ of throughput through a single IP interface.

SAS Grid and OpenStorage 7410

You'd think a CPU benchmark would not be speeded up by filesystem consideration but think again as you see this detailed study of

ZFS accelerated SPEC CPU

Also keep an eye on MySQL best practice from Neel and his cool mysql/innodb tools : MySQL Inno DB best practices, Inniostat & MySQL Truss

It quite nice to see that all the engineering effort is really coming together now. The ZFS we have today has made incredible strides in the last year.

posted by roch juin 19 2009, 09:22:06 AM MEST Permalink

20090611 jeudi juin 11, 2009
Compared Performance of Sun 7000 Unified Storage Array Line
The Sun Storage 7410 Unified Storage Array provides high-performance for NAS environments. Sun's product can be used on a wide variety of applications. The Sun Storage 7410 Unified Storage Array with a _single_ 10 GbE connection delivers linespeed of the 10 GbE.



All those numbers characterise a single head of a 7410 clusterable technology. The 7000 clustering technology stores all data in dual attached disk trays and no state is shared between cluster heads (see Sun 7000 Storage clusters). This means that an active-active cluster of 2 healthy 7410 will deliver 2X the performance posted here.

Also note that the performance posted here represent what is acheived under a very tightly defined constrained workload (see Designing 11 Storage metric) and those do not represent the performance limits of the systems. This is testing 1 x 10 GbE port only; each product can have 2 or 4 10 GbE ports, and by running load across multiple ports the server can deliver even higher performance. Achieving maximum performance is a separate exercise done extremely well by my friend Brendan :

Measurement Method

To measure our performance we used the open source Filebench tool accessible from SourceForge (Filebench on solarisinternals.com). Measuring performance of a NAS storage is not an easy task. One has to deal with the client side cache which needs to be bypassed, the synchronisation of multiple clients, the presence of client side page flushing deamons which can turn asynchronous workloads into synchronous ones. Because our Storage 7000 line can have such large caches (up to 128GB of ram and more than 500GB of secondary caches) and we wanted to test disk responses, we needed to find a backdoor ways to flush those caches on the servers. Read Amithaba Filebench Kit entry on the topic in which he posts a link to the toolkit used to produce the numbers.

We recently released our first major software update 2000.Q2 and along with that a new lower cost clusterable 96 TB Storage, the 7310.

We report here the compared numbers of a 7310 with the latest software release to those previously obtained for the 7410, 7210 and 7110 systems each attached to an 18 to 20 client pool over a single 10Gbe interface with the regular frame ethernet (1500 Bytes). By the way, looking at brendan's results above, I encourage you to upgrade to use Jumbo Frames ethernet for even more performance and note that our servers can drive two 10Gbe at line speed.

Tested Systems and Metrics

The tested setup are :
        Sun Storage 7410, 4 x quad core: 16 cores @ 2.3 Ghz AMD.
        128GB of host memory.
        1 dual port 10Gbe Network Atlas Card. NXGE driver. 1500 MTU
        Streaming Tests:
        2 x J4400 JBOD,  44 x 500GB SATA drives 7.2K RPM, Mirrored pool, 
        3 Write optimized 18GB SSD, 2 Read Optimized 100GB SSD.
        IOPS tests:
        12 x J4400 JBOD, 280 x 500GB SATA drives 7.2K RPM, Mirrored pool,
        272 Data drives + 8 spares.
        8-Mirrored Write Optimised 18GB SSD, 6 Read Optimized 100GB SSD.
        FW OS : ak/generic@2008.11.20,1-0

        Sun Storage 7310,2 x quad core: 8 cores @ 2.3 Ghz AMD.
        32GB of host memory.
        1 dual port 10Gbe Network Atlas  Atlas Card (1 port used). NXGE driver. 1500 MTU
        4 x J4400 JBOD for a total 92 SATA drives  7.2K RPM
        43 mirrored pairs
        4 Write Optimised 18GB SSD, 2 Read Optimized 100GB SSD.
        FW OS : Q2 2009.04.10.2.0,1-1.15

        Sun Storage 7210, 2 x quad core: 8 cores @ 2.3 Ghz AMD
        32 GB of host memory.
        1 dual port 10Gbe Network Atlas Atlas Card (1 port used). NXGE driver. 1500 MTU
        44  x 500 GB SATA drives  7.2K RPM, Mirrored pool,
        2 Write Optimised 18 GB SSD.
        FW OS : ak/generic@2008.11.20,1-0

        Sun Storage 7110, 2 x quad core opteron: 8 cores @ 2.3 Ghz AMD
        8 GB of host memory.
        1 dual port 10Gbe Network Atlas Atlas Card (1 port used). NXGE driver. 1500 MTU
        12 x 146 GB SAS drives, 10K RPM, in 3+1 Raid-Z pool.
        FW OS : ak/generic@2008.11.20,1-0


The newly released 7310 was tested with the most recent software revision and that certainly is giving the 7310 an edge over it's peers. The 7410 on the other hand was measured here managing a much large contingent of storage, including mirrored Logzillas and 3 times as many JBODs and that is expected to account for some of the performance delta being observed.



There are 6 read tests, 2 writes test and 1 synchronous write test which overwrites it's data files as a database would. A final filecreate test complete the metrics. Test executes against 20GB working set _per client_ times 18 to 20 clients. There are 4 sets used in total running over independent shares for a total of 80GB per client. So before actual runs at taken, we create all working sets or 1.6 TB of precreated data. Then before each run, we clear all caches on the clients and server.

In each of the 3 groups of 2 read tests, the first one benefits from no caching at all and the throughput delivered to the client over the network is observed to come from disk. The test runs for N seconds priming data in the Storage caches. A second run (non-cold) is then started after clearing the client side caches. Those test will see the 100% of the data delivered over the network link but not all of it is coming off the disks. Streaming test will race through the cached data and then finish off reading from disks. The random read test can also benefit from increasing cached responses as the test progresses. The exact caching characteristic of a 7000 lines will depend on a large number of parameters including your application access pattern. Numbers here reflect the performance of fully randomized test over 20GB per client x 20 clients or a 400GB working set. Upcoming studies will include more data (showing even higher performance) for workloads with higher cache hit ratio than those used here.

In a Storage 7000 server, disks are grouped together in one pool and then individual Shares are creates. Each share has access to all disk resource subject to quota (a minimum) and reservation (a maximum) that might be set. One important setup parameter associated with each share is the DB record size. It is generally better for IOPS test to use 8K records and for streaming test to use 128K records. The recordsize can be dynamically set based on expected usage.

The tests shown here were obtained with NFSv4 the default for Solaris clients (NFSv3 is expected to come out slightly better). The clients were running Solaris 10, with tuned tcp_recv_hiwat of 400K and dopageflush=0 to prevent buffered writes from being converted into synchronous writes.

Compared Results of the 7000 Storage Line







Analysis



The data shows that the entire Sun Storage 7000 line are throughput workhorse delivering 10 Gbps level NAS services per cluster head nodes, using a single Network Interface and single IP address for easy integration into your existing network.

As with other storage technology write streaming performance require more involvement from the storage controller and this leads to about 50% less write throughput compared to read throughput.

The use of write optimized SSD in the 7410, 7310 and 7220 also give this storage very high synchronous write capabilities. This is one of the most interesting result as it maps to database performance. The ability to sustain 24000 O_DSYNC writes at 192MB/sec of synchronized user data using only 48 inexpensive sata disks and 3 write optimized SSD is one of the many great performance characteristics of this novel storage system.

Random Read test generally map directly to individual disk capabilities and is a measure of total disk rotations. The cold runs shows that all our platforms are delivering data at the expected 100 IOPS per spindle for those SATA disks. Recall that our offering is based on the economical energy efficient 7.2 RPM disk technology. For cold random reads, a mirrored pair of 2 x 7.2K RPM offers the same total disk rotation (and IOPS) as expensive and power hungry 15 K RPM disks but in a much more economical package.

Moreover the difference between the warm and cold random read runs is showing that the Hybrid Storage Pool (HSP) is providing a 30% boost even on this workload that addresses randomly 400GB working set on 128GB of controller cache. The effective boost from the HSP can be much greater depending on the cacheability of workloads.

If we consider an organisation in which the avg mail message is 8K in size, our results show that we could consolidate 100000 employees on a single 7410 storage where each employee is accessing new data every 3.6 seconds with 70ms response time.

Messaging system are also big consumer of file creations, I've shown in the past how efficient ZFS can be at creating small files (Need Inodes ?). For the NFS protocol, file creation is a straining workload but the 7000 storage line comes out not too bad with more than 5000 filecreates per second per storage controller.

Conclusion

Performance Can never be summerised with a few numbers and we have just begun to scratch the surface here. The numbers presented here along with the disruptive pricing of the Hybrid Storage Pool will, I hope, go a long way to show the incredible power of the Open Storage architecture being proposed. And keep in mind that this performance is achievable using less expensive, less power hungry SATA drives and that every data services : NFS, CIFS, iSCSI, ftp, HTTP etc. offered by our Sun Storage 7000 servers are available at 0 additional software cost to you.

Disclosure Statement: Sun Microsystem generated results using filebench. Results reported 11/10/08 and 26/05/2009 Analysis done on June 6 2009.

posted by roch juin 11 2009, 09:44:22 PM MEST Permalink Comments [2]