Monday Nov 10, 2008

Last Wednesday, Sun released results pertaining to consolidating Web2.0 infrastructure on Sun CMT Servers. This was accomplished using Suns own Web2.0 benchmarking kit which is based on the Olio toolkit. That result demonstrates consolidated Web2.0 tiers running well and taking advantage of the Solaris/CMT combination. So how about storage consolidation within this framework?

With the Sun Storage 7410, ease of storage management, reliability, and performance makes it an intriguing choice for consolidating Web2.0 storage. So we decided to take a look at why this device might be a good fit. My particular environment consisted of a Sun Fire T5120 for the web tier consolidation, and the 7410, with 1x J4400 and 24x 700GB SATA drives.

If I'm going to chose a NAS device for consolidating storage, it's got to be reliable and easy to use. From a reliability standpoint, the 7410 utilizes ZFS, I won't rehash the features which make this a great choice. I was more interested in the ease of use. How quickly can I get a reliable filesystem up and exported to my client?

The answer, was pretty fast. Setting up a zfs filesystem was already very easy. The front end for managing the 7410 appliance makes it so I don't even need to know that much.

  • Selected disks to use, RAID config, spares, and a zpool was created.
  • Selected the default project
  • Set permissions and size quota
  • Set a mount point
  • Done!

I now had a filesystem I can mount on my clients. No messing around with dfstab, no setup to make an nfs server. Of course you can also use features of zfs as well (quotas, compression, checksumming, etc...).


Aside from being simple to use and reliable, NAS should be practical, have good observability and perform reasonably. The 7410 I used consumed 6 RU, 2RU for the server, and 4 RU for the J4400. Contained in that was 16.8TB's of raw space (SATA drives), the server, and the ability to use SSD's to help performance for buffering I/O (L2ARC). That's a nice combination of space, cheap (SATA) storage, with the ability to achieve good performance. How about observability?

Appliance analytics is quite frankly an incredible means to gather/show performance information. At the root of it is DTrace which enables very detailed visibility as to what's happening on the 7410. Beyond DTrace, it not only gives up to the second graphical reports, but also stores historical data. Graphically I can see things like disk IO, network IO, cache utilization, backup/restore information, just to name a few. Within each category, one can drill down in numerous ways, such as NFS operations by share or network bytes by direction. It's all point and click, and very easy to get the information you're interested in. For a much better analysis on the 7000 analytics, check out this document.


Finally, performance. After all, that's ultimately what I'm interested in. To be a reasonable solution, the 7410 cannot be a race car with a 4 cylinder engine. Does this piece fit into the Web2.0 framework? Does it make sense to use it for consolidation of storage?

The Web2.0 kit utilizes a large centralized location to store large files, images, etc. To avoid replication, and giving us the features above, the 7410 is interesting. Utilizing a similar configuration as referenced in the Web2.0 blog (using zones for replication/consolidation of the LAMP stack), we tested using the 7410 as our storage device with a Sun Fire T5120 as the LAMP consolidation server. The goal is to see if one can scale user count while keeping response time within the guidelines of the benchmark. In this case less, the metric requires less than 1 to 3 seconds response time based on the type of transaction (such as adding an event or user, or viewing details).

What I found was scaling to a rather large user count (2400), response times for all types of transactions fell within those established guidelines. There seems to be the idea that NAS latency for IO will make it a poor choice. That could be true, but one shouldn't just assume that. The above demonstrates that to the end user, I/O latency is not adversely affecting this workload. Drilling down further, iostat showed that the average response time for both DAS and NAS were under 4 milliseconds, no wonder it wasn't an issue. But that's easy to claim if I'm not really doing much in the way of throughput.

Time for another analytics snapshot. As shown below, throughput of 65MB/s was observed while consuming very little CPU on the 7410. That's a pretty good load for a single Gb connection. Also of note, at that higher user count, only 11% more cpu was utilized on the 5120 as compared to DAS. With a more robust network, I would expect to be able to push even further with better overall CPU efficiency. However, on the bright side, the 7410 has a large amount of CPU and IO left to utilize.


With the feature set of ZFS, management, analytics, mass storage, efficiency and performance, the 7410 seems worth considering for consolidation based on the needs of your environment. Hopefully in the future we'll have the opportunity to push this even further, if so I'll be certain to post about it.

Monday Oct 08, 2007

A quick bio is in order I suppose. I joined Suns Strategic Applications Engineering Group back in 2000 as a Sys Admin. Since that time I have moved on to be a Benchmark Engineer with my focus being commercial workloads on Suns mid range and high end Sparc based servers. The end results are to produce (among other things) world record benchmarking results. Of course working at Sun allows me to do some of the other things I enjoy in life, such as trying to hit little white golf balls, attempting to navigate hills with slick white power (sometimes ice) on them, and trying my hand at cycling.

One of the areas of focus for me recently has been Postgres performance analysis. To that end, a colleague of mine (Glenn Fawcett) and I have developed a presentation for this years CEC that shows how to obtain performance statistics, and what frameworks that are pre-existing to help obtain this information. In addition, we attempt to relate it to gathering performance data within Oracle, and ask/answer some of the more common questions we see related to database performance. What is the throughput? What is the response time? What resources are being consumed?

If you have an interest in that kind of stuff, Feel free to download and check it out

This blog copyright 2009 by Marcus Heckel