Read the original article at blogs.sun.com

PERFORMANCE SCALABILITY SECURITY HIGH AVAILABILITY.. HowTo and Tutorial ZFS Java Solaris VirtualBox MySQL for ISV and Partners

I had an interesting discussion with 2 colleagues about the possible interest of putting the swap space of a system on a SSD.

If I consider the gain in latency that an SSD brings versus a capacity disk - in the region of 100x - the solution seems obvious. Swapping - or more precisely paging - must be much faster/ with an SSD. Since RAM is expensive versus SSD, I could even be tempted to design a system with a small amount of RAM and a large amount of swap space on SSDs. In other words, I can ask myself if trying to prevent my system to page is still a good fight?

Let's try to shed some light on these questions.

Paging takes place when my system runs out of RAM because more processes are created or because existing processes requires more memory (check this article for details about how to monitor paging). At some point, the operating system keeps on looking for pages in RAM that can be transfered into the swap space, while at the same time it brings back in RAM pages that were paged out and that are required by running applications. This situation is commonly referred to as a paging system. At this point, the performance of my system brutally goes down: copying memory pages back and forth between RAM and disk slows down my whole system mainly because of disk performance. Moving the swap space from a disk to an SSD does not reduce this activity. It only makes it faster. Bare in mind that the CPU doesn't have direct access to the swap space, to the SSD. For the CPU to access data or instructions that have been paged out, these data or instructions still need to be copied back into RAM, which brings us to another side effect of paging: it creates traffic on the IO bus.

In addition, ahead of the critical paging situation, when the demand for RAM starts to grow, other things happen on my system. I am using ZFS for my storage and ZFS has its primary cache - the ARC - in RAM. When RAM gets under pressure, this cache sees its size reduced. The data removed from the ARC goes in the ZFS level 2 cache - the L2ARC. The L2ARC can be located either on disks or SSDs, but as soon as it is involved there is some additional traffic on the IO bus that now competes with the traffic created by the paging activity. Eventually, when the L2ARC gets full, the data is not cached anymore. A long story short, if I am running an application that creates a lot of IOs the shortage of RAM impacts its performance.

Finally, we compared the performance of an SSD versus a disk, but in terms of latency an SSD is still 1000x slower than RAM, so the impact that paging brings (i.e. moving from RAM to SSD) is still noticeable. At the end, even though SSDs can improve paging performance, preventing my system to page is still a must if I want to get the best out of it. If I have some extra money to spend on performance after increasing the RAM of my system and if my application is IO intensive I would rather buy an SSD for the L2ARC cache rather than for swapping. This will certainly have a positive impact on the IO perf of my application.

Comments:

I am surprised you would waste your time with such a dumb idea! It should be darn obvious without the need to test it.

Posted by bill gates on November 23, 2009 at 12:03 PM CET #

The fastest I/O is an I/O that doesn't happen. The latency of going to swap even with an SSD is much larger than memory latency. Just buy more RAM to avoid the need to do I/O to the swap device and be done with it.

Posted by 192.18.8.1 on November 23, 2009 at 01:53 PM CET #

I've had similar thoughts, but not going so far as saying that I would want to design the system such that it will use SSD paging space. In my use case, I have lots (20 - 50) of distinct workloads on a server, each encapsulated in a zone. Allocation of reservable memory (zone.max-swap) is desired, because:

- Some zones will have short spikes in reservable memory requirements.
- Some zones are running J2EE servers where the vendor of the app inside the JVM says that the JVM must be given x GB of memory, even though it is observed to never use more than x/4 (RSS vs. VSZ).

It is now starting to be common practice to increase the swap size on systems way beyond the 8 GB I was typically allocating. That is, a 146 GB root disk turns into 2 16 GB root partitions (root, lu root) and the rest swap. If the system ever starts paging to this swap space, there is very little chance that a sysadmin will be able to compete with the workloads to get in and whack whoever is using more memory than was expected. Using rcapd to force particular workloads to page only offers a suggestion as to who is most likely to page, but with the same systemic performance problem.

I hypothesize that using SSD as swap space would make it so that things would get slow when heavy paging starts, but the system would still be responsive enough that applications would still function and a sysadmin could get on the box to alleviate the pain. I would applications would have poor response time but not so bad that connections would time out.

Allocating 100+ GB of SSD to this purpose seems silly, as attention would need to be paid to the problem long before 100 GB is used. I suspect that using a zvol as swap with the combination of SSD for ZIL and L2ARC would give the right balance of performance and cost.

Posted by Mike Gerdts on November 23, 2009 at 02:28 PM CET #

As we used to say on early virtual memory systems in the 1970s, "there's no paging like no paging." As long as there is a hierarchy of access rates and latencies, especially separated by two or more orders of magnitude, there will always be high value in optimizing systems to keep frequently-accessed or time-critical data and code in RAM.

Similar studies were done and published repeatedly in the early 1980s, when lucky mainframers had "solid state drums" - essentially big boxes of DRAM that emulated a zero-seek, low-latency, low-capacity, high-speed I/O device that otherwise resembled a disk drive. Beretvas and Tetzlaff at IBM did a lot of good work in this area that is still valid in a modern computing context. The gist of it all was that SSDs are fantastic additions to a system, but bigger RAM always wins.

Posted by Ross Patterson on November 23, 2009 at 04:28 PM CET #

On a different topic, current SSDs are designed for persistent data, not transient data. Such data has a low write-to-read ratio - once written, a file may be read many times before being rewritten. Paging and swapping workloads tend to be almost evenly balanced, with every page/segment/whatever being written out once and read back in once. The parts used in persistent SSDs tend to have high write-cycle times and high-but-limited write-count lifetimes (i.e., flash). You're better off with a big box of RAM that emulated a disk than a flash drive if all you're using it for is paging. Format it during system startup and then mount it, and you won't mind that it isn't persistent.

To go back to the mainframe experience, we used to deliberately pre-load SSD paging devices with shared, read-only, high-use pages, thus getting the most bang for our bucks. In a Unix environment, one obvious choice would be libc.so.

Posted by Ross Patterson on November 23, 2009 at 04:38 PM CET #

This idea should be picked up by embedded systems builders (mainly smartphones), if it really works as advertised. Since they haven't yet, my suspicion is that it works better on paper than it does in reality.

My $.02

Posted by Gerry on November 23, 2009 at 06:46 PM CET #

I was wondering the exact same thing that this blog discussed!

We have some embedded systems, with limited memory, and this would seem like an alternate way to extend the life of those systems.

I would like to see some benchmarks, after seeing what the read and write speeds are of the flash... leave it to the consumer to determine whether it is adequate.

Posted by David on November 24, 2009 at 05:50 PM CET #

Post a Comment:
Comments are closed for this entry.