Effects of Flash/SSD on PostgreSQL - PostgreSQL East 2009
I presented a talk at PostgreSQL East 2009 at Drexel University today morning. The topic was "Effects of Flash/SSDs on PostgreSQL". Here are the slides from the presentation
If you have questions please leave comments.


If I read this right, you ended up testing both HDD and SSD with write-caching enabled. Wouldn't it be more sensible to compare them both with write-caching disabled? Is it somehow safe to run a database system on a SSD with write caching? The best practice so far has been to disable write caching on HDDs, and I don't see anything that would have changed that with SSDs.
Posted by Peter Eisentraut on April 06, 2009 at 05:46 AM EDT #
Well, I and CELKO began newsgroup chatting about SSD quite a while ago (and he discussed SSD in "Thinking in Sets"). That led me to finally start a blog, specifically to discuss why SSD makes sense. Unfortunately, this paper doesn't get why SSD makes sense. It merely drops an existing 0NF (I suspect) on SSD. Not much sense to that. To make sense, as I have discussed, one has to strip out the redundant data (go to 5NF, which can be an order of magnitude or more of deflation) as part of the migration to SSD. Then you get not only speed, but logical consistency you didn't have before.
May be it's time for Sun or Postgres to pay me to do that full test I have been wanting to do. :)
Posted by Robert Young on April 06, 2009 at 12:10 PM EDT #
Peter,
One of the thing I heard back was that the write cache on SSDs are really needed for Write Wear Leveling. The early generation of SSDs does not seem to have battery backed up Write Cache RAM but the industry does seem to solve the problem in the next generation of SSDs. In the meanwhile please check with your manufacturer of the SSDs on whether the turning off write cache is supported or not. Otherwise the best option is to use mirroring for SSDs.
Hope this helps.
Posted by Jignesh Shah on April 09, 2009 at 12:09 AM EDT #
Hey!
Do you think the performance could be improved in a hybrid setup?
* Storing WALs on a regular hard disk (more capacity, mostly sequential accesses)
* Storing tablespaces on the SSD (less capacity, mostly random accesses)
Sounds like it would make sense.
Marti
Posted by Marti Raudsepp on April 10, 2009 at 09:40 AM EDT #
Hi Marti,
If your workload has a read bottleneck more than a write bottleneck then yes your suggestions of putting WAL on regular disk and tablespaces on SSDs will help. Thought you might want to make sure that your SSD is big enough to hold stuff that is causing read latency.
Posted by Jignesh Shah on April 10, 2009 at 09:59 AM EDT #
Jignesh,
How does one actually "disable" write caching on the drives through ZFS? I thought by default cache flush is enabled which tells the drives to flush cache at each write, this is the equivalent of "disabling write cache" no? And by disabling cache flushing we are in effect "enabling" write caching on target devices? Just confused with the terminology thrown around.
Also, I've tested the SSDs with cache flush on or off, with flushing off there was maybe a 5% differential with sync writes...I'm assuming like what you heard this is due to the write cache actually being used for write leveling with very little of it actually being used for write caching...I couldn't find what the cache size is, but some blogs mentioned the Intel X25-Es do have 64MB of volatile write cache. Is this true? There are also rumors that the cap on the X25-Es will have enough to flush whatever it has to disk, have you seen/heard anything like that?
In our environment we only use SSDs for ZIL offload, which is working great now that some of the fw/driver issues have been worked out with Opensolaris and the LSI 1068E rev B2 based Storagetek HBA.
Thanks for the information :)
Posted by Robert K on June 17, 2009 at 06:04 PM EDT #
Intel X25-E have 256KB (yes, kilobytes) of cache.
Posted by Bao604 on August 03, 2009 at 06:55 PM EDT #
Jignesh,
Thank you very much for publishing this. It was very interesting and I hope you are going to publish more stuff like that.
One thing I'd be interested to add in your test: PCI-E SSD (like Fusion-IO). The IO is supposed to be more than 20x higher than on a X25...
Cheers,
Mike
Posted by Mike on August 08, 2009 at 05:32 AM EDT #
Bao604, The Intel x-25E write cache is 64MBytes, not 256KB. The 256K part found on the board is SRAM processor cache. HDD write cache is always DRAM.
Posted by 24.118.154.187 on August 09, 2009 at 12:05 PM EDT #
Jignesh, Re:
>>"...write cache on SSDs are really needed for Write Wear Leveling. The early generation of SSDs does not seem to have battery backed up Write Cache RAM but the industry does seem to solve the problem in the next generation of SSDs."
Ok...so Flash SSD doesn't work without volatile DRAM write cache, but the problem here is that database systems rely on the HDD write acknowledgement to ensure that data has been written to non-volatile media.
Mirroring does not help the problem, power-loss would mean both SSD's devices would lose data. Wouldn't it be better to have the database server cache or buffer writes at the application level and turn off the DRAM at the SSD? That's what we do today, when we turn off HDD write cache. Why would we accept a volatile write cache on SSD when we don't permit them on HDD?
I just saw a result published where SSD IOPS dropped to around 1.5% of advertised performance when the volatile DRAM write-cache was disabled.
http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html
Posted by Rick E. on August 09, 2009 at 12:28 PM EDT #