Friday September 26, 2008
Tamp - a Lightweight Multi-Threaded Compression Utility
Packages for Solaris (x86 and SPARC), and a source tarball are available below.
Many years ago (more than I care to remember), I saw an opportunity to improve the performance of a database backup. This was before the time of Oracle on-line backup, so the best choice at that time was to:
The obvious thing to improve here is the time between steps 1 and 3. We had a multi-CPU system running this database, so it occurred to me that perhaps compressing the export may speed things up.
I say "may" because it is important to remember that if the compression utility has lower throughput than the output of the database export (i.e. raw output; excluding any I/O operations to save that data) we may just end up with a different bottleneck, and not run any faster; perhaps even slower.
As it happens, this era also pre-dated gzip and other newer compression utilities. So, using the venerable old "compress", it actually was slower. It did save some disk space, because Oracle export files are eminently compressible.
So, I went off looking for a better compression utility. I was now more interested in something that was fast. It needed to not be the bottleneck in the whole process.
What I found did the trick - It reduced the export time by 20-30%, and saved some disk space as well. The reason why it saved time was that it was able to compress at least as fast as Oracle's "exp" utility was able to produce data to compress, and it eliminated some of the I/O - the real bottleneck.
I came across a similar situation more recently - I was again doing "cold" database restores and wanted to speed them up. It was a little more challenging this time, as the restore was already parallel at the file level, and there were more files than CPUs involved (72). In the end, I could not speed up my 8-odd minute restore of ~180GB, unless I already had the source files in memory (via the filesystem cache). That would only work in some cases, and is unlikely to work in the "real world", where you would not normally want this much spare memory to be available to the filesystem.
Anyway, it took my restore down to about 3 minutes in cases where all my compressed backup files were in memory - this was because it had now eliminated all read I/O from the set of arrays holding my backup. This meant I had eliminated all competing I/O's from the set of arrays where I was re-writing the database files.
I could not even remember the name of the utility I used years ago, but I knew already that I would need something better. The computers of 2008 have multiple cores, and often multiple hardware threads per core. All of the current included-in-the-distro compression utilities (well, almost all utilities) for Unix are still single-threaded - a very effective way to limit throughput on a multi-CPU system.
Now, there are a some multi-threaded compression utilities available, if not widely available:
Here is a chart showing some utilities I have tested on a 64-way Sun T5220. The place to be on this chart is toward the bottom right-hand corner.
Here is a table with some of the numbers from that chart:
| Utility | Reduction (%) | Elapsed (s) |
|---|---|---|
| tamp | 66.18 | 0.31 |
| pigz --fast | 71.18 | 1.04 |
| pbzip2 --fast | 77.17 | 4.17 |
| gzip --fast | 71.10 | 16.13 |
| gzip | 75.73 | 40.29 |
| compress | 61.61 | 18.21 |
To answer your question - yes, tamp really is 50-plus-times faster than "gzip --fast".
The utility I have developed is called tamp. As the name suggests, it does not aim to provide the best compression (although it is better than compress, and sometimes beats "gzip --fast").
It is however a proper parallel implementation of an already fast compression algorithm.
If you wish to use it, feel free to download it. I will be blogging in the near future on a different performance test I conducted using tamp.
Tamp makes use of the compression algorithm from Quick LZ version 1.40. I have tested a couple of other algorithms, and the code in tamp.c can be easily modified to use a different algorithm. You can get QuickLZ from here (you will need to download source yourself if you want to build tamp).
Posted at 12:02PM Sep 26, 2008 by timc in Performance | Comments[3]
ask 1:
why you do not use the LZO 2.03
(which is released Apr 30 2008) ?
you wrote (README):
- is not widely available
ask 2:
if using LZO
- will the resulting file compatible
to "lzop" (http://www.lzop.org) ?
- is it possible to unpack the resulting file
with lzop (http://www.lzop.org) ?
In this case there will be
a very widely available unpacker.
ask 3:
would it be possible for you
to modify the program in the way
to create a zip - compatible resulting file ?
Posted by Eisbaer on September 27, 2008 at 03:43 PM PDT #
1. I did test with LZO 2.03, so I know that it can be used with the LZO macro in tamp.c. I have chosen to use QuickLZ for better performance.
2. Files compressed with tamp are not compatible with any other decompression utility.
3. I do not intend to develop something that is compatible with the ZIP format.
Posted by Tim on September 28, 2008 at 08:31 AM PDT #
Hi Eisbaer,
There's a fast .zip compatible compressor on http://www.quicklz.com/zip.html
It's not as fast as LZO or QuickLZ but still alot faster than gzip -1.
I'm not working on the project anymore and it hasn't been tested very well. Also doesn't support files larger than 2 GB.
Posted by Lasse Reinhold on September 30, 2008 at 02:57 AM PDT #