The Good, the Blog & the Ugly - Tim Cook's Weblog

« Previous month (Aug 2008) | Main | Next month (Oct 2008) »

http://blogs.sun.com/timc/date/20080926 Friday September 26, 2008

Tamp - a Lightweight Multi-Threaded Compression Utility

UPDATE: Tamp has been ported to Linux, and is now at version 2.5

Packages for Solaris (x86 and SPARC), and a source tarball are available below.

Back Then

Many years ago (more than I care to remember), I saw an opportunity to improve the performance of a database backup. This was before the time of Oracle on-line backup, so the best choice at that time was to:

  1. shut down the database
  2. export to disk
  3. start up the database
  4. back up the export to tape

The obvious thing to improve here is the time between steps 1 and 3. We had a multi-CPU system running this database, so it occurred to me that perhaps compressing the export may speed things up.

I say "may" because it is important to remember that if the compression utility has lower throughput than the output of the database export (i.e. raw output; excluding any I/O operations to save that data) we may just end up with a different bottleneck, and not run any faster; perhaps even slower.

As it happens, this era also pre-dated gzip and other newer compression utilities. So, using the venerable old "compress", it actually was slower. It did save some disk space, because Oracle export files are eminently compressible.

So, I went off looking for a better compression utility. I was now more interested in something that was fast. It needed to not be the bottleneck in the whole process.

What I found did the trick - It reduced the export time by 20-30%, and saved some disk space as well. The reason why it saved time was that it was able to compress at least as fast as Oracle's "exp" utility was able to produce data to compress, and it eliminated some of the I/O - the real bottleneck.

More Recently

I came across a similar situation more recently - I was again doing "cold" database restores and wanted to speed them up. It was a little more challenging this time, as the restore was already parallel at the file level, and there were more files than CPUs involved (72). In the end, I could not speed up my 8-odd minute restore of ~180GB, unless I already had the source files in memory (via the filesystem cache). That would only work in some cases, and is unlikely to work in the "real world", where you would not normally want this much spare memory to be available to the filesystem.

Anyway, it took my restore down to about 3 minutes in cases where all my compressed backup files were in memory - this was because it had now eliminated all read I/O from the set of arrays holding my backup. This meant I had eliminated all competing I/O's from the set of arrays where I was re-writing the database files.

Multi-Threaded Lightweight Compression

I could not even remember the name of the utility I used years ago, but I knew already that I would need something better. The computers of 2008 have multiple cores, and often multiple hardware threads per core. All of the current included-in-the-distro compression utilities (well, almost all utilities) for Unix are still single-threaded - a very effective way to limit throughput on a multi-CPU system.

Now, there are a some multi-threaded compression utilities available, if not widely available:

  • PBZIP2 is a parallel implementation of BZIP2. You can find out more here
  • PIGZ is a parallel implementation of GZIP, although it turns out it is not possible to decompress a GZIP stream with more than one thread. PIGZ is available here.

Here is a chart showing some utilities I have tested on a 64-way Sun T5220. The place to be on this chart is toward the bottom right-hand corner.

Here is a table with some of the numbers from that chart:

Utility Reduction (%) Elapsed (s)
tamp 66.18 0.31
pigz --fast 71.18 1.04
pbzip2 --fast 77.17 4.17
gzip --fast 71.10 16.13
gzip 75.73 40.29
compress 61.61 18.21

To answer your question - yes, tamp really is 50-plus-times faster than "gzip --fast".

Tamp

The utility I have developed is called tamp. As the name suggests, it does not aim to provide the best compression (although it is better than compress, and sometimes beats "gzip --fast").

It is however a proper parallel implementation of an already fast compression algorithm.

If you wish to use it, feel free to download it. I will be blogging in the near future on a different performance test I conducted using tamp.

Compression Algorithm

Tamp makes use of the compression algorithm from Quick LZ version 1.40. I have tested a couple of other algorithms, and the code in tamp.c can be easily modified to use a different algorithm. You can get QuickLZ from here (you will need to download source yourself if you want to build tamp).

Resources

http://blogs.sun.com/timc/date/20080906 Saturday September 06, 2008

Installing Solaris from a USB Disk

I regularly do a full install of a Solaris Development release onto my laptop. Why full? Well, that is another story for another day, but it is not because the Solaris Upgrade software; including Live Upgrade; is lacking.

I decided I no longer see the sense of burning a DVD to do this; and I know that Solaris can boot from a USB device.

I used James C. Liu's blog as an inspiration, but the following is what I have found worked well to boot an install image located on a USB disk. You may also be interested in the Solaris Ready USB FAQ.

NOTE: This procedure only has a chance of working if you have a version of Solaris 10 or later that uses GRUB and has a USB driver that works with your drive.

  1. Set up an 8GB "Solaris2" partition on the USB drive using fdisk. Make it the active partition.
  2. Set up a UFS slice using all but the first cylinder of that 8GB as slice 0 using format. Run newfs. Mount.

    The first cylinder ends up being dedicated to a "boot" slice. I do not know what it is used for, perhaps avoidance of overwriting PC-style partition table & boot program.

  3. Mount the DVD ISO using lofiadm/mount (hint: google lofiadm solaris iso)
  4. Use cpio to copy the contents of the DVD ISO into the UFS partition on the USB drive, e.g:

    # cd <rootdir of DVD ISO>
    # find . | cpio -pdum <rootdir of USB filesystem>
    

  5. Run installgrub to install the stage1 & stage2 files from the DVD ISO onto the USB drive If the filesystem on your USB drive has mounted as /dev/dsk/c2t0d0s0 for example, then use:

    # cd <rootdir of DVD ISO>
    # /sbin/installgrub boot/grub/stage1 boot/grub/stage2 /dev/rdsk/c2t0d0s0
    

  6. Boot off the USB disk. It uses the same GRUB install that would be on a DVD.
  7. Now, I can not remember whether the next step was either:

    • Wait for the install to fail (unable to find distribution), or:

    • Exit/quit out of installation

    ...but you need to get to a shell.

  8. Manually mount the USB partition at /cdrom

    NOTE: your controller numbers are probably not as you expect at this point, so double-check what you are mounting.

  9. Re-start the install
    I used "suninstall". I think you can use "solaris-install" instead.

The install seemed to run fine from there, however it went through a sysconfig stage after the reboot.

Then I ended up with one teeny problem - my X server would not start.

I discovered some issues with fonts, and then decided to check the install log. I discovered a number of packages had reported status like:


Installation of <SUNWxwfnt> partially failed.
19997 blocks
pkgadd: ERROR: class action script did not complete successfully

Installation of <SUNWxwcft> partially failed.

Installation of <SUNW5xmft> partially failed.

Installation of <SUNW5ttf> partially failed.

Installation of <SUNWolrte> partially failed.

Installation of <SUNWhttf> partially failed.

I have since pkgrm/pkadd-ed these packages (using -R while running the laptop on an older release with the new boot environment mounted), and all is now well.

http://blogs.sun.com/timc/date/20080904 Thursday September 04, 2008

Building GCC 4.x on Solaris

I needed to build GCC 4.3.1 for my x86 system running a recent development build of Solaris. I thought I would share what I discovered, and then improved on.

I started with Paul Beach's Blog on the same topic, but I knew it had a couple of shortcomings, namely:

  • No mention of a couple of pre-requisites that are mentioned in the GCC document Prerequisites for GCC
  • A mysterious "cannot compute suffix of object files" error in the build phase
  • No resolution of how to generate binaries that have a useful RPATH (see Shared Library Search Paths for a discussion on the importance of RPATH).

I found some help on this via this forum post, but here is my own cheat sheet.

  1. Download & install GNU Multiple Precision Library (GMP) version 4.1 (or later) from sunfreeware.com. This will end up located in /usr/local.
  2. Download, build & install MPFR Library version 2.3.0 (or later) from mpfr.org. This will also end up in /usr/local.
  3. Download & unpack the GCC 4.x base source (the one of the form gcc-4.x.x.tar.gz) from gcc.gnu.org
  4. Download my example config_make script, edit as desired (you probably want to change OBJDIR and PREFIX, and you may want to add other configure options.
  5. Run the config_make script
  6. "gmake install" as root (although I instead create the directory matching PREFIX, make it writable by the account doing the build, then "gmake install" using that account).

You should now have GCC binaries that look for the shared libraries they need in /usr/sfw/lib, /usr/local/lib and PREFIX/lib, without anyone needing to set LD_LIBRARY_PATH. In particular, modern versions of Solaris will have a libgcc_s.so in /usr/sfw/lib.

If you copy your GMP and MPFR shared libraries (which seem to be needed by parts of the compiler) into PREFIX/lib, you will also have a self-contained directory tree that you can deploy to any similar system more simply (e.g. via rsync, tar, cpio, "scp -pr", ...)


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.