Whilst I'm waiting for my home directory to reappear, I thought I'd mention some of the work
I've done to support easy booting of domains in Xen.
For dom0 to be able to boot a para-virtualised domU, it needs to be able to bootstrap it. In
particular, it needs to be able to read the kernel file and its associated ramdisk so it can
hand off control to the kernel's entry point when the domain is created. And we must somehow
make these files accessible in the dom0. Previously, you had to somehow copy out the files
from the domU filesystem into dom0. This was often difficult (consider getting files off an
ext2 filesystem in a Solaris dom0), and was obviously prone to errors such as forgetting to
update the copies when upgrading the kernel.
For a while now Xen has had
support for a bootloader. This runs in userspace and is responsible
for copying out the files (that specified by kernel and ramdisk in the domain's
config file) to a temporary directory in dom0; the files are then passed on to the
domain
builder. Xen has shipped with a bootloader called
pygrub. Whilst somewhat confusingly
named, it essentially emulated the grub menu. It had backends for a couple of Linux filesystems
written in Python and worked by searching for a grub.conf file, then presenting a
lookalike grub menu for the user to interact with. When an entry was selected, the specified
files would be read off the filesystem and passed back to the builder.
This worked reasonably well for Linux, but we felt there was a number of problems. First, the
interactive menu only worked for first boot; subsequent reboots would automatically choose
an entry without allowing user interaction (though this is now fixed in xen-unstable). Its
interactive nature seemed quite a stumbling block for things like remote domain management;
you really don't want to babysit domain creation. Also, the implementation of the filesystem
backends wasn't ideal; there was only limited Linux filesystem support, and it didn't work
very well.
We've adapted pygrub to help with some of these issues. First, we replaced the filesystem
code with a C library called libfsimage. The
intention here is to provide a
stable API for accessing filesystem images from userspace. Thus
it provides a simple interface for reading files from a filesystem image and a plugin architecture
to provide the filesystem support. This
plugin API is also stable, allowing filesystems past, present and
future to be transparently supported. Currently there are plugins for ext2, reiserfs, ufs and iso9660,
and we expect to have a zfs plugin soon. We borrowed the grub code for all of these plugins
to simplify the implementation, but the API allows for any implementation.
Some people were suggesting solutions involving loopback mounts. This was problematic for us
for two main reasons. First, filesystem support in the different dom0 OS's is far from complete;
for example, Solaris has no ext2 support, and Linux has no (real) ZFS support. Second, and more
seriously, it exposes a significant gap in terms of isolation: the dom0 kernel FS code
must be entirely resilient against a corrupt domU filesystem image. If we are to consider domU's
as untrusted, it doesn't make sense to leave this open as an attack vector.
Another simple change we made was to allow operation without a grub.conf at all. You can
specify a kernel and ramdisk and make pygrub automatically load them from the domU filesystem. Even
easier, you can leave out all configuration altogether, and a Solaris domU will automatically boot
the correct kernel and ramdisk. This makes setting up your config for a domU much easier.
pygrub understands both fdisk partitions and Solaris slices, so simply specifying the disk
will cause the bootloader to look for the root slice and grab the right files to boot.
There's more work
we can do yet, of course.
Tags: Xen OpenSolaris
Trackback URL: http://blogs.sun.com/levon/entry/booting_para_virtualised_os_instances