So here's an issue: What's the best way to split up the Solaris name space into datasets? Should it be split at all? I'm going to answer the second question right away with my strong opinion that YES, we should split up the name space into separate datasets. We don't need to do it any more for space reasons (back in the days of 400 MB disks, we couldn't fit all of what shipped with the Solaris operating system onto a single disk), but now that having multiple file systems is EASY (no need to preallocate a separate slice for each one), maybe we should do it for other reasons. Here are a few:
- It might make sense to have different qualities of service for different parts of the name space. Maybe /opt could be compressed, for example.
- When you clone a bootable environment (i.e., a root file system and its subordinate file systems), you might want to include some parts of the old boot environment by reference, not by cloning. For example, since /var/adm/log reflects the history of the overall system, not just a single boot environment, maybe you want to have just one copy which is shared among the various boot environments. If that directory were its own file system, it's easier to share it among different bootable environments.
- Eventually, we'd like to support booting from other kinds of pool configurations besides simple mirrors. But to do that, we need to have the files that are crucial for booting on each disk in the pool. That happens automatically with mirrors, but not for RAID-Z. So how to make sure that some files are available on each disk? Well, I had considered a file attribute of some kind. Some kind of "treat this file special" interface to zfs. But that would be a pain because every time we wrote out, say, the boot archive, we should have to give it the "special" treatment, whatever that is. ZFS's "quality of service" boundary is the dataset, not the file. So what if we created a new dataset property, which is the "make this dataset available to the booter" property? It's not clear how to implement that and I don't want to get into it (I'll leave it to the ZFS internals gurus to figure that one out), but if we had such a property, then we could assign it to the root file system and automatically get bootability. But however that property is implemented, it would probably involve replicating the entire dataset on each disk. And that's a good reason for keeping the root file system small. Which means splitting off /usr, /opt, and perhaps other parts of the name space into separate datasets.
- Splitting the name space into separate file systems might have some advantages for zones.
My gut feeling is that the most controversial part of the name space as far as where to make the divisions will be /var. Should /var as a whole be a separate dataset? Is there value in splitting off some of the subdirectories of /var as separate datasets?
I'll be working on a plan for these and other issues. Watch this space for more questions and updates. I welcome your comments on this, which can be posted either here or at the zfs-discuss@opensolaris.org alias.
All I suggest is that the default install is NOT sub-divided, but the install process prompts the end-user for their preferred layout, should they wish to complicate things further.
Problem solved. =)
By the way, will swap be available as a separate ZFS dataset [by default] or will there be a raw implementation on the same ZFS boot disk? If the raw option, we should still have the full ZFS disk cache enabled...right?
Posted by Wes Williams on April 23, 2007 at 01:05 PM MDT #
It'd be worth looking to windwards for the 'zfs split' functionality that's on the way.
I'm looking for an easy way to boot zfs clones, ideally with live upgrade support, that can transparently survive either disk in the mirror dying.
As ever, really appreciate your work on this.And a pony :)
Posted by Dick Davies on April 23, 2007 at 02:49 PM MDT #
Posted by Gary Mills on April 23, 2007 at 03:29 PM MDT #
Posted by Nico on April 23, 2007 at 04:05 PM MDT #
Posted by Nico on April 23, 2007 at 04:08 PM MDT #
Thus far, it's sounding like we want /, /tmp, /var, /var/tmp, /opt, and /usr. Any thoughts on /usr/local? /home? /export? /sbin? I know /sbin is part of / right now, but if we're talking about sharing filesystems among zones, then perhaps that could be split off?
Posted by Mark J Musante on April 24, 2007 at 07:02 AM MDT #
Posted by lalt on April 24, 2007 at 08:32 AM MDT #
Posted by Jim Laurent on April 24, 2007 at 09:22 AM MDT #
Posted by Ron Halstead on April 30, 2007 at 09:25 AM MDT #
One of the things that I've had to fight with Solaris for a long time was the concept of /export/home as the default home area.
I *always* end up disabling autofs for a few reasons, but primarily I don't like the cludge of /export/home. I like the neat and clean simplicity of /home without it having to overmount it on login.
Next, I think the ability to migrate an existing directory into a new zfs set would be in order.
Let's say you do your install, and elect not to create /var/crash as a separate area, and decide later to make it separate, or /var/spool or something else that is in use quite a bit.
Now, instead of having to shutdown to single user mode, rename the old directory, create the new zfs mountpoint, copy data, and then bring the box back up to multiuser mode, give the ability to migrate a subdirectory to a separate subfs online (possibly throught the use of snapshots or whatever mechanism is needed).
Posted by Larry B on July 28, 2008 at 03:24 PM MDT #
In regards to the DoD GEN003620: CAT III...
Sounds to me like someone at the DoD hasn't been awake for the last 10 years.
Solaris doesn't go down if root fills up, doesn't need to have separate filesystems/mount points for /home or /var.
They've done experiments where they kept an NFS server online for months while scripts kept the entire root filesystem (which included /var /opt /export/home) 100% full the entire time. They got bored with it and shutdown the experiment after proving there was no issue with Solaris running on a full root filesystem.
Granted, logging goes to hell with a full root, but that happens regardless of which filesystem the logs are written to.
Posted by Larry B on July 28, 2008 at 03:28 PM MDT #
Hello
I was wondering if there is any method to create different dataset within rpool during initial installation. e.g for opt. I see for var there is a option, but I cant figure it out for opt. All I want is root, var & opt on different dataset.
can you throw some inputs in same?
Thank you.
Birut Patel.
NYC
Posted by Birut Patel on March 05, 2009 at 11:32 AM MST #