Solaris Volume Manager root mirror problems on S10
There are a couple of bugs that we found in S10
that make it look like Solaris Volume Manager root
mirroring does not work at all. Unfortunately
we found these bugs after the release went out.
These bugs will be patched but I wanted to describe
the problems a bit and describe some workarounds
On a lot of systems that use SVM to do root mirroring there are only two disks. When you set up the configuration you put one or more metadbs on each disk to hold the SVM configuration information.
SVM implements a metadb quorum rule which means that during the system boot, if more than 50% of the metadbs are not available, the system should boot into single-user mode so that you can fix things up. You can read more about this here.
On a two disk system there is no way to set things up so that more than 50% of the metadbs will be available if one of the disks dies.
When SVM does not have metadb quorum during the boot it is supposed to leave all of the metadevices read-only and boot into single-user. This gives you a chance to confirm that you are using the right SVM configuration and that you don't corrupt any of your data before having a chance to cleanup the dead metadbs.
What a lot of people do when when they set up a root mirror is pull one of the disks to check if the system will still boot and run ok. If you do this experiment on a two disk configuration running S10 the system will panic really early in the boot process and it will go into a infinite panic/reboot cycle.
What is happening here is that we found a bug related to UFS logging, which is on by default in S10. Since the root mirror stays read-only because there is no metadb quorum we hit a bug in the UFS log rolling code. This in turn leaves UFS in a bad state which causes the system to panic.
We're testing the fix for this bug right now but in the meantime, it is easy to workaround this bug by just disabling logging on the root filesystem. You can do that be specifying the "nologging" option in the last field of the vfstab entry for root. You should reboot once before doing any SVM experiments (like pulling a disk) to ensure that UFS has rolled the log and is no longer using logging on root.
Once a patch for this bug is out you will definitely want to remove this workaround from the the vfstab entry since UFS logging offers so many performance and availability benefits.
By they way, UFS logging is also on by default in the S9 9/04 release but that code does not suffer from this bug.
The second problem we found is not as serious as the UFS bug. This has to do with an interaction with the Service Management Facility (SMF) which is new in S10 and again, this is related to not have metadb quorum during the boot. What should happen is that the system should enter single-user so you can clean up the dead metadbs. Instead it boots all the way to multi-user but since the root device is still read-only things don't work very well. This turned out to be a missing dependency which we didn't catch when we integrated SVM and SMF. We'll have a patch for this too but this problem is much less serious. You can still login as root and clean up the dead metadbs so that you can then reboot with a good metadb quorum.
Both of the problems result because there is no metadb quorum so the root metadevice remains read-only after a boot with a dead disk. If you have a third disk which you can use to add a metadb onto, then you can reduce the likelihood of hitting this problem since losing one disk won't cause you to lose quorum during boot.
Given these kinds of problems you might wonder why does SVM bother to implement the metadb quorum? Why not just trust the metadbs that are alive? SVM is conservative and always chooses the path to ensure that you won't lose data or use stale data. There are various corner cases to worry about when SVM cannot be sure it is using the most current data. For example, in a two disk mirror configuration, you might run for a while on the first disk with the second disk powered down. Later you might reboot off the second disk (because the disk was now powered up) and the first disk might now be powered down. At this point you would be using the stale data on the mirror, possibly without even realizing it. The metadb quorum rule gives you a chance to intervene and fix up the configuration when SVM cannot do it automatically.
On a lot of systems that use SVM to do root mirroring there are only two disks. When you set up the configuration you put one or more metadbs on each disk to hold the SVM configuration information.
SVM implements a metadb quorum rule which means that during the system boot, if more than 50% of the metadbs are not available, the system should boot into single-user mode so that you can fix things up. You can read more about this here.
On a two disk system there is no way to set things up so that more than 50% of the metadbs will be available if one of the disks dies.
When SVM does not have metadb quorum during the boot it is supposed to leave all of the metadevices read-only and boot into single-user. This gives you a chance to confirm that you are using the right SVM configuration and that you don't corrupt any of your data before having a chance to cleanup the dead metadbs.
What a lot of people do when when they set up a root mirror is pull one of the disks to check if the system will still boot and run ok. If you do this experiment on a two disk configuration running S10 the system will panic really early in the boot process and it will go into a infinite panic/reboot cycle.
What is happening here is that we found a bug related to UFS logging, which is on by default in S10. Since the root mirror stays read-only because there is no metadb quorum we hit a bug in the UFS log rolling code. This in turn leaves UFS in a bad state which causes the system to panic.
We're testing the fix for this bug right now but in the meantime, it is easy to workaround this bug by just disabling logging on the root filesystem. You can do that be specifying the "nologging" option in the last field of the vfstab entry for root. You should reboot once before doing any SVM experiments (like pulling a disk) to ensure that UFS has rolled the log and is no longer using logging on root.
Once a patch for this bug is out you will definitely want to remove this workaround from the the vfstab entry since UFS logging offers so many performance and availability benefits.
By they way, UFS logging is also on by default in the S9 9/04 release but that code does not suffer from this bug.
The second problem we found is not as serious as the UFS bug. This has to do with an interaction with the Service Management Facility (SMF) which is new in S10 and again, this is related to not have metadb quorum during the boot. What should happen is that the system should enter single-user so you can clean up the dead metadbs. Instead it boots all the way to multi-user but since the root device is still read-only things don't work very well. This turned out to be a missing dependency which we didn't catch when we integrated SVM and SMF. We'll have a patch for this too but this problem is much less serious. You can still login as root and clean up the dead metadbs so that you can then reboot with a good metadb quorum.
Both of the problems result because there is no metadb quorum so the root metadevice remains read-only after a boot with a dead disk. If you have a third disk which you can use to add a metadb onto, then you can reduce the likelihood of hitting this problem since losing one disk won't cause you to lose quorum during boot.
Given these kinds of problems you might wonder why does SVM bother to implement the metadb quorum? Why not just trust the metadbs that are alive? SVM is conservative and always chooses the path to ensure that you won't lose data or use stale data. There are various corner cases to worry about when SVM cannot be sure it is using the most current data. For example, in a two disk mirror configuration, you might run for a while on the first disk with the second disk powered down. Later you might reboot off the second disk (because the disk was now powered up) and the first disk might now be powered down. At this point you would be using the stale data on the mirror, possibly without even realizing it. The metadb quorum rule gives you a chance to intervene and fix up the configuration when SVM cannot do it automatically.
This quorum problem is precisely the reason that I have suggested (no formal RFE, yet) that Sun hardware come with a small solid state disk (think USB thumb drive) to hold a copy of the metadb's.
I haven't filed this RFE because it seems to make more sense to me to simply implement hardware mirroring in all of the hardware. Not like the V440 did it with only 2 of the 4 disks, however.
Posted by MIke Gerdts on April 01, 2005 at 09:47 AM PST #
Posted by Ian McGinley on April 05, 2005 at 08:57 PM PDT #
Posted by Jerry Uanino on April 19, 2005 at 06:31 AM PDT #
Posted by Peter Tribble on April 19, 2005 at 06:44 AM PDT #
Posted by Chris Albertson on August 04, 2005 at 08:17 PM PDT #
Posted by gerald.jelinek on August 05, 2005 at 04:19 PM PDT #
Posted by Jim Willey on September 09, 2005 at 08:09 AM PDT #
Posted by Peter Bauer on October 26, 2005 at 11:23 PM PDT #
Posted by Jerry Jelinek on November 03, 2005 at 01:48 PM PST #
Posted by Michael Pye on December 07, 2005 at 06:38 AM PST #
Posted by Jerry Jelinek on December 07, 2005 at 08:39 AM PST #
Posted by Michael Pye on December 07, 2005 at 09:11 AM PST #
Posted by John McQueen on February 09, 2006 at 01:49 PM PST #
Posted by Michael Pye on February 16, 2006 at 09:24 AM PST #
Posted by Michael Pye on April 07, 2006 at 06:53 AM PDT #
unfortunately I have set up two X2100 production server with a S10 two disk RAID-1 configuration which includes mirroring the root filesystem. I already had set up "nologging" for /, but I still does not know how to carefully handle the case of a disk failure.
Wouldn't it makes sense to just unmirror at least root filesystem before a disk failure occurs? What about regularly rsyncing / on both disks with each other to keep the root filesystem in sync if unmirroring is recommendable? If one disk fails you should be able to boot at least from the other one and after disk replacement user data still should be resynced automatically.
Let me ask a last question regarding the metadb quorum stuff. What's the problem with storing just additional database state replicas on an USB stick. If a disk fails in a two disk RAID-1 configuration you still should have more that 50% of the metadbs available. By the way: Is it feasable to add metadbs to a running setup?
I am using S10 03/05 on one machine and S10 01/06 on the other one. Because I have gotten the recommendation not to put three or more metadbs on each disk on an x86 system, I have configured RAID-1 with only two metdbs each.
Posted by Werner Dworaczek on July 14, 2006 at 10:35 PM PDT #
Posted by fdasfdsa on October 12, 2006 at 07:03 PM PDT #
Unfortunately it would seem that this issue is *still* not fixed as of x86 Sol10u3 (11/06) with recent patches.
Host: x4100 with two internal SAS disks. One of the disks has gone screwy, and the system wedged. When I reset and try to boot (even with -s tacked onto multiboot for single-user) I get:
"WARNING: Error writing ufs log state
WARNING: ufs log for / changed state to Error
WARNING: Please umount(1M) / and run fsck(1M) "
I'm trying to netboot off a jumpstart server to get a prompt so I can mount without UFS logging and deal with the metadb replicas but not having much luck so far.
I've submitted a Sunsolve case regarding this bug but haven't yet gotten anyone to acknowledge it. The closest is a claim that I need to have an altbootpath defined in bootenv.rc but it doesn't seem to be as though that's relevant here.
It does seem, though, that we should disable the default UFS logging on /, which I really don't want to have to do (and shouldn't have to, especially for a bug that's gone unfixed for 2.5 years).
Posted by Anthony on October 30, 2007 at 12:06 PM PDT #