Back when I got my first real break as a sysadmin, one of my first tasks was to upgrade the Uni's finance office server, a SparcServer 1000. Running Solaris 2.5 with a gaggle of external unipacks and multipacks for Oracle 7.$mumble, I organised an outage with the DBAs and the Finance stakeholders, practiced installing Solaris 2.6 on a new system (we'd just got an E450), and at the appointed time on the Saturday morning I rocked up and got to work on my precisely specified upgrade plan.
That all went swimmingly (though
looooooooowly) until the time came to reboot after the final SDS 4.1 mirror had been created. The primary system board decided that it really didn't like me, and promptly died along with the boot prom.
PANIC!!
At that point I didn't know all that much about the innards of the SS1000 otherwise I probably would have just engaged in some swaptronics with the other three boards. However, I was green, nervous, and - by that point - very tired of sitting in a cold, loud machine room for 12 hours. Turned the box off, rang the local Sun support office and left a message (we didn't have weekend coverage on any of our systems then), rang my boss and the primary stakeholder in the Finance unit and went home.
Come Monday morning, all hell broke loose - the Accounts groups were unable to do any work, and the DBAs had to do a very quick enable of the DR system so I could get time to work on the problem with Sun. The "quick enable" took around 4 hours, if I'm remembering it correctly. Fortunately for me, not only were the DBAs quite sympathetic and very quick to help, but Miriam on the support phone number (who later hired me) was able to diagnose the problem and organise a service call to replace the faulty board. She also calmed me down, which I really, really appreciated. (Thankyou Miriam!)
So ... why am I dredging this up? Because I've just done a LiveUpgrade (LU) from Solaris Nevada build 91 to build 93, with ZFS root, and it took me a shade under 90 minutes. Total. Including the post-installation reboot. Not only would I have gone all gooey at the idea of being able to do something like LU back in that job, but if I could have done it with ZFS and not had to reconfigure all the uni- and multi-pack devices I probably could have had the whole upgrade done in around 4 or 5 hours rather than 12. (Remember, of course, that while the SS1000 could take quite a few cpus, they were still very very very very sloooooooooow).
Here's a trancript of this evening's upgrade:
# uname -a
SunOS gedanken 5.11 snv_91 i86pc i386 i86xpv
(remove the snv_91 LU packages)
pkgrm SUNWlu... packages from snv_91
(add the snv_93 LU packages)
pkgadd SUNWlu... packages from snv_93
(Create my LU config)
# lucreate -n snv_93 -p rpool
Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
INFORMATION: The current boot environment is not named - assigning name .
Current boot environment is named .
Creating initial configuration for primary boot environment .
The device is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name PBE Boot Device .
Comparing source boot environment file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Cloning file systems from boot environment to create boot environment .
Creating snapshot for on .
Creating clone for on .
Setting canmount=noauto for > in zone on .
Saving existing file in top level dataset for BE as //boot/grub/menu.lst.prev.
File propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE in GRUB menu
Population of boot environment successful.
Creation of boot environment successful.
-bash-3.2# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 50.0G 151G 35K /rpool
rpool/ROOT 7.06G 151G 18K legacy
rpool/ROOT/snv_91 7.06G 151G 7.06G /
rpool/ROOT/snv_91@snv_93 71.5K - 7.06G -
rpool/ROOT/snv_93 128K 151G 7.06G /tmp/.alt.luupdall.2695
rpool/WinXP-Host0-Vol0 3.57G 151G 3.57G -
rpool/WinXP-Host0-Vol0@install 4.74M - 3.57G -
rpool/dump 4.00G 151G 4.00G -
rpool/export 7.47G 151G 19K /export
rpool/export/home 7.47G 151G 7.47G /export/home
rpool/gate 5.86G 151G 5.86G /opt/gate
rpool/hometools 2.10G 151G 2.10G /opt/hometools
rpool/optcsw 225M 151G 225M /opt/csw
rpool/optlocal 1.20G 151G 1.20G /opt/local
rpool/scratch 14.4G 151G 14.4G /scratch
rpool/swap 4G 155G 64.6M -
# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
snv_91 yes yes yes no -
snv_93 yes no no yes -
Golly, that was so easy! Here I was rtfming for the LU with UFS syntax.... not needed at all.
# time luupgrade -u -s /media/SOL_11_X86 -n snv_93
No entry for BE in GRUB menu
Copying failsafe kernel from media.
Uncompressing miniroot
Uncompressing miniroot archive (Part2)
13367 blocks
Creating miniroot device
miniroot filesystem is
Mounting miniroot at
Mounting miniroot Part 2 at
Validating the contents of the media .
The media is a standard Solaris media.
The media contains an operating system upgrade image.
The media contains version <11>.
Constructing upgrade profile to use.
Locating the operating system upgrade program.
Checking for existence of previously scheduled Live Upgrade requests.
Creating upgrade profile for BE .
Checking for GRUB menu on ABE .
Saving GRUB menu on ABE .
Checking for x86 boot partition on ABE.
Determining packages to install or upgrade for BE .
Performing the operating system upgrade of the BE .
CAUTION: Interrupting this process may leave the boot environment unstable
or unbootable.
Upgrading Solaris: 100% completed
Installation of the packages from this media is complete.
Restoring GRUB menu on ABE .
Adding operating system patches to the BE .
The operating system patch installation is complete.
ABE boot partition backing deleted.
PBE GRUB has no capability information.
PBE GRUB has no versioning information.
ABE GRUB is newer than PBE GRUB. Updating GRUB.
GRUB update was successful.
Configuring failsafe for system.
Failsafe configuration is complete.
INFORMATION: The file on boot
environment contains a log of the upgrade operation.
INFORMATION: The file on boot
environment contains a log of cleanup operations required.
WARNING: <3> packages failed to install properly on boot environment .
INFORMATION: The file on
boot environment contains a list of packages that failed to
upgrade or install properly.
INFORMATION: Review the files listed above. Remember that all of the files
are located on boot environment . Before you activate boot
environment , determine if any additional system maintenance is
required or if additional media of the software distribution must be
installed.
The Solaris upgrade of the boot environment is partially complete.
Installing failsafe
Failsafe install is complete.
real 83m24.299s
user 13m33.199s
sys 24m8.313s
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 52.5G 148G 36.5K /rpool
rpool/ROOT 9.56G 148G 18K legacy
rpool/ROOT/snv_91 7.07G 148G 7.06G /
rpool/ROOT/snv_91@snv_93 18.9M - 7.06G -
rpool/ROOT/snv_93 2.49G 148G 5.53G /tmp/.luupgrade.inf.2862
rpool/WinXP-Host0-Vol0 3.57G 148G 3.57G -
rpool/WinXP-Host0-Vol0@install 4.74M - 3.57G -
rpool/dump 4.00G 148G 4.00G -
rpool/export 7.47G 148G 19K /export
rpool/export/home 7.47G 148G 7.47G /export/home
rpool/gate 5.86G 148G 5.86G /opt/gate
rpool/hometools 2.10G 148G 2.10G /opt/hometools
rpool/optcsw 225M 148G 225M /opt/csw
rpool/optlocal 1.20G 148G 1.20G /opt/local
rpool/scratch 14.4G 148G 14.4G /scratch
rpool/swap 4G 152G 64.9M -
-bash-3.2# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
snv_91 yes yes yes no -
snv_93 yes no no yes -
# luactivate snv_93
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE
Saving existing file in top level dataset for BE as //etc/bootsign.prev.
WARNING: <3> packages failed to install properly on boot environment .
INFORMATION: on boot
environment contains a list of packages that failed to upgrade or
install properly. Review the file before you reboot the system to
determine if any additional system maintenance is required.
Generating boot-sign for ABE
Saving existing file in top level dataset for BE as //etc/bootsign.prev.
Generating partition and slice information for ABE
Copied boot menu from top level dataset.
Generating direct boot menu entries for PBE.
Generating xVM menu entries for PBE.
Generating direct boot menu entries for ABE.
Generating xVM menu entries for ABE.
Disabling splashimage
Re-enabling splashimage
No more bootadm entries. Deletion of bootadm entries is complete.
Changing GRUB menu default setting to <0>
Done eliding bootadm entries.
**********************************************************************
The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.
**********************************************************************
In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:
1. Boot from Solaris failsafe or boot in single user mode from the Solaris
Install CD or Network.
2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following command to mount:
mount -Fzfs /dev/dsk/c1t0d0s0 /mnt
3. Run utility with out any arguments from the Parent boot
environment root slice, as shown below:
/mnt/sbin/luactivate
4. luactivate, activates the previous working boot environment and
indicates the result.
5. Exit Single User mode and reboot the machine.
**********************************************************************
Modifying boot archive service
Propagating findroot GRUB for menu conversion.
File propagation successful
File propagation successful
File propagation successful
File propagation successful
Deleting stale GRUB loader from all BEs.
File deletion successful
File deletion successful
File deletion successful
Activation of boot environment successful.
# date
Friday, 4 July 2008 9:45:41 PM EST
# init 6
propagating updated GRUB menu
Saving existing file in top level dataset for BE as //boot/grub/menu.lst.prev.
File propagation successful
File propagation successful
File propagation successful
File propagation successful
Here I reboot and then login.
# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
snv_91 yes no no yes -
snv_93 yes yes yes no -
# lufslist -n snv_91
boot environment name: snv_91
Filesystem fstype device size Mounted on Mount Options
----------------------- -------- ------------ ------------------- --------------
/dev/zvol/dsk/rpool/swap swap 4294967296 - -
rpool/ROOT/snv_91 zfs 20630528 / -
# lufslist -n snv_93
boot environment name: snv_93
This boot environment is currently active.
This boot environment will be active on next system boot.
Filesystem fstype device size Mounted on Mount Options
----------------------- -------- ------------ ------------------- --------------
/dev/zvol/dsk/rpool/swap swap 4294967296 - -
rpool/ROOT/snv_93 zfs 10342821376 / -
Cor! That was so easy I think I need to fall off my chair.
Thinking about this for a moment, I needed just
6 commands and around 90 minutes to upgrade my laptop. If only I'd had this technology available to me back then.
Finally, let me send a massive, massive thankyou to the install team and the ZFS team for all their hard work to get these technologies integrated and working pretty darned smoothly together.