Wednesday December 24, 2008
Migration is the action of moving a domain from a physical platform to another physical platform. Cold migration is moving a domain which is not active (the domain is either stopped or unbound). Warm migration is moving a domain which is active. For warm migration, the domain to migrate is suspended on the source platform, then the entire state of the domain is transfered over the network to the target platform. Finally the domain is resumed on the target platform.
Virtual I/O Dynamic Reconfiguration allows virtual services (vsw, vds) and virtual devices (vnet, vdisk) to be dynamically added to a domain without having to reboot that domain.
Network NIU Hybrid I/O is a feature available on UltraSPARC-T2 based system which enables a virtual network interface from a guest domain to directly access a physical network interface owned by an I/O domain. That way a guest domain can have the performance of a physical network interface while using a virtual network interface.
Network VLAN support provides VLAN 802.1Q support to virtual network interfaces and virtual switches. This enables to assign port-vlan-id and vlan-id to virtual network interfaces and virtual switches.
Performance of the virtual network interface has been improved by about 30%.
A virtual switch can now be associated with an interface which is a link aggregation of some network interfaces.
Virtual Disk Multipathing enables a virtual disk to be connected to its backend through different service domains, and ensures the availability of the virtual disk in case one of the service domain becomes unavailable.
Single-slice disk support has been improved and Solaris can now be installed on such a disk. Also single-slice disks are now visible with the format(1m) command.
The XML interface with Domain Manager is now public, fully supported and documented. The Domain Manager can now be connected using the XMPP protocol, and the XML support provides a convenient way to interface an application with the Domain Manager.
This release also includes a lot of other improvements and bug fixes. Among other:
| Num. of Domains | CPUs/Domain | Memory/Domain | Comments |
|---|---|---|---|
| 128 | 2 | 4GB | each has two cpu threads |
| 32 | 8 | 16GB | each domain can have an entire cpu core (1 core has 8 threads) |
| 4 | 64 | 128GB | each domain can have an entire cpu chip (1 chip has 8 cores) |
| 1 | 256 | 512GB | the domain has all 4 cpu chips (4 chips x 8 cores x 8 threads = 256) |
Of the course the number of domains and the associated number of cpus and memory can be adjusted depending on your need.
The T5440 also has 4 PCI buses, that means that you can create up to 4 I/O domains. Such domains can have direct access to physical I/O resources, like a network card or a physical disk. So you can effectively split the T5440 into 4 fully independent systems (for example 4 domains, each with 64 cpu threads and 128GB of memory) which have their own physical disks and network interfaces and do not depend on any other domain because they don't need to use virtual I/O.
With such a large number of domains (up to 128), it is important to be able to easily move a domain from one system to another system in case you need to shutdown the entire platform (for example during a maintenance), or if you have some new hardware available and you want to free some resources on an existing system (for example to allocate more cpus or memory to the existing domains). You can easily do this with the Domain Migration feature of LDoms 1.1. Migration is done with a single command:
primary# ldm migrate domain_to_migrate system_to_migrate_to
Then the system will automatically select the appropriate type of migration depending of
the state of the domain to migrate:
The figure below illustrates the difference between virtual I/O and hybrid I/O by showing the path of the network packets in the different modes:
Click on the image to enlarge
Configuring a virtual network interface in hybrid mode is very simple: just specify "mode=hybrid" when adding or setting a virtual network interface. For example:
primary# ldm add-vnet mode=hybrid vnet0 primary-vsw0 ldg1
or
primary# ldm set-vnet mode=hybrid vnet0 ldg1
Note the setting "mode=hybrid" is just an hint to the system so that it tries to use hybrid
I/O. If the system is unable to use the hybrid mode (for example because the virtual switch
is not associated with an XAUI adapter) then the system will automatically fallback to the
legacy mode and use virtual I/O.
The T5120 and T5220 can have up to 2 XAUI adapters, and each XAUI adapter can be shared 3 times in hybrid mode. That means that you can have up 6 domains having a virtual network interface using hybrid I/O.
See Raghuram's blog for more details about Network Hybrid I/O.
In addition, LDoms 1.1 provides a solution to that problem for virtual disk I/O. If multiple service domains have access to the same virtual disk backend (for example a file on a NFS server, or a shared LUN on a SAN) then a virtual disk can be associated with all these service domains and the path to access the virtual disks backend will change depending on the availability of the service domains.
Virtual disk multipathing is configured by putting the vdsdev representing the same virtual disk backend into the same multipathing group (mpgroup). This is done when using the add-vdsdev command. For example, if we have two service domains (primary and alternate), each with a vds service (primary-vds0 and alternate-vds0), and each service domain is able to access the same NFS file /home/domain/ldg1/vdisk0, then we can put that backend file into the same mpgroup "foo".
primary# ldm add-vdsdev mpgroup=foo /home/domain/ldg1/vdisk0 vdisk0@primary-vds0
primary# ldm add-vdsdev mpgroup=foo /home/domain/ldg1/vdisk0 vdisk0@alternate-vds0
Finally the backend file can be exported as a virtual disk to the domain ldg1:
primary# ldm add-vdisk vdisk0 vdisk0@primary-vds0 ldg1
That way the virtual disk will be accessible in domain ldg1, primarily through the primary domain.
But if the primary domain goes down then the virtual disk will remain accessible through
the alternate domain. This is illustrated in the following figures:
Click on the image to enlarge
Click on the image to enlarge
The virtual disk backend can be a physical disk, a physical disk slice, a file or a volume from a volume management framework (like ZFS, SVM, VxVM...).
A backend is exported from a domain with the command "ldm add-vdsdev" (or "ldm add-vdiskserverdevice"):
# ldm add-vdsdev <backend> <volume_name>@<service_name>
And it is assigned to another domain with the command "ldm add-vdisk":
# ldm add-vdisk <disk_name> <volume_name>@<service_name> <domain>
Note that a backend is effectively exported when the domain <domain> is bound.
Virtual Disk Export Options
There are two ways a backend can be exported as a virtual disk, either as a full disk or as a single slice disk. Currently, the way a backend is exported (either as a full disk or as single slice disk) depends on the type of backend (whether it is a disk, a slice, a file or a volume). The next section (Virtual Disk Backend) explains how each type of backend is exported.
When a backend is exported to a domain as a full disk, it will appear in that domain as a regular disk with 8 slices (s0 to s7). Such a disk is visible with the format(1m) command and its partition table can be changed using either the fmthard(1m) or format(1m) command.
A full disk will also be visible from the Solaris installer and can be selected as a disk device on which Solaris can be installed.
When a backend is exported to a domain as a single slice disk, it will appear in that domain as a disk with a single partition (s0). Such a disk is not visible with the format(1m) command and its partition table can not be changed.
A single slice disk will not be visible from the Solaris installer and can not be select as a disk device on which Solaris can be installed.
The virtual disk backend is the location where data of a virtual disk are effectively be stored. This backend can be a physical disk, a physical disk slice, a file or a volume (ZFS, SVM, VxVM...). The way a backend is exported (either as a full disk or as single slice disk) depends on the type of backend. Remember that it is not possible to install Solaris on a single slice disk.
| Backend | Export | Solaris Installation |
|---|---|---|
| Physical Disk | Full Disk | Possible |
| Physical Disk Slice | Single Slice Disk | Not Possible |
| File | Full Disk | Possible |
| Volume (ZFS, SVM, VxVM...) | Single Slice Disk* | Not Possible* |
(*) This will change once bug 6514091 (vDisk server should export volumes as full disks) is fixed.
Here is some additional information about each type of backend:
A physical disk is exported as a full disk. In that case, virtual disk drivers (vds and vdc) forward I/Os from the virtual disk and act as a pass-through to the physical disk.
A physical disk can be exported by exporting the slice 2 (s2) of the disk.
Example: exporting a physical disk as a virtual disk
To export the physical disk c1t48d0 as a virtual disk, we have to export the slice 2 of that disk (c1t48d0s2):
# ldm add-vdsdev /dev/dsk/c1t48d0s2 c1t48d0@primary-vds0
Once the disk is exported, it can be assigned to a domain. Here it is
assigned to the domain "test":
# ldm add-vdisk pdisk c1t48d0@primary-vds0 test
Finally the disk is accessible from the guest domain "test" as a full
disk (i.e. a regular disk with 8 slices); here the disk is accessible as c0d1:
# ls -1 /dev/dsk/c0d1s*
/dev/dsk/c0d1s0
/dev/dsk/c0d1s1
/dev/dsk/c0d1s2
/dev/dsk/c0d1s3
/dev/dsk/c0d1s4
/dev/dsk/c0d1s5
/dev/dsk/c0d1s6
/dev/dsk/c0d1s7
A physical disk slice is exported as a single slice disk. In that case, virtual disk drivers (vds and vdc) forward I/Os from the virtual disk and act as a pass-through to the physical disk slice.
Example: exporting a physical disk slice as a virtual disk
To export the slice 0 of the physical disk c1t57d0 as a virtual disk, we have to export the device corresponding to that slice (c1t57d0s0):
# ldm add-vdsdev /dev/dsk/c1t57d0s0 c1t57d0s0@primary-vds0
Once the disk is exported, it can be assigned to a domain. Here it is
assigned to the domain "test":
# ldm add-vdisk pslice c1t57d0s0@primary-vds0 test
Finally the disk is accessible from the guest domain "test" as a single
slice disk (i.e. a disk with only 1 slice: s0); here the disk is accessible
as c0d13:
# # ls -1 /dev/dsk/c0d13s*
/dev/dsk/c0d13s0
A file is exported as a full disk. In that case, virtual disk drivers (vds and vdc) forward I/Os from the virtual disk and manage the partitioning of the virtual disk. The file is eventually a disk image storing data of all slices of the virtual disk.
When a file is exported as a virtual disk and no partitionning information is stored into that file then the system will automatically write a default disk label into the file and define a default partionning with two slices (0 and 2) covering the entire disk. Note that this behavior will change once bug 6575050 (vds should support unformatted disks) is fixed.
Example: exporting a file as a virtual disk
To export the file /ldoms/domain/test/fdisk0 as a virtual disk, we first have to create it. The size of the file will define the size of the virtual disk. Here we create a 100mb blank file to get a 100mb virtual disk:
# mkfile 100m /ldoms/domain/test/fdisk0
Then the file can be directly exported as a virtual disk:
# ldm add-vdsdev /ldoms/domain/test/fdisk0 fdisk0@primary-vds0
Once the file is exported, it can be assigned to a domain. Here it is
assigned to the domain "test":
# ldm add-vdisk fdisk fdisk0@primary-vds0 test
Finally the disk is accessible from the guest domain "test" as a full
disk (i.e. a regular disk with 8 slices); here the disk is accessible as c0d5:
# ls -1 /dev/dsk/c0d5s*
/dev/dsk/c0d5s0
/dev/dsk/c0d5s1
/dev/dsk/c0d5s2
/dev/dsk/c0d5s3
/dev/dsk/c0d5s4
/dev/dsk/c0d5s5
/dev/dsk/c0d5s6
/dev/dsk/c0d5s7
A volume is exported as a single slice disk. In that case, virtual disk drivers (vds and vdc) forward I/Os from the virtual disk and act as a pass-through to the volume.
Example: exporting a ZFS volume as a virtual disk
To export the ZFS volume zdisk0 as a virtual disk, we first have to create it. The size of the volume will define the size of the virtual disk. Here we create a 100mb volume to get a 100mb virtual disk:
# zfs create -V 100m ldoms/domain/test/zdisk0
Then we have to export the device corresponding to that ZFS volume:
# ldm add-vdsdev /dev/zvol/dsk/ldoms/domain/test/zdisk0 zdisk0@primary-vds0
Once the volume is exported, it can be assigned to a domain. Here it is
assigned to the domain "test":
# ldm add-vdisk zdisk0 zdisk0@primary-vds0 test
Finally the disk is accessible from the guest domain "test" as a single slice disk
(i.e. a disk with only 1 slice: s0); here the disk is accessible as c0d9:
# ls -1 /dev/dsk/c0d9s*
/dev/dsk/c0d9s0
FAQ
You have probably only exported single slice disks (disk slices or volume). The Solaris installer does not handle single slice disks so it thinks that the system has no disk and the installation fails.
You need to export a full disk (a physical disk or a file) and start the installation again.
You can export a physical CDROM/DVD like you export a physical disk by exporting the slice 2 (s2) of the CDROM/DVD. However the exported CDROM/DVD will be seen as a regular disk and not as a CDROM/DVD, and you can access the content of the CDROM/DVD from Solaris but you can not boot from that CDROM/DVD.
As a consequence you can export a Solaris CDROM/DVD and but you can not use it to install Solaris by booting the exported CDROM/DVD. So you can not install a guest domain from a CDROM/DVD. This will be improved when bug 6434615 (vDisk needs to support booting/installing from DVDs) is fixed.
You are probably exporting as a virtual disk a backend that is not accessible or that can not be exported (for example a file that does not exist). On the service domain, check the /var/adm/messages file for any error messages from the vds driver. This should give you some hints about what is wrong with which backend.
For example, a message like this one:
vds: [ID 877446 kern.info] vd_setup_vd(): /ldoms/domain/test/fdisk/fdisk01 is currently inaccessible (error 2)means that /ldoms/domain/test/fdisk/fdisk01 can not be exported because it does not exist (error 2 = ENOENT = No such file or directory, see "man -s2 intro").
When you export a file as a virtual disk, you may want to access the content of the disk image when the guest domain is down and you may want to mount one of the slices defined in the disk image (i.e. in the file). Unfortunately this is currently not possible.
lofi is currently not able to deal with a disk image, and it will present the disk as a one slice. If a slice is defined at the beginning of the file (i.e. at offset 0 of the virtual disk) then you may be able to access that slice using lofi, but any other slice will be inaccessible. This should be improved once bug 4765069 (RFE: lofiadm should be VTOC aware) is fixed.
Currently the only way to access any slice of a disk image (file) is to create a guest domain, export the file as a virtual disk to that domain and access the corresponding slice of the virtual disk from the guest domain.
Solaris 10 8/07 and patch 120011-14 contain several other fixes for LDoms, you can check Liam's blog for details.
So if you want to setup a split PCI configuration on this newer Sun Fire T2000 you have
to add either a Fiber Channel or a SCSI host adapter in one of the PCI-E or PCI-X slots
on bus pci@7c0 (bus_b) like this:
So I did a small program (vdlinux) which corrects these invalid values. It just have to be run on a Linux disk image and then that disk image can be run with LDoms without having to do any fancy tricks (like changing the disk label before and after binding the domain). The utility also applies the workaround for bug 6544963 if this is needed.
# vdlinux bootlinux Incorrect number of partition (0). Updating number of partition to 8. Incorrect vtoc sanity (0). Correcting vtoc sanity (600ddeee). Applying workaround for bug 6544963. Updating label checksum.You only needs to run the program one time on your Linux disk image. So it is best to do it right after you have generated your disk image and unbind the domain. If you execute the program another time, it will just say that the label is correct:
# vdlinux bootlinux Label looks correct.Then you can start your Linux domain as a regular domain without having to care about the label of the Linux disk image.
The vdlinux utility is available here:
The source file can be compiled with the following command:cc -o vdlinux vdlinux.c
I/O domains
Logical domains which have direct access to the hardware are called I/O domains. Obviously, you will have at least one I/O domain and this will be the first domain created on the system i.e. the primary domain. Then you can create additional I/O domains by removing some hardware resources from the primary domain and assigning them to another domain. Finally the number of I/O domains you can create depends on the hardware resources available on your system so it eventually depends on the type of system you are using.
PCI buses on Sun Fire T2000
Let's look at the Sun Fire T2000 Server for a concrete example. On this system, the smallest hardware resource you can assign to a domain is an entire PCI bus; and the Sun Fire T2000 has only two PCI buses hence you can create a maximum of two I/O domains.
The two PCI buses of the Sun Fire T2000 server are initially assigned to the primary domain. The buses are identified as pci@780 (or bus_a) and pci@7c0 (or bus_b) and they are connected the following devices:
As you can see, both buses have two network interfaces, but other resources are not so evenly spread: pci@7c0 (bus_b) has all the internal disks, the DVD-ROM and 4 PCI slots while pci@780 (bus_a) has only one PCI slot.
So there is no problem to create an I/O domain with bus pci@7c0 (bus_b) because you can have all the basic hardware resources you need (i.e. a disk and a network interface). But when using bus pci@780 (bus_a), you only get some network interfaces but no disk. Hence to create an I/O domain with pci@780 (bus_a) you will have to add a PCI-E card (either a Fiber Channel or a SCSI host adapter) in the PCI-E slot 0 to get access to some storage devices. You also have to ensure that the card you are adding can be used to boot the system.
Configuration of the primary domain
Initially both PCI buses are assigned to the primary domain. You can verify this with the "ldm list-bindings" command:
primary# ldm list-bindings primary
...
IO: pci@780 (bus_a)
pci@7c0 (bus_b)
...
However to be able to split the PCI buses, the primary domain should be using
devices from only one PCI bus and, most of the time, you will use devices from bus
pci@7c0 (bus_b) because the system disk of the primary domain is an internal disk.
You can check the disks used by looking at the path of the disk devices:
primary# ls -l /dev/dsk ... lrwxrwxrwx 1 root root 65 Feb 2 17:19 /dev/dsk/c1t0d0s0 -> ../../devices/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0:a ...You have to ensure that the system disk is on bus pci@7c0 (bus_b) and that any disk on bus pci@780 (bus_a) is not being used by the primary domain.
You also have to check that the network interfaces used by the primary domain are also on bus pci@7c0 (bus_b). To do so, look at the path of the e1000g interfaces:
primary# ls -l /dev/e1000g*You have to ensure that the network interfaces you are using (especially the primary network interface) are on bus pci@7c0 (bus_b).
If your primary network interface (for example e1000g0) is not on bus pci@7c0 (bus_b) then you will have to reconfigure your system so that it uses another interface (for example e1000g2) which has to be on bus pci@7c0 (bus_b). If you have to change the network interface, don't forget to correctly reconnect the network cables (for example move the network cable from e1000g0 to e1000g2).
After checking that the primary domain is only using devices from bus pci@7c0 (bus_b), you can remove bus pci@780 (bus_a) from the configuration of the primary domain. This can be done using the "ldm remove-io" command:
primary# ldm remove-io bus_a primaryThe reconfiguration is not immediate and you will have to reboot the primary domain so that the removal of pci@780 (bus_a) gets effective. After the primary domain is rebooted, you can check that it now only owns bus pci@7c0 (bus_b):
primary# ldm list-bindings primary
...
IO: pci@7c0 (bus_b)
...
Configuration of the alternate I/O domain
Now that PCI bus pci@780 (bus_a) is available, you can assign it to another domain. To do so, you just have to use the "ldm add-io" command while configuring your alternate domain:
primary# ldm create alternate primary# ldm set-vcpu 4 alternate primary# ldm set-mem 4G alternate primary# ldm add-io bus_a alternateThis creates an alternate I/O domain with 4 cpus, 4GB of memory and the PCI bus pci@780 (bus_a). After the alternate domain is configured, it can be started as a regular domain with the "ldm bind" and "ldm start" commands;
primary# ldm bind alternate primary# ldm start alternateWhen the alternate domain is bound, you can check that it is using bus pci@780 (bus_a):
primary# ldm list-bindings alternate
...
IO: pci@780 (bus_a)
...
And you can connect the console of that domain to install it. The installation can be
done through the network with a "boot net" like for installing a regular Sparc system.
Differences on Sun Fire T1000
You can setup the same configuration on a Sun Fire T1000 Server. The Sun Fire T1000 has two PCI buses similar to the two PCI buses of the Sun Fire T2000: pci@780 (bus_a) and pci@7c0 (bus_b). But the Sun Fire T1000 has no PCI-E and PCI-X slots on bus pci@7c0 (bus_b). Fortunately it still has PCI-E slot 0 on bus pci@780 (bus_a) which can be used to plug a FC or SCSI host adapter to connect some storage for the alternate domain.
Virtual I/O Failover
Once you have more than one I/O domain, you can configure virtual I/O failover for guest domains. Check out Narayan's blog for details: Part One and Part Two.
Linux on UltraSPARC-T2
Anyway, it eventually works and it works fine. You can see the result with a demo on Ash's blog. And note that the demo and the tests have been done on a system with an UltraSPARC-T2 processor, so Linux does work with the UltraSPARC-T2. Here is a log of the boot sequence:
{0} ok boot
Boot device: rootdisk File and args:
SILO Version 1.4.13
boot: linux.2623
Allocated 8 Megs of memory at 0x40000000 for kernel
Loaded kernel version 2.6.23
Remapping the kernel... done.
OF stdout device is: /virtual-devices@100/console@1
Booting Linux...
[585488.953894] VIO: Adding device channel-devices
[585488.954061] VIO: Adding device vnet-port-0-0
[585488.954202] VIO: Adding device vdc-port-0-0
[585488.954350] VIO: Adding device ds-0
... snip ...
* Running local boot scripts (/etc/rc.local) [ OK ]
Ubuntu gutsy (development branch) t2k-linux1 ttyS0
t2k-linux1 login: root
Password:
Last login: Fri Aug 10 10:48:13 2007 on ttyS0
Linux t2k-linux1 2.6.23-rc1 #1 SMP Sun Jul 29 21:19:34 PDT 2007 sparc64
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
root@t2k-linux1:~# uname -a
Linux t2k-linux1 2.6.23-rc1 #1 SMP Sun Jul 29 21:19:34 PDT 2007 sparc64 GNU/Linux
root@t2k-linux1:~# grep CPU /proc/cpuinfo | wc -l
60
root@t2k-linux1:~# cat /proc/cpuinfo
cpu : UltraSparc T1 (Niagara)
fpu : UltraSparc T1 integrated FPU
prom : OBP 4.27.0.build_03***PROTOTYPE BUILD*** 2007/07/27 18:48
type : sun4v
ncpus probed : 60
ncpus active : 60
D$ parity tl1 : 0
I$ parity tl1 : 0
... snip ...
MMU Type : Hypervisor (sun4v)
State:
CPU0: online
CPU1: online
CPU2: online
CPU3: online
... snip ...
CPU57: online
CPU58: online
CPU59: online
Linux identifies the processor as an UltraSPARC-T1 but this is really an UltraSPARC-T2.
The evidence is that the UltraSPARC-T1 has only
32 threads and here we have a Linux domain running with 60 (yes 60!) cpus. The
UltraSPARC-T2 has 64 threads and this system was configured with a primary domain
running Solaris with 4 cpus and a guest domain running Linux with 60 cpus. Note
that we have to use a Linux 2.6.23 kernel to be able to boot the UltraSPARC-T2 processor.
Linux Domain bind/start Tricks
When you have a Linux disk image, Dave Miller and Fabio mention some tricky steps to be able to boot from that disk image because the LDoms virtual disk server mangles partition tables.
Here is a simpler procedure: let say you have a Linux disk image /ldoms/disklinux and you have configure the domain linux-domain to use that image. Then if you just do a "ldom bind" and "ldm start" of the linux-domain, Linux will not boot correctly. What you need to do is:
# dd if=/ldoms/disklinux of=/ldoms/labellinux count=1
# dd if=/dev/zero of=/ldoms/disklinux count=1 conv=notrunc
# ldm bind linux-domain
# dd if=/ldoms/labellinux of=/ldoms/disklinux count=1 conv=notrunc
# ldm start linux-domain
And then you can start Linux.
Why do we need to do that? On the Linux disk image, you will have a fake Sun VTOC disk label that defines 0 partition. If you directly bind and start the Linux domain with this disk label then the virtual disk server will read the label and, accordingly, it will see that no partition is defined. Then later, when Linux starts, it will request the virtual disk server to read from slice 2, but as no partition is defined the virtual disk server will return an error and Linux will be unable to read from the disk.
When we erase the disk label and bind the domain, the virtual disk server will create a default partitioning with partition 2 representing the entire disk. After the domain is bound the virtual disk server will not read the disk label again so the original label can be restored. Then when Linux will read from slice 2, there will be no problem because the virtual disk server now knows about slice 2.
This will be improved with some next version of the virtual disk server driver and probably a change in the Linux virtual disk so that none of these tricks are required to start a Linux domain.
The good news is that bug 6531557 (format(1m) does not work with virtual disks) has just been fixed into Solaris Nevada and Open Solaris. So the format(1m) command now works with virtual disks in a LDoms guest domain. The fix of that problem for Solaris 10 should come later as a patch.
Note that there are some format(1m) sub-commands will still not work because such commands only work with SCSI disks and virtual disks do not currently appear as SCSI disks (even when a virtual disk is created from a physical SCSI disk).
The following shows which format(1m) sub-commands work with virtual disks and which do not:
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
Commands in green work with virtual disks.Another good news is that this also fixes some underlying problems such as doing I/O using an absolute disk offset, or providing the correct virtual disk size. These problems were not impacting the end-user but they were causing troubles to developers such as Dave Miller and Fabio who working hard to have Linux running with LDoms.
Although this fix does not solve all problems Dave and Fabio are facing, it introduces the foundation for a next fix which hopefully should solve everything by introducing the support for unformatted disk (bug 6575050) and this will avoid the hacks currently required to be able to use a Linux disk image.
Problem
In some cases, when a file is used as a virtual disk, the label of that virtual disk can be lost when rebinding a domain (ldm bind) using that file (or a copy of that file) as a virtual disk.
For example, if a domain uses a file as a virtual disk and the Solaris system gets installed on that virtual disk (using boot net) then all will be running without any problem while the domain is not unbound. If the domain is unbound (using ldm unbind) then the label on the file used as a virtual disk might be lost the next time a domain using that file is bound (using ldm bind). In such a situation, the newly bound domain will be unable to use the system installed on the virtual disk and it will fail to boot with an error like "the file does not appear to be bootable" or it might fail to mount the root filesystem with an error like "vfs_mountroot: cannot mount root".
This problem is referenced as bug 6544963 (vdisk can lose label when rebinding domain)
Workaround
To prevent this problem, you need to use the following script fcksum. This script will check if you are in the case where the label will not be correctly validated during the next "ldm bind", and if this is the case it will change the label and its checksum so that it can be correctly validated. The script should be run on any file that has been used as a virtual disk and for which the disk label or disk partitioning has been changed. The script should be run right after the domain using the virtual disk is unbound (ldm unbind) for the first time.
For example, if file filedisk is used by a domain as a virtual disk and if the Solaris system is being installed onto that virtual disk then you should run the script after doing the first "ldm unbind" on that domain. Note that the script should be run before doing any "ldm bind' otherwise you can loose the disk label.
The syntax to run the script is: ./fcksum filedisk
Note that the script will first backup the existing label of the file in a file named label.file.day_time.
Here is the output you will get if your label is updated:
$ ./fcksum rootdisk
Backing up original label in label.rootdisk.070314_201917
Changing checksum
0x1fe: 0xe456 = 0x6456
Changing dummy field
0x1b8: 0 = 0x8000
Label checksum has been updated
Otherwise if the label does not need to be updated, you will get:
$ ./fcksum rootdisk
Backing up original label in label.rootdisk.070314_201005
Label checksum is okay
Dowmnload the fcksum script (use the Save Link as... option of your browser)