web counter

Thursday Oct 15, 2009

A cool feature of LDoms 1.2 software is support for multiple guest domains accessing the same vdisk backend.

This is handy if you've got a file system that supports concurrent writes from multiple hosts.

Keep in mind, though, that while you can have 1:many relationship between backends and virtual disks (vdisk), the virtual disk server device (vdsdev) to vdisk relationship is still 1:1.

From the Logical Domains 1.2 Administration Guide:

Export a Virtual Disk Backend Multiple Times

A virtual disk backend can be exported multiple times either through the same or different virtual disk servers. Each exported instance of the virtual disk backend can then be assigned to either the same or different guest domains.

When a virtual disk backend is exported multiple times, it should not be exported with the exclusive (excl) option. Specifying the excl option will only allow exporting the backend once. The backend can be safely exported multiple times as a read-only device with the ro option.


Caution – Caution –

When a virtual disk backend is exported multiple times, applications running on guest domains and using that virtual disk are responsible for coordinating and synchronizing concurrent write access to ensure data coherency.


The following example describes how to add the same virtual disk to two different guest domains through the same virtual disk service.

  1. Export the virtual disk backend two times from a service domain by using the following commands.


    # ldm add-vdsdev [options={ro,slice}] backend volume1@service-name
    # ldm add-vdsdev -f [options={ro,slice}] backend volume2@service-name
    

    Note that the second ldm add-vdsdev command uses the -f option to force the second export of the backend. Use this option when using the same backend path for both commands and when the virtual disk servers are located on the same service domain.

  2. Assign the exported backend to each guest domain by using the following commands.

    The disk-name can be different for ldom1 and ldom2.


    # ldm add-vdisk [timeout=seconds] disk-name volume1@service-name ldom1
    # ldm add-vdisk [timeout=seconds] disk-name volume2@service-name ldom2

So given that example, you might do something like:

# ldm add-vdsdev /dev/dsk/c0t1d0s0 vol1@primary-vds0
# ldm add-vdsdev -f /dev/dsk/c0t1d0s0 vol2@primary-vds0
# ldm add-vdisk vdisk1 vol1@primary-vds0 ldom1
# ldm add-vdisk vdisk1 vol2@primary-vds0 ldom2


Wednesday Oct 14, 2009

I was excited when I saw that Solaris 10 10/09 was to support ZFS rooted flash JumpStarts.

From the What's New document:

ZFS Features and Changes

The following section summarizes new features in the ZFS file system.

  • ZFS and Flash installation support – In the Solaris 10 10/09 release, you can set up a JumpStart profile to identify a flash archive of a ZFS root pool. For more information, see the Solaris ZFS Administration Guide.

Awesome.

So I hopped over to the JumpStart and Advanced Installations doc; I can never remember the profile syntax for a zpool.  I browsed to the install_type section, and found:

install_type Keyword (ZFS and UFS)

The install_type keyword is required in every profile. For a UFS installation, Several options are available. The only option available for a ZFS installation is the initial_install keyword. This option installs a new Solaris OS on a system. The profile syntax is the following:

install_type initial_install

Note –

The following UFS options are not available for a ZFS installation.

Cannot be installed!?  Things were looking dim.

Following the link from the What's New doc, I check out the ZFS Admin Guide.  All the way at the bottom of the Installing a ZFS Root File System (Flash Archive Installation) section I found:

Example 5–2 Installing a System with a ZFS Flash Archive

After the master system is installed or upgraded to the Solaris 10 10/09 release, create a Flash archive of the ZFS root pool. For example:


# flarcreate -n zfs10u8BE zfs10u8flar
Full Flash
Checking integrity...
Integrity OK.
Running precreation scripts...
Precreation scripts done.
Determining the size of the archive...
The archive will be approximately 4.94GB.
Creating the archive...
Archive creation complete.
Running postcreation scripts...
Postcreation scripts done.

Running pre-exit scripts...
Pre-exit scripts done.

On the system that will be used as the installation server, create a JumpStart profile as you would for installing any system. For example, the following profile is used to install the zfs10u8flar archive.

install_type flash_install
archive_location nfs system:/export/jump/zfs10u8flar
partitioning explicit
pool rpool auto auto auto mirror c0t1d0s0 c0t0d0s0

Now I'm getting hopeful again.  Maybe the Advanced Installations doc has a misprint.

After testing it, sure enough, ZFS rooted flash_install works!  If you do a lot of provisioning, this is huge.

A quick note -- my profile syntax was just a shade different and it still worked -- which was good, because we're using a 3rd-party provisioning tool which might have been hard to fix.

My profile:

install_type flash_install
archive_location nfs://10.1.1.1/usr/local/flars/sol10-1009.flar
partitioning explicit
pool rpool auto auto auto c0d0s0

Keep in mind that this was to install a flar that was generated on a system that was already ZFS rooted.  That system was jumped (initial_install) with this profile:

install_type initial_install
system_type server
cluster SUNWCuser
pool rpool auto auto auto c0d0s0
bootenv installbe bename zfsroot dataset /var

The differences are subtle, but enough to bomb the jump if you get it wrong.

Monday Oct 12, 2009

I was wondering today if kernel caging is enabled on one of my systems.  Sure, you can grep it out of /var/adm/messages, but I wanted a way to interrogate the kernel itself.

MDB to the rescue.

To see if kernel caging is enabled:

# echo "kernel_cage_enable::print" | mdb -k
0x1

This should work for any parameter in /etc/system

UPDATE: Thanks to Bill Hathaway for pointing out that mdb is in write-mode only with the -w switch.

Solaris 10 Update 8, also know as 10/09, was just released.  It has a bunch of cool new features.

Here are the new features I'm most interested in playing with:

  • Turbo-charged SVR4 packaging.  Despite its old age, I'm a big fan of the pkg format.  The 10/09 enhancements are supposed to make everything from initial installs, to LiveUpgrades, to zone installations faster.  Awesome.
  • Zone parallel patching.  Huge possibilities here.  Patching zones has been a sore area for me.  Anything that makes patching systems with non-global zones better would be great.
  • ZFS root flash archive-based installation.  The installer used to core dump if you tried to install a flar onto a ZFS rooted system.  In my mind, this is one of the last steps to a mature ZFS rooted file system.
  • ZFS user and group quotas.
  • Solaris 8 and 9 Container support.  I've read rumors that support for Solaris 8 and 9 Containers is bundled with the OS.  This would be great if it's true.  Historically you needed the Solaris 8 or Solaris 9 media if you wanted to install a Branded Zone.  If this is true it would alleviate a big headache for me.
After I've had some time to play with these features I'll post my findings.

This weekend I got the chance to work on my SST project a bit.  I used to think that I'd be able to knock out the new version all by myself.  But with family and day-job responsibilities I just don't have the cycles.  Therein lies the beauty of open source.  If I can distribute the load between 4 of 5 of my colleagues the task becomes quite do-able.

To facilitate the distribution of the test and development tasks, I'm working on a standard methodological approach. SST is all shell scripts, so just about anybody can dive in and help.  If everybody approaches the problem from the same angle, things become even easier.

If you're interested in SST development, feel free to shoot me an email at jason.callaway@sun.com.

Saturday Oct 10, 2009

I went for a quick ride today while I was waiting for my Solaris 10 Update 8 VirtualBox to load.

Apparently I live in hill country.  Lung-burning hill country.

Ride-20091010-1

I've just started riding again after breaking my foot a few months ago.   It's great to be able to ride again, but boy am I out of shape.

Going over some railroad tracks I nearly ate some pavement.  There was about a six inch gap between the tracks and the road.  Nearly crashing reminded me to take another picture, though.

Ride-20091010-2

Weather's supposed to be nice tomorrow too.

Back to work on SST.  Now that 10/09 is out, I suppose we'll focus on that for SST:LV 5.0.

Saturday Sep 26, 2009

Here's another great image from the LDoms Community Cookbook at wikis.sun.com.

From the Sun SPARC Enterprise T5440 section:

I'm going to print these topology images and keep them on my cube wall.  If you find yourself doing a lot of T-Class administration, they're invaluable.

Friday Sep 25, 2009

Split PCI bus capability has become pretty common among the T-Class systems.  The ability to split the PCI bus provides multiple I/O Domain support.  With extra I/O Domains you can do all sorts of cool stuff like NIC fail-over and even direct card control.

If you've ever had trouble understanding (or in my case even just remembering) how the split bus configurations work, here's the site to check out: LDoms Community Cookbook, Section 1 - Hardware & Split PCI.

This page (and the whole site) is fantastic.  Check out this image -- it makes understanding the bus allocations trivial:


I've definitely got to spend some more time on wikis.sun.com

Monday Sep 21, 2009

I ran into a problem today with spaces in /etc/syslog.conf.  Any spaces between the selectors and actions upset the m4 parser -- you've got to use tabs.

I'm using a deployment / provisioning tool (not xVM) that wants to put spaces in that file.  Very frustrating.

Here's a one-liner Perl script that strips out the spaces and replaces them with tabs.  The file will look a little ugly, but at least it'll work.

# perl -pi'*.bak' -e 's/^(\S+?\.\S+)\s*(\S*)/$1\t\t$2/' /etc/syslog.conf

There are a hundred ways to skin this cat, but this was quick and only one line, which is what I needed.

Enjoy.

Thursday Feb 05, 2009

Back in January, I posted about an issue with a freshly-jumpstarted system.  During the first reboot of the system, I got the error:

glm: cannot load driver
Cannot load drivers for /pci@1c,600000/scsi@2/disk@0,0:a
Can't load the root filesystem

Mike Gerdts kindly commented with some details about the error:

You root disk is (as known by OBP):

/pci@1c,600000/scsi@2/disk@0,0:a

In order for Solaris to know how to talk to scsi@2 (a SCSI device), it needs to know how to talk to pci@1c,600000 (the PCI bus that leads to the SCSI bus). Because of a bug in the installer, the PCI driver provided by SUNWpd is not installed. By adding the package, you installed the PCI driver that now makes it so that the SCSI device is accessible.

FWIW, I've run into the same problem installing into LDoms. Since my installations tend to be via jumpstart, I have added the following to my jumpstart profiles:

package SUNWpd add

Mike's sollution is a good one.  As it turns out the SUNWCXall metacluster is the only metacluster that includes the SUNWpd and SUNWpdu packages.  (They actually have their own cluster called SUNWCpd, which is what's added by SUNWCXall.) You can see for yourself if you mount up the install media and examine Solaris_10/Product/.clustertoc.

Since I was installing SUNWCuser, and my jumpstart profile (which I inherited and never scrutinized) added the package SUNWpdu and not SUNWpd, this error is not surprising.  What does surprise me is the fact that this error has shown up on only one of my systems, a V240, and none of my others.  I'll check to see if ther are any system-specific clusters that are missing the SUNWCpd cluster.

I love it when things make sense.

Wednesday Jan 21, 2009

I ran into a wacky problem today on a V240.  On the reboot after applying a jumpstart installation, the boot failed with the error:

glm: cannot load driver
Cannot load drivers for /pci@1c,600000/scsi@2/disk@0,0:a
Can't load the root filesystem

After a little Googling, I found an OpenSolaris forum entry that references a known bug and workaround for V210s.

I gave the workaround a shot, and it worked on my V240.  Keep in mind that I updated the firmware to 4.22.33 before trying the workaround and still had the glm error.  My Solaris 10 version is 5/08 with the 127127-11 kernel.

The workaround

After applying the jumpstart image, and after the first failed reboot:

  1. Boot from a Solaris 10 DVD into single-user mode
    1. {1} ok boot cdrom -s
  2. Mount / and /var on /a.  (/var is on slice 3 in this example)
    1. # mount /dev/dsk/c1t0d0s0 /a
    2. # mount /dev/dsk/c1t0d0s3 /a/var
  3. Reinstall the SUNWpd package
    1. # cd /cdrom/Solaris_10/Product
    2. # pkgadd -a /a/var/sadm/install/admin -R /a -d . SUNWpd
  4. Reboot

The pkgadd throws errors, but seems to succeed.

## Executing postinstall script.
Reboot client to install driver.
exec failed. error=2.

Installation of was successful.

I haven't had the chance to trace out this error and see 1) what's causing it and, 2) why this works, but it does seem to work.

UPDATE: I've discovered that this is mostly my fault, and posted about it here.

Thursday Dec 18, 2008

Well, it seems like my dreams of using a vdsdev with a zvol backend as a boot-device for Solaris 10 5/08 have been dashed.

Here's my environment:

  • T5220
    • Solaris 10 5/08 127127-11
    • LDoms 1.0.3
    • Firmware 7.1.6.d
  • Guest domain
    • Installed via JET
    • Flar-based install from same Solaris 10 5/08 image as the T5220

While it is possible to present a zvol to a guest domain as a single slice, Solaris 10 5/08 does not support a single-slice install.  So you could install on a whole disk, or on an image file from within a zfs dataset, and then present more storage after the jump.  But you can't use a zvol as your boot-device.

This problem seems to be fixed with Solaris 10 10/08:

Before Solaris 10 10/08 (Update 6), the installation of Solaris is possible only on a full disk, not on a single-slice disk.
Starting with Solaris 10 10/08 (Update 6), the installation of Solaris is also possible on a single-slice disk. In that case, if the Solaris installation is done using the UFS filesystem then only the root partition must be defined and this partition must use all the disk space.

Next week I'll be working on our 10/08 build, so I'll be able to give this a try.  10/08 supports a zfs root, which is essential -- more on that in a moment.

So why do I care about a zvol boot-device?  LUNs.  Say you have a big LUN presented to your control domain.  ZFS is an easy and effective way of carving up the LUN and giving storage to your guest / service / I/O domains.  You get block-level snapshots, easy administration, and pretty decent performance.  Oh, and it's free too.  But without zvols you have to make an image file to use as a backend for your vdsdev, i.e.,

# zfs create lunpool/guestldom
# mkfile 36g /lunpool/guestldom/imgfile
# ldm add-vdsdev /lunpool/guestldom/imgfile guestldom-system-disk@primary-vds0

This is all well and good.  But mkfile pads the image file that you create with 0s.  This can take quite a while.  Sure, you could do it once, and then snapshot and clone the containing dataset.  But then all of your image files would be dependent upon that initial snapshot.  Not a huge deal, but it could be annoying.

Zvols do a bit to address this issue, but not completely.  You can create a zvol of arbitrary size very quickly, i.e.,

# zfs create -V 36g lunpool/guestldom

There are no 0s to be written, the command comes back in a snap.  But according to the LDoms cookbook, your guest domain will need to be installed with a single filesytem mounted at /.  This is a big problem if we're talking UFS.  For example, how will you isolate, say, /var from /kernel?  You could maybe use some fs quotas, but that's not really ideal.  And what about swap?  I have no idea how that will work -- with only s0 will swap live in a file on top of UFS?  That'll be interesting to see; I'll post about that next week when I get to try it.

A zfs root could solve this problem.  /var can be isolated from /.  Swap gets its own dataset.  I'm a little worried about where the zfs database will live, I'll have to see.

So... I've still got more questions than answers.  But one thing is clear: upgrade to 10/08!


Wednesday Dec 17, 2008

I almost rode the BWI loop today, but I had so much fun at Rockburn Branch last time that I made a last minute change of plans.  After all, how wet could the trails be?

The answer is pretty darn wet.

Upstream from here on the way back I lost my momentum and had to put my foot down in shin-deep water.  Boy, did that make for a cold foot on the way back.

Riding up the hill on the far side of this picture I crashed -- nothing bad, but enough to shake my confidence for riding over wet logs.  Those things are slippery like ice.

This trail will be loads of fun when it's dry.  There are some great logs and jumps set up.  Near the logs I encountered strange but beautiful dead trees that were covered with ivy.

Sorry about the fuzzy pictures.  It's hard to hold the iPhone still when you're breating hard.  Actually, the camera is somewhat difficult to use since you've got no idea what your focal length is, exposure, or anything, really.

Back to work -- still haven't figured out how to get a zvol vdsdev working as a guest ldom root.

Tuesday Dec 16, 2008

I wasn't really through in my troubleshooting here, so take this with a grain of salt, but maybe it'll save you some time.

In order to get a ZFS vdsdev working in LDom 1.0.3 on Solaris 10 5/08 on a T5220 I had to upgrade the firmware to 7.1.6.d.

The documentation says that 7.1.x is the minimum required for LDom 1.0.3.  But whenever I tried to jump the guest domain, the guest would complain about no disks being present.

After upgrading to 7.1.6.d I was able to get image files within a ZFS dataset working as vdsdevs.  I still can't get zvols to work, although I haven't tried very hard.

Here's the process I followed to upgrade the firmware and create the ZFS vdsdev.

  • Copy the patch to your control domain and unzip.
  • Download the firmware to your system controller:

# cd <pkgdir>
# sysfwdownload Sun_System_Firmware-7_1_6_d-SPARC_Enterprise_T5120+T5220.pkg

  • If you haven't already, make yourself a CLI_mode=ALOM user on the system controller:

-> create /SP/users/yourUserName role=Administrator cli_mode=alom

  • Shut down the control domain.
  • Log onto the system controller with that new user and install the new firmware:

sc> flashupdate -s 127.0.0.1

  • Power back up and make your ZFS vdsdevs.  (Just a sketch of some commands, not a complete process...)

# zfs create some/data/set
# mkfile 36g /some/data/set/imagefile
# ldm add-vdsdev /some/data/set/imagefile vol0@primary-vds0
# ldm add-vdisk vdisk0 vol0@primary-vds0 dom0

For a more complete treatment of LDoms, check out the 1.0.3 Administration Guide (pdf).  ZFS in LDoms is addressed in the LDoms I/O Best Practices: Data Reliability blueprint.  Although that blueprint is for OpenSolaris, which may explain why I can't get the zvols to work.

Monday Dec 15, 2008

I went for a ride at lunch today. Rockburn Branch Park has some very nice single-track.  The trails range from smooth dirt and gravel to rocky and technical.  From what I understand, these trails can take you all the way into Patapsco

Rockburn Branch Park in Elkridge, MD

It was great.  I'm trying to get better at the work/life balance.

Hopefully by getting out to hit the trails (which are less than 4 miles from my customer site -- awesome!), I'll be able to balance the scales a little.


This blog copyright 2010 by Jason Callaway