Availability Engineering
Sun Cluster Oasis
« Introduction to PxFS... | Main | Solaris Cluster at... »
Monday Jul 14, 2008
LDoms guest domains supported as Solaris Cluster nodes

Folks, when late last year we announced support for Solaris Cluster in LDoms I/O domains on this blog entry , we also hinted about support for LDoms guest domains. It has taken a bit longer then we envisaged, but i am pleased to report that SC Marketing has just announced support for LDoms guest domains with Solaris Cluster!!

So, what exactly does "support" mean here? It means that you can create a LDoms guest domain running Solaris, and then treat that guest domain as a cluster node by installing SC software (specific version and patch information noted later in the blog) inside the guest domain and have the SC software work with the virtual devices in the guest domain. The technically inclined reader would, at this point, have several questions pop into his head... How exactly does SC work with virtual devices? What do i have to do to make SC recognize these devices? Are there any differences between how SC is configured in LDoms guest domains, vs non-virtualized environments? Read-on below for a high level summary of specifics:

  • For shared storage devices (i.e. those accessible from multiple cluster nodes), the virtual device must be backed by a full SCSI LUN. That means, no file backed virtual devices, no slices, no volumes. This limitation is required because SC needs advanced features in the storage devices to guarantee data integrity and those features are available only for virtual storage devices backed by full SCSI LUNs.

  • One may need to use storage which is unshared (ie is accessed from only one cluster node), for things such as OS image installation for the guest domain. For such usage, any type of virtual devices can be used, including those backed by files in the I/O domain. However, for such virtual devices, make sure to configure them to be synchronous. Check LDoms documentation and release notes on how to do that. Currently (as of July 2008) one needs to add "set vds:vd_file_write_flags = 0" to the /etc/system file in the I/O domain exporting the file. This is required because the Cluster stores some key configuration information on the root filesystem (in /etc/cluster) and it expects that the information written to this location is written synchronously to the disks. If the root filesystem of the guest domain is on a file in the I/O domain, it needs this setting to be synchronous.

  • Network based storage (NAS etc.) is fine when used from within the guest domain. Check cluster support matrix for specifics. LDoms guest domains don't change this support.

  • For cluster private interconnect, the LDoms virtual device "vnet" can be used just fine, however the virtual switch which it maps must have the option "mode=sc" specified for it. So essentially, for the command ldm subcommand add-vsw, you would add another argument "mode=sc" on the command line while creating the virtual switch which would be used for cluster private interconnect inside the guest domains. This option enables a fastpath in the I/O domain for the Cluster heartbeat packets so that those packets do not compete with application network packets in the I/O domain for resources. This greatly improves the reliability of the Cluster heartbeats, even under heavy load, leading to a very stable cluster membership for applications to work with. Note however, that good engineering practices should still be followed while sizing your server resources (both in the I/O domain as well as in the guest domains) for the application load expected on the system.

  • With this announcement all features of Solaris Cluster supported in non-virtualized environments are supported in LDoms guest domains, unless explicitly noted in the SC release notes. Some limitations come from LDoms themselves, such as lack of jumbo frame support over virtual networks or lack of link based failure detection with IPMP in guest domains. Check LDoms documentation and release notes for such limitations as support for such missing features are improving all the time.

  • For support of specific applications with LDoms guest domains and SC, check with your ISV. Support for applications in LDoms guest domains is improving all the time, so check often.

  • Software version requirements. LDoms_1.0.3 or higher, S10U5 and patches 137111-01, 137042-01, 138042-02, and 138056-01 or higher are required in BOTH the LDoms guest domains as well as in the I/O domains exporting virtual devices to the guest domains. Solaris Cluster SC32U1 (3.2 2/08) with patch 126106-15 or higher is required in the LDoms guest domains.

  • Licensing for SC in LDoms guest domains follows the same model as those for the I/O domains. You basically pay for the physical server, irrespective of how many guest domains and I/O domains are deployed in that physical server.
  • This covers the high level overview of how SC is to be deployed inside the LDoms guest domains. Check out the SC Release notes for additional details, and some sample configurations. The whole virtualization space is evolving very rapidly and new developments are happening ever so quickly. Keep this blog page bookmarked and visit it frequently to find out how Solaris Cluster is evolving along with this space.

    Cheers!

    Ashutosh Tripathi
    Solaris Cluster Engineering

    Posted at 02:47PM Jul 14, 2008 in Sun  |  Comments[21]

    Comments:

    [Trackback] Endlich ist es geschafft!  Seit heute werden LDoms als Knoten im SunCluster unterstuetzt.  Ja, richtig, Gast-Domains.  Die Details gibt es  in einem Blog von Ashutosh Tripathi in den Release Notes der aktuelle...

    Posted by Die Kernspalter on July 14, 2008 at 10:34 PM PDT #

    Any idea when a cluster service for ldom will be out? Ie with cluster I/O domains when one dies it moves the ldoms across etc.

    Posted by kangcool on July 19, 2008 at 04:54 PM PDT #

    Hi kang,

    There is a technical issue with the death of I/O domains. The guest domains keep running, just blocked on I/O. When the I/O domain comes back the guest domains continue. So, if we failover all the guest domains if the I/O domain dies, there is potential for problems. We are looking to solve the technical issue.

    It would help us if you describe the deployment problem you are trying to solve. Would running SC inside the guest domains work for your deployment?

    Regards,
    -ashu

    Posted by ashu on July 24, 2008 at 08:55 AM PDT #

    Just thinking for virtual hosting

    Ie here an image that you think is a real box, do what you want etc...

    But what if the box "blows up" you lose all your ldom guests. Joy.

    Ok you can restart each manually but that could be a pain, so why not get software to do it.

    Vmware now has this feature.

    Posted by kangcool on July 24, 2008 at 02:36 PM PDT #

    Hi kang,

    Yes that makes sense. But we have to differentiate between a box actually "blowing up" (as you put it), vs the box (or to be specific, the I/O domains on the box), just going for a reboot.

    Having said that, i do see the point that failing over the guest domains themselves has value. I did mention that technical issue in my earlier comment. We would have to see what is the best way to go about dealing with that.

    Thanks for your feedback,
    -ashu

    Posted by ashu on July 25, 2008 at 05:09 PM PDT #

    This is a topic I'm very interested in as well. Has any progress been made in determining whether or not this will be an offering? Is it on the roadmap and, if so, what is the forecasted timeframe for availability? Failover of domains would be just as beneficial as failover of containers.

    Posted by Jeff on September 08, 2008 at 02:26 PM PDT #

    Folks,

    To anyone who reads Jeff's comments above and wonders where is the response..... Since Jeff's questions related to product roadmaps and timelines, we are discussing this privately with some SUN folks in the loop.

    So, no... we don't ignore user comments here on the cluster Oasis!! :-) :-)

    Regards,
    -ashu

    Posted by ashu on September 09, 2008 at 03:20 PM PDT #

    Just for clarification: I have a single T6320 with two guest domains and one I/O domain. Is it supported for me to map the same physical device(s) from each guest LDOM? Or is this the fencing issue? This will be the only cluster on this box and the two guest LDOMs will be the only servers sharing the back-end disk.
    Thanks,
    Kyle

    Posted by Kyle on September 15, 2008 at 08:52 AM PDT #

    Hi Kyle,

    If you are running the SC software inside the two guest LDoms, indeed, there is a restriction on this configuration currently. You cannot share the same LUN across 2 guest domains on the same box.

    You should check out the SC release notes at http://wikis.sun.com/display/SunCluster/Sun+Cluster+3.2+2-08+Release+Notes#SunCluster3.22-08ReleaseNotes-optguestdomain

    On section entitled "SPARC: Guidelines for Logical Domains in a Cluster", second bullet where it talks about "Fencing", which explicitly talks about this not being supported as currently, there is no supported way to disable fencing.

    Regards,
    -ashu

    Posted by ashu on September 15, 2008 at 01:14 PM PDT #

    Hi, I am new to LDOMs but have 3 T5440 servers and have worked out how to setup ldoms. My problem is I am also using HP IVM on IA64 machines which is the HP virtualisation technology. With IVM I can setup 2 virtual machines and a virtual shared device on one IVM server, I change the parameters on the virtual file I created telling it that it is sharable then I can mount it on both IVM. Then I install Serviceguard on both virtual nodes and bobs yer uncle I can configure a completely vitual cluster that I can install OVO on, this all works just like a real cluster. Is this going to be possible with LDOMS and SC in future? or is it possible now? I have the latest SC 3.2 bits, however I can not work out how to mount a virtual disk file on both LDOMS. So it appears that LDOM and SC does not work the same way as HP IVM. Am I correct in my assumption or am I just missing some commands to share the virtual file.

    Cheers

    Steve

    Posted by Steve Luther on July 24, 2009 at 07:39 AM PDT #

    Hi Steve,

    Looks like the basic issue you have here is about sharing a virtual disk from multiple LDoms, on different servers. First thing to note is that with SC, the only type of virtual disks supported as shared storage are complete LUNs (so no file based virtual disks, for example).

    So basically you use the "ldm add-vds" and "ldm add-vdsdev" commands on the control domains on all servers to share the same disk (on shared storage) to guest LDoms on multiple servers. These guest domains would be running SC software.

    Did that help or were you looking for specific syntax for the ldm command? Check out the man page of the ldm command which is very well written.

    HTH,
    -ashu

    Posted by ashu on July 24, 2009 at 11:56 AM PDT #

    Hi Ashu,

    Thanks for replying, It did sort of clear the matter of using virtual files as a shared up but it is not really what I wanted to here. No I don't need the actual syntax for it I have already done that bit. So it appears that the SUN LDOM stuff is not as good as the HP IVM stuff, do you know if this wil lbe changed in the future? Sun will have a really good virtualisation solution if it worked similarly to HPs IVM stuff by allowing a shared virtual file rather than having to mount a LUN which takes away the complete virtualisation solution.

    One more question, I have created a 40gb file which I mounted on an LDOM and installed with my jumpstart server, this all works fine. is it possible for me to unmount that 40gb file copy it and mount it on another LDOM and get it working just by changing the hostname and IP address once it is booted? I can do this with HP IVM virtual files. If this is possible it will certainly be an advantage. If you haven't already guessed I am an IVM administrator for HP Openview, but as I use both IVM and LDOM every day it would be nice if they were both capable of doing similar things.

    Cheers

    Steve

    Posted by Steve Luther on July 25, 2009 at 01:50 AM PDT #

    Hi Steve,

    On the second question, yes certainly you can copy the image file and reuse it after changing the host information. As a matter of fact, if your original image was on a ZFS zvol you could use the zfs clone operation to do this even better (avoiding copying the blocks which would be the same in two images, which would be a LOT of blocks given that we are talking about OS image).

    On the first question, restriction on the kind of shared devices comes from the Cluster requirement to fence the shared devices. With SC32U2, you can try disabling fencing on the device (using the cldevice command, option default_fencing). That should work. If not, just holler.

    And once your evaluation finishes, it would be great to hear about your experience.

    Hope that helps,
    -ashu

    Posted by ashu on July 27, 2009 at 02:27 PM PDT #

    Cheers Ashu,

    I will look into it and keep you updated as to my progress.

    Cheers
    Steve

    Posted by Steve Luther on July 28, 2009 at 01:38 AM PDT #

    Hi Ashu,

    One more quick question, the mkfile command creates files with -rw-------T file permissions, what is the T for and how can I get it on to the copied virtual file.

    Cheers

    Steve

    Posted by Steve Luther on August 04, 2009 at 02:08 AM PDT #

    Hi Steve,

    The sticky bit (T) is turned on by the mkfile for historical reasons. Briefly: This is because mkfile was most often used to create temporary swap files. Turning on the sticky bit on such files was reasonable because it meant that the OS would try to keep pages from the swapfile into memory. With the current Solaris implementation of the virtual memory subsystem, this is no longer necessary or useful. mkfile, of course, is used now a days for many varied usage. I don't believe Solaris virtual memory implementation even looks at this flag anymore, but i did not actually double check that.

    So, i would say don't worry about it. But if you must, you can always do "chmod +t " on the copied file. Since the execute permission for others is off, this is the magic combination to turn on this flag.

    HTH,
    -ashu

    Posted by ashu on August 04, 2009 at 09:15 AM PDT #

    Hi Ashu,

    I recently setup an environment with Sun Cluster in Logical Domains and it works well. The stage it failed at complete storage LUN failure on one cluster node under Logical Domain. Or in other way, all LUN path failed. Logical Domain just hangs and does nothing? Is this because of the above I/O control domain limitation? I wanted it to reboot immediately it finds all paths to the vdisk are failed, and once rebooted, move all services back to other node. Currently I have to manually reboot the box, although services move well to the other node.
    Appreciate your early responde.

    Thanks
    Vijay

    Posted by vijay upreti on August 05, 2009 at 11:33 AM PDT #

    Hi Vijay,

    Handling of storage failures by SC should be no different between regular Solaris nodes and LDoms.

    I presume you have looked at scdpm(1M) man page and played with the settings? What did it report for the failed disk from different nodes?

    Regards,
    -ashu

    Posted by ashu on August 05, 2009 at 12:04 PM PDT #

    Thanks Ashu

    Steve

    Posted by Steve Luther on August 07, 2009 at 04:10 AM PDT #

    Hi Ashu,

    Nope, I didn't play around with any settings. And what I can see is that with the LUN path failures, I can see the Logical Domain itself hangs, without cluster. So looks like some other weired Ldom issue?

    Thanks
    Vijay

    Posted by Vijay Upreti on September 16, 2009 at 07:18 AM PDT #

    Hi Vijay,

    I believe the default behaviour for vdisks is to wait forever for the I/O to resume. This would show up as "hang" if the failed vdisk happened to host guest domain OS. Try specifying "timeout=30" to the add-vdisk command. That would mean the I/O would return with EIO after 30 seconds if the service domain is down. If the vdisk hosted the OS, you are still hosed, Solaris cannot continue and would most likely panic, but at least you get something tangible (a panic!) to look at, instead of sitting around wondering what is going on.

    So much for the virtual disk behaviour itself. Things get even more interesting if we are talking about a cluster and failures of shared disks. Here the SC's disk path monitoring framework comes into play. Once you have played with the vdisk timeouts a bit and understand the behaviour, i encourage you to experiment with cluster disk path monitoring. Take a look at "man -M /usr/cluster/man/ scdpm" to get an idea of what you can do with the feature.

    HTH,
    -ashu

    Posted by ashu on September 16, 2009 at 03:12 PM PDT #

    Post a Comment:
    • HTML Syntax: NOT allowed
    « Introduction to PxFS... | Main | Solaris Cluster at... »