Sun Blog Brad Beadles

Thursday Jul 12, 2007

Knowing that I may be a little rusty, as I've been away from doing the day-to-day work of systems administration, I decided to setup a T2000 with LDOMS.  So how rusty am I - well rustier than I thought.  It reminded me of laying off from working out for a while and then wake up and decide to go out and run a mile or two and then hit the weights.  At first, it wasn't so bad until you realize that things aren't going the way they used to when you were doing this regularly.  It was still fun.  And at the end, you flex up and say - "Oh Yea, I've still got it" as if you've just defeated some worldly challenge.  Either way, you do feel good about accomplishing something.  I almost forgot this triumphant feeling.  Wow, it felt good.  Just asked my wife when I came downstairs from the office at 1:30am boasting about defeating this T2000 and LDOMS.

Let's take a look at this not so big a challenge - if you're not rusty.  First, we'll start with an overview of the things you need to do and then talk about the the learning experiences that came from each of the major tasks in getting LDOMs up and running on the T2000.

Overview of Tasks:
  • Load Solaris 10 (11/06) on the T2000
  • Add the required OS Patches & Packages
  • Create A zpool & zfs filesystem for disk image files for guest domains and etc.
  • Update the T2000 System Controller Firmware
  • Setup the Control Domain
  • Create a Guest Domain
  • Jumpstart the Guest Domain
  • Sing "We Are the Champions"
Load Solaris 10 (11/06) on the T2000:

    Stuff you will need:
  • Here's where you can get LDOM Documentation.  Please read both Doc's first.
  • Here's where you can get the Beginner's Guide to LDOMS.  A good first read.
  • Solaris 10 (11/06) media or Jumpstart server with S10 (11/06).
  • LDOM Packages LDOMS_Manager-1_0-RR.zip.
  • Patches:
    • Solaris Kernal Patch 118833-36
    • LDOM Patch 124921-02
    • Solaris Patch 125043-01
    • System Firmware Patch - I used 126399-01 for a T2000
    • I also recommend the 10_Recommended Patch bundle which will take care of your Solaris patches.  Double check w/showrev -p
  • Here's where you can download LDOM 1.0
  • Here's where you can download Patches
  • Here's a good starting point for LDOM Reference materials.
I loaded Solaris 10 on the T2000 using a DVD.  I needed to get to the OK prompt so that I could boot from the DVD drive.  No problem, I hooked up the T2000 to an already installed Termincal Concentrator port 6.  Then I telnet'd to the ip address of the terminal concentrator's specific port and logged into the the system conroller port of the T2000.  I decided now is the time to go ahead and setup the system console to work over the network

bb@hippo:~ >telnet 5006

Please login: admin
Please Enter password: admin

sc> setupsc
  NOTE:  Here I answered the prompts to enable networking and gave the sc an IP address an gateway address and changed the prompt.
sc> resetsc  
NOTE:  You need to reset the system controller for the changes to take affect.

arakeen-sc>showplatform 
NOTE:  Displays platform details and status

arakeen-sc>showhost 
NOTE: Shows flash firmware versions
Host flash versions:
   Hypervisor 1.4.1 2007/04/02 16:37
   OBP 4.26.1 2007/04/02 16:26
   POST 4.26.0 2007/03/26 16:45

arakeen-sc>showfaults 
NOTE: I did this because I had fault lights on the systems front panel.

arakeen-sc>clearfault  
NOTE:  I cleared the fault plugged in second powersupply.  UUID is the id for the faild component.

arakeen-sc clearasrdb 
NOTE:  cleared blacklisted asr db (automatic system reboot/recovery)

arakeen-sc>  
NOTE: Hit control key and right bracket to get out of the telnet session to the terminal concentrator.

bb@hippo:~ >ssh arakeen-sc  NOTE: Decided to use the network port to get to the console via the system controller - less choppy.

arakeen-sc>console -f  NOTE: This get you to the OK prompt if not booted (console for the T2000).  -f is to force write mode.

ok> boot cdrom  NOTE:  Boot from cdrom (okay it really is a DVD drive but cdrom is the alias to point to the DVD device) so that you can load the OS

The T2000 boots up from the DVD and automatically starts the installation process.  Answer the questions via the text based install process.  Note: I had to use   2 for the F2 key due to my terminal emulation setup.

Once installed, I logged into the T2000 and took a look around - everything looked good so off  to the next step.

Add the required OS Patches & Packages:

Once the base OS was loaded, I decided to create a zpool and zfs filesystem to store my downloads of the Patches and Packages.  I also used a zfs filesystem for my jumpstart server in the control domain.  Then I used a seperate zfs filesystem for each of my LDOMS.  The seperate zfs filesystems for each LDOM would allow for me to be able to zfs snapshot and zfs clone a Gold LDOM disk image file such that I could use for easily creating duplicate LDOMS without having to jumpstart.  All I would have to do is a sysconfig for the cloned LDOM.  Now that is really really cool!!!!

Create A zpool & zfs filesystem for disk image files for guest domains and etc:

Here's what I did on the control domain arakeen:

root@arakeen:/> zpool create tank c1t0d0s4

root@arakeen:/> zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank                   44.2G   9.15G   35.1G    20%  ONLINE     -


root@arakeen:/>zfs create tank/Downloads
root@arakeen:/>zfs create tank/jumpstart
root@arakeen:/>zfs create tank/LDOMS
root@arakeen:/>zfs create tank/LDOMS/ldg1

root@arakeen:/>zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
tank                      9.15G  34.4G  28.5K  /tank
tank/Downloads         275M  34.4G   275M  /tank/Downloads
tank/LDOMS            5.00G  34.4G  25.5K  /tank/LDOMS
tank/LDOMS/ldg1       5.00G  34.4G  5.00G  /tank/LDOMS/ldg1
tank/jumpstart        3.88G  34.4G  3.88G  /tank/jumpstart

Now I downloaded the the above Patches and Packages into my /tank/Downloads directory.  Unzipped the stuff.  The plan was to boot into single user so doing my old ways I shutdown and booted into single user mode.

root@arakeen:/> shutdown -i0 -g0 -y
root@arakeen:/>boot -s

All was good - entered root password and was in sinlge user mode.   So I went to cd into /tank/Downloads  OOPS  nothing there!  What in the world?  That's right zfs doesn't get mounted in single user mode.  So I went ahead and hit D to take me back to Multi-User mode and copied the zip files over to /var/tmp.  I then installed all the patches.  NOTE:  I could have probably done a zfs mount tank/Downloads and been fine.  Call me RUSTY.

Okay, maybe here's the time I should talk a little about the T2000 configuration that I'm using.  For one, I was very lucky to find one in a Lab.  So I'm not complaining.   The configuration of the T2000 is 8 cores, 4GB memory, and 1 73GB Disk.  Not a real good candidate to run LDOMS - okay I know that; but it was available and I did learn a bunch of stuff due to the limited memory and disk resources.  And I will highlight them throughout when I get to that point.

Update the T2000 System Controller Firmware:

First of all you need to check out the readme file for this patch (126399-01) for installation instructions.  Basically, there are two ways to flashupgrade your T2000 System controller firmware 1.) From the Solaris console and 2.) Using the flashupdate command on the system controller console.  As you would have guessed it the T2000 I was using would not support the first method of using sysfwdownload utility from the Solaris console.  So I had to do flashupdate via the 2nd method which required an ftp server in which the system controller could access as part of the flashupdate command.  Good thing I already configured the system controller to use the network management port.  The only thing I had to do was setup my Solaris 10 laptop as an ftp server.  That was easy enough just enable ftp via "svcadm enable ftp". 

This was really a non-event.  However, please read the readme's included with the patch.

What about LiveUpgrade?

Here's where I decided - hey wouldn't it be cool - to have a partition (slice) reserved for LiveUpgrade.  I should have done this the firstime - it is the recommended practice for patching and upgrades let alone be a great failsafe alternate boot environment.   Besides, I now wanted to see what would happen if I did a LiveUpgrade to Solaris U4 coming out in the next couple of months.   How would this effect my LDOMS configuration?  I'll save this for another blog.

Okay, to do this I'm going to have to repartition my disk - Bummer.  Call me RUSTY - should have done the best practice layout.  So I went back and re-installed with better disk partitioning to utilize LiveUpgrade.   But wait a minute I my boot cdrom hangs with "Assertion failed: nvlist_lookup_uint64(zhp->zpool_config, "pool_guid", &theguid) == 0, file ../common/libzfs_import.c, line 336, function pool_active".   Now that's nasty.  This took me back a few hours.  NOTE:  This is a bug when installing from DVD with a disk with zfs already on it.  So there was a work around where you break out of the install and then restarted the install.  I wasn't able to get this to work so I just deleted the zpool with the zfs filesystems.  I should have searched sunsolve.sun.com and Googled the error message - I would have saved tons of time on trying to figure out what I was doing wrong.

Here's what my partition table looks like:


root@arakeen:/>prtvtoc /dev/rdsk/c1t0d0s2
    .
  Part      Tag    Flag     Cylinders             Size            Blocks
  0       root        wm       0 -  1648            8.00GB    (1649/0/0)   16780224               Root Filesystem
  1       swap       wu    1649 -  3297         8.00GB    (1649/0/0)   16780224              SWAP SPACE
  2     backup     wm       0 - 14086        68.35GB    (14087/0/0) 143349312            Whole Disk
  3 unassigned    wm    3298 -  4946      8.00GB    (1649/0/0)   16780224              Reserved for LiveUpgrade
  4 unassigned    wm    4947 - 14086  44.35GB    (9140/0/0)   93008640               Zpool space
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0

Note:  You might think 8GB of swap is high for 4GB Ram.  It probably is;  but 4GB of memory isn't alot of memory for running multiple LDOM andl isn't much for 32 cpu's (8 cores x 4 threads).

Setup the Control Domain:

Now were on to the fun stuff - right?

So the first thing we need to do is startup LDOM manager and setup the control domain.  I used the Admin Guide to walk me through the steps which looked like this:

root@arakeen:/>svcadm enable ldmd  Note:  This turns on LDOMs.
root@arakeen:/>ldm  Note: This will show you the many command line options and parameters.  It also will make sure you have your path setup.
root@arakeen:/>ldm ls -l Note:  This will show you all available resources that are available.

Now it is time to setup the default services for the control domain (the name of the control domain is defaulted to Primary):
  • vdiskserver - virtual disk server
  • vswitch - virtual switch service
  • vconscon - virtual console concentrator service
My first attemp, I  setup the control domain resources before I setup the services.  I believe this resulted in some wasted time when I tried to setup services.  So I recommend seting up the services first.  Otherwise you may run into some weird behavior which was an error messages.  The release notes indicated for this errors I got - to restart ldmd (svcadm restart ldmd).  The other thing that happened was I wasn't able to stetup the vdiskserver.  So what I finally ended up doing was to reset back to the factory-default configuration (ldm set-config factory-default).  You will see this later.  Remember, I did this out of order so that is why I had to start over.  The easiest way was to go back to the factory-default configuration for the control domain.

root@arakeen:/>ldm add-vds primary-vds0 primary
root@arakeen:/>ldm add-vcc port-range=5000-5100 primary-vcc0 primary
root@arakeen:/>ldm add-vsw net-dev=e1000g0 primary-vsw0 primary

Lets see what the Primary (control domain looks like now:

root@arakeen:/>ldm list-services primary

Vldc:   primary-vldc0
Vldc:   primary-vldc3
Vds:    primary-vds0
                vdsdev: vol1    device=/tank/LDOMS/ldg1/bootdisk.img
Vcc:    primary-vcc0
                port-range=5000-5100
Vsw:    primary-vsw0
                mac-addr=0:14:4f:f8:92:db
                net-dev=e1000g0
                mode=prog,promisc


Services are setup so now it is time to setup system resources for the control domain.  By default all resources are assigned to the Primary (control domain).  So in order to set guest domains you will need to release some of the resources to be used for other domains.  I really couldn't find any real definitive recommendations for what resources should be.  So here's what I would start out with and why.
  • 4 vcpu's -  This is based on keeping the 4 threads per core aligned.  If you put 2 vcpus (threads) in 2 seperate ldoms then each ldom is shareing the same core.  If possible, it is better to not share a core between multiple ldom to minimize any possible contention.  It all depends on the workload of the ldoms.  You can share cores across ldoms upto 4 ldoms as there are 4 threads per core.  This is how you can get 32 ldoms for an 8 core T2000.
  • 4GB of memory - This is based on conversations and experimentation with different memory settings.  You can go lower than 4GB memory if you are not going to use ZFS in the control domain as virtual storeage for guest domains.  Remember the T2000 that I was using only had 4GB of memory and initially I set it up with 2GB of memory and everything was good until I started jumpstarting my first guest domain where the jumpstart hung in the middle off adding packages to the zfs disk image file.   I increased the control domain to have 3GB and I was able to squeak through the jumpstarting of the guest domain.  Others have indicated that if you are using ZFS you should have at least 4GB of memory in the control domain as well.
So here's setting up Primary's (control domain's) resources:

root@arakeen:/> ldm set-mau 1 primary
root@arakeen:/>ldm set-vcpu 4 primary
root@arakeen:/>ldm set-memory 3G primary

Let's check the config:

root@arakeen:/> ldm ls -l primary

Name:   primary
State:  active
Flags:  transition,control,vio service
OS:    
Util:   1.0%
Uptime: 59m
Vcpu:   4
        vid    pid    util strand
        0      0      2.3%   100%
        1      1      1.0%   100%
        2      2      0.6%   100%
        3      3      0.2%   100%
Mau:    1
        mau cpuset (0, 1, 2, 3)
Memory: 3G
        real-addr        phys-addr        size           
        0x8000000        0x8000000        3G
Vars:   reboot-command=cr ." Ignoring auto-boot? setting for this boot." cr
IO:     pci@780 (bus_a)
        pci@7c0 (bus_b)
Vldc:   primary-vldc0   [num_clients=4]
Vldc:   primary-vldc3   [num_clients=7]
Vds:    primary-vds0    [num_clients=1]
                vdsdev: vol1    device=/tank/LDOMS/ldg1/bootdisk.img
Vcc:    primary-vcc0    [num_clients=1]
                port-range=5000-5100
Vsw:    primary-vsw0    [num_clients=1]
                mac-addr=0:14:4f:f8:92:db
                net-dev=e1000g0
                mode=prog,promisc
Vcons:  S
P

Now it's time to store our config and we do this by:

root@arakeen:/> ldm add-config initial   NOTE:  This saves the configuration on the system controller (ALOM).
root@arakeen:/> ldm ls-config

factory-default {current}
initial [next]

Now reboot and we can move onto setting up our first Domain.

Create Guest Domain:

I needed a boot disk for the guest domain and I've already created a zfs files system above tank/LDOMS/ldg1.  I decide to use a file on top of the the zfs filesystem so that I could create a snapshot and then clone the snapshot and use it as a bootable disk image file for another domain.  I created a 5GB file via:

root@arakeen:/tank/LDOMS/ldg1> makefile 5G bootdisk.img

Here's the command I used to setup my guest domain ldg1:

root@arakeen:/> ldm add-domain ldg1

root@arakeen:/> ldm add-vcpu 8 ldg1

root@arakeen:/> ldm add-memory 396M ldg1  NOTE:  I used 396MB as I only have 4G total and needed 3GB min. for using ZFS in control domain.

root@arakeen:/> ldm add-vnet vnet1 primary-vsw0 ldg1

root@arakeen:/> ldm add-vdiskserverdevice /tank/LDOMS/ldg1/bootdisk.img vol1@primary-vds0

root@arakeen:/> ldm add-vdisk vdisk1 vol1@primary-vds0 ldg1

root@arakeen:/> ldm set-variable auto-boot\?=false ldg1

root@arakeen:/> ldm bind-domain ldg1

root@arakeen:/> ldm start-domain ldg1

root@arakeen:/> telnet localhost 5000

You should now be at the ok prompt just like you would be on a physical system - that's cool.  We need to set up our devaliases so that we can boot off the right devices.  Please refer to the Administration Guide and/or the Beginner's Guide for the details.  I setup a devalias called vdisk1 for my disk and vnet1 for my network then changed my boot-device variable to vdisk1 vnet1.

Jumpstart the Guest Domain:

I used the control domain as a jumpstart server.  I did this by mounting the DVD and running:

root@arakeen:/cdrom/sol_10_1106_sparc/s0/Solaris_10/Tools>setup_install_server

Then I did a few short cuts knowing this isn't the best or recommended way to jumpstart a server.  I bypassed setting up a profile and sysidcfg file etc.  I just wanted to be able to get access to the Solaris bits and create a boot server so I could boot the guest domain and interactively install the bits on the virtual disk.  I know should have taken the time to create a jumpstart server correctly.  Don't stone me!  Now I ran:

root@arakeen:/cdrom/sol_10_1106_sparc/s0/Solaris_10/Tools>add_install_client -e 0:14:4f:fa:b5:48 ldg1 sun4v

This did the necessary stuff for me to be able to:

ok boot vnet1 - install

Oh No!  It never got it's IP address to start the booting process.  What's up with this - I spent alot of time messing around with trying to figure out why rarp wasn't working.  Well, if you remember reading the docs the vswitch is a layer 2 switch and by default the vnet can't communicate with the external network via the physical interface.  Okay that's cool. I'll just plumb up vsw0 in the control domain.  No, it didn't work.  The control domain's physical interface (e1000g0) still couldn't see the broadcast from vsw0.  Long story short I had to unplumb e1000g0 and plumb up vsw0 per the install guide!!!!

Starting to feel good now!!  I'm booting and waiting for first install screen.  So you can tell, I did get booted and answered all the install questions.  And yes, I did a reboot after the install and I could login.  Is it time to sing yet?   Let's try one more thing.  Let's clone the /tank/LDOMS/ldg1 filesystem and create a new Guest domain and boot from the cloned file system.  So here's how it went:

root@arakeen:/>zfs snapshot tank/LDOMS/ldg1@july12-1920

Lets take a look to see what happened.

root@arakeen:/>zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
tank                  19.2G  24.4G  28.5K  /tank
tank/Downloads         275M  24.4G   275M  /tank/Downloads
tank/LDOMS            15.0G  24.4G  28.5K  /tank/LDOMS
tank/LDOMS/ldg1       5.00G  24.4G  5.00G  /tank/LDOMS/ldg1
tank/LDOMS/ldg1@july12-1920      0      -  5.00G  -                           <--- Here's the snap shot
tank/LDOMS/ldg2       33.2M  24.4G  5.00G  /tank/LDOMS/ldg2
tank/jumpstart        3.88G  24.4G  3.88G  /tank/jumpstart

Notice that there is no space used and if I:

root@arakeen:/>cd /tank/LDOMS/ldg1/.zfs/snapshot/july12-1920/
root@arakeen:/tank/LDOMS/ldg1/.zfs/snapshot/july12-1920>ls -l
total 10492946
-rw------T   1 root     root     5368709120 Jun 25 15:33 bootdisk.img
-rwxr-xr-x   1 root     root         646 Jun 25 15:29 fcksum
-rw-r--r--   1 root     root         512 Jun 25 15:30 label.bootdisk.img.070625_153008

Now let's clone it so that I can use it:

root@arakeen:/>zfs clone tank/LDOMS/ldg1@july12-1920 tank/LDOMS/ldg3
root@arakeen:/>zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
tank                  19.2G  24.4G  28.5K  /tank
tank/Downloads         275M  24.4G   275M  /tank/Downloads
tank/LDOMS            15.0G  24.4G  29.5K  /tank/LDOMS
tank/LDOMS/ldg1       5.00G  24.4G  5.00G  /tank/LDOMS/ldg1
tank/LDOMS/ldg1@july12-1920      0      -  5.00G  -
tank/LDOMS/ldg2       33.2M  24.4G  5.00G  /tank/LDOMS/ldg2
tank/LDOMS/ldg4           0  24.4G  5.00G  /tank/LDOMS/ldg3         <----Note now space used!!  Yet!  Once I boot it an change hostname etc.. this will change.
tank/jumpstart        3.88G  24.4G  3.88G  /tank/jumpstart

So now I created a ldg2 guest domain:

root@arakeen:/> ldm add-domain ldg2
root@arakeen:/> ldm add-vcpu 8 ldg2
root@arakeen:/> ldm add-memory 396M ldg1  NOTE:  I used 396MB as I only have 4G total and needed 3GB min. for using ZFS in control domain.
root@arakeen:/> ldm add-vnet vnet1 primary-vsw0 ldg2
root@arakeen:/> ldm add-vdiskserverdevice /tank/LDOMS/ldg2/bootdisk.img vol1@primary-vds0
root@arakeen:/> ldm add-vdisk vdisk1 vol1@primary-vds0 ldg2
root@arakeen:/> ldm set-variable auto-boot\?=false ldg2
root@arakeen:/> ldm bind-domain ldg2
root@arakeen:/> ldm start-domain ldg2

root@arakeen:/> telnet localhost 5000
root@arakeen:/>telnet localhost 5000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "ldg1" in group "ldg1" ....
Press ~? for control options ..

ldg1 console login:


Note that it says ldg1 and not ldg2.  That is because I didn't do a sys-unconfig before I cloned so the new guest domain has the same identity as what I cloned.  So if I want both domains online at once I would just do a sys-unconfig of one of the domains and reboot and answer the identification questions etc.

Also, at this time there is a bug that if you unbind the domain and bind the domain you could loose the disk label of the bootdisk.img (first block inside file) remember it is looks like a physical disk to the guest domain.  The work around is to run fcksum after you unbind and before you bind it again.

NOW WE CAN SING "We Are the Champions"!!!!


Powered by ScribeFire.