Tuesday Sep 15, 2009

I have a little python script that I use for for personal use to do some small OpenSolaris guest installs. I use it for VirtualBox and xVM guests, but it should work fine for metal too assuming you add the correct drivers needed for your system.

It's a simple text based installer. Now, I'm not much of a Python coder, the script is something I play around with in my spare time, it's not finished, and it's not supported by Sun, etc, etc.. :-) But I though some folks would find it useful so I'm sharing it

A 2009.06 based xVM install is something around 380M vs 3G+. b122 is quite a bit bigger due to some dependency bloat. For a xVM guest, boot the 2009.06 iso, login as jack, grab the installer and run it...

: core2[1]#; virt-install -n opensolaris -r 1024 -p --nographics -l /net/192.168.0.71/tank/isos/solaris/os2009.06.iso -f /vdisks/opensolaris 
[CUT]
opensolaris console login: jack
Password: 
Last login: Tue Sep 15 05:19:57 from core2.lan
Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008
jack@opensolaris:~$ wget http://blogs.sun.com/mrj/resource/slim-guest-installer
[CUT]
05:28:04 (69.94 KB/s) - `slim-guest-installer' saved [16096/16096]
jack@opensolaris:~$ chmod a+x slim-guest-installer 
jack@opensolaris:~$ pfexec ./slim-guest-installer

Thanks for choosing to install the OpenSolaris OS! Before you start, review
the Release Notes for this release for a list of known problems. The release
notes can be found at
   http://opensolaris.org/os/project/indiana/resources/relnotes/200906/x86

****
NOTICE: THIS INSTALLER ONLY SUPPORTS INSTALLING TO A WHOLE DISK. ALL DATA
ON THE DISK YOU INSTALL TO WILL BE DESTROYED.
****

Please Select Install Disk

  AVAILABLE DISK SELECTIONS:
	0.  /dev/dsk/c7t0d0p0  21459755520 bytes
Specify disk (enter its number or 'q' to quit): 0

NOTE: ALL DATA ON THIS DISK WILL BE DESTROYED.

Install on /dev/rdsk/c7t0d0p0 (yes or no): yes
Configuring ZFS Root:................ COMPLETE
Installing packages...
DOWNLOAD                                    PKGS       FILES     XFER (MB)
SUNWopenssl                                39/67   6042/8542   57.27/93.32
[CUT]

For VirtualBox, boot the 2009.06 iso, login as jack, grab the installer and run it with an additional option (--profile=vbox-guest) to specify virtualBox packages.

opensolaris console login: jack
Password: 
Last login: Tue Sep 15 05:19:57 from core2.lan
Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008
jack@opensolaris:~$ wget http://blogs.sun.com/mrj/resource/slim-guest-installer
[CUT]
05:28:04 (69.94 KB/s) - `slim-guest-installer' saved [16096/16096]
jack@opensolaris:~$ chmod a+x slim-guest-installer 
jack@opensolaris:~$ pfexec ./slim-guest-installer --profile=vbox-guest

Have Fun!

Friday Apr 10, 2009

First, a couple of answers to some questions from the first post..

James, re: "be able (after a PXE boot) to mount NFS or iSCSI and switch (or layer) root". No. I'm really going for a standalone, extremely stripped down, ramdisk based image right now. No swap. No disk support. This wouldn't be something you would use for a NAS appliance. For that, you would be better off going with a stripped down pkg based opensolaris image. I can get those down to ~350M using a "custom installer"

Benoit, re: "And in a memory usage how light can solaris go?" Good question, lets take a look :-)

When I do get a little free time to play around with this stuff, I generally do it using OpenSolaris xVM domUs since it takes about a second to test then reboot these images.

The last post I did was a i86pc based image.. Here's a domU with the same bits loaded.. We're at ~ 15M for disk usage.

Here's the py file for my domU.. Notice I'm using the dom0's unix (since I'm building the ramdisk based on dom0's bits).

: alpha[1]#; cat /tank/guests/micro/guest.py 
name = "micro"
vcpus = 1
memory = "256"
kernel = "/platform/i86xpv/kernel/unix"
ramdisk = "/tank/guests/micro/ramdisk"
extra = "/platform/i86xpv/kernel/unix"
vif = ['']
on_shutdown = "destroy"
on_reboot = "restart"
on_crash = "preserve"
: alpha[1]#; 

: alpha[1]#; xm create -c /tank/guests/micro/guest.py
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.3.2-rc1-pre-xvm chgset 'Mon Apr 06 20:13:29 2009 -0400 18424:97250633e58b'
SunOS Release 5.11 Version onnv-3.3-mrj 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
strplumb: failed to initialize drv/dld
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk:a             38255   15116   19314    44%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  205012       0  205012     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
#
Now, we are really stripped down (for Solaris)... Lets add in kmdb and enough of mdb to let us do a mdb -K.

I have a very very ugly python script I use to build up my ramdisk...

: alpha[1]#; ./micro.py 
USAGE: ./micro.py cfg disk sizeM
Here's the config file I'm using for my domu after uncommenting out kmdb... Notice I don't even have syscalls in at this point...
: alpha[1]#; cat domu.files
@kmdb32

@kernel32
@i86xpv32
@init32
#@syscall32

#@mount
#@uidcache

#@net32
#/usr/bin/ln

#@devfsadm32
#@basic32
#@ssh32
I just luupgraded my system to b112.. Lets build a new image using that... So how much memory are we using?
: alpha[1]#; ./micro.py domu.files /tank/guests/micro/ramdisk 40
NOTICE: overwriting disk: /tank/guests/micro/ramdisk

: alpha[1]#; xm create -c /tank/guests/micro/guest.py 
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: Invalid iBFT table 0x1
strplumb: failed to initialize drv/dld
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk:a             38255   19246   15184    56%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  204992       0  204992     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
# mdb -K

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                       8922                34   14%
Anon                          140                 0    0%
Exec and libs                  78                 0    0%
Page cache                    550                 2    1%
Free (cachelist)              708                 2    1%
Free (freelist)             53089               207   84%
Balloon                         0                 0    0%

Total                       63487               247
[0]> 
less than 40M... Not great, but not bad either... There some code changes we could do to get things smaller. But it's not big enough where it would matter at this point... Since I'm building a 40M ramdisk, we need space for that too.. So I would need around 80-90M total for this image..

Lets do a quick test, changing the domU's memory to 80M.

< memory = "256"

> memory = "80"

: alpha[1]#; ./micro.py domu.files /tank/guests/micro/ramdisk 40
NOTICE: overwriting disk: /tank/guests/micro/ramdisk
: alpha[1]#; xm create -c /tank/guests/micro/guest.py 
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: Invalid iBFT table 0x1
strplumb: failed to initialize drv/dld
# mdb -K
WARNING: retrying of kmdb allocation of 0x600000 bytes

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                       5766                22   31%
Anon                          140                 0    1%
Exec and libs                   0                 0    0%
Page cache                      1                 0    0%
Free (cachelist)             1259                 4    7%
Free (freelist)             11265                44   61%
Balloon                         0                 0    0%

Total                       18431                71
[0]> 
Interesting.. What about a 64-bit kernel? It's going to bigger of course. But if you need a 64-bit kernel, memory shouldn't be an issue.
: alpha[1]#; xm create -c /tank/guests/micro/guest64.py 
Using config file "/tank/guests/micro/guest64.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
strplumb: failed to initialize drv/dld
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk:a             38255   31187    3243    91%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  195480       0  195480     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
# mdb -K

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      15575                60   25%
Anon                          192                 0    0%
Exec and libs                  93                 0    0%
Page cache                    771                 3    1%
Free (cachelist)             1203                 4    2%
Free (freelist)             45653               178   72%
Balloon                         0                 0    0%

Total                       63487               247
[0]> 
OK, now lets do something interesting... Let bring up networking enough so we can ping, etc. In my domu.files file, I'm going to bring in @syscall32, @mount, @net32, and /usr/bin/ln.

One thing you notice when you start playing with this stuff, is that things can grow very fast when you start pulling in user bins (due to all the libraries which can be pulled in too). You would think things like reboot and poweroff would be a small impact. Not so :-)

I have a custom init bin which configures the system and starts up a shell (no SMF, etc). I'll manually configure it though so you can see what I'm doing to get the system up.

Notice I'm setting up a dev link for the NIC.. We don't have devfsadm in this particular ramdisk. Obviously you would pre-create the link on the ramdisk, or pull in the devfsadm bits.. But I though it was an interesting thing to show.

: alpha[1]#; xm create -c /tank/guests/micro/guest.py 
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: Invalid iBFT table 0x1
# mount -o remount,rw /devices/ramdisk:a /
# /sbin/soconfig -f /etc/sock2path
# ifconfig lo0 plumb 127.0.0.1 netmask 255.255.255.0 up
# cd /dev
# ln -s ../devices/xpvd/xnf@0:xnf0 xnf0
# ifconfig xnf0 plumb 192.168.0.91 netmask 255.255.255.0 up
# route add default 192.168.0.1
add net default: gateway 192.168.0.1
# ping 192.168.0.1
192.168.0.1 is alive
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/devices/ramdisk:a     38255   28442    5988    83%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  198852       0  198852     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
# mdb -K

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      11628                45   18%
Anon                          140                 0    0%
Exec and libs                 129                 0    0%
Page cache                    711                 2    1%
Free (cachelist)             1435                 5    2%
Free (freelist)             49444               193   78%
Balloon                         0                 0    0%

Total                       63487               247
[0]> 

What's next? I'm trying to get VirtualBox (CLI only) running (RDP for external access) :-)

Monday Mar 30, 2009

Congratulations to Intel on their new Intel Xeon 5500 series. It is a truly remarkable CPU, and in my opinion, will be looked back on as a game changer in the industry.

There are features which are easy to see, such as the integrated memory controllers and the Quick Path Interconnects (QPI) (which connect the CPUs and IO bridges to each other).

But, as you would expect, they also continue to improve the guts of the CPUs. For example, with the 5500 series and associated IO chipsets, comes virtualization improvements including better virtualized CPU performance, and the ability to safely passthrough an IO device to a virtualized guest.

With Intel's help, we backported a lot of the new functionality from xen-unstable.hg into OpenSolaris xVM in April of 2008, including the big ones, Extended Page Table (EPT) support and Virtual Processor IDs (VPID). So we've been ready for a while :-)

With the EPT support, you can bypass the shadow page code in the hypervisor for fully virtualized guests. This gives you a nice performance improvement, and has the added benefit of a not having to run a *lot* of complex, and occasionally, buggy code.

I've just scratched the surface of the virtualization improvements in the Intel Xeon 5500 series. For folks who enjoy CPU technology, we live in exciting times.

Monday Jan 22, 2007

What's an iPhone without the phone? My guess, the nextgen iPod with ichat w/ audio, or a skype like app? Seems like the logical progression to me...

Monday Jul 17, 2006

Bringing up Solaris domain 0 (dom0) on Xen was surprisingly easy. Mostly because all of the hard work was already done by other people. The hard work which remained, was also done by other people :-)

I apologize in advance for giving credit to the wrong folks or for taking credit for something I didn't do. This was such a blur, it all tends to blend together...

Obviously, this won't cover everything. I tried to talk about some of the more interesting parts. Well, interesting is relative of course :-)

To start with, first you need to be able build xen on Solaris. You could actually cheat and start with a xen image and skip all the user apps to manage domUs. But that seems kind of pointless unless you have tons of bodies to throw at the effort, which we don't, thankfully.

John L and Dave already had Xen building, so all I had to do was ask them what I needed to do to build it.. The first thing you need are changes to gcc and binutils that's shipped in /usr/sfw. Which is why you need to download unofficially updated SUNWgcc, SUNWgccruntime, and SUNWbinutils packages in order to build the xen sources on Solaris (they will be officially updated at some point in the future).

There were two things that John L fixed. The first one was a bug in how we build gcc (can't find it's own ld scripts). See this bug.

The second fix was to add a -divide to the binutils gas to not treat / as a comment. John got this change back to to binutil cvs repository, but it hasn't made it out in a release yet (as far as I know).

Of course, Dave and John L had to change stuff in the xen.hg gate to get it to compile too. If you look at the source, you'll notice there are a few things we don't try and compile current, e.g. hvm related support. Then, of course, you need to test it to make sure the xen binary worked (user apps would have to wait until Solaris dom0 was up). Not sure if it just worked or they had to debug it, but it was working by the time I got to it :-)

So after I built my xen gate, put xen.gz in /boot (starting with 32-bit dom0), and tried to boot a i86xen (vs i86pc) version of the kernel debugger (kmdb). Again, I was following footsteps here. John L had done a ton of work getting kmdb to work in domU (since we already had Solaris domU running on a Linux dom0). And Todd and/or John L had already debugged kmdb on a Solaris dom0. So I was at kmdb prompt ready to venture into unknown territory.

So before I could boot my Solaris dom0, I had to build one. Up to this point, we only had the driver changes we needed for domU. Before xen, we only had one x86 "platform", i86pc.

This is unlike SPARC, which usually gets a new "platform" or every major architecture change (e.g. sun4m, sun4u, sun4v). On SPARC, you'll also see machine specific platmod's and platform directories to provide additional functionality and modules which are specific to a given machine (e.g. /platform/SUNW,Sun-Fire-880).

For xen (on x86), we have a new "platform", i86xen. For Solaris dom0, we we're missing all of the drivers which were in i86pc (i.e. they did not show up in i86xen). The vast majority of these drivers aren't platform specific and can go into intel, i.e. doesn't have any platform specific code (which today is i86pc and i86xen). So I had to try to move each driver over to intel and see if it had platform specific code or not. Since there was only one intel "platform" in the past, the lines we're a little gray at times. But I finally got through it and ended up moving around 40 drivers in src/uts and a little over 15 in closed/uts, to intel from i86pc. For the rest, I need to create makefile in i86xen to build a platform specific version of these drivers.

Now I had a Solaris dom0 kernel to boot. I setup my cap-eye install kernel, rebooted into kmdb, and :c'd into a new world. The majority of the hard work was already done bringing up domU. The CPU and VM code for domU, done by Tim, Todd, and Joe just worked for domain 0. That made life very simple.

The first problem I ran into was the internal pci config access setup in mlsetup. It was initially shutoff for domU, I had added it back in for dom0. However, this requires a call to the BIOS, which xen doesn't allow. So I changed the code to default to PCI_MECHANISM_1 for i86xen dom0.

From there, the next problem I ran into was ins/outs weren't working.. That was fixed with a HYPERVISOR_physdev_op (PHYSDEVOP_SET_IOPL), which ended up being slightly wrong and fixed by Todd before we released.

Now I was at the point where we are attaching drivers and the drivers are trying to map in their registers. Joe had done a bunch of work in the VM getting the infrastructure ready for foreign PFNs, which are basically PFN's which are tagged to mark then as containing the real MFN, instead of being present in the mfn_list. Since this was the first time trying that code out, I ran into a couple of minor bugs. The more interesting problem was that Xen was using one of the software PTE bits in a debug version of Xen which conflicted with the bit we we're using to mark the page as a foreign. I commented out that feature and rebuilt Xen and continued on while Joe worked on changing the PTE software bits to be encoded instead of individual flags to avoid bit 2 int PTE software field.

I had already changed the code in rootnex to convert the MFN (device register access) to a foreign PFN during ddi_regs_map_setup(). So once the PTE software bits were cleared we were sailing through the driver reading its device registers and on to mapping memory for device DMA.

I had also modified the rootnex dma bind routine. When we're building dma cookies, we need to put MFNs in the cookies instead of PFNs. I had a couple of bugs in that code, fixed that up, then ran into the contig alloc code path. I hadn't coded up the contig alloc code path changes yet (were we want to allocate physically contiguous memory). So I cheated and temporarily took out all the drivers which required contig alloc, and did the contig alloc code at a later time (my boot device didn't need it :-) )

Now I was up to vfs_mountroot(). This is where the Solaris drivers start taking over disk activity and stop using the BIOS to load blocks. This is also where we first start noticing problems if interrupts don't work.

This is where I handed off the Stu :-). This was the last of the hard problems. Stu had been busily working on Solaris dom0 interrupt support. A mix of event channels, pcplusmp, ACPI, and APICs. Something I would never wish on anyone. Stu got it up and working remarkably fast (something he should talk about :-)) and I was back and running up to the console handover.

The console config code is a little bit messy in solaris. I waded through that for a little bit. All of the code was originally in the common intel part of the code. I moved the platform specific code to i86pc and i86xen then have a different implementation in i86xen which basically always sends the Solaris console to the Xen console. Not sure if it will stay that way in the end, but that makes the most sense IMO.

And from there, I was at the multi-user prompt..

Some other interesting problems I ran into during the bringup. I had to have isa fail to attach on a Solaris domU. The ISA leaf drivers assume the device is present and bad things happen. There were a couple places in the kernel where they have hard coded physical address which it tries to map in (e.g. psm_map_phys_new; the lower 1M of memory, used for BIOS tables, etc.; and xsvc used by Xorg/Xsun). And we found out the hard way that Xen's low mem alloc implementation is linux specific. Only allocates memory < 4G && > 2G. We need to redo our first pass at implementing memory constrained allocs.

As far as booting 64-bit Solaris dom0, it booted up the first time.

We'll that enough for now.. I'll save the bringup of domUs on a Solaris dom0 for the next post. That was a little more challenging...