Opensolaris HPC Developer Edition 1.0 is just around the corner - and while the image is supposed to work mostly with VMware and Sun desktop virtualization products (VMware Player, VMware Workstation, xVM VirtualBox), it should not be a problem to get it running using any hypervisor that supports vmdk images.

 I am lucky enough to have access to test releases (probably thanks to fact, that I am working on one of the components) and I have found small problems that can make an attempt to run the HPC Developer edition a failure. The problems are mostly related to subtle differences between virtualization products which are reflected to guest operating system (HDD controller type, NIC type etc.) and a little bit to OpenSolaris itself. Luckily, it is not that hard to get it fixed if you know where to look and what to do.

So if you have downloaded the Opensolaris HPC Developer Edition, have vmdk image lying on your hard drive and want to get it running with VirtualBox, try to follow next paragraphs (I do not guarantee it will work for you, but it did for me ;) - BTW, I assume that you know the VirtualBox basics).

First, you have to create new virtual machine profile:

  • set operating system to Solaris - Opensolaris 64bit (this is quite important as 32bit Opensolaris can have performance issues with VirtualBox)
  • set memory to 2048 MB (it should work with less, but if you can spare that, use more)
  • set hard disk to existing file (naturaly to HPC Developer Edition vmdk file)

Once the wizard is finished, continue to fine-tune the virtual machine properties:

  • use 32 MB for videomemory (it will give you option to run in seamless mode)
  • make sure that you have enabled VT-x/AMD-V extensions (make sure that your CPU supports it and that you have enabled this feature in BIOS)
  • preferably, use Intel PRO 1000/MT (Desktop) as virtual NIC, and use NAT
  • and definitely, change the HDD controller type to SCSI(Lsilogic) (you will need to check "Enable Additional Controller" first)

Once you prepared the virtual machine profile, cross your fingers and start the machine (just kidding, do not worry it will work) - just be sure that you select 64bit version of Opensolaris to boot.

 If everything goes well, you should get to typical screen with graphical login - but it may happen (VirtualBox+OpenSolaris issue) that service responsible for that fails to start and you will see booting progressbar "progressing" forever. In case, that booting takes you unreasonable long time, try to hit "Enter" (while focus is held by VirtualBox) - it should switch you to text login prompt. Just login as "hpcuser/hcpuser" and perform following command:

pfexec svcadm restart gdm

After a while, the screen with graphical login will appear - just login and wait till system is fully up.

 The easier part is over, it's time for the funny ride. It is impossible to tell exactly what everything can go wrong (hopefully, nothing and you will not have to fix anything) but I faced these 2 problems (both related to network setup) and both of them can appear regardless you run the HPC Developer Edition in VMware, VirtualBox or whatever:

  • the zones (node1, node2) are not reachable
  • the ip addresses assigned to zones (node1, node2) are not available

The first problem can occur because we changed the virtual NIC (Opensolaris in HPC Developer Edition found out that MAC address has changed, and plugged in new network interface with different name.

There is a small chance that the problem will not appear in final realease, if the distribution will based on OVF format that will carry information about the virtual NIC MAC address. Anyway, it still can happen that the MAC specified in OVF file is taken in your system, and you will need to fix it.

There is an easy solution, though:

  • switch to superuser (pfexec su - )
  • halt zones (zoneadm -z1 node halt; zoneadm -z node2 halt)
  • check the name of your interface (ifconfig -a)
    • most probably, you will have interface named "e1000g1" (you should see "e1000g1", "e1000g1:1", "e1000g1:2", "e1000g1:3" amongst listed interfaces). if no, I hope you are able to figure out what it your actual NIC name
  • go to /etc/zones
  • edit both files node1.xml and node2.xml
    • search for string "e1000g0" and replace it with "e1000g1" (or the name of your NIC)
  • voila, you can boot the zones again (zoneadm -z node1 boot, zoneadm -z node2 boot)
Give the system a little time, and you will be able to check that zones are reachable (using ping, for example).

The second problem occurs because we have chosen to use NAT as network type (but it can happen for bridged networking as well) - the virtual network (inside the HPC Developer Edition virtual machine) is exposed to host network. And it can happen, that settings used in virtual machine collide with settings of your host network (most probably the virtual subnet matches your host subnet and thanks to it, the IP addresses assigned to zones might be already taken).

As in the first case, there is a small chance that the final release will not suffer from this problem - possible solution would be to use crossbow technology inside the HPC Developer Edition and do not rely on physical virtual interface. The problem is, that crossbow has small issues that most probably will not be resolved till the oficial relase of HPC Developer Edition. But maybe in relase 1.1 .. :)

The solution can look like this:

  • the image is using following IP addresses
    • 192.168.137.77 - for zone with name node2
    • 192.168.137.76 - for zone with name node1
    • 192.168.137.75 - for global zone with name hpcdistro
  • the subnet is specified by virtualization tool, you can find out the value by using 'ifconfig -a' for example. In case of VirtualBox, it will most probably be "ffffff00" or "255.255.255.0"
    • if your host is sitting on the same subnet, or you run another virtual machine with the same virtualization tool that is sharing the network (for example, you have another virtual machine running in VirtualBox that is using NAT network) AND you use one of the above IP addresses above in your host network or in that another virtual machine, read on. Otherwise you should be safe.
  • Ok, let's assume you have colliding IP address for node2. The easiest solution is to pick unused IP address, and update the network configuration in image:
    • switch to super user (pfexec su - )
    • halt the zone node2 (zoneadm -z node2 halt)
    • edit the file /etc/zones/node2.xml
      • search for string "192.168.137.77" and replace it with new IP (for example, "192.168.137.79")
    • edit file /etc/hosts
      • search for string "192.168.137.77" and replace it with new IP that you used above
    • boot up zone node2 (zoneadm -z node2 boot)
    • login to zone node1 (zlogin node1)
    • edit file /etc/hosts
      • search for string "192.168.137.77" and replace it with new IP that you used above
    • logout (exit :-D)
    • login to zone node2 (zlogin node2)
    • edit file /etc/hosts
      • search for string "192.168.137.77" and replace it with new IP that you used above
    • logout (exit)

That's it. Check that you have IP addresses updated (ifconfig -a) and that the zone is reachable using the new IP (using ping IPADDRESS and also ping node2). To check if the basic stuff works, you can check if both execution daemons for SGE are up and running:

source /opt/sge/default/common/settings.sh
qstat -f
queuename                      qtype resv/used/tot. load_avg arch   states
---------------------------------------------------------------------------------
all.q@node1                    BIP   0/0/1          0.25     sol-amd64
---------------------------------------------------------------------------------
all.q@node2                    BIP   0/0/1          0.21     sol-amd64

That's it. You can try the other features now :)

PS: The above how-to was tested (performed) on three platforms (to check that it is really working). They are:

  • Sun XFire 2200 M2 (2 x AMD dual core Opteron, 4 GB ram) running OpenSolaris 2009.06 and VirtualBox 2.2.4
  • Asus M50V (Intel Core2 Duo, 4 GB ram) running Windows Vista 64bit and VirtualBox 3.0beta1
  • Toshiba Portege R600 (Intel Core2 Duo,  3 GB ram) running OpenSolaris 2009.06 and VirtualBox 2.2.4
Comments:

You need to change the gdm restart command above to:

pfexec gdm restart

that should make the login screen appear correctly..
.
Barton & Bogdan

Posted by Barton Fiske on June 23, 2009 at 11:53 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by Michal Bachorik