« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Blog::Navigation

Blog::Editing

Bookmarks::Blogroll

Blog::Referers

Today's Page Hits: 12

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.

Powered by Roller Weblogger.
20060920 Wednesday September 20, 2006
Multi-OS, Multi-platform netbooting

As part of my day job, I run a small lab of machines for my team. We've got 15 Sun x86 servers, split about evenly between v60z and v20x machines. Plus, a bunch of Sparc workstations.

One of the important features of this lab is the ability to punch a great big RESET button, and get a fresh OS install on a machine. So there is a network boot and install server, holding install images of the current versions of Solaris NV, 10, and 9; both SPARC and x86 versions. Plus a rudimentary bit of logic to have a user select which image to install, and a bit of minimal configuration.

Maintaining this takes a surprising amount of fiddling. SPARC network boot I had down cold a long time ago, that part gives me few headaches. The main thing that needs done there is making sure the OS images are kept up to date. The area that gives me trouble now are the x86 machines.

For the SPARC machines, I've got pretty much just one generic set of configuration for all machines. About all I need to keep track of is which OS version to install on which machine, and I'm done.

The simple problem with x86 network boot and install is that it's in flux, and each machine behaves somewhat differently. A good example comes in the differences between the v60z and v20x machines. I have no access to any of the graphical consoles, and I sit 140 miles away from this lab. So everything is done remote, and mostly through CLI tools, and access to serial consoles. Well, the serial console on the v60z is wired to the second serial port (ttyb, or COM2). The v20z's are a more complex story. From the OS persective, the serial port is the first one (ttya, or COM1). From my perspective, they are a pain to get to, since I have to ssh into the v20z service processor, and issue a command there. In order to have x86 machines boot, ignore the graphical console, and use the appropiate serial port requires different configuration for the v20z and the v60x, and a slightly different process.

Next, how Solaris itself boots has been in flux. Starting with Solaris10 U1, there was a shift from the old way of booting to using GRUB. I think that this is a great thing, but it did mean that I had to spend a bit of time figuring out how to configure both boot systems, and write a script to set one or the other up on demand. Plus, deal with Solaris 9's quirks in that regard - an effort I've pretty much abandoned. Everyone - for doing any serious work on x86 machines, don't use Solaris 9. Solaris 10 U1 or later works a lot better.

Lately, I took on a new challenge - figure out how to net boot and install Red Hat Enterprise Linux. To my surprise, it wasn't that hard - this Sun Blueprint let me through what I needed to configure.

The hardest part, really, was dealing with Red Hat's kickstart configuration files. RedHat doesn't seem to have a way where I can provide a configuration template, and tell it 'Yes, this boot is doing DHCP for installing, use the address assigned via DHCP as your static address once done' - an idiom that Solaris easily gets right.

My last challenge - I noticed that Solaris 10, Solaris NV, and Red Hat can all boot via grub, and yes! I can have one grub menu that boots any of those OSes. Only ... because there are differences between the v20z, v60x, and VMWare - I have to have different boot templates for each of them in the menu.lst. Grr.

Here's a incomplete recipie to do a netboot of a random x86 machine. There may well be pieces missing, don't complain to me if you try exactly what I put here and it doesn't work.

First, there's a magic DHCP macro that needs defining:

Also on the boot server, there needs to be a file boot/grub/menu.lst available through TFTP.

Contents of menu.lst:


default=0
timeout=30
#
# Solaris 10
title Solaris_10 Jumpstart
kernel /I86PC.Solaris_10-2/multiboot kernel/unix - install -B install_config=192.0.2.1:/export/home/jumpstart,sysid_config=192.0.2.1/export/home/jumpstart,install_media=192.0.2.1:/export/s10u2/combined.s10x_u2wos/latest
module /I86PC.Solaris_10-2/x86.miniroot
#
# Solaris NV
title Solaris_11 Jumpstart
kernel /I86PC.Solaris_11-1/multiboot kernel/unix - install -B install_config=192.0.2.1:/export/home/jumpstart,sysid_config=192.0.2.1/export/home/jumpstart,install_media=192.0.2.1:/space/nv/combined.nvx_wos/latest
module /I86PC.Solaris_11-1/x86.miniroot
#
# Red Hat Linux
title Red Hat Linux 4 Update 3 - 32bit
kernel /rhel4-u3-i386/vmlinuz ksdevice=eth0 ks=nfs:192.0.2.1:/export/redhat/kickstart/ks.cfg load_ramdisk=1 network
initrd /rhel4-u3-i386/initrd.img
#
# Red Hat Linux
title Red Hat Linux 4 Update 3 - 32bit v60x
kernel /rhel4-u3-i386/vmlinuz ksdevice=eth0 ks=nfs:192.0.2.1:/export/redhat/kickstart/v60x.cfg load_ramdisk=1 network console=ttyS1,9600n8
initrd /rhel4-u3-i386/initrd.img
#
# Red Hat Linux
title Red Hat Linux 4 Update 3 - 64bit
kernel /rhel4-u3-x86_64/vmlinuz ksdevice=eth0 ks=nfs:192.0.2.1:/export/redhat/kickstart/ks64.cfg load_ramdisk=1 network
initrd /rhel4-u3-x86_64/initrd.img
#
# Red Hat Linux
title Red Hat Linux 4 Update 3 - 64bit v20z
kernel /rhel4-u3-x86_64/vmlinuz ksdevice=eth0 ks=nfs:192.0.2.1:/export/redhat/kickstart/v20z.cfg load_ramdisk=1 network console=ttyS0,9600n8
initrd /rhel4-u3-x86_64/initrd.img

Here, you can see that I have Solaris 10, Solaris NV, and RHEL 4 all booting from one file.

20060712 Wednesday July 12, 2006
Parallels and VMware

Lately, I've had an opportunity to try out both Parallels desktop, and VMWare server. Herin are a few notes and comments on the two.

My goal with virtualization software is to create an easily rebuildable, easily disposable, easily replicatable lab environment. I frequently have a need for a Solaris environment where I can load up various servers from the Java Enterprise System stack, and muck with them, or show others how to make them work. So, I want to see how well Solaris 10 works (or doesn't) in these environments. I gather that in this regard, I'm an atypical user - many seem to want to run a consumer-oriented operating system made by a Seattle based company in these virtual machines. This doesn't interest me at all.

I have two environments where I'm trying out these pieces of software:

1) W2100z running the Linux version of Java Desktop System 3.
2) MacBook Pro running Mac OS X 10.4.7.

There's quite a few things different between the platforms - Both Parallels and VMware run on the W2100z. VMware doesn't have a version out yet for the Mac, so Parallels is the only game in town. The 2100z has 4GB of ram, the MacBook Pro only 2GB. The 2100z has two AMD Opteron processors, and the MacBook Pro has a dual-core processor.

However, Parallels runs pretty much the same on both platforms, and a virtual machine created on the Mac runs just fine on the W2100z. And, both VMware and Parallels are clearly pieces of software targeting the same problem in the same way, and are very comparable in function and features. Solaris works in both, although somewhat differently.

VMware notes:

* There are few differences that I could note between VMware workstation and VMware server. VMware server is free, and seems to do all I want (create/change virtual machine settings, networking, etc.) Plus, the free VMware player is a bonus I like - this means I can create a virtual machine running Solaris, and hand it out along with the player for people to use. (Potentially very useful for teaching labs.)
* VMware has a BIOS that runs on boot - and oh boy, you have to be quick to catch it while it is running and change settings.
* Solaris doesn't seem to need any extra drivers - either VMware emulates common hardware Solaris has drivers for, or the VMware drivers are built-in.
* Audio works.
* X works, no problems - using the XSun server. Haven't tried XOrg yet.

Parallels notes:

* My general impression is that Parallels works fine, but just doesn't have the level of finish that VMware does.
* Parallels doesn't have a BIOS where you can change settings - boot sequence is changed through the virtual machine settings.
* Parallels provides a network driver for Solaris.
* Video seems to work just fine.
* Audio doesn't work. Parallels provides an emulated audio device, but Solaris doesn't seem to recognize it.
* X works, but you need to use the XOrg server, and configure it to use PS2 mouse/keyboard. Otherwise, you start up X and nothing happens! Fun.
* Parallels can't handle running on a machine with 4GB RAM or more. I had to tweak the BIOS on the W2100z to not use all the physical memory so that Parallels could start up.

General Solaris notes:

* Solaris doesn't like suspend and resume. At least, every time I've tried it, both parallels and VMware, it just freezes up afterwards.
* Solaris isn't very comfortable configuring itself through DHCP.

Conclusion:

VMware, even the free Player, is a much better virtualization solution than Parallels. There's nothing particularly wrong with Parallels, it just doesn't seem to be as mature as VMware. Parallels also got to the Mac first, and is even endorsed by Apple, which makes it the best way to do multiple OSes on Apple Intel hardware. I have no interest in setting up Boot Camp on my MacBook Pro - I've been through setting up a multi-boot machine, and am done with that now, thank you very much.

20051116 Wednesday November 16, 2005
6-way boot I've finally managed to get my v2100z to boot 6 OSes. It can now do:
  1. Windows XP 64
  2. Solaris NV build 27
  3. Solaris 10
  4. JDS 3
  5. RHEL 3
  6. FreeBSD
And, thanks to GRUB now being integrated into Solaris NV, half of those are natively booted from GRUB. The rest use chainloaders.
20050330 Wednesday March 30, 2005
FreeBSD! Che suggested trying to install Plan9 or one of the BSD's on my W2100z. I've tried Plan9, FreeBSD, OpenBSD, and NetBSD, and the only one where the install worked was FreeBSD 5.3 (amd64). I'ts a bit touchy - FreeBSD doesn't always seem to recognize my USB keyboard. And I haven't been able to get X to start yet.
20050321 Monday March 21, 2005
Mild dissappointment OpenDarwin 7.2.1 doesn't install on my Sun Java Workstation W2100z. Not that I expected it would. Any non-mainstream OS like this would have to have drivers for all the hardware, and it looks like OpenDarwin doesn't have drivers for the SCSI bus or disks, since it didn't recongize them as valid install targets. However, I do currently have a bit of a hodepodge of OSes on the W2100z: Solaris 10, Red Hat Enterprise 3, JDS 3, and Windows XP 64. I'd also like to have Solaris 9 on there, which would make a 5 way multi-boot ... I've got one partition reserved for a sixth OS, any suggestions? ;-)
20050204 Friday February 04, 2005
System failures

So, starting last Friday, I had a series of embarrassing system failures on the servers I run in my office, here at the Catnip Coast. I feel they are embarrassing because they all were preventable - and I really should have been thinking better, and recovered more quickly.

Quick background: for a few years now, I've run a DNS/web/mail server at home for my family and friends. This serves a handful of domains I've registered for myself (holyhippie .net, .org, .com; and catnipcoast.com) and one I registered for my college friends (backtable.org). There's about a dozen people with mail accounts on this system, 3 people outside my house use it on a semi-regular basis. My wife uses this as her primary mail account, and I use it as a secondary account.

The first incarnation of this server was on a SparcStation 5 named fnord. He ran the site quite well, until about two years ago, when I got an Ultra 2 (named muscat) to replace it. I had in my office at the time the hardware for the third incarnation (named bocana) a SunFire v120.

Muscat has the ability to have two internal, hot-pluggable hard drives. I had originally set it up so the system was mirrored across both drives; in case of a failure in one, it wouldn't take the whole system down. A few months ago, the bearings on one of the drives started to go, and it developed an unbearable high-pitched whine. Rather than live with it whining at me, I just pulled the drive, and let muscat run on just one.

The first event: Friday, a bit before 11AM, a power failure happens. The first thing I notice is that a bunch of things on my desk suddenly shut off, but my laptop stays on. I think "Power to the house is out". It takes a couple of seconds to realize that was not the case, and instead it was the UPS (Uninterruptible Power Supply; a battery for computers, so that if the power fails, the computer won't.)

Now, the effect of the UPS failing was to take down exactly the machines I needed to be up: muscat, the DSL router, my wireless, my monitor ... many things are now dead in the water, and I start scrambling to re-plug everything in.

Once I have things plugged in, I try to bring muscat back up. He won't boot. The failure is odd and cryptic. It seems to me like the software doing the mirroring of the root drive is having a brain fart. Much cursing on my part ensues; see, I don't have a CD-ROM for muscat now (he only can use SCSI), and I don't have on hand another way to boot him to where I can start recovering.

So, I start shuffling things around, and making sure that things won't go astray. Bocana was sitting there, waiting to be configured in muscat's place, so I start setting bocana up to take over muscat's job.

Saturday, I go on a quest to the closest Fry's (a 40 minute drive away) to find a SCSI CD-ROM. I figure there is no other retail place in the area likely to carry such a thing; and I'm surprised to find that Fry's doesn't. While I'm there, I have a realization: muscat's hard drives probably will plug into bocana. I also pick up a new UPS for home. I figured that the battery for the old UPS might be replaceable, but right now it was less hassle (and not that expensive) to just get a new one.

Saturday evening, when I get home, I try it out. Sure enough, I can take the drives out of muscat and plug them into bocana. I do a bit of fiddling, but I can't figure out how to repair the drive to make it bootable again. Oh well - at least I can get the data off the drive, and I do just that. I put a bit of more work in, and bocana now can do most all the DNS, web, DHCP and firewall stuff that muscat used to do. I'm happy at the progress, so I leave the LDAP and mail stuff for later.

Sunday, it's time to do projects around the house. Up for today, replacing the dimmer in the dining room.
This means that I have to turn off breakers until I find the right one. It turns out, that my office is on the same breaker as the dining room overhead light. The new UPS is plugged in, but nothing is plugged into it yet.

When I get back to my office after finishing work, I try to boot bocana. It won't boot. The failure is odd and cryptic - and seems like the software doing the mirroring of the root drive is having a brain fart. This time, the cursing is at about twice the volume as before.

I now have two dead systems, no way to boot either (bocana didn't have a CD-ROM either), and no other systems I can plug the hard drives into.

I spend a lot of time Sunday night packing up bocana and muscat, and getting ready to drive to the office - a mere 2.5 hour drive. I leave monday, 7:30AM.

Once I get to the office, tons of stuff is happening. I'm running to various meetings, and stealing time in between to work on bocana. I find a CD-ROM in another system in my lab that will go into bocana - so at least I can boot it now. I still can't figure out how to recover the drives. However, since bocana's data was mirrored, I can experiment on one half of the mirror without risking everything. I work on this until 7PM, then drive home. Meanwhile, Valkyrie has called a couple of times, politely complaining that she can't check her email.

Sometime around now, I have a realization - I've been trying to solve this problem the wrong way. I've been trying to get the mirroring fixed, and haven't been able to. What I can do instead is tell the system to ignore the mirroring, and treat the disk partition that has a half of the mirror as if were just a normal filesystem.

Things start falling together once I've realized this. I can now boot bocana off of one of his old drives, without mirroring. I'm back to where I was Sunday morning - and feverishly work on finishing the job of moving all of the services that muscat used to provide to bocana. It takes a while, but around 3AM Tuesday, everything seems finished and working.

The embarrassing part - both of the key realizations I had to fixing things (that I could put muscat's drives in bocana, and how to turn off mirroring) are things I have known for a while, and done before. I should have realized them right off, and been able to resurrect muscat on Friday, without having to spend many hours on the road and more hacking away.

Regardless, things are fixed now, and all should be working fine. In the process, I got to get familiar with Solaris 10 - and there's some really cool new stuff in it.

Copyright (C) 2003, Capitan Holy Hippie's ramblings