Using Sun tech at Strathclyde Sun@Strathclyde

Saturday Apr 25, 2009

I woke up today to find that one of the disks in my home server had failed overnight. I was actually able to work this out while still in bed shortly after waking up, because I could hear it clicking and whirring pathetically as it tried to spin up - not a nice way to start your day. As I write this I'm in the process of filing an RMA to get the disk replaced, which promises to be a painful, drawn-out process, but hey - at least my data is still safe thanks to ZFS (so long as none of my other disks decide to break - not inconcievable, seeing as they're all identical...).

However, hardware faults aren't always audible, so I was pleased to see that my script for detecting hardware faults and then emailing me had triggered. Here's what I got sent:

-------- Original Message --------
Subject: Hardware failed on zebedee
Date:    Sat, 25 Apr 2009 13:54:02 +0200
From:    lamsey@zebedee

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Apr 25 09:32:07 43d4b6e4-1219-e9d5-bac5-f829b8fb2f2a  ZFS-8000-D3    Major    

Fault class : fault.fs.zfs.device

Description : A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for
              more information.

Response    : No automated response will occur.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

Logging into my server and running zpool status -x showed me which disk was at fault (c4d0), and a bit of searching in the output of prtconf -v allowed me to work out the serial number of the affected disk (more specifically, it allowed me to work out the serial numbers of the disks which were still working, meaning I could work out which physical disk was broken by a process of elimination after cracking the box open).

So, how do I achieve the above? The answer is actually incredibly simple. The content of the email is just the output of fmadm faulty, a command which interrogates Solaris' FMA (Fault Management Architecture) feature to see if there's any hardware issues on a system. Wrap it up in a script (the below is based on one I found on the 'net eons ago and can no longer find), and you end up with something like:

lamsey@zebedee:bin$ cat check_hardware.ksh
#!/bin/ksh
# Public domain. Use as you wish. EMAIL=liam@lamsey.co.uk TMPFILE=/tmp/fmadm.output.$$ # run fmadm and cut away the first two lines (headers) /usr/bin/pfexec /usr/sbin/fmadm faulty | /usr/bin/sed 1,2d > $TMPFILE # Check if the file size is greater than zero. This means we got
# some output from fmadm and therefore some hardware may be bad.
# Using HTML here means we can use <pre> to preserve formatting. if [ -s $TMPFILE ]; then ( /usr/bin/echo "Subject: Hardware failed on `hostname`" /usr/bin/echo "From: lamsey@zebedee" /usr/bin/echo "MIME-Version: 1.0" /usr/bin/echo "Content-Type: text/html" /usr/bin/echo "Content-Disposition: inline" /usr/bin/echo /usr/bin/echo '<pre>' # don't just use the temp file, it's missing headers /usr/bin/pfexec /usr/sbin/fmadm faulty /usr/bin/echo '</pre>' ) | /usr/local/bin/msmtp -a 1and1 $EMAIL fi # clean up the temp file /usr/bin/rm -f $TMPFILE

Simply slap a call to the above script into your crontab, ideally running at least once a day, and you're good to go. Note that I use msmtp for sending emails automatically as it's a heck of a lot easier to configure than sendmail (which is important if you use an ISP like o2 which blocks outgoing SMTP traffic, preventing you from using sendmail in its out-of-the-box configuration). It doesn't come with Solaris though, so you'll need to compile it if you want to do the same (very simple, works fine with configure / make / make install).

Edit (01/5/09): I received the replacement disk today (took them long enough...). Slammed it into the server, issued a quick zpool replace c4d0 command, and all is good with the world again :-)

lamsey@zebedee:~$ zpool status shared
  pool: shared
 state: ONLINE
 scrub: resilver completed after 3h13m with 0 errors on Fri May  1 17:12:23 2009
config:

        NAME        STATE     READ WRITE CKSUM
        shared      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0  217M resilvered
            c3d1    ONLINE       0     0     0  217M resilvered
            c4d0    ONLINE       0     0     0  236G resilvered
            c4d1    ONLINE       0     0     0  217M resilvered
            c6d1    ONLINE       0     0     0  217M resilvered

errors: No known data errors

Friday Apr 17, 2009

OK, first off: I'm an idiot. You'd think that after forgetting to take pictures/videos at my last few tech demos, I'd have managed to remember to do so today during the Sun@Strathclyde OSUM's visit to Linlithgow. Sadly, no - completely forgot. Had the camera in my bag and everything, just completely failed to use it. Epic lose :-(

The day, however, did go pretty well - we were entertained by the manager of Sun's Executive Briefing Centre at Linlithgow, who kindly showed us around the campus, including a tour of Project Blackbox and the cavernous Solution Centre lab. We also, after a brief spell of warbling from myself, enjoyed a great talk from two experienced Sun engineers about science, technology and engineering from a Sun perspective - really interesting stuff, and plenty of useful food for thought for the future.

 All in all, a pretty good day, and a chance for students to get a glimpse of the inner workings of a high-tech computing company. I think we'll have to see about doing this again next year, so don't worry if you missed it!

Tuesday Apr 14, 2009

I was asked today for some info on how to get to Sun's Guillemont Park location by train from Glasgow. This is a journey I've made many times, so I actually know pretty well off the top of my head how to negotiate the national rail network to the area around GMP. With that in mind, here's the info I put together - hopefully it'll be useful to other people at some point.

Here are the rail instructions. Obviously, if you're coming from somewhere other than Glasgow, you'll have a different starting station, but if it's anywhere north of Birmingham the below should still apply.

When you're wanting to check times or book tickets, go to somewhere like http://www.nationalrail.co.uk and search for "from: Glasgow (all stations)" to either "Farnborough (all stations)" or "Camberley". Between those end stations, you'll get all of the three train lines coming into the area around Guillemont Park.

Usually, it'll be a direct line from Glasgow to London Euston (if it's the West Coast main line, usually Virgin) or King's Cross / St. Pancras (if it's the East Coast main line, usually National Express). The east coast route takes longer as it goes via Edinburgh, but National Express have free wifi on their trains whereas Virgin don't.

From either London station, it'll then be a tube transfer to Waterloo. From Euston it's about half a dozen stations on the Northern line, no changes needed. I think you might need one change coming from King's cross, can't remember off the top of my head which lines.

From Waterloo, you'll get a South West Trains service to whichever end station you chose. If it's Farnborough Main, it'll most likely be a direct service. Camberley will most likely need a change at either Ascot or Ash Vale. Farnborough North will involve a change at Reading or Woking, I think - not sure, I've only been on this line once.

There are some other routes to these stations other than London - I've been through Reading and Birmingham before, rather than the big London terminals. I'd go via London for preference, but you might find the website gives you other options.

Once you get to your chosen end station, though, you need to get to Guillemont Park itself. Unfortunately, GMP is roughly equidistant from all the surrounding train stations, meaning you'll need to walk a few miles to get there on foot from the station.

Handily, however, Sun and Nokia run a shuttle bus service between their campuses and most of the surrounding rail stations (with Fleet the notable exception - take note if you're moving down there and don't have transport for getting to GMP!). Here's a Google Map I threw together which shows GMP's location relative to the shuttle bus routes:

The map above can be found here, while the shuttle bus timetables can be found here.

I hope this is of use to someone at some point. Feel free to leave a comment if you want any more info!