I was doing some work and thought, "Wouldn't it be nice to have my own cluster?" I'm guessing not many people have those types of revelations, and probably fewer that decide they should go ahead and solve the problem. I wanted a cheap, small, easy to pack, light, quiet, low-power cluster that I could sit on my desk, and not even think about it. I did a little bit of research of what equipment to use and decided on the following:
(1) EEE PC-901 -- A good option for a management node due to it not only being small and portable, but also includes a screen to allow as a front-end to the cluster.
(3) fit-pc2 -- A tiny PC in a box. Although not considered a workhorse, the Atom-based system can get the job done with its two 32-bit cores.
The network for the cluster is just using any 10/100/1000 switch laying around (in my case a spare unused wireless router).
The OS for the system will be CentOS 5.3 (i386) with all of the cluster tools being supplied by the Sun HPC Software, Linux Edition (sorry, consider this part an advertisement for the project I work on). OK, with all that selected, let's build us a cluster.
Installing the EEE PC 901
BOOTING
Because the EEE PC 901 doesn't have a CD-ROM drive, and the ethernet card in the machine isn't recognized by the CentOS installer (therefore no network install), I found a USB stick that was big enough to hold the CentOS DVD installer.
To create this, I did the following:
- mount the CentOS iso (/mnt)
- fdisk /dev/sdb (set to type b)
- mkfs.vfat /dev/sdb1
- mount the USB drive (/media/disk)
- rsync -rv /mnt/isolinux /mnt/images /media/disk/
- mkdir /media/disk/syslinux
- syslinux -d syslinux /dev/sdb1
- mv /media/disk/isolinux/* /media/disk/syslinux/
- rmdir /media/disk/isolinux
- mv /media/disk/syslinux/isolinux.bin /media/disk/syslinux/syslinux.bin
- mv /media/disk/syslinux/isolinux.cfg /media/disk/syslinux/syslinux.cfg
- Copy the CentOS iso image to /media/disk/linux.iso
Now, boot onto the USB drive. I've noticed that the EEE seems to have random issues of detecting flash drives as bootable. If that's the case, you might need to cycle a few times and check your boot device settings in the BIOS options. It'll work, it just might take a little magic.
Once the machine boots, you should be at a very unpretty CentOS installer screen. Click through the defaults until you get to the "Installation Method" screen. At this point, you want to choose "Hard Drive" and then "/dev/sda1". If all goes well, the installer will continue on look a lot more "normal".
PARTITIONING
There's always a question of how best to partition the hard drive of a machine. At one time I was pretty hard core about what partitions were used, how big they are, extended or primary, etc. Now, I've realize that really just using the default partitioning scheme works best for most people. One thing to keep in mind is that for clustering, your management node could have a need for more disk space in certain places; this is namely to store the images you'll be using on the rest of the cluster. Depending on your diskless of diskfull provisioner, the location of these images could change, so be aware of those decisions. In this setup, the images will mainly lie in /var/lib/oneSIS; so as long as /var has a fair amount of disk space we should be good.
And now a quick note about the EEE PC 901. Mine is the "linux" version, which includes a 20GB Solid State Device. This is physically represented as two separate drives; a faster 4GB drive and a slightly slower 16GB drive. Normally, installing linux on an EEE you would want to put as much of the OS onto the 4GB drive and save the rest for /home; for this layout I'm going to devote the 4GB disk to the initial CentOS install, and then over the course of this document, carve up the 16GB as needed.
** NOTE **
The CentOS installer is going to want to include any USB drives it finds in the partitioning scheme. This includes the one you booted off of, so make sure that it's only paying attention to the drives you want it to.
I noticed that the installer attempts to install grub onto /dev/sda (which is thoroughly unuseful) by default. So, make sure that you remove the USB flash drive from the grub options (not required) and click the "Configure advanced boot loader options". Then click next, and on the next screen click the "Change Drive Order" pushing /dev/hdc (or your actual hard drive) up to the top position. Clicking OK, you should now see your actual hard drive set for the MBR selection to load grub into (phew).
PACKAGE SELECTION
At this point, we'll go ahead and just leave the default package selection (which is just "Desktop - Gnome"). While you can install anything you want from here (it's your box, do what you want), I like to keep it minimal and do things later. One thing to note on the EEE PC 901 is that the ethernet and wireless networking aren't supported by default. So, perhaps also installing the "Development Tools" would be helpful here.
Well, if all went well, the system should be installing; this can take a little bit of time ... so, entertain yourself in some way.
** NOTE **
I noticed that on my system, the CentOS installer saw the hard drives as /dev/hdc and /dev/hdd I didn't really do anything about it because I didn't know what the reprocutions might be (from what I can tell, there aren't any, that just seems to be the placement Asus put the SSD on the bus). After the installation and reboot, I also noticed that my system won't boot without the USB stick available (seems that grub installed onto there instead of the hard drive). So, if this happens to you, here's what I did (though you might need to run through the next section of this guide to get to a login prompt).
- Boot the system with the USB stick, but make sure it actually boots into Linux and not the installer.
- Login to the system as root (you can either login as root, or su to root later; sure wish CentOS setup sudo by default).
- Pull up a terminal and "cd /boot/grub"
- cp grub.conf /tmp
- rm -rf /boot/grub/*
- grub-install /dev/hdc (make sure this matches whatever your boot device should be)
- mv /tmp/grub.conf /boot/grub && ln -s grub.conf /boot/grub/menu.lst
- Modify the /boot/grub/grub.conf so that the CentOS entry looks like this (specifically the root part):
title CentOS (2.6.18-128.el5) root (hd0,0) kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet initrd /initrd-2.6.18-128.el5.img
After a reboot, system came right back without worrying about the USB drive.
SYSTEM CONFIGURATION
After the initial reboot, there are some quick configurations that CentOS want you to do. Included are Firewall and SELinux options. For both of these, I usually just disable them since in a cluster they tend to be a hindrance. If you do need added security, the firewall is your best bet, but I'd recommend leaving SELinux disabled.
At this point, you should be at a login screen and ready to start really configuring the management node. As a long time user of sudo, this is a good moment to go ahead and get it running. In a terminal, the first thing we need to do is enable the "wheel" group in the sudoers file:
- As root run visudo
- Look for the line that looks like:
# %wheel ALL=(ALL) ALL
and remove the leading "#". - Save the file and quit.
So, at this point anyone in the "wheel" group can use sudo (with a password) for all commands on the system. So, now we need to add ourselves to the wheel group:
usermod -G wheel <your username>
For the change to take affect, you'll need to log out and back in; then after that sudo will be available.
If you're using an EEE PC 901 like I am you'll notice that you don't have wireless or ethernet. CentOS doesn't include the needed modules for the ethernet card that comes with the 901 (Attansic L1 Gigabit Ethernet Adapter). Looking online, it seems that the module that's needed is the "atl1e", so we'll install it:
- On a working system: wget http://elrepo.org/linux/elrepo/el5/i386/RPMS/kmod-atl1e-1.2.40.3-1.el5.elrepo.i686.rpm
- Copy the rpms to your flash disk and move it over to the EEE.
- On the eee: rpm -Uvh <path to>/kmod-atl1e-1.2.40.3-1.el5.elrepo.i686.rpm
- Reboot
A good idea at this point is to take a break and let the computer churn on updates. You generally want to make sure your system is up-to-date, so we'll do that now before we start really configuring the cluster software. So, run "sudo yum update" accept the updates it wants to do, and then go and do something else for a bit.
<-- Time Passes -->
So, now we have what amounts to a laptop running CentOS. Not overly exciting, nor the goal of this little ditty, so let's keep going. First thing I want to do is give my system (and cluster for that matter) a name. I'm not going to judge or help offering up names, instead I'll just use the name "frosted" (thanks for Michael MacDonald for the name Frosted Mini-Cluster). With that, I have a naming scheme for my cluster:
| Node Type | Name | IP-Address |
|---|---|---|
| Management/Login | frosted | 10.0.10.1 |
| Compute | frosted0 | 10.0.10.2 |
| Compute | frosted1 | 10.0.10.3 |
| Compute | frosted2 | 10.0.10.4 |
For a cluster like this, we'll just go ahead and double up the responsiblities of the management node by also making it the login node. All the rest of the nodes in the cluster can all be used for computing (or whatever). At this point, we're really building a cluster, so I'll just point to the Sun HPC Software, Linux Edition documentation. One thing to keep in mind is that this cluster is 32-bit, that SHSLE is a 64-bit OS. Most of the tools that are a part of SHSLE can easily be rebuilt for 32-bit, it just takes time. The rest of this write-up is the result of my rebuilding the stack and making my own ISO. I'm not quite sure if we (the Giraffe team) can do anything to help others with a 32-bit version; but I might be able to at least make my ISO available in some way.
When going through the OneSIS configuration, a very important step is making sure you correctly build the initramfs so that the fit-pc2 has a working network device. Here's the command that I used:
mk-initramfs-oneSIS -w r8169 -f -s 8192 /tftpboot/images/centos5.3-onesis/initrd-2.6.18-128.1.10.el5.img 2.6.18-128.1.10.el5
With that, you should be good to go with booting the nodes off of this image.
CONFIGURING THE FIT-PC2's
All the way up till now, the shiny new fit-pc2's have been sitting there unused. Now's their moment. The first thing we need to do is get the ready for network booting. The fit-pc2 has built-in support to boot over the ethernet, but by default it's disabled (at least on my fit-pc2s). So, we need to turn them on, go into the CMOS settings (hitting F2 frantically), and setting the following options:
Main -> Large Disk Access Mode: -> Other Advanced -> Installed O/S: -> Other Advanced -> Boot-time Diagnostics Screen: -> Enabled Advanced -> QuickBoot Mode: -> Enabled Intel -> Chipset Control Sub-Menu -> LAN PXE BOOT: -> Enabled Exit -> Exit Saving Changes
So, with that, we should be able to turn them on and bask in the glow of our shiny new cluster.
USING THE CLUSTER
There are a number of tools that come included with the Sun HPC Software Stack that help us utilize the cluster. One such tool is pdsh (parallel shell), which allows us to run commands on all of the nodes in parallel. So, for example, let's say we want to check the time on all of the nodes, we can do the following:
[makia@frosted ~]$ pdsh -a date frosted0: Sat Jun 27 14:41:51 MDT 2009 frosted1: Sat Jun 27 14:42:04 MDT 2009 frosted2: Sat Jun 27 14:42:03 MDT 2009 [makia@frosted ~]$
Or, if we want to see combined output, we can also include dshbak:
[makia@frosted ~]$ pdsh -a "df" | grep /home | dshbak -c ---------------- frosted[0-2] ---------------- 10.0.10.1:/home 3850336 2525376 1129376 70% /home [makia@frosted ~]$
As part of the SunHPC Software Stack, we also include slurm (and also a way to configure it). Once it's configured, it can help us run jobs across the system (including MPI jobs, using OpenMPI which is also included):
[makia@frosted mpi]$ mpicc -o hello hello.c [makia@frosted mpi]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up infinite 3 idle frosted[0-2] [makia@frosted mpi]$ salloc -N3 salloc: Granted job allocation 3798 [makia@frosted mpi]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3798 compute bash makia R 0:07 3 frosted[0-2] [makia@frosted mpi]$ mpirun -mca btl self,tcp hellofrosted0: hello world from process 0 of 6 frosted1: hello world from process 3 of 6 frosted2: hello world from process 5 of 6 frosted0: hello world from process 1 of 6 frosted1: hello world from process 2 of 6 frosted2: hello world from process 4 of 6 [makia@frosted mpi]$ exit exit salloc: Relinquishing job allocation 3798 [makia@frosted mpi]$
Another nice tool that comes along with the stack is monitoring tools. Here's ganglia output for the cluster (it even had a slight load on it, just for added drama).

CONCLUSION
That's it. A minicluster. Not to difficult to build, takes up little-to-no desk space, extremely quiet, and kinda cool to have sitting around. Usefullness varies with whatever you see the need for a cluster. I decided I wanted to see how the Selfish Detour benchmarks would hold up. Here's the graph (it's pretty, but also shows the nodes are pretty noisy):

Feel free to leave comments or questions.




