Set up a local NFS file system
In order to install HPC-CT, you need a shared filesystem, visible from all nodes of your cluster.
In my case, I had to do this first.
Let us call node0 your headnode (who will the server of the NFS filesystem)
Start on node0
%svcadm -v enable -r network/nfs/server
%mkdir /tools
%chmod 777 /tools
%share -F nfs -o rw /tools
Add the share command into
%cat /etc/dfs/dfstab
share -F nfs -o rw /tools
and you will get it automatically after a reboot.
Now on all other client nodes (node1 to nodeN) do
%mkdir /tools
%mount -F nfs node0:/tools /tools
and add a line at the end of
%cat /etc/vfstab server:/disk - /mount_point nfs - yes rw,soft node0:/tools - /tools nfs - yes rw,soft
Password-free rsh
The next step is to get a password free rsh for root
edit/create a rhostfile containing the hostnames and the login :
%cat ~/.rhosts node0 root node1 root node2 root nodeN root
and add the hostnames in the file
%cat /etc/hosts.equiv node0 node1 node2 nodeN
you should now be able to create files under /tools and do a rsh nodeN command
without any password prompt.
Installing HPC-CT
Now it is time to install HP-CT 8.2. Download the latest version from here.
Stay on your headnode, node0, and put sun-hpc-ct-8.2-SunOS-i386.tar.gz under the shared filesystem /tools
%cd /tools
%gunzip -c sun-hpc-ct-8.2-SunOS-i386.tar.gz | tar xvf
%cd sun-hpc-ct-8.2-SunOS-i386/Product/Install_Utilities/bin
%./ctinstall -n node0,node1,node2,nodeN -r rsh
For more information, see here
for the HPC CT installation guide.You do not need to have the IB
network during the installation of HPC-CT. This is a feature taken at
run-time, and not at install-time.
For the time being Solaris uses the uDAPL protocol. This protocol requires a TCP interface be up and running
Check with
% ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.60.20.183 netmask ffffff00 broadcast 10.60.20.255
ether 0:1e:68:2f:1d:9e
that this is the case. You can already try to run a mpi program by specifying the tcp interface:
%mpirun -np 2 -mca btl sm,tcp,self -mca plm_rsh_agent rsh -hostfile ./hostfile ./a.out
%cat hostfile
node0 slots=1
node1 slots=1
Configuring the IB interface
Check that the IB updates and packages are installed.
Run the
%pkginfo -x | grep -i ib
within a long list you should see something like this :
<snip>
SUNWhermon Sun IB Hermon HCA driver SUNWib Sun InfiniBand Framework SUNWibsdp Sun InfiniBand layered Sockets Direct Protocol SUNWibsdpib Sun InfiniBand Sockets Direct Protocol SUNWibsdpu Sun InfiniBand pseudo Sockets Direct Protocol Admin
<snip>
If you see nothing here, you will have to install the IB patches from the install image.
If you are using an earlier version of Solaris10 X86 (5/09), you can get these packages from here.
Check the /usr/sbin/datadm command
%datadm -v
If you see nothing, you have to check whether or not you have this file :
%cat /usr/share/dat/SUNWudaplt.conf # # Copyright 2008 Sun Microsystems, Inc. All rights reserved. # Use is subject to license terms. # # ident "@(#)SUNWudaplt.conf 1.3 08/10/16 SMI" # driver_name=tavor u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " " driver_name=arbel u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " " driver_name=hermon u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " "Run the following command on all nodes
%datadm -a /usr/share/dat/SUNWudaplt.conf
Now datadm should display this
%datadm -v
ibd0 u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " " "driver_name=hermon"
and you should have a file
%cat /etc/dat/dat.conf
ibd0 u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " " "driver_name=hermon"
Eventually reboot now all nodes.
If you have done no mistake, they all should come back with an NFS mounted directory /tools and
password free rsh commands and, datadm should return the line as shown above.
Check if the IB interface is seen under
%ll /dev/ib* 3120 2 lrwxrwxrwx 1 root other 29 Nov 11 15:43 /dev/ibd -> ../devices/pseudo/clone@0:ibd 92901 2 lrwxrwxrwx 1 root root 72 Nov 16 10:09 /dev/ibd0 -> ../devices/pci@0,0/pci8086,25f8@4/pci15b3,673c@0/ibport@2,ffff,ipib:ibd0
Here my interface is called ibd0. You may have another number at the end.
Now we have to configure the ibd0 interface. In my example, I decided
to give the following IP address for the ibd0 interface:
(Before doing this check with ping that these addresses are really unused ... )
node0 5.6.134.50
node1 5.6.134.51
node2 5.6.134.52
etc ...
Now on every node run ifconfig command with the correct IP
On node 0
%ifconfig ibd0 plumb 5.6.134.50 broadcast 5.6.255.25 netmask 255.255.0.0 up
on node1
%ifconfig ibd0 plumb 5.6.134.51 broadcast 5.6.255.25 netmask 255.255.0.0 up
etc
The ibd0 should now be unplumbed and show
%ifconfig ibd0
ibd0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 4
inet 5.6.134.50 netmask ffff0000 broadcast 5.6.255.255
ipib 0:1:0:4a:fe:80:0:0:0:0:0:0:0:21:28:0:1:3e:5c:90
Finally
to make this interface persistent across reboots you have to create on
every node a file that contains the IP address for the ibd0 interface.
on node0
%cat /etc/hostname.ibd0
5.6.134.50
and on node1
%cat /etc/hostname.ibd0
5.6.134.51
etc ...
As a test you should be able to ping all IP adresses from all nodes.
Do a last sanity check by looking at
%ldd /opt/SUNWhpc/HPC8.2/sun/lib/openmpi/mca_btl_udapl.so
and check that all libraries are found
Now you are ready for rock'n roll and you can run
%setenv LD_LIBRARY_PATH /opt/SUNWhpc/HPC8.2/sun/lib
%mpirun -np 2 -mca btl sm,self,udapl -mca plm_rsh_agent rsh -x LD_LIBRARY_PATH -hostfile ./hostfile ./a.out
with the same! hostfile as above
%cat hostfile
node0 slots=1
node1 slots=1
Some additional remarks
As
you have seen from the examples above, HPC-CT will look for the best
way to communicate with the hosts mentioned in the hostfile by
searching the fastest possible interconnect.
Let us suppose that
node0 and node1 are connected (as described above over IB), while node3
and node4 are on the TCP interconnect. Running
%mpirun -np 4 -mca btl sm,self,tcp,udapl -mca plm_rsh_agent rsh -x LD_LIBRARY_PATH -hostfile ./hostfile ./a.out
%cat hostfile
node0 slots=1
node1 slots=1
node2 slots=1
node3 slots=1
will
use IB between nodes 0 and 1 and the TCP network for the rest. If you
would impose the IB network by setting -mca btl sm,self,udapl the run
will fail and you get an error message.
