ISV engineering's HPC web log For HPC ISVs & OSS

Friday Feb 20, 2009

Building OpenMPI 1.29 for OpenSolaris with Studio 12


The last version of OpenMPI 1.2x is released. OpenMPI 1.29 is out and available for download from http://wwww.openmpi.org/

As I have in a previous blog, built 129rc1, this build is similar, but a big fix for OpenSolaris (as I documented in my blog) has made it into the source tree. This is great, you will find the build goes really smoothly.

Getting started

  • OpenSolaris 2008.11

  • OpenMPI 1.2.9

  • Studio 12 compilers

  • GNU utilities

OpenMPI 1.2.9 can be obtained at: http://www.openmpi.org/

For OpenSolaris, use the package manager and the opensolaris.org repository to download the Studio 12 compilers for C/C++ and GNU utilities, the repository is great and as a developer on OpenSolaris, you should get use to this tool to manage your environment, it is a fantastic addition, I use it heavily.


Building OpenMPI 1.2.9


# gzip -d openmpi-1.2.9.tar.gz

# tar xvf openmpi-1.2.9.tar

This is a basic delivery of source, it utilizes configure for the environment build. Configure will pick up the Studio compilers by setting the enviroment variables: CC, CFLAGS, CXX, CXXFLAGS, F77, FFLAGS

# export CC=/export/home/langston/COMPILER/SUNWspro/bin/cc

# export CXX=/export/home/langston/COMPILER/SUNWspro/bin/CC

# export F77=/export/home/langston/COMPILER/SUNWspro/bin/f77

With /export/home/langston/COMPILER/SUNWspro where I have Sun Studio 12 installed.

# configure –prefix=/usr/local/openmpi1.29

You can aim the prefix anywhere you like, but I'm finding that it is best to put OpenMPI is a location rather than let it default to /usr/local – I have too many version running around and I need them isolated from each other.

# gmake


Installing OpenMPI and running an example


# gmake install

Make sure you have that path in your PATH, in my case, /usr/local/openmpi1.29/bin , this will find things such as mpicc

Make sure you have LD_LIBRARY_PATH set with /usr/local/openmpi1.29/lib include

# export PATH=$PATH:/usr/local/openmpi1.29/bin

# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi1.29/lib

# cd examples

langston@alpha:~/OPENMPI/openmpi-1.2.9/examples$ gmake hello_c

mpicc -g hello_c.c -o hello_c

langston@alpha:~/OPENMPI/openmpi-1.2.9/examples$ ldd hello_c

libmpi.so.0 => /usr/local/openmpi1.29/lib/libmpi.so.0

libopen-rte.so.0 => /usr/local/openmpi1.29/lib/libopen-rte.so.0

libopen-pal.so.0 => /usr/local/openmpi1.29/lib/libopen-pal.so.0

libsocket.so.1 => /lib/libsocket.so.1

libnsl.so.1 => /lib/libnsl.so.1

libm.so.2 => /lib/libm.so.2

libthread.so.1 => /lib/libthread.so.1

libc.so.1 => /lib/libc.so.1

libmp.so.2 => /lib/libmp.so.2

libmd.so.1 => /lib/libmd.so.1

libscf.so.1 => /lib/libscf.so.1

libuutil.so.1 => /lib/libuutil.so.1

libgen.so.1 => /lib/libgen.so.1

langston@alpha:~/OPENMPI/openmpi-1.2.9/examples$ orterun --mca btl tcp,self -np 2 hello_c

Hello, world, I am 0 of 2

Hello, world, I am 1 of 2


Caveats


  • For Sun, a great alternative is to use ClusterTools 8, it is based on OpenMPI 1.3

Comments:

I am really interested in parallel computing and I am studying OpenMPI, just a newbie.
I greatly appreciate your entry and very glad to learn how to run a parallel program in OpenSolaris.

Thank you so much for your entry.

However, I encountered a problen when following your thorough instructions.

Every thing is very smooth until the end of the instruction
When I ran the command

$ orterun --mca btl tcp,self -np 2 hello_c
Password:

It required me to enter a password, but I wonder what password was required? I tried root password but it didn't work.

Please help me figure out the problem? Or I have to install another package to run the example?

Thank you again

Posted by Le Duy Khanh on March 07, 2009 at 06:23 AM PST #

My first thought is that you are using ssh for security. Can you ssh to the
server you are trying to run your mpi job on? The password is probably your
password. If so, you can make an entry in the known_hosts file in ~/.ssh which
will allow you to run without a password as your user. If the system you are
trying to run on is not in this file, you will always be required to enter a
password, which would be of the user trying to execute orterun. I make note
of this in case you are trying to run over something like Sun Grid Engine,
whereby it could be a different user.

But its just a guess ...

Something else to try, see if you can rsh to the same server. This may tell
you if ssh is the security mechanism.

Jim

Posted by James Langston on March 07, 2009 at 06:01 PM PST #

Also, test to see if you can rsh without a password, I just thought that would create the same scenario for you. If you cannot, then you must create a ~/.rhosts file for your user with the machines you need to run your mpi job on.

Also, by default rsh is not running:

svcadm enable svc:/network/shell:default
svcadm enable svc:/network/login:rlogin

Then run a simple test -

rsh myHostToRunMPIon ls

This should run ls on the node, without a password prompt or a permission denied error.

Jim

Posted by James Langston on March 07, 2009 at 06:21 PM PST #

Hix, I forgot to tell you that I am also an OpenSolaris newbie ^^

I am using OpenSolaris in my Acer Laptop and there is no connection to any servers or hosts. I followed exactly what you instructed in your post.

1) "My first thought is that you are using ssh for security".
-> How can I check if ssh is used for my security?
That I have a "~/.ssh" in my home folder can prove that I use SSH?right?

2) "If so, you can make an entry in the known_hosts file in ~/.ssh"
-> ~/.ssh folder, there are 2 files only : id_dsa and id_dsa.pub. There is no "known_hosts" file in the folder. What kind of entry did you tell me to make?

3) "you are trying to run over something like Sun Grid Engine"
=> How can I check if I am trying run over something?This situation is not viable because I am just a OpenSolaris newbie and I don't know "Sun Grid Engine" but I did install ClusterTools 8 in my OpenSolaris. Does ClusterTools 8 cause any problem?

4) "test to see if you can rsh without a password" ?
=> Following your instruction, I enabled rsh then trie to run

# rsh 192.168.1.2 ls
permission denied

Where 192.168.1.2 is my IP.
Then, I tried

$pfexec rsh 192.168.1.2 ls
permission denied

"This should run ls on the node, without a password prompt or a permission denied error." => But "permission denied"

What is the problem?

Thanks

Posted by Le Duy Khanh on March 07, 2009 at 10:44 PM PST #

I think you are on to the issue now, you need to fix the rsh problem first, after that I think you should be fine.

- Don't worry about the first 3 bullet items, concentrate on getting rsh to work without a password, also - did you try to enter the password of the user executing orterun ?

Make sure you have these enabled (enabled should be true)

svcs -l svc:/network/shell:default
svcs -l svc:/network/login:rlogin

if not enabled (enable them with)

svcadm enable svc:/network/shell:default
svcadm enable svc:/network/login:rlogin

I have been able to successfully run with or without this service enabled, but something is prompting you for a password, this is what I think it is. Or as I said earlier, ssh could be activated as well, it may be how you originally configured your system when you built it. You may want to run:

truss -f -o /tmp/truss.out orterun --mca btl tcp,self -np 2 hello_c

and take a look at the output of truss for any errors which may be network or socket related.

You can include the truss output here and I can take a look at it.

Jim

Posted by James Langston on March 08, 2009 at 05:25 PM PDT #

HI James,

I checked both

$svcs -l svc:/network/shell:default
$svcs -l svc:/network/login:rlogin

And they are both online.

Let me clearly explain you what i meet

First, I followed your instructions, every thing was OK, but the last one

When I typed

$ orterun --mca btl tcp,self -np 2 hello_c

Then a dialog appeared (it jsut appeared one when I first run $orterun, next times, it didn't appear again)

http://i196.photobucket.com/albums/aa33/strongdevil/01.jpg

I didn't know which password was required, therefore, I entered my user password.
Then, I was not sure It was accepted or not . But the dialog disappeared but the console still required a password

http://i196.photobucket.com/albums/aa33/strongdevil/02.jpg

I entered either user password or root password but I didn't help. After 3 times, a message appeared

http://i196.photobucket.com/albums/aa33/strongdevil/03.jpg

I tried to execute your instruction

$truss -f -o /tmp/truss.out orterun --mca btl tcp,self -np 2 hello_c

But a password was require also

http://i196.photobucket.com/albums/aa33/strongdevil/04.jpg

Hixhix, I really don't know what to do and what is the problem. Please help me.

Posted by 203.162.3.163 on March 09, 2009 at 04:47 AM PDT #

The truss command will monitor all activities and commands up to the request for the password, you can stop the orterun and then look at the file, stop with <cntl c>.

Jim

Posted by 65.185.4.252 on March 09, 2009 at 06:26 AM PDT #

Hi Jim,

This was the status when I used 2 commands
***************************
# svcs -l svc:/network/shell:default
fmri svc:/network/shell:default
name rsh
enabled true
state online
next_state none
state_time Mon Mar 09 16:41:37 2009
restarter svc:/network/inetd:default
contract_id
dependency require_any/error svc:/network/loopback (online)
dependency optional_all/error svc:/milestone/network (online)

# svcs -l svc:/network/login:rlogin
fmri svc:/network/login:rlogin
name remote login
enabled true
state online
next_state none
state_time Mon Mar 09 16:41:37 2009
restarter svc:/network/inetd:default
***************************
They were all running.

I include the truss.out here
http://12a1nhc.com/duyhoai/KhanhBKITSun/truss.out
Hix, there are about 3000 lines in the truss.out file. Hix:(

By the way when I used "gmake" and "gmake install"... there was a mail to name "root" was sent like this one

***************************
From MAILER-DAEMON Tue Mar 10 21:42:39 2009
Return-Path: <MAILER-DAEMON@LDKLap_Opensolaris.local>
Received: from localhost (localhost)
by LDKLap_Opensolaris.local (8.14.3+Sun/8.14.3) id n2AEgdMO025643;
Tue, 10 Mar 2009 21:42:39 +0700 (ICT)
Date: Tue, 10 Mar 2009 21:42:39 +0700 (ICT)
From: Mail Delivery Subsystem <MAILER-DAEMON>
Message-Id: <200903101442.n2AEgdMO025643@LDKLap_Opensolaris.local>
To: <root@LDKLap_Opensolaris.local>
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
boundary="n2AEgdMO025643.1236696159/LDKLap_Opensolaris.local"
Subject: Returned mail: see transcript for details
Auto-Submitted: auto-generated (failure)
Content-Length: 11860
***************************

Hope it could be helpful.

Please help me to figure out the problem. I have to start studying OpenMPI as soon as possible but i got stuck with the easiest example. Also, I don't want to install another OS like Ubuntu. My friend could run the example pretty well on Ubuntu.

THank you so much ^^

Posted by Le Duy Khanh on March 10, 2009 at 08:12 AM PDT #

The truss output is very, very helpful - looking at the output, the request is there for the password. Looking back a few lines, you will see that there are ssh requests coming in, and in particular for root. And stepping back just a little further, you will see kerberos initialization. Your system has been built very securely.

You will need to set up key pairs for your user (and maybe root ?) in order to run without passwords. After you set that up, and can ssh without passwords, you will should be able to run. Otherwise, you may have to ping the OpenMPI user community to see if anyone has set up OpenMPI with the high security model you have set up. I have not.

Jim

Posted by James Langston on March 10, 2009 at 09:51 AM PDT #

Thanks Jim,

I greatly appreciate if you can copy the lines where you see ssh and Kerberos.I couldn't see them :((

Yes, I used to try to create a key pairs of my user like this

$ ssh-keygen -b 1024 -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/export/home/member//.ssh/id_dsa):
Created directory '/export/home/member//.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/member//.ssh/id_dsa.
Your public key has been saved in /export/home/member//.ssh/id_dsa.pub.
The key fingerprint is:
3e:f4:f8:c4:39:91:53:67:78:88:56:e7:8c:c5:39:37 member@rampage

But it doesn't seem helpful. About root, in my OpenSolaris, root is a role not a real user. Therefore, may i have to switch root to a user and then create a key pair for it?

I am using OpenSolaris 2008.11. I also use default security of OpenSolaris.

Could you please give me some advices such as: do I have to reinstall OpenSolaris for less security or may I install another OS like Ubuntu for its convenience?

Thank you so much Jim

Posted by Le Duy Khanh on March 11, 2009 at 05:57 AM PDT #

Could you send me a new truss output? I looking to see what may have changed with your keys set.

Jim

Posted by James Langston on March 11, 2009 at 06:32 AM PDT #

Also, found this on the OpenMPI website

http://www.open-mpi.org/faq/?category=rsh#ssh-keys

Jim

Posted by James Langston on March 11, 2009 at 06:36 AM PDT #

Thank you so much Jim. I can do it ^^ I forgot to chmod $HOME and ./ssh foler to 755 ^^

I greatly appreciate your help in past few weeks.

Again, thank you ^^

Posted by Le Duy Khanh on March 16, 2009 at 08:38 AM PDT #

b

Posted by a on October 05, 2009 at 12:24 AM PDT #

yes

Posted by 横浜 on October 05, 2009 at 12:26 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed