Pascal's Weblog
The Grid...



Archives
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
Click me to subscribe
Search

Links
 

Today's Page Hits: 18

Main | Next page »
Wednesday Jun 20, 2007
What's new?
I moved to a different project couple of months ago. We just delivered the first version of our product. Here is an overview, here is the link to it, and here you will find what service tags are about.

I keep in touch with some of the developers on the Sun grid team, here are some features recently delivered by this team (in case someone actually reads this blog and did not see them somewhere else and since it is still called 'The Grid'!): More information at network.com

Posted at 10:16AM Jun 20, 2007 by Pascal Ledru in Grid Computing  |  Comments[0]

Friday Nov 10, 2006
SC'06
The largest Supercomputer conference in the world, SC06, is next week and I will be there presenting Sun Grid. Sun will be presenting a bunch of other products in the HPC area.

I am one of the experts! (Well, my boss thinks I am!). Please, stop at our booth (#605), you will get some free hours on Sun Grid. And, if I don't know the answers to some of your questions, I will get back to you as soon as I can!

Grid Expert!

Posted at 12:51PM Nov 10, 2006 by Pascal Ledru in Grid Computing  |  Comments[1]

DRMAA
I initially gave examples on how to distribute an application over the Grid using the Sun Grid Engine commands (e.g., qsub). An other option is to use the Distributed Resource Management Application API (DRMAA) which provide options to submit jobs directly from within a program. A sample code looks like in Java:


         // Submit jobs to run 
         for (int i = 0; i < 50; i++) {
           System.out.println("Creating Remote Command: " + i);
           jt[i] = session.createJobTemplate();
           jt[i].setRemoteCommand(exec);
           ids[i] = session.runJob(jt[i]);
         }

A full example is provided on the Developer Network site by Fay Salwen, one of our Grid experts.
Posted at 09:46AM Nov 10, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Thursday Nov 09, 2006
Globus client certificates
These steps are used to connect to a GridFTP server and set up the security certificates appropriately; they are provided by Gabriele Carcassi, a senior engineer here at Sun.

We will be using the Commodity Grid (CoG) Kits on the client.

On your machine, as yourself:
> ./cog-4_1_4/bin/grid-cert-request -cn "Pascal Ledru"
>scp .globus/usercert_request.pem root@myserver.central.sun.com:usercert_request.ledru.pem

On myserver, as root:
> /usr/local/globus-4.0.3/bin/grid-ca-sign -in usercert_request.ledru.pem -out usercert.ledru.pem

On the client machine, as yourself:
> scp root@myserver.central.sun.com:usercert.ledru.pem .globus/usercert.pem
> ./install/cog-4_1_4/bin/grid-proxy-init

On the server, edit the /etc/grid-security/grid-mapfile to add an entry such as:
"/O=Grid/OU=GlobusTest/OU=simpleCA-myserver/CN=Pascal Ledru" globus

On the client, try the file-transfer script:
 ./cog-file-transfer -s file:///home/pl108086/test.txt -d gridftp://myserver.central.sun.com:2811/tmp/TEST.txt


Et voila...
Posted at 02:00PM Nov 09, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Globus on Solaris 10 x86
I just built the Globus Toolkit on Solaris 10 / x86 (AMD Opteron). Here are the steps I went thru if these can be useful to anyone:

From the Download site, pick up the source: gt4.0.3-all-source-installer.tar bundle.

Make sure you have GNU tar and GNU make (I did not have them by default and installed these in /usr/local/bin)

As root, create the directory /usr/local/globus.4.0.3

Create a globus user, as this user here are the environment variables I have:

> printenv
HZ=
SHELL=/bin/bash
TERM=xterm
LC_ALL=C
MAIL=/var/mail/globus
PATH=/usr/local/bin:/usr/bin:/usr/sfw/bin:/usr/dist/share/devpro/5.x-i386/bin:/usr/ccs/bin:/usr/dist/exe
PWD=/etc/grid-security
LANG=C
TZ=US/Mountain
SHLVL=2
HOME=/users/globus
LOGNAME=globus
LC_CTYPE=C
_=/usr/bin/tcsh
HOSTTYPE=i86pc
VENDOR=sun
OSTYPE=solaris
MACHTYPE=i386
USER=globus
GROUP=other
HOST=myhost
GPT_LOCATION=/usr/local/globus-4.0.3
CC=/usr/dist/share/devpro/5.x-i386/bin/cc
GLOBUS_LOCATION=/usr/local/globus-4.0.3
GCC=/usr/sfw/bin/gcc

Run:

 ./configure --prefix=$GLOBUS_LOCATION --with-flavor=gcc32dbg

Then:
make

I ran into couple of problems. Initially did not have ar and pod2man in my path. So, I corrected my PATH by adding:
/usr/ccs/bin:/usr/dist/exe

Then, I ran into this problem:
file:/tmp/testgt/gt4.0.3-all-source-installer/source-trees/wsrf/java/core/source/build.xml:211: Compile failed; see the compiler error output for details.

due to the fact I uses a JDK1.5 compiler and enum is a reserved word used in:
import org.apache.axus.enum.Scope;
I edited the build file to add the -source 1.4 option such as:
source="1.4"


Then, I waited and waited and waited.... Eventually:
echo "Your build completed successfully.  Please run make install."
Your build completed successfully.  Please run make install.


Then:
> make install
/usr/local/globus-4.0.3/sbin/gpt-postinstall
running /usr/local/globus-4.0.3/setup/globus/setup-globus-common..[ Changing to /usr/local/globus-4.0.3/setup/globus ]
creating globus-sh-tools-vars.sh

running /usr/local/globus-4.0.3/setup/globus/setup-globus-job-manager-fork..[ Changing to /usr/local/globus-4.0.3/setup/globus ]
find-fork-tools: WARNING: "Cannot locate mpiexec"
find-fork-tools: WARNING: "Cannot locate mpirun"
checking for mpiexec... no
checking for mpirun... no
find-fork-tools: creating ./config.status
config.status: creating fork.pm
..Done

I then try to run setup-simple-ca as the globus user:
> $GLOBUS_LOCATION/setup/globus/setup-simple-ca

Got this error:
creating CA config package... /usr/local/globus-4.0.3/setup/globus/setup-simple-ca: GLOBUS_LOCATION=/usr/local/globus-4.0.3: is not an identifier

So changed
#! /bin/sh
to
#! /bin/bash

Got this error:
creating CA config package...
ERROR: Unable to configure the simple_ca 9a7cd758 setup package
Corrected GPT_LOCATION such as:
setenv GPT_LOCATION /usr/local/globus-4.0.3
Then ran as root:
>  $GLOBUS_LOCATION/setup/globus_simple_ca_9a7cd758_setup/setup-gsi -default
>  $GLOBUS_LOCATION/bin/grid-cert-request -host myhost.central.sun.com
Posted at 11:34AM Nov 09, 2006 by Pascal Ledru in Grid Computing  |  Comments[1]

Wednesday Sep 27, 2006
Compute Server 101
Sun recently came up with a new technology to write applications for the Sun Grid Compute Utility. Under the cover it is based on Jini.

It is available for download at on the Developer Network web site.

Installation notes are also available at this site.

I installed the Netbeans module, picked on of the examples: integralpi built the project which generates a zip file: gridjob.zip in the dist directory, logged in to Sun Grid, created a resource using gridjob.zip, created a job using gridjob.zip and specifying "start" as the executable and submitted the job. The job ran for few minutes, I downloaded the result and checked the value of pi in the cs_master file:

cat cs_master.o68935.2

Sleeping for 0 seconds
Your job-array 68936.4-102:1 ("cs_ephemeralworker -ephemeral") has been submitted
Sep 27, 2006 4:43:43 PM com.sun.computeserver.master.Master run
INFO: Master starting
Sep 27, 2006 4:44:05 PM com.sun.computeserver.integralpi.PiGenerator done
INFO: Master PI:3.1415906524138111994160239945727946065207750695568908515112969308267806110
10577374635946745511251244209674355608893092721700668334960937500
Sep 27, 2006 4:44:05 PM com.sun.computeserver.master.Master doWaitForFeedback
INFO: Master exiting via WAIT
Sep 27, 2006 4:44:05 PM com.sun.computeserver.mm.MasterAspect doOutput
INFO:
Job Succeeded

Sep 27, 2006 4:44:05 PM com.sun.computeserver.mm.MasterAspect doOutput
INFO:
 Master Job Start Time       : 2006-09-27 16:43:43.674
 Master Job End Time         : 2006-09-27 16:44:05.187
 Total Job Execution Time    : 21.513 Seconds
 Total Processed Tasks       : 100
 Total Failed Tasks          : 0
 Total Submitted Tasks       : 100
             Generated       : 100
             Fissioned       : 0
 Task Throughput             : 4.7 Tasks per Second
 Average Task Execution Time : 0.215 Seconds per Task

By default, 100 tasks are used. You can see the output of each worker:

cat cs_ephemeralworker.o68936.86

Sleeping for 8 seconds
Sep 27, 2006 4:44:03 PM com.sun.computeserver.worker.WorkerWatcher init
INFO: Worker active on nyc1r226cpn06.retail.nyc1.sungrid.net
Sep 27, 2006 4:44:09 PM com.sun.computeserver.worker.WorkerWatcher goAway
INFO: Worker exiting nyc1r226cpn06.retail.nyc1.sungrid.net
Posted at 11:59AM Sep 27, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Monday Sep 18, 2006
And the result is...
I ran a quick experiment with each node executing 20 million tests:

1 node:    Average: 3.1411545250056307 with 1 nodes.
2 nodes:   Average: 3.141313165330142 with 2 nodes.
20 nodes:  Average: 3.1416765781116878 with 20 nodes.
200 nodes: Average: 3.141568127864779 with 200 nodes.
Actually 3.1415 is a fairly good approximation: π to one MILLION decimal places

Note that to avoid writting a different script I use the feature to pass a parameter to a job to specify the number of clients:

#!/bin/ksh
#$ -N Buffon
#$ -cwd

# Set the environment variables
if [ -f $HOME/.profile ]; then
   . $HOME/.profile
fi

numClients=$1

echo "Starting the server..."
echo "Number of clients: "$numClients
svrResp=$(qsub -N server startServers.ksh)
echo svrResp is $svrResp
svrJobId=$(echo "$svrResp" | awk '{print $3}')
echo svrJobId is $svrJobId

# Wait until the server is started
status="not running"
until [ "$status" == "r" ]
do
  status=$( qstat | nawk '/'$svrJobId'/ {print $5}' )
  echo Server job status is $status
  sleep 10
done

#Wait until the serverhost file is created
filename="$HOME/serverhost"
until test -f $filename
do
  sleep 10
done
# then pull the server node name from the file
servernode=$(cat $filename)
rm $filename

echo "Server is running on" $servernode "Submitting a set of clients to the grid for remote execution..."
qsub -N clients -t 1-$numClients startClient.ksh $servernode

echo "Submitting a cleanup job that will wait until the clients are complete"
qsub -hold_jid clients cleanup.ksh $svrJobId
Posted at 11:34AM Sep 18, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Tuesday Sep 12, 2006
Monte Carlo Simulations
The problem we have examined so far was based on distributed pieces of a large problems on several nodes to speed up computations. Basically, a master distributed pieces of the job to be calculated by workers then collected these pieces.

Amazingly, there is a class of applications even easier to distribute on a Grid. For Monte Carlo simulations, the same program is executed on each node and the master only averages the results!

The most famous example of a Monte Carlo simulation is used to approximate the value of π.

Monte Carlo simulations are widely used for simulating the behavior of various physical and mathematical systems. They are also used in finance.
Posted at 12:49PM Sep 12, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Wednesday Sep 06, 2006
A large job
To try something little bit larger, I ran the Mersenne Prime search program described below on 100 nodes searching all the Mersenne primes with an exponent up to 25,000; the last one in this interval being 2^23209 -1 discovered in 1979 (less than 30 years ago!). The program ran for 12 hours, was simple to write (a simple Java program), and it was just a matter to upload it to the Grid, submit it, and dowload the result. While of course, it is difficult to compare with other searches, this page describes how the 32nd Mersenne Prime was found: a specialized program written for a Cray-2 supercomputer.
Posted at 08:55AM Sep 06, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Friday Sep 01, 2006
No Multicasting
As an option to try to discover a job from another job, I tried to multicast a request. Did not seem to work, opened the Develper's Guide to read that... indeed Network multicasting is disabled on the Grid!
Posted at 04:07PM Sep 01, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Thursday Aug 31, 2006
Job Monitoring
Running a long duration job, I ran into the problem of wanted to know what was going on with my job. As the files to download are only available when the job terminates this is not easy to get intermediate data directly.

Below is a suggestion from one of my colleagues here at Sun. Let's extend the application I presented previously to get intermediate data. If you recall the interface of my Server:

public interface MersenneServer extends Remote {
  public int[] getInterval() throws RemoteException;
  public void postResult(Result result) throws RemoteException;
  public String getStatus() throws RemoteException;
}


I added an administration interface. For this simple example, just a method getStatus() which allows us to access intermediate data.

Once this job is submitted, you just need to submit a second job which calls this new method, terminates and the progress is available to be downloaded from the Grid Web UI.

The trick here is to find the location of the initial server. Recall that all jobs are submitted as different users on a server selected by the Grid Engine. To find the identity of the server, this solution relies on the qstat command.

Submit a job with a somewhat unique id:
svrResp=$(qsub -N mserv08 startServers.ksh)

Here the id is 'mserv08'. When the monitoring job is submitted, it first executes the qstat command:
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r213cpn21.retail.nyc     1 8
  58642 0.60500 start      user0104     r     08/27/2006 00:50:38 all.q@nyc1r213cpn24.retail.nyc   102
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r213cpn32.retail.nyc     1 12
  59914 0.50500 clients    user0105     r     08/31/2006 00:04:37 all.q@nyc1r214cpn14.retail.nyc     1 5
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r214cpn15.retail.nyc     1 16
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r214cpn20.retail.nyc     1 1
  59917 0.50500 Admin      user0107     r     08/31/2006 00:05:37 all.q@nyc1r219cpn24.retail.nyc     1
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r219cpn28.retail.nyc     1 11
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r220cpn04.retail.nyc     1 13
  59913 0.50500 mserv08    user0105     r     08/31/2006 00:04:07 all.q@nyc1r220cpn17.retail.nyc     1
  58219 0.50500 clients    user0103     r     08/25/2006 16:51:22 all.q@nyc1r220cpn21.retail.nyc     1 14

The monitoring job, then looks for the server the initial job is running on: nyc1r220cpn17.retail.nyc (after printing a hostname I noticed that the format of the host was: nyc1r220cpn17.retail.nyc1.sungrid.net), add '1.sungrid.net' to get the fully qualified name (Possibly nyc1r220cpn17 would have work too), executes an RMI looking on the server and calls the administration method:
Local host: nyc1r219cpn24.retail.nyc1.sungrid.net
Looking for host where process: mserv08 is running
Host is: nyc1r220cpn17.retail.nyc1.sungrid.net
Looking up rmi://nyc1r220cpn17.retail.nyc1.sungrid.net/MersenneSolver...
Status: ....


The drawback of this solution is that you need to submit a second job to monitor the initial real job.. and this will cost you one dollar (even if this job takes only couple of seconds!).
Posted at 08:10AM Aug 31, 2006 by Pascal Ledru in Grid Computing  |  Comments[1]

Friday Aug 18, 2006
Distributed Mersenne Primes on Sun Grid
Just finished up running the distributed version of my test program. Only took 15 minutes using 5 processors of the Grid. An almost perfect acceleration!


What I think I need to get used to when using the Grid are the couple of scripts using the Grid engine commands to initiate all the processes on the nodes of the Grid. They are basically the same as the ones used by the RMI examples listed on the community but I will still list these scripts as I found them useful:

cat Mersenne.ksh
#!/bin/ksh
#$ -N Mersenne
#$ -cwd

# Set the environment variables
if [ -f $HOME/.profile ]; then
   . $HOME/.profile
fi

numClients=5

echo "Starting the server..."
svrResp=$(qsub -N server startServers.ksh)
echo svrResp is $svrResp
svrJobId=$(echo "$svrResp" | awk '{print $3}')
echo svrJobId is $svrJobId

# Wait until the server is started
status="not running"
until [ "$status" == "r" ]
do
  status=$( qstat | nawk '/'$svrJobId'/ {print $5}' )
  echo Server job status is $status
  sleep 10
done

#Wait until the serverhost file is created
filename="$HOME/serverhost"
until test -f $filename
do
  sleep 10
done
# then pull the server node name from the file
servernode=$(cat $filename)
rm $filename

echo "Server is running on" $servernode "Submitting a set of clients to the grid for remote execution..."
qsub -N clients -t 1-$numClients startClient.ksh $servernode

echo "Submitting a cleanup job that will wait until the clients are complete"
qsub -hold_jid clients cleanup.ksh $svrJobId



 cat startServers.ksh
#!/bin/ksh
#$ -N startServers
#$ -cwd

echo "Starting the registry in the background..."
rmiregistry &

# Wait until the registry is started
proc=0
while [ "$proc" == 0 ]
do
  proc=$( ps -ef | grep "[r]miregistry" )
  echo $proc is running
  sleep 10
done

# Place the name of this host in a file, so that clients can read it
hostname > $HOME/serverhost
echo "The servers location is" $(hostname)

echo "Starting the server..."
java MersenneServerImpl



 cat startClient.ksh
#!/bin/ksh
#$ -N startClient
#$ -cwd

servernode=$1

echo "Starting a client on $(hostname) to talk to server running on" $servernode
java MersenneClient $servernode



cat cleanup.ksh
#!/bin/ksh
#$ -N cleanup
#$ -cwd

# Set the environment variables
if [ -f $HOME/.profile ]; then
   . $HOME/.profile
fi

echo Killing the server, job number $1 ...
qdel $1

Posted at 03:10PM Aug 18, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Learning the hard way!
I learnt this one the hard way!

Do not harcode environment variables as the location of commands (e.g., Grid commands) could change. DO NOT do something like:

#!/bin/ksh
#$ -N Test_qstat
#$ -cwd

SGETOOLS=/home/sgeadmin/n1ge60/bin/sol-x86
export SGETOOLS

echo "Starting Test program..."
$SGETOOLS/qstat
echo "End Test program..."



Instead use:

#!/bin/ksh
#$ -N Test_qstat
#$ -cwd

# Set the environment variables
if [ -f $HOME/.profile ]; then
   . $HOME/.profile
fi

echo "Starting Test program..."
echo `which qstat`
qstat
echo "End Test program..."

Posted at 02:24PM Aug 18, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Wednesday Aug 16, 2006
Grid Engine crash course
The three commands to get started with the Grid Engine are:
- qsub
- qdel
- qstat


qsub is used to submit a job on a grid. qstat is used to see what are the jobs running on a grid, and qdel is used to delete a job from the grid. Here is a quick example:

cat startServer.ksh
#!/bin/ksh

SGETOOLS=/home/sgeadmin/N1GE/bin/sol-amd64
export SGETOOLS

numClients=10


$SGETOOLS/qsub -N client -t 1-$numClients startClient.ksh


This program starts 10 copies of the client program on the grid. The client has access to some environment variables such as: SGE_TASK_ID
> cat startClient.ksh
#!/bin/ksh
host=$( hostname )
echo "Starting client on" $host with ID  $SGE_TASK_ID >> /home/pascal/test1/file$host
proc=0
while [ "$proc" == 0 ]
do
  echo $proc
done
the output of qstat looks like:

> qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node01a                      1 1
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node01b                      1 2
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node02a                      1 3
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node02b                      1 10
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node03a                      1 9
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node04a                      1 7
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node04b                      1 5
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node05a                      1 8
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node05b                      1 6
   1221 0.55500 client     pascal       r     08/16/2006 17:39:50 all.q@node06b                      1 4

Posted at 02:48PM Aug 16, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]

Tuesday Aug 15, 2006
Distributed Mersenne Primes
Let's rewrite the previous program to take advantage of multiple CPUs! The idea is to have a master generating some pieces of work, some workers executing slices of the work and posting their results to the master. Fairly straightforward to write in Java using RMI! Starting with the interface defining the master:

import java.rmi.*;

public interface MersenneServer extends Remote {
  public int[] getInterval() throws RemoteException;
  public void postResult(int[] values) throws RemoteException;
}

Workers simply grab slices of the work and test the integers in the slices are primes:

import java.math.*;
import java.rmi.*;
import java.rmi.server.*;
import java.util.ArrayList;

public class MersenneClient {

  static final BigInteger one  = new BigInteger("1");
  static final BigInteger two  = new BigInteger("2");
  static final BigInteger four = new BigInteger("4");

  public MersenneClient() throws RemoteException {
  }

  private static boolean LucasLehmerTest(int p) {
    BigInteger s = four;
    BigInteger n = one.shiftLeft(p).subtract(one);
    for (int i = 3; i <= p; i++) {
      s = s.multiply(s).subtract(two).mod(n);
    }
    if (s.bitCount() == 0) {
      return true;
    } else {
      return false;
    }
  }


  public static void main(String[] args) {
    String host = args[0];
    String name = "rmi://" + host + "/MersenneSolver";
    System.out.println("Looking up " + name + "...");
    MersenneServer server = null;

    try {
      server = (MersenneServer)Naming.lookup(name);
    } catch (Exception ex) {
      System.out.println("Caught an exception looking up Solver.");
      ex.printStackTrace();
      System.exit(1);
    }

    while (true) {
      try {
        int[] interval = server.getInterval();
        if (interval == null) break; // no more intervals
        ArrayList list = new ArrayList();
        for (int i = interval[0]; i <= interval[1]; i++) {
          if (LucasLehmerTest(i)) {
            list.add(i);
          }
        }
        int[] values = new int[list.size()];
        for (int i = 0; i < list.size(); i++) values[i] = list.get(i);
        server.postResult(values);
      } catch (RemoteException ex) {
        System.out.println("Caught remote exception.");
        System.out.println("Probably server shutdown as all intervals are evaluated");
        System.exit(1);
      }
    }
  }

}


The implementation of the master generates slices and checks if all slices have been evaluated:

import java.rmi.*;
import java.rmi.server.UnicastRemoteObject;
public class MersenneServerImpl extends UnicastRemoteObject implements MersenneServer {

  private int i = 1;
  private int interval = 100;

  //private int totalInterval = 40;
  private int totalInterval = 80;

  public MersenneServerImpl() throws RemoteException {
  }

  // calcuate the Mersenne primes up to 5000
  // break the range into small intervals
  // each client will test the primes within a given interval
  public synchronized int[] getInterval() throws RemoteException {
    if (i >= 5000) return null;
    //if (i >= 3000) return null;
    if (i >= 2000) interval = 50;
    int j = i;
    int k = i + interval-1;
    i = i + interval;
    return new int[] {j, k};
  }

  public synchronized void postResult(int[] values) throws RemoteException {
    for (int i = 0; i < values.length; i++) {
      System.out.println("2^" + values[i] + "-1 is prime");
    }
    // check if we should exit
    totalInterval--;
    if (totalInterval == 0) System.exit(0);
  }

  public static void main(String[] args) {
    try {
      String name = "MersenneSolver";
      System.out.println("Registering Mersenne Solver");
      MersenneServerImpl solver = new MersenneServerImpl();
      Naming.rebind(name, solver);
      System.out.println("Remote Solver ready...");
    } catch (Exception ex) {
      ex.printStackTrace();
    }
  }

}


I first test this program locally using this simple script:

#!/bin/ksh

echo "Starting registry"
rmiregistry &

# Wait until the registry is started
proc=0
while [ "$proc" == 0 ]
do
  proc=$( ps -ef | grep "[r]miregistry" )
  echo $proc is running
  sleep 10
done

echo "Starting server"
java MersenneServerImpl &
sleep 10
echo "Starting client"
java MersenneClient localhost &
sleep 2
echo "Starting client"
java MersenneClient localhost &


Tomorrow, I will go over the Grid Engine commands to actually run this application on a Grid.
Posted at 11:41AM Aug 15, 2006 by Pascal Ledru in Grid Computing  |  Comments[0]