ISV engineering's HPC web log For HPC ISVs & OSS

Tuesday Jun 30, 2009

Getting an application (in this case, GLPK) into the OpenSolaris /contrib repository.[Read More]

Wednesday Jun 24, 2009

We are pleased to announce the availability of Sun HPC ClusterTools 8.2.   HPC ClusterTools provides a high-performance, production-quality Message Passing Interface (MPI) for Sun x86 and SPARC based systems.   HPC ClusterTools 8.2 continues to build on the success of CT8 where support for Linux, in addition to Solaris, was first introduced.  


HPC ClusterTools 8.2 is based on Open MPI 1.3.3 (open-mpi.org) and includes the following:

- InfiniBand, GbE, 10GbE, and Myrinet interconnect support

- IB multi-rail, IB QDR, and Mellanox ConnectX support

- MPI profiling with Sun Studio Analyzer, plus support for VampirTrace

- Application suspend/resume support

- Automatic Path Migration support

- Performance and scalability, including shared memory optimizations

- DTrace providers on Solaris

- Sun Studio, PGI, Intel, Pathscale, and GNU/gcc compiler support

- Plug-ins for Sun Grid Engine (SGE) and Portable Batch System (PBS)

- Totalview and Allinea DDT parallel debugger support

- Full MPI-2 standard implementation, including MPI I/O and one-sided communication

- Support for Linux (RHEL, SLES, CentOS), OpenSolaris, and Solaris

- Full Service Support offerings available from Sun

Move details can be found at http://www.sun.com/software/products/clustertools/index.xml 

We are announcing the availability of the Sun Grid Engine 6.2 Update 3

release.


Sun Grid Engine 6.2u3 is a feature update release. We are delivering cloud

connectivity functionality, initial power saving support, add a few much

demanded features and complete the Microsoft Vista support.


A. What's new

=============


Amazon Elastic Cloud EC2 Adapter

--------------------------------

The Service Domain Mager (SDM) adds connectivity to Amazon Elastic Cloud EC2

and the ability to flexibly add execution hosts as needed on demand.


Initial Power Saving Support

----------------------------

A new power saving scheme in SDM enables the creation of a special resource

spare pool in which systems can be powered on or off when added or removed from

this spare pool.


Service Domain Manager (SDM) Simple Install

-------------------------------------------

It is now possible to install and run an SDM system with only one JVM per

(managed or master) host. Previously, the system was using up to three

separate JVMs per host. This new feature simplifies installation,

configuration and maintenance.


SGE Inspect - Sun Grid Engine Inspect Module

-------------------------------------------

A new Java based Sun Grid Engine Inspect module allows to monitor SGE

clusters and the Service Domain Manager (SDM).


Exclusive Host Scheduling

-------------------------

Exclusive host scheduling allows users to request that jobs and parallel

tasks run exclusively on a host if allowed by an administrator.


Microsoft Windows Vista Display Support

---------------------------------------

With the release of Sun Grid Engine 6.2u3, the display_win_gui feature is

now fully supported. display_win_gui can now be used to display a job GUI on

the visible Desktop, on both the 32- and 64-bit versions of Windows Vista

(Enterprise and Ultimate Edition), and on Windows Server 2008.


This feature allows a Sun Grid Engine job to request the "display_win_gui"

complex attribute, which launches a GUI on the currently visible Desktop on

the Windows host that displays job information. This works only if the job

is a native Windows application.


Changes to licensing

====================


This Sun Grid Engine version changes the terms under which the software can

be used. Without a valid Sun Grid Engine license, evaluation use is only

permitted for 90 days. The courtesy binaries which will be made available,

will continue to allow unlimited use but will not include the Amazon EC2

adapter and the SGE Inspect modules.


Relevant links

==============


Download:

http://www.sun.com/software/sge


Documentation:

http://wikis.sun.com/display/gridengine62u3


Release Notes and detailed information on new features:

http://wikis.sun.com/display/gridengine62u3/Release+Notes


Patch Matrix:

http://wikis.sun.com/display/gridengine62u3/Patch+Matrix


Man Pages Online:

http://gridengine.sunsource.net/manpages.html


List of fixed bugs:

http://gridengine.sunsource.net/project/gridengine/62patches.txt


Announcing Sun HPC Software, Developer Edition 1.0 for OpenSolaris, which

provides a pre-configured, integrated development environment to enable

developers to quickly and efficiently create, debug and deploy parallel

applications. It takes advantage of virtualization for easy installation and

seamlessly integrates with a cloud environment for extreme scalability.


At a Glance

- Fully featured HPC development environment distributed as a virtual machine

(VM).

- Pre-configured as a grid enabled, virtual HPC cluster comprised of three

OpenSolaris zones.

- Turn-key parallel application development environment with distributed

resource management and cloud connectivity built in and ready to go.

- Sun Studio and Sun HPC ClusterTools come pre-installed and configured,

providing Fortran, C and C++ compilers, MPI libraries, performance analysis

and debugging tools, high performance scientific libraries and an intuitive IDE

for application development.

- Sample applications are included in the installation to get you up and

running quickly.


Learn more at:

http://www.sun.com/software/products/hpcsoftware/hpcdev


This software is particularly aimed at professors and students for parallel

programming classes since it is very lightweight and provides an complete

development environment for HPC on a laptop.


Sun Studio 12 Update 1, the latest production release of Sun Studio Compilers and Tools, is now available for download:

http://developers.sun.com/sunstudio/


Supported on Solaris 10, OpenSolaris (2008. 11 and 2009.06) and the leading Linux distributions (SuSE Linux Enterprise Server 10, Red Hat Enterprise Linux 5, CentOS 5), feature highlights since Sun Studio 12 include:


* C, C++ and Fortran compiler optimizations for the latest UltraSPARC and SPARC64-based architectures

* C, C++ and Fortran compiler optimizations for the latest x86 architectures from Intel and AMD including SSSE3, SSSE4a, SSe4.1, SSE4.2 compiler intrinsics support

* Compiler, debugger, and profiling support for OpenMP 3.0

* Profiling of distributed MPI-based applications

* DLight - New tool for unified application and system profiling using Dynamic Tracing (DTrace) technology on Solaris platforms

* dbxTool - New stand-alone graphical debugger

* Highly tuned and parallelized scientific libraries, including ScaLAPACK

* Update IDE based on NetBeans 6.5.1 software


Sun Studio 12 Update 1 Features page: http://developers.sun.com/sunstudio/features/index.jsp

Sun Studio 12 Update 1 Press Release: http://www.sun.com/aboutsun/pr/2009-06/sunflash.20090623.1.xml

Sun Studio Blogging Contest: http://developers.sun.com/sunstudio/community/campaigns/blogcontest_062009/welcome.jsp

Monday May 04, 2009

New version of HAR performance monitoring tool. [Read More]

Friday Apr 17, 2009

Sun's new Nahalem-based servers are ready for running LS-DYNA in a very efficient way. (This is a reposting from my Sun blog.)

[Read More]

Tuesday Apr 14, 2009


Having been involved in MD Nastran performance tuning and benchmarking activities on Sun hardware for many years it always catches my attention when I see some stand-out performance on new hardware.  I saw some of this exceptional performance during some recent MD Nastran benchmarking I did on the new Intel Xeon Processor 5500 Series (aka Nehalem) found in the Sun Fire X4270 server.  This benchmarking effort was part of a larger effort I was involved in to gather benchmarking performance data across various Sun hardware configurations--my goal was to study the effects on MD Nastran performance (elapsed and cpu times ) for different processors, disk,  and memory configurations.  I saw exceptional performance (reduced elapsed times) on the Sun Fire X4270 compared to all the various platforms and configurations I tested.   As one example from my benchmark study I chose the following X4150 machine configuration for this blog:

    Sun Fire X4150 Server

    2x Xeon X5460 3.1 Ghz processors, 24GB RAM

    4x 146GB 10K RPM SAS drives

    OS: Solaris 10


     The new Sun Fire X4270 configuration:

     Sun Fire X4270

     2 Xeon X5570 2.9 Ghz processors, 24GB RAM

     4x 146 10K RPM SAS drives

     OS: Solaris 10


On both servers I used one disk for the Solaris 10 OS, MD Nastran binary, and the standard MD Nastran *.f04,*.f06, and *.log output files.  I configured the remaining  disks with ZFS and used these for the MD Nastran database files (more on ZFS later).


The table below shows the "% reduction in elapsed time" for the MD Nastran MDR3 benchmarks on the Sun Fire X4270 compared to the Sun Fire X4150:


 getrag  gm20a_1_1  md0mdf1_1  xl1fn40  vl0sst1  xl0imf1  xl0tdf1_1  xx0cmd2_1
 34%  27%  32%  32%  16%  55%  17%  35%
 A description of these MD Nastran MDR3 benchmarks can be found on MSC.Software's performance web site.  

 


 

As I mentioned earlier, this performance data came out of a larger study I was involved in to look at various machine configurations and their effect on MD Nastran performance.  I'll blog in more detail on that study in a future blog--here's a few highlights:

1. Solid State Drive (SSD) performance vs SAS disk:

During my performance study with SSD's I saw some noteworthy performance improvements (elapsed time reductions) on some of the above individual MD Nastran benchmarks and also when I ran a combination of these benchmarks concurrently on the same machine.  In some cases I saw up to a 56% reduction in elapsed times when using the SSD's compared to the internal SAS disks --with the amount of reduction corresponding to the amount of I/O and memory used by the benchmark(s).  For example, the relatively large MD Nastran  DMP (Distributed Memory Parallel) benchmark "xx0cmd2_8,  DMP=8" was 49% faster (elapsed time) using the SSD's compared to the internal SAS disks, while other smaller benchmarks like the "getrag" benchmark showed a 19% reduction in elapsed time.  I was also able to get significant performance improvement using the SSD's when I ran a combination of Nastran benchmarks concurrently on one machine.  For example, running 3 benchmarks concurrently (two getrag jobs and one xl1fn40 job), which together utilized 20Gb of memory and generated 1TB of total I/O, I saw a 44 - 56% reduction in elapsed time with the SSD's compared to the internal disks.  The reason for the range of 44-56% is explained in the next section below on "ZFS Intent Log (ZIL)".


Here's the configuration I used for this SSD benchmarking:

     Sun Fire X4270

     2 Xeon X5570 2.9 Ghz processors, 24GB RAM

     4x 146 10K RPM SAS drives (formatted with ZFS (Raid 0))

     3x SSD's (32GB SSD's) (formatted with ZFS (Raid 0))

     OS: Solaris 10


2. ZFS Intent Log (ZIL):

I've blogged in the past on the benefits and "ease-of-use" of using "ZFS with MD Nastran".

Recently I discovered a configuration option with ZFS that's worth noting related to what's called the ZFS Intent Log (ZIL).  By simply turning off the ZIL I was able to turn the 44% reduction in elapsed time (mentioned above in my discussion on SSD performance) into a 56% reduction.  The ZIL is a mechanism to guarantee ZFS in terms of writes in the event of a machine crash.  However, if you're running an MD Nastran "scratch (scr=yes)" job then you will probably be comfortable experimenting with ZIL turned off to see if your particular mix of jobs will benefit.  I'm currently in the process of running various Nastran benchmark combinations with ZIL turned off and will post the results of that study in a future blog.

To turn off the ZIL edit the /etc/system file and add the following:

set zfs:zil_disable=1

 

For more information:

1. Sun Fire X4270 server 

2. MD Nastran 

3. Sun Fire X4150 server 

4. Solaris 10 OS 

5. Solid State Drive (SSD) 

 

 

 

 

Thursday Mar 19, 2009

 In my prior blogs in this series on integrating "Sun Grid Engine and MSC.Software's MD Nastran" I described the following:

1. "Sun Grid Engine and MD Nastran" [recommended SGE configurations/queues for MD Nastran users]  

2. "Part 1--How to submit "MD Nastran" (serial) jobs with Sun Grid Engine"

3. "Part 2--How to submit "MD Nastran" DMP (Distributed Memory Parallel) jobs with Sun Grid Engine"  

4. "Part 3--How to configure consumable resources (Disk, Memory, and License Tokens)"

In this final blog in the series I'll describe some SGE configuration details I mentioned in the earlier blogs.

1.  How to create "runtime limiting queues"
2.  How to create an "SGE parallel environment"

First, a quick comment on my use of SGE's CLI (command line interface) instead of the GUI interface tool (QMON).
I continue to give examples using the SGE's CLI instead of the  GUI interface (QMON) to configure SGE because I've found the CLI to be an extremely flexible and powerful way to change settings once you become familiar with SGE. [That said, I still go back to the SGE QMON GUI interface tool when I'm not sure how something works--then it's a really great way to see the relationship of a feature or setting to the other components within SGE]



Section I:  "How to create runtime limiting queues"

Step #1. Execute the qconf command to make a small, medium, and large queue

#qconf -aq small.q
#qconf -aq medium.q
#qconf -aq large.q

Step #2: Modify the queues "small.q, medium.q, and large.q" to have different h_rt (runtime (elapsed time) limits).


--> First, set the small.q to have complex_value h_rt=5 minutes
# qconf -mattr queue complex_values h_rt=00:05:00 small.q
"complex_values" of "small.q" is empty - Adding new element(s).
root@tm19-232 modified "small.q" in cluster queue list

-->Second, set the medium.q to complex value h_rt=30 minutes
# qconf -mattr queue complex_values h_rt=00:30:00 medium.q
"complex_values" of "medium.q" is empty - Adding new element(s).
root@tm19-231 modified "medium.q" in cluster queue list

-->Third, set the large.q to complex value h_rt="very large number"
# qconf -mattr queue complex_values h_rt=99:00:00 large.q
"complex_values" of "large.q" is empty - Adding new element(s).

Now verify that the above values/limits have been set correctly:
For example, to check time limit of 5 minutes (300 seconds) for small.q:
tm19-231:/dpl/sge.sc08#qconf -sq small.q
qname                 small.q
hostlist              tm19-231
.....
complex_values        h_rt=300
.....


Section II:   How to make a parallel environment (PE) called "nastran" for Nastran DMP (Distributed Memory Parallel) jobs:
First, some background.
What is a parallel environment?  A parallel environment within SGE enables concurrent computing on parallel platforms in networked environments.
Before you continue you might want to also read my earlier blog on MD Nastran's DMP (Distributed Memory Parallel) capability:  

Here's the steps in creating a Nastran "parallel environment" in SGE.

Step #1: Create the "nastran" parallel environment

First check to see what parallel environment(s) already exist in your SGE environment:
tm19-231:/dpl/sge.sc08#qconf -spl
make

Now add a  parallel environment  (let's call it "nastran") to SGE using the following "qconf -ap" command:
[the -ap option (add parallel environment) displays an editor containing a parallel environment configuration template.]
tm19-231:/dpl/sge.sc08#qconf -ap nastran
pe_name           nastran
slots              0
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

Now you can modify/edit two of the above default parameters settings ("slots" and "allocation_rule") to the following: (1)  "slots 16" to utilize all cores in my two 8-core machines, and (2)  "allocation rule $round_robin"  so that SGE will distribute the jobs optimally among the machines defined in the parallel environment queue.

tm19-231:/dpl/sge.sc08#qconf -sp nastran
pe_name            nastran
slots              16
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE



Step #2: Attach/associate one or more "queues" to the "nastran" parallel environment using "qconf -mq"
...for this example I'm only attaching queue "large.q" to the parallel environment "nastran".

tm19-231:/dpl/sge.sc08#qconf -mq large.q
qname                 large.q
hostlist              tm19-231 tm19-232
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 8
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
"/var/tmp/9512-O3JEa7" 50 lines, 1494 characters

....now edit the above "pe_list"  setting to add the "nastran" parallel environment
to this "large.q" queue.

qname                 large.q
hostlist              tm19-231 tm19-232
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make nastran
rerun                 FALSE
slots                 8
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
"/var/tmp/9512-O3JEa7" 50 lines, 1494 characters

Step 3: You're now ready to submit jobs to the "nastran" parallel environment:

[The queues that will be available for these jobs are only those queues that have been associated with the parallel environment interface "nastran" ( in my example I only have "large.q" associated with the "nastran" parallel environment--however, you can also add other queues (like the queues small.q and medium.q described earlier).  In addition to this requirement that a queue must be associated with a parallel environment the queue must also satisfy any resource requirement specified by a  "qsub -l" command when you submit your job.]

...as described in a prior blog here's the script I used to submit a Nastran job to the parallel environment "nastran".

#cat nastran_qsub.sh
qsub -pe nastran 2 -q large.q  -l mem_free=6G -l nastran_tokens=`token_estimate.sh`  -l export_size=10G  -S /bin/ksh nastran_sge.sh

The above SGE command line defines the following:
1. "qsub" : qsub is the SGE command used to submit the Nastran job submittal script "nastran_sge.sh" to the parallel environment  "nastran"  I configured within SGE.  The standard Nastran job submittal command "mdnast2008..." that Nastran user's are familiar with is in the script  "nastran_sge.sh"  (see Step 2. below).

2." -pe nastran 2" : This tells  qsub to submit  "nastran_sge.sh" to the  SGE parallel execution environment "nastran".  The "nastran_sge.sh" script will then start two MD Nastran jobs (running in parallel)  on one or more of the host machines as defined in the queue (large.q).
[The queues that are suitable for this job, like the large.q, are queues that are associated with the parallel environment interface "nastran" by the parallel environment configuration. Suitable queues also must satisfy the resource requirement specification specified by the qsub -l command (see item 4. below).]

3. "-q large.q" : I configured three SGE queues  for my Nastran environment (small.q, medium.q, and large.q)--each one having a different limit on the amount of elapsed time allowed for jobs within the queue.  In this example I chose large.q to handle a long  (elapsed time) running job.

4. "-l mem_free=6G" -l nastran_tokens=`token_estimate.sh`  -l export_size=10G" defines the "complex resource attributes" that I configured within SGE. If  any of these requirements (free memory, tokens, or disk space) is not met the jobs will not be dispatched.



This concludes my series of blogs on integrating "Sun Grid Engine and MD Nastran".  Feel free to send me comments or suggestions on additional configurations/queues/settings that you think might be useful to MD Nastran users. Based on the input I receive I may start up another series of blogs on this topic to cover those additional suggestions.






.







 

 

Wednesday Mar 18, 2009

Sun Studio Express 3/09, the official build used for the Sun Studio 12

Update 1 Early Access Program, is now available for download:

http://developers.sun.com/sunstudio/downloads/express/index.jsp


The Sun Studio Early Access build is available on Solaris, OpenSolaris

and the latest Linux distributions (Red Hat Enterprise Linux, SuSE Linux

Enterprise, CentOS, Ubuntu). Feature highlights since the Sun Studio 12

release include:


    * C/C++/Fortran compiler optimizations for the latest x86

architectures from Intel and AMD including SSSE3, SSSE4a, SSe4.1, SSE4.2

compiler intrinsics support

    * C/C++/Fortran compiler optimizations for the latest UltraSPARC

and SPARC64-based architectures

    * DLight - New tool to utilize and visualize the power of Solaris

Dynamic Tracing (DTrace) technology

    * dbxTool - New stand-alone GUI debugger

    * Full OpenMP 3.0 compilers and tools support

    * MPI performance analysis in the Performance Analyzer

    * NetBeans IDE 6.5 including new remote development features


These new features are described in the readme and wiki pages:

Readme: http://developers.sun.com/sunstudio/downloads/ssx/express_March2009.html

Wiki pages: http://wikis.sun.com/display/SunStudio/Sun+Studio+Express+March+2009+Release


The Early Access program gives developers a chance to evaluate new

capabilities of Sun Studio software, provide their feedback to the

product team and influence future product releases.  Please encourage

your customer's participation in this program so that we can gather

valuable feedback to assess the readiness of our release.


Sun Studio 12 Update 1 Early Access Program:

http://developers.sun.com/sunstudio/overview/earlyaccess/index.jsp


Tuesday Mar 17, 2009

LUG09 - Seventh Annual Lustre User Group Meeting

April 16-17, 2009

Cavallo Point Lodge

Sausalito, California


Preliminary agenda: 

https://www.regonline.com/custImages/241834/LUG/090312lug09agenda-P1.pdf


Registration is now open for the Lustre User Group, the premier event for learning new 

technical information, acquiring best practices, and sharing knowledge about Lustre

 technology. LUG09 is a once-a-year opportunity for users to get answers, advice, and 

suggestions regarding their specific Lustre implementations. 


Attendees will have access to experts and peers who will share their real-world experiences. 

With updates on the community development project, Birds of a Feather sessions, demos, 

and tutorials, LUG09 is the perfect opportunity to meet with the Lustre development team 

and discuss upcoming enhancements and capabilities.


Hurry! To take advantage of the $350 Early Bird registration rate, you must register 

by April 1, 2009.

http://www.regonline.com/LUG09



Special Course Available: Lustre Advanced Administration and Support

April 15, 2009

Cavallo Point Lodge

Sausalito, California


For the first time ever, a special course on Lustre Advanced Administration and Support 

will be offered on April 15 before the User Group meeting. This course is designed for 

people who already have a good understanding and and experience with the Lustre File System. 

The class will cover a host of advanced architectural and support techniques. Space will be 

limited for this course and tuition discounts will be offered for LUG attendees.


Register now for the Lustre Advanced Administration and Support Seminar. LUG attendees 

will get a special discount on course registration.


See you at LUG09!


If you have any questions or interest about LUG09, please contact us at

LUG2009@SUN.COM


Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters.

For this setup we will use the following software packages:

1. Ganglia - the core Ganglia package

2. Zlib - zlib compression libraries

3. Libgcc - low-level runtime library

4. Rrdtool - round Robin Database graphing tool

5. Apache web server with php support

You can get the  packagers ( 1-3)  from sunfreeware (depending on your architecture -  x86 or SPARC)

Unzip and Install the packages

1. gzip -d ganglia-3.0.7-sol10-sparc-local.gz
  
   pkgadd -d ./ganglia-3.0.7-sol10-sparc-local
 
2. gzip -d zlib-1.2.3-sol10-sparc-local.gz 
  
   pkgadd -d ./zlib-1.2.3-sol10-sparc-local 
  
3. gzip -d libgcc-3.4.6-sol10-sparc-local.gz
  
   pkgadd -d ./libgcc-3.4.6-sol10-sparc-local
  

4. You will need pkgutil from blastwave in order to install rrdtool software packages

  

/usr/sfw/bin/wget http://blastwave.network.com/csw/unstable/sparc/5.8/pkgutil-1.2.1,REV=2008.11.28-SunOS5.8-sparc-CSW.pkg.gz


 
gunzip pkgutil-1.2.1,REV=2008.11.28-SunOS5.8-sparc-CSW.pkg.gz


 
pkgadd -d pkgutil-1.2.1,REV=2008.11.28-SunOS5.8-sparc-CSW.pkg

 

Now you can install packages with all required dependencies with a single command:

/opt/csw/bin/pkgutil -i rrdtool
  
5. You will need to download Apache ,PHP and Core libraries from Cool stack

Core libraries used by other packages

bzip2 -d CSKruntime_1.3.1_sparc.pkg.bz2
  
pkgadd -d ./CSKruntime_1.3.1_sparc.pkg
  


Apache 2.2.9,  PHP 5.2.6

 bzip2 -d CSKamp_1.3.1_sparc.pkg.bz2
  
 pkgadd -d ./CSKamp_1.3.1_sparc.pkg
  
The following packages are available:
 
  1  CSKapache2     Apache httpd
  
                    (sparc) 2.2.9
   
  2  CSKmysql32     MySQL 5.1.25 32bit
    
                    (sparc) 5.1.25
 
  3  CSKphp5        PHP 5
  
                    (sparc) 5.2.6
  
Select package(s) you wish to process (or 'all' to process
   
all packages). (default: all) [?,??,q]:1,3
 

Select the 1 and 3 option

Enable the web server service

svcadm enable svc:/network/http:apache22-csk 
 

Verify it is working

svcs svc:/network/http:apache22-csk


STATE          STIME    FMRI


online         17:02:13 svc:/network/http:apache22-csk
  

 Locate the Web server  DocumentRoot

grep DocumentRoot /opt/coolstack/apache2/conf/httpd.conf
 
DocumentRoot "/opt/coolstack/apache2/htdocs" 
  

Copy the Ganglia directory tree

cp -rp /usr/local/doc/ganglia/web  /opt/coolstack/apache2/htdocs/ganglia
 

Change the rrdtool path on  /opt/coolstack/apache2/htdocs/ganglia/conf.php

from /usr/bin/rrdtool  to /opt/csw/bin/rrdtool


 


 

Start the gmond daemon with the default configuration

/usr/local/sbin/gmond --default_config > /etc/gmond.conf
 

Edit /etc/gmond.conf  ,change  name = "unspecified"  to name="grid1"  (This is our grid name.)

Verify that it has started : 

ps -ef | grep gmond 
nobody  3774 1 0 16::57:41 ? 0:55 /usr/local/gmond
 

In order to debug any problem, try:

/usr/local/sbin/gmond --debug=9
 

Build the directory for the rrd images

mkdir -p /var/lib/ganglia/rrds
chown -R nobody  /var/lib/ganglia/rrds  
  
Add the folowing line to /etc/gmetad.conf
data_source "grid1"  localhost 
  

Start the gmetad daemon

/usr/local/sbin/gmetad
  

Verify it -->

 ps -ef | grep gmetad

nobody  4350     1   0 17:10:30 ?           0:24 /usr/local/sbin/gmetad
 

To debug any problem

  
/usr/local/sbin/gmetad --debug=9

Point your browser to: http://server-name/ganglia


  

  

  

Wednesday Mar 04, 2009

We are announcing today the availability of the Sun Grid Engine 6.2 Update 2

release. We are also announcing a major reconstruction of the SGE wiki docs

which will lead to better usability and navigation.


SGE 6.2u2 is a "feature update" release. We are delivering a few much

demanded features, scalability improvements and memory foot print reductions

in huge HPC clusters, and bug fixes.


There are no changes in licensing and pricing. Patches will be available

within the next 24 hours on Sunsolve. Open source courtesy binaries will be

made available next week.


What's new

==========


GUI Installer

------------

Sun Grid Engine 6.2u2 comes with a new GUI installer to simplify the

installation process. The GUI installer enables you to easily install a

whole cluster interactively. To install a cluster, you need to set up the

environment in a similar way to an automatic installation.


Job Submission Verifiers (JSVs)

-------------------------------

JSVs allow users and administrators to define rules that determine which

jobs are allowed to enter into a cluster and which jobs should be rejected

immediately. A JSV is a script or binary that can be used to verify, modify,

or reject a job during the time of job submission or on the master host.


Consumable Resources Per Job

----------------------------

Consumable complex attributes can now be configured as per job. Such

consumables are consumed as requested and are no longer multiplied by the

requested slots. This makes resource requests for parallel jobs much easier

to define, especially when using slot ranges.


jemalloc Library

----------------

Linux distributions (x64 platforms) come with a default memory allocator

library which is not as efficient as the open source jemalloc memory

allocator library also used by the Firefox browser. SGE 6.2 Update 2

replaces the native Linux malloc library with the jemalloc library. This has

a positive effect on the master host performance in large and high

throughput Sun Grid Engine clusters on Linux and reduces the memory

footprint up to 20%. This will lead to a significant performance increase.


Relevant links

--------------


   Download:

   http://www.sun.com/software/sge/get_it.jsp


   Documentation

   http://wikis.sun.com/display/gridengine62u2


   Release Notes:

   http://wikis.sun.com/display/gridengine62u2/Release+Notes


   Patch Matrix:

   http://wikis.sun.com/display/gridengine62u2/Patch+Matrix


   Man Pages Online:

   http://gridengine.sunsource.net/manpages.html


Monday Feb 23, 2009

Please mark your calendars for the upcoming Sun HPC Consortium event

This year's European event will take place in the city of Hamburg, Germany.

We will meet before the ISC'09 event (http://www.supercomp.de/isc09/) starting on June 23rd.

As usual, we will have our customers and partners report on their latest research and systems.

This is the place to hear the latest technology updates from Sun and how partners and customers are using it. 

*LUG09 - Seventh Annual Lustre User Group Meeting*

*April 16-17, 2009*

*The Lodge at Golden Gate*

*Sausalito, California*


Sun Microsystems welcomes you to the Lustre User Group, the premier

event for learning new technical information, acquiring best practices,

and sharing knowledge about Lustre technology. LUG09 is a once-a-year

opportunity for users to get answers, advice, and suggestions regarding

their specific Lustre implementations.


Attendees will have access to experts and peers who will share their

real-world experiences. With updates on the community development

project, Birds of a Feather sessions, demos, and tutorials, LUG09 is the

perfect opportunity to meet with the Lustre development team and discuss

upcoming enhancements and capabilities.

*Lustre Advanced User Seminar

*


Lustre Advanced User Seminar will be offered on *April 15* before the

User Group meeting. This seminar is designed for senior systems

administrators, engineers and integrators needing more comprehensive

knowledge of Lustre Administration and Troubleshooting techniques. Prior

completion of Lustre Administration and Support Level 1 (ES-288) and/or

prior experience administrating Lustre is strongly recommended in order

to receive maximum value from this seminar. Space will be limited and

registration fee discounts will be offered for LUG attendees.



Links to the LUG09 Registration Site will be posted soon at Lustre.org

<http://lustre.org/>, so stay tuned for further announcements.

If you have any questions or interest about LUG09, please contact us at

LUG2009@SUN.COM <mailto:LUG2009@SUN.COM>


See you at LUG09!


Sunday Feb 22, 2009

In this blog I continue my discussion on how to integrate "Sun Grid Engine and MD Nastran".

I'll explain how to utilize SGE to manage disk space, memory, and MD Nastran license tokens.

In my first blog in this series I discussed 2 frequent questions that Nastran users have before submitting their jobs:

1.) "What machine(s) have enough disk space to satisfy the Nastran output database file requirements (scratch and scr300)?"

2.) "What machine(s) have enough physical memory for optimum Nastran performance?"

I also mentioned in the prior blog that I wanted to show how to solve the issue of managing Nastran jobs in an environment that has a
limited pool of license "tokens"--(utilizing a combination of MSC's "ESTIMATE" program together with SGE's ability to
track "consumable resources").


Before I begin here's some background on SGE terminology that I'll be referring to:

1.) SGE has what's called "resource attributes" that are stored in an entity called the Grid Engine "complex".
Disk space, machine memory, and license tokens are examples of resource attributes.

2.) A resource attribute can be classified/defined as a "consumable resource"--when this is the case SGE will
do the bookkeeping necessary to track the availability of the resource (increasing or decreasing the
availability based on usage). Note that this availability can be monitored/modified through two different
methods: (1) by telling SGE how much of the resource you'll be using on the SGE job submittal command
(e.g., -l disk_space=10G), and/or (2) by using a "load sensor" script that dynamically determines the availability
of the "consumable resource"--a script that runs in the background and periodically determines the availability
of a particular resource.

 

The rest of this blog will be separated into 3 sections:

Section 1: "How to manage disk space"

Section 2: "How to manage Memory"

Section 3: "How to manage license tokens"

 



-->Section 1: "How to manage disk space"

 


Step #1: Create a "load sensor" script that does the following: (1) dynamically determines disk space on the filesystem of interest to you,
and (2) sets the corresponding "resource attribute" to the available disk space (in this case I created a "consumable resource" called "export_size").
[In Step 2 below I show how I added this "export_size" consumable resource to the SGE environment.]

In this example, I decided to use "/export" as the filesystem of interest--to change this to your filesystem just change the line [FS="/export"].

Here's the script (disk_space.sh) that I used to determine free disk space on /export:

tm19-231:$PWD#cat disk_space.sh

#!/bin/sh

FS="/export"

export FS

myhost=`uname -n`

ende=false

while [ $ende = false ]; do

   # ----------------------------------------

   # wait for an input

   #

   read input

   result=$?

   if [ $result != 0 ]; then

      ende=true

      break

   fi


   if [ "$input" = "quit" ]; then

      ende=true

      break

   fi

   #set export_host to free space on /export

     dfutput="`df -kh $FS | tail -1`"

     diskfree=`echo $dfutput | awk '{ print $4}'`

     echo begin

     echo "$myhost:export_size:${diskfree}"

     echo end

done

# we never get here

exit 0

## end of disk_space.sh script

 

Step #2:  Create a consumable resource called "export_size" using the "qconf -mc" edit command:


tm19-231:/dpl/sge.sc08#qconf -mc

"/var/tmp/2481-9OzK0o" 55 lines, 4571 characters

#name               shortcut   type        relop requestable consumable default  urgency

#----------------------------------------------------------------------------------------

...

export_size         e_size     MEMORY      <=    YES         YES        0        0

...

# >#< starts a comment but comments are not saved across edits --------

 


Step #3:  Point to the above disk_space.sh script (Step #1) that has been customized for your site's storage environment,

In my example I'm assuming all machines will use /export for their local Nastran scratch space--this

can be modified to include more than one filesystem by adding another complex attribute setting

in this script--you can also have one for each host.


Now use "aconf -mconf global" to indicate that this load_sensor should apply to all hosts.

Note that you can list (coma separated) more than one script for the load_sensor below:

 

# qconf -mconf global

"/var/tmp/9800-xONwHt" 50 lines, 1938 characters

execd_spool_dir              /gridware/sge/default/spool

mailer                       /bin/mailx

xterm                        /usr/openwin/bin/xterm

load_sensor                  /gridware/sge/util/resources/loadsensors/tmpspace.sh, \

                             /dpl/sge.sc08/disk_space.sh

prolog                       none

epilog                       none

shell_start_mode             posix_compliant

login_shells                 sh,ksh,csh,tcsh

min_uid                      0

min_gid                      0

user_lists                   none

xuser_lists                  none

projects                     none

xprojects                    none

enforce_project              false

enforce_user                 auto

load_report_time             00:00:40

max_unheard                  00:05:00

reschedule_unknown           00:00:00

loglevel                     log_warning

administrator_mail           dale.layfield@sun.com

"/var/tmp/9800-xONwHt" 50 lines, 1938 characters

 

 

Step #3a: As an optional step you can set the consumable resource export_size value to an initial size for all hosts
You can use "qconf -me global" to modify/edit the exec host (global) setting to "export_size=500G.

To verify this setting use the following "show" command: 

tm19-231:/dpl/sge.sc08#qconf -se global

hostname              global

load_scaling          NONE

complex_values        nastran_tokens=500,export_size=500G

load_values           NONE

processors            0

user_lists            NONE

xuser_lists           NONE

projects              NONE

xprojects             NONE

usage_scaling         NONE

report_variables      NONE

tm19-231:/dpl/sge.sc08#

 

Step #4: Now you can verify that the "export_size" is being dynamically set to the

actual /export free space. On my machine (hostname tm19-231) there is 50G available/free in /export:

 


m19-231:/dpl/sge.sc08#qhost -F

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS

-------------------------------------------------------------------------------

global                  -               -     -       -       -       -       -

   gc:nastran_tokens=500.000000

   gc:export_size=500.000G

tm19-231                sol-amd64       8  0.09    8.0G    1.7G    8.0G     0.0

   gc:nastran_tokens=500.000000

   hl:export_size=50.000G

...............

Step #5: You can now submit your Nastran jobs with a request for the amount of disk space your job needs
and SGE will only dispatch the job to hosts that meet your requirements.

For example, because the following qsub command requests 10GB of disk space "-l export_size=10G" the job will only be
dispatched to host(s) that have sufficient free space in /export.

#cat nastran_qsub.sh
qsub -pe nastran 2 -q large.q  -l mem_free=6G -l nastran_tokens=`token_estimate.sh`  -l export_size=10G  -S /bin/ksh nastran_sge.sh

 

-->Section II: "How to manage Memory"

Making sure that your job only runs on machine(s) with sufficient physical memory can be accomplished by using the built-in "consumable attribute" "mem_free".  As shown in the above example you would specify how much memory you need using
"-l mem_free=nG"--your job would then be dispatched to only those machine(s) that meet this request.

 



-->Section III: "How to manage license tokens"

 

As I discussed in a prior blog I wanted to find a way avoid the scenario where a Nastran job can
stop/"time out" partially through the analysis because there are no more Nastran license tokens left in
the "token" pool. This scenario can occur if the customer site has a limited license
"token" pool. The problem then arises when two or more Nastran jobs are running.
First, the jobs request an initial set of license tokens to get started.
Then later during the analysis these same job(s) may need more tokens because of some additional
Nastran "feature" that gets invoked.  At this point there may not be enough "tokens" left in the license pool,
resulting in the job(s) stopping/"timing out" for lack of licenses tokens.

A solution for this issue is by using MSC Software's "ESTIMATE" program.  The ESTIMATE program does a quick scan
and analysis of the Nastran input file to determine its resource requirements (disk, memory, and license tokens).  This
information can then be used along with a "consumable resource" to ensure that sufficient license tokens are
available. You can read more about the ESTIMATE program at:http://www.mscsoftware.com/support/prod_support/mdnastran/cog.pdf

Here's what I did to configure SGE to work with MSC's ESTIMATE program:
 
Step #1:  Create a script (estimate.sh) that will call MSC's "ESTIMATE" program.

tm19-231:/dpl/sge.sc08/dmp_jobs#cat estimate.sh
#
MD2008=/msc/nastran/bin/md2008
#
$MD2008 estimate  dmp_101c.dat report=keyword out=estimate.out


The output from the above estimate.sh script (with this particular Nastran input file) is:

tm19-231:/dpl/sge.sc08/dmp_jobs#estimate.sh
ESTIMATE - (Version 2008.0 Jun  6 2008 12:16:44)

 ======================
 Licensing Information:
 ======================
 Features Required:
NASTRAN
Token counts for each feature may be adjusted by modifying "/msc/nastran/md2008/solaris/feature.lis"
 Feature         Tokens
 NASTRAN            250
 Total:             250

WORDSIZE=32
K=1024
BUFFSIZE=8193
SOL=101
SE=No
SOLVER=Direct
NGRID=3811
NDOF=10845
N0D=202
N1D=0
N2D=0
N3D=575
MEMORY=128.0MB
DISK=65.4MB
DBALL=33.6MB
SCRATCH=16.8MB
SCR300=11.8MB
SDBALL=7813MB
SSCR=7813MB
ERRORS=1
JID=./dmp_101c.dat
tm19-231:/dpl/sge.sc08/dmp_jobs#


Step #2: Now parse the above "ESTIMATE" output and return the number of required license tokens.

tm19-231:/dpl/sge.sc08/dmp_jobs#cat token_estimate.sh
#
estimate.sh > estimate.out 2> /dev/null
nawk '/^ *Total/ {print $2}' < estimate.out
#

... token_estimate.sh returns the total number of Nastran license tokens required for this job-->250:
tm19-231:/dpl/sge.sc08/dmp_jobs#token_estimate.sh
250


Step #3: Create a "consumable resource" for Nastran tokens.
 
tm19-231:/dpl/sge.sc08#qconf -mc
...
nastran_tokens      nt         INT         <=    FORCED      YES        0        0
,,,


Step #4: Set the consumable resource "nastran_tokens" to the total amount available in the
token pool (and make this available to all hosts and queues (i.e., "global").
 
For example, I added "nastran_tokens=500" to the exec host (global) setting using SGE edit
command "qconf -me global":

To verify this setting use the following "show" command:

tm19-231:/dpl/sge.sc08#qconf -se global

hostname              global

load_scaling          NONE

complex_values        nastran_tokens=500,export_size=500G

load_values           NONE

processors            0

user_lists            NONE

xuser_lists           NONE

projects              NONE

xprojects             NONE

usage_scaling         NONE

report_variables      NONE


Step #5: Now you can submit your Nastran job and specify the required nastran tokens needed for your job
using  "-l nastran_tokens=`token_estimate.sh` on the SGE command line.
(in this example nastran_tokens=`token_estimate.sh`will resolve to nastran_tokens=250).
So, for example, with 500 total tokens available you would be able to submit 2 of these 250
token jobs concurrently-any additional jobs would not be dispatched until one of these
jobs completed and released its license tokens.

Here's an example taken from my earlier blog showing the use of the nastran_tokens attribute for a distributed memory parallel (DMP) Nastran job:

#cat nastran_qsub.sh
qsub -pe nastran 2 -q large.q  -l mem_free=6G -l nastran_tokens=`token_estimate.sh`  -l export_size=10G
-S /bin/ksh nastran_sge.sh

 


 

In my next blog in this series I'll show how I configured the various queues and parallel environment for MD Nastran.

 

 

 

Friday Feb 20, 2009

Building OpenMPI 1.2.9 for OpenSolaris with Studio 12[Read More]

Wondering what relevance Solaris and Sun's SPARC and CMT technology has in HPC? Do you care about security with HPC?

Read HPC Wire's latest news piece and learn how Canada's High Performance Computing Virtual Laboratory (HPCVL) is using Solaris, SPARC and CMT to provide computational services for a fairly traditional set of HPC applications, including everything from biomedical research to computational fluid dynamics in a super secure environment.

HPC Wire reports, "The HPCVL has a cluster of 8 Sun SPARC Enterprise M9000 servers, each with 64 quad-core 2.52 GHz Sparc64 VII processors supporting two hardware threads per core, for compute intensive jobs with large memory requirements. There is also a cluster of 7 Sun Fire 25000 servers, each of which has 72 dual-core UltraSPARC-IV+ processors, aimed at a similar (but perhaps less demanding) workload. HPCVL's Victoria Falls cluster is built from 73 Sun SPARC Enterprise T5140 Servers, each of which has two UltraSparc T2 chips with 8 cores apiece, each supporting 8 hardware threads. At full capacity this cluster can support just over 9,300 threads, and the system provides a throughput compute platform for HPCVL's users. All of the systems use Sun's Grid Engine workload management tool, and run Solaris."

The HPCVL's Web page calls it "one of Canada's leading secure HPC environments".

 Full article can be found here: http://www.hpcwire.com/features/Canadian-HPC-Lab-Maintains-Warm-Relationship-with-Sun-39811502.html

Building Plot3D on OpenSolaris with Studio 12[Read More]

Sunday Feb 08, 2009

Continuing from my last blog in this series on integrating "Sun Grid Engine and MD Nastran" I will now  describe how to submit MSC.Software's MD Nastran DMP (Distributed Memory Parallel) jobs to SGE. In subsequent blogs I'll explain how I configured the SGE queues and parallel execution environment  "nastran" referenced below. 

 

Here's a brief overview of how all this works for submitting DMP jobs:


First, some background. MD Nastran offers the ability to run certain solution sequences in parallel using the Message Passing Interface (MPI), an industry-wide standard library for C and Fortran message-passing programs.   For the Solaris x86 version of MD Nastran Sun HPC ClusterTools (based on Open MPI) is used for the MPI.  


There are 2 basic requirements for submitting  MD Nastran (DMP) jobs: 

(1) You must specify the number of parallel processes you want (MD Nastran command line keyword  "dmp=n"), 

(2) And, if you want to spread the parallel processes out to more than just the local machine you are submitting Nastran from then you must specify target "host" machine(s) using the MD Nastran command line keyword "hosts="  (e.g, hosts=node1,node2, .....).  With these basic requirements in mind I show below how to use the  SGE qsub command to control a Nastran DMP job submission, including the determination of the  "host" machines to run these parallel jobs.  I also show how to use MSC's job resources estimation tool ("estimate") to automate the determination of the required (memory, disk, and license tokens)


So, here's what you would do to run a DMP(Distributed Parallel) MD Nastran job using SGE.

 

Step #1. Create a script containing the qsub command.

....for example,

#cat nastran_qsub.sh

qsub -pe nastran 2 -q large.q  -l mem_free=6G -l nastran_tokens=`token_estimate.sh`  -l export_size=10G  -S /bin/ksh nastran_sge.sh


 


 


The above SGE command line defines the following:

1. "qsub" : qsub is the SGE command that I used to submit the Nastran job submittal script "nastran_sge.sh" to the parallel envrionment  (nastran) I configured within SGE. The standard Nastran job submittal command "mdnast2008..." that Nastran user's are familiar with is in the script  "nastran_sge.sh"  (see Step 2. below).

2." -pe nastran 2" : This tells  qsub to submit  "nastran_sge.sh" to the  SGE parallel execution environment (nastran).  The "nastran_sge.sh" script will then start two Nastran jobs (running in parallel)  on one or more of the host machines as defined in the queue (large.q). 
[The queues that are suitable for this job, like the large.q, are queues that are associated with the parallel environment interface "nastran" by the parallel environment configuration. Suitable queues also must satisfy the resource requirement specification specified by the qsub -l command (see item 4. below).]   I'll explain how I configured the parallel environment "nastran" and the associated queues in a subsequent blog.

3. "-q large.q" : I configured three SGE queues  for my Nastran environment (small.q, medium.q, and large.q)--each one having a different limit on the amount of elapsed time allowed for jobs within the queue.  In this example I chose large.q to handle a long  (elapsed time) running job.

4. "-l mem_free=6G" -l nastran_tokens=`token_estimate.sh`  -l export_size=10G" defines the "complex resource attributes" that I configured within SGE.  I'll explain how I created these in a later blog, but basically they allow the user to define the amount of available real memory (6GB in this case) required for this job, the number of Nastran license tokens that are required, and also the required amount of disk space (10GB in this case) for the Nastran database files---if either of these requirements (free memory, tokens, or disk space) is not met the jobs will not be dispatched. 
[In this example, I only show MSC's "estimate" program being used to calculate the required license tokens.  In a subsequent blog in this series I'll show
when/how you can use "estimate" to also automate the calculation of the memory and disk requirements. 


Step #2.  Create a wrapper script (referenced in Step 1. above) containing  the standard MD Nastran DMP job submittal command:

...for example,

#cat nastran_sge.sh

#! /bin/ksh

#

# sge_nast: Sun Grid Engine wrapper script to use with MSC.Nastran V2001.0.9 and greater.

#

# Usage: qsub -pe nastran  $Nproc .... nastran_sge.sh

#

#

#Set nastran information for Head node:

mdnast2008=/msc/nastran/bin/mdnast2008

#

#

#Set up list of host(s) for use by mdnast2008 job submittal command (see below):

#

HOSTS=""

while read FILE

do

       NODE=`echo $FILE | awk '{ print $1}'`

       HOST0="$HOSTS"

       HOSTS="$HOST0:$NODE"

       echo $HOSTS

done < $PE_HOSTFILE

# Remove leading ':'

HOSTS=`echo $HOSTS | sed 's/://'`

echo $HOSTS

echo $PE_HOSTFILE

echo $NSLOTS

NSLOTS=`echo $NSLOTS`

#

# Got hosts names, now run Nastran with DMP (Distributed Memory Parallel) spreading the parallel Nastran jobs across the list of

#  host computers ($HOSTS):

#

$mdnast2008 /dpl/sge.sc08/dmp_jobs/dmp_101c.dat out=out.dmp.sge.2008.r2.101c dmp=$NSLOTS hosts=$HOSTS scr=yes bat=no auth=1700@matsci.sfbay

#

# End

#

 

 

 


 

The above script uses the following SGE parallel execution environment variables: 

NSLOTS – The number of queue slots in use by a parallel job.  

PE_HOSTFILE – The path of a file that contains the definition of the virtual parallel machine that is assigned to a parallel job by the grid engine system. This variable is used for parallel jobs only. It contains the list of "host" machines that satisfy the resource and queue requirements as specified on the above qsub command in Step 1.

 


 

The above mdnast2008 command will result in the parallel nastran jobs being started on different hosts (e.g., dmp=2 hosts=tmp19-231:tm19-232 ). 


Below is an example of the output you'll see using the above scripts:

Note that the output from the SGE "qstat -f" command  shown below shows an "r" to indicate the 2 nastran jobs are running: 
    200 0.55500 nastran_sg dpl          r     02/08/2009 17:52:10     1  

If one of the requested resources had not been available (memory, disk, or tokens) then instead of "r" you would  see the following output from qstat -f, indicating that the job has been put into a "hold/wait" status, pending availability of the requested resource(s).
############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    201 0.55500 nastran_sg dpl          qw    02/08/2009 18:05:58     2        

 


 


tm19-231#nastran_qsub.sh

Your job 200 ("nastran_sge.sh") has been submitted

tm19-231#

tm19-231#qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states

---------------------------------------------------------------------------------

all.q@tm19-232                  BIP   0/0/8          0.20     sol-amd64     

---------------------------------------------------------------------------------

large.q@tm19-231               BIP   0/1/1          0.10     sol-amd64     

    200 0.55500 nastran_sg dpl          r     02/08/2009 17:52:10     1        

---------------------------------------------------------------------------------

large.q@tm19-232               BIP   0/1/1          0.20     sol-amd64     

    200 0.55500 nastran_sg dpl          r     02/08/2009 17:52:10     1        

---------------------------------------------------------------------------------

medium.q@tm19-232           BIP   0/0/4          0.20     sol-amd64     

---------------------------------------------------------------------------------

small.q@tm19-231               BIP   0/0/4          0.10     sol-amd64     



In subsequent blogs in this series I'll describe in detail how I configured SGE for the above MD Nastran parallel job submittal (including "how to create the "nastran" parallel environment", creating consumable resources (nastran_tokens), using MSC.Software's "estimate" program, and other MD Nastran specific configurations that I found useful in conjunction with SGE. 



 

Thursday Feb 05, 2009

Building MPICH2-1.0.8 for OpenSolaris with Sun Studio 12
[Read More]

Monday Feb 02, 2009

Building OpenMPI 1.3 for OpenSolaris with Sun Studio 12[Read More]

Friday Jan 30, 2009

Continuing from my first blog in this series on integrating "Sun Grid Engine and MD Nastran" I would now like to describe how to submit MD Nastran (serial) jobs to SGE. In subsequent blogs I'll explain how I configured the SGE queues referenced below. 
 
To begin, MD Nastran is typically submitted from a command line interface.  This interface already has some built-in "job queuing" features that allow the Sun Grid Engine to interface easily with Nastran.
 
This built-in "job queuing" feature of Nastran allows Sun Grid Engine users to define their own keywords--this provides users the ability to customize the Nastran job submittal for use with SGE.
 
Here's a brief overview of how this works:
When the “queue” keyword is specified on a Nastran job submittal command line, the user can then "define" a corresponding “submit”  keyword in a Nastran configuration file (called an RC file). The "submit" keyword consists of a list of queue names (queue_list) followed by the command definition for the queues.  The “submit” keyword  defines the command used to run a job when a “queue” keyword is specified that matches a queue name in the submit keyword’s queue_list. The command_definition for this "submit" keyword can contain keyword names enclosed in percent “%” signs that are replaced with the value of the keyword before the command is run.  You can learn more about this at:  http://www.mscsoftware.com/support/prod_support/mdnastran/cog.pdf
So, here's what you would do to run a Nastran job using SGE and the command line features available within the Nastran job submittal scripts. 

 

Step #1. Create an MD Nastran "configuration file" sge_rc--referred to as an "RC file":
#cat sge_rc
scr=yes
bat=no
app=no
submit=small.q,medium.q,large.q=qsub  -q %queue%  -l nastran_tokens=%qoption% -l   export_size=10G -S /bin/ksh  %job% 

The above "submit=" line defines the following:
1. Three SGE queues configured for Nastran (small.q, medium.q, and large.q) {I'll show how I configured these Nastran queues in a later blog}
2. "qsub" is the SGE command that will be used to submit the Nastran jobs to SGE.
3. "-q  %queue%" is the SGE keyword that specifies which SGE queue to use, where %queue% will be replaced by the "queue=" value specified by the user on the Nastran command line (see Nastran command line below: "queue=small.q")
4. "-l nastran_tokens=%options% -l export_size=10G" defines the "complex resource attributes" that I configured within SGE.  I'll explain how I created these in a later blog, but basically they allow the user to define the number of Nastran license tokens that are required to run this job and also the required amount of disk space (10GB in this case) for the Nastran database files---if either of these requirements (tokens or disk space) is not met the jobs will not be dispatched.   

 

Step #2.  Create a Nastran submittal script with the necessary SGE keywords:
#cat nastran.sh 
MDNAST2008=/msc/nastran/bin/mdnast2008 
$MDNAST2008 v10101.dat  queue=small.q rcf=sge_rc qoption=`token_estimate.sh`    

 
The above MDNAST2008 command defines the following:
1. "queue=" specifies which SGE queue to use--in this example it's small.q
2. "qoption=" is a special Nastran keyword that allows you specify your own job submission parameters for SGE (in this case the RC file (sge_rc), defined above, will have the parameter %qoption% replaced by the value returned from the script `token_estimate.sh`). {I'll explain in a later blog how this particular script `token_estimate.sh`  calculates the number of license tokens required for this job.}
 
The above RC file (sge_rc) and Nastran submittal script (nastran.sh) will allow you to submit Nastran (serial) jobs to SGE.  
In my next blog I'll describe how to submit DMP (distributed parallel) MD Nastran jobs.  

Tuesday Jan 27, 2009

I’ve been working recently to more tightly integrate MSC Software’s MD Nastran application with Sun Grid Engine, with the goal of  documenting “best  practices” and  “how-to” guidelines for the most useful SGE configurations for the typical  MD Nastran user.

[For those of you not familiar with the MD Nastran application you can find out more at http://www.mscsoftware.com/products/nastran.cfm.]

Having worked closely over the years with MSC’s software development and engineering staff  I enlisted their advice on  useful SGE configurations for MD Nastran users.   Out of those discussions and from my own “wish list” (from many years of Nastran performance  benchmarking and tuning activities) I came up with the following ideas for useful SGE queues/configurations for MD Nastran users:

1.   First,  I believe most engineers want to have machines available for quick turnaround jobs (5-30 minutes) and other machines for long   running Nastran jobs (perhaps hours or days).   So, for that I wanted to create at least 3 runtime limiting SGE job queues:

For example:

3 Queues:  Small, Medium, Large:

Where each queue would limit the amount of elapsed time allowed for a MD Nastran job.

Queue definition/limitation:

Small:     Time limit:  5 minutes elapsed

Medium: Time limit:  30 minutes elapsed

Large:     Time limit:  unlimited

2.  Next, I wanted to configure these SGE queues to handle what I’ve found to be two important questions when running Nastran jobs:

    1. "What machine(s) have enough disk space to satisfy the often large Nastran output database files (scratch and scr300)?"
    2. "What machine(s) have enough physical memory to avoid the performance slowdown when Nastran matrix operations "spill" to disk?
Nastran  provides the user a way to determine these values (i.e., “memory to avoid spill" and database file “high-water” size). 

I’ll explain how to obtain these values in a later blog.

3.       And finally, I wanted to find a way to help with the situation where a Nastran job can  stop/”time out” part way through the analysis because there are no more Nastran license tokens left in the “token” pool—This scenario can occur if the customer site has a limited license “token” pool. The problem then arises when two or more Nastran jobs are running—First, the jobs request an initial set of license tokens to get started.  Then later during the analysis these job(s) may need more tokens because of some additional Nastran “feature” that gets invoked.  At this point there may not be enough “tokens” left in the license pool, resulting in the job(s) “ timing out”/stopping for lack of licenses tokens.

4.     I’ve also received additional suggestions since I started on this effort and I’ll show how to add those features also in later blogs:

Some of those suggestions are:

1.       For DMP (distributed parallel) jobs ensure that the distributed jobs run on the same CPU (processor speed).

2.       For DMP jobs ensure that no more than one of the distributed jobs is run on any machine.

3.      For DMP jobs ensure the network interconnect is the same on all jobs

 The reason why MD Nastran DMP jobs would benefit from the above is that efficient/scalable DMP processing requires that each distributed job finish its portion of the analysis in approximately the same amount of time (ie., if one machine is slower than the rest it becomes the bottleneck for achieving good scaling).

So, with all the above Nastran/SGE “wish list” items in mind  I’ll be blogging over the coming weeks on how I went about configuring my SGE environment to satisfy these MD Nastran queues/configurations.

Saturday Jan 24, 2009

For the past 3 years I've been involved in supporting the UCSD Rocks team in their port to Solaris x86_64.  The Rocks development team at UCSD is comprised of a group of talented software engineers who are always looking for ways to make it easier to deploy, manage, and upgrade clusters.

Last year at SC08 I joined the Rocks team as they demonstrated their latest "Rocks and Solaris" cluster distribution in the Sun Booth. In their demo you had the chance to see that Rocks now brings the same ease of installation for Solaris cluster deployment that Rocks has had for Linux for several years.

You can download a pre-release (alpha) version of this "Rocks on Solaris" at: 

https://wiki.rocksclusters.org/wiki/index.php/Rocks_on_Solaris

I'm sure the Rocks team would appreciate any feedback you can give them on this alpha version.

What you'll see in this version is:

1. A fully automated provisioning of Solaris compute nodes, and support for Thumper/Thor appliances from a Linux frontend.

2. Rolls support

3. MPI support using Sun HPC Cluster Tools

4. Sun Grid Engine support

Wednesday Jan 21, 2009

Having been involved in MD Nastran performance tuning and benchmarking activities on Sun hardware for many years I'm always looking for optimal configurations with our latest hardware and storage options.  I recently did some benchmarking on a Sun Fire x4450, four 2.4GHz processors (24 cores), 80gb memory, and with 8 internal 146gb disks. I used one disk for the Solaris 10 OS, Nastran binary, and the standard Nastran *.f04,*.f06, and *.log output files. I configured the remaining 7  disks with ZFS and used these for the Nastran database files. (more on ZFS later).

What I discovered that's noteworthy is the performance when running multiple Nastran jobs on this configuration. I was able run multiple concurrent jobs ( 6 of MSC's standard performance benchmarks)  with only a “10% slowdown” (elapsed time) when compared to running them standalone/sequentially―the total I/O for the 6 jobs was ~2TB).

I think it's noteworthy because MD Nastran is a highly I/O and compute intensive application  and  you will often be told to run your Nastran  jobs on separate machines in order to get “acceptable” performance.  The reason for this recommendation of distributing your Nastran jobs to separate machines is that the combination of an MD Nastran analysis job's memory requirements (as specified by the “mem=” keyword on the Nastran command line) and the frequently large amount of I/O can have a significant impact on MD Nastran performance and this impact can be even greater if  multiple Nastran jobs are run on the same machine and are using the same file system.  

 

So, the advice of spreading Nastran jobs out on separate machines is true in many cases but as I discovered with the above configuration  you can get surprisingly good performance on one node (given the right hardware configuration).

I believe this relatively small slowdown  of 10% is in large part due to the ZFS filesystem which does an excellent job of caching Nastran's I/O using the machine's available memory (80gb in this case).  Before ZFS became available it was necessary to configure the I/O caching manually for Nastran (via /etc/system parameters), but with ZFS this is all automatic and I've consistently seen it do an excellent job with Nastran's I/O (producing high CPU to IO ratios on Nastran runs).  

Besides this excellent I/O performance with ZFS it also is extremely easy to configure for Nastran--you simply create the ZFS pool with the one command and you're ready to start running your Nastran jobs.

Here's some notes on how I configured ZFS and ran Nastran for these benchmarks:

...create the ZFS filesystem (with 7 of the 8 disks on this x4450):
zpool create ZFS1  c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0

...now run the Nastran benchmark jobs pointing to /ZFS1 for the Nastran database files.
mdnast2008 input_file scr=yes sdir=/ZFS1

Tuesday Jan 20, 2009

Building OpenMPI 1.2.9rc1 for OpenSolaris with Sun Studio 12[Read More]

Monday Jan 19, 2009

Building hybrid MPI/OpenMP application using Studio and OpenSolaris[Read More]

Monday Jan 12, 2009

Building Apache C++ stdlib for OpenSolaris using Sun Studio[Read More]

Tuesday Dec 23, 2008

This blog contains important information about the targeted audience of

this beta release, new functionality, the duration of this SGE beta program

and your possibilities to get support and provide feedback.


Content

-------

1. Audience of this beta program

2. Duration of the beta program and release date

3. New functionality delivered with this release

4. Installing SGE 6.2u2beta in parallel to a production cluster

5. Beta program feedback and evaluation support


1. Audience of this beta program

--------------------------------

This Beta is intended for users who already have experience with the Sun

Grid Engine software or DRM (Distributed Resource Management) systems of

other vendors. This beta adds new features to the SGE 6.2 software. Users

new to DRM systems or users who are seeking a production ready release

should use the Sun Grid Engine 6.2 Update 1 (SGE 6.2u1) release which is

available from


   http://www.sun.com/software/gridware/get_it.jsp


For the shipping SGE 6.2u1 release we are offering a free 30 day evaluation

email support.


2. Duration of the Beta program and release date

------------------------------------------------

This beta program lasts until Monday, February 2, 2009. The final release of

Sun Grid Engine 6.2 Update 2 is planned for March 2009.


3. New functionality delivered with this release

------------------------------------------------


Sun Grid Engine 6.2 Update 2 (SGE 6.2u2) is a feature update release for SGE

6.2 which adds the following new functionality to the product:


   - a GUI based installer helping new users to more easily install the

     software. It complements the existing CLI based installation routine

   - new support for 32-bit and 64-bit editions of Microsoft Windows Vista

     (Enterprise and Ultimate Edition), Windows Server 2003R2 and Windows

     Server 2008.

   - a client and server side Job Submission Verifier (JSV) allows an

     administrator to control, enforce and adjust jobs requests, including

     job rejection. JSV scripts can be written in any scripting language,

     e.g.  Unix shells, Perl or TCL.

   - consumable resource attributes can now be requested per job. This makes

     resource requests for parallel jobs much easier to define, especially

     when using slot ranges.

   - on Linux, the use of the 'jemalloc' malloc library improves performance

     and reduces memory requirements

   - the use of the poll(2) system call instead of select(2) on Linux

     systems improves scalability of qmaster in extremely huge clusters


4. Installing SGE 6.2u2 in parallel to a production cluster

-----------------------------------------------------------

Like with every SGE release it is safe to install multiple Grid Engine

clusters running multiple versions in parallel if all of the the following

settings are different:


   - <sge_root> directory

   - ports (environment variables) for qmaster and execution

     daemons

   - unique "cluster name" - from SGE 6.2 the cluster name is

     appended to the name of the system wide startup scripts

   - group id range ("gid_range")


Starting with SGE 6.2 the Accounting and Reporting Console (ARCo) supports

to accept reporting data from multiple Sun Grid Engine clusters. Following

the installation directions for ARCo and using a unique cluster name for

this beta release there is no risk of losing or mixing reporting data from

multiple SGE clusters.


5. Beta Program Feedback and Evaluation Support

-----------------------------------------------

We welcome your feedback and questions on this Beta. We are asking you to

restrict your questions specific to this Beta release. In case you are

seeking for general evaluation support for the Sun Grid Engine software

please subscribe to the free evaluation support by downloading and using the

shipping version of SGE 6.2 Update 1.


There are the following email aliases available:


   Technical support alias:    sge-beta@sun.com

   Feedback on documentation:  sge-beta-doc@sun.com

   General feedback:           sge-beta-feedback@sun.com