I’ve been working recently to more tightly integrate MSC Software’s MD Nastran application with Sun Grid Engine, with the goal of documenting “best practices” and “how-to” guidelines for the most useful SGE configurations for the typical MD Nastran user.
[For those of you not familiar with the MD Nastran application you can find out more at http://www.mscsoftware.com/products/nastran.cfm.]
Having worked closely over the years with MSC’s software development and engineering staff I enlisted their advice on useful SGE configurations for MD Nastran users. Out of those discussions and from my own “wish list” (from many years of Nastran performance benchmarking and tuning activities) I came up with the following ideas for useful SGE queues/configurations for MD Nastran users:
1. First, I believe most engineers want to have machines available for quick turnaround jobs (5-30 minutes) and other machines for long running Nastran jobs (perhaps hours or days). So, for that I wanted to create at least 3 runtime limiting SGE job queues:
For example:
3 Queues: Small, Medium, Large:
Where each queue would limit the amount of elapsed time allowed for a MD Nastran job.
Queue definition/limitation:
Small: Time limit: 5 minutes elapsed
Medium: Time limit: 30 minutes elapsed
Large: Time limit: unlimited
2. Next, I wanted to configure these SGE queues to handle what I’ve found to be two important questions when running Nastran jobs:
- "What machine(s) have enough disk space to satisfy the often large Nastran output database files (scratch and scr300)?"
- "What machine(s) have enough physical memory to avoid the performance slowdown when Nastran matrix operations "spill" to disk?
I’ll explain how to obtain these values in a later blog.
3. And finally, I wanted to find a way to help with the situation where a Nastran job can stop/”time out” part way through the analysis because there are no more Nastran license tokens left in the “token” pool—This scenario can occur if the customer site has a limited license “token” pool. The problem then arises when two or more Nastran jobs are running—First, the jobs request an initial set of license tokens to get started. Then later during the analysis these job(s) may need more tokens because of some additional Nastran “feature” that gets invoked. At this point there may not be enough “tokens” left in the license pool, resulting in the job(s) “ timing out”/stopping for lack of licenses tokens.
4. I’ve also received additional suggestions since I started on this effort and I’ll show how to add those features also in later blogs:
Some of those suggestions are:
1. For DMP (distributed parallel) jobs ensure that the distributed jobs run on the same CPU (processor speed).
2. For DMP jobs ensure that no more than one of the distributed jobs is run on any machine.
3. For DMP jobs ensure the network interconnect is the same on all jobs
The reason why MD Nastran DMP jobs would benefit from the above is that efficient/scalable DMP processing requires that each distributed job finish its portion of the analysis in approximately the same amount of time (ie., if one machine is slower than the rest it becomes the bottleneck for achieving good scaling).
So, with all the above Nastran/SGE “wish list” items in mind I’ll be blogging over the coming weeks on how I went about configuring my SGE environment to satisfy these MD Nastran queues/configurations.

The problem with the license token is a well-known and there is no real way of eliminating it entirely. Mark Olesen has written some scripts and devised a clever way of minimizing the race condition.
http://wiki.gridengine.info/wiki/index.php/Olesen-FLEXlm-Integration
It'll be a great contribution if you could describe the steps for integrating Nastran with SGE and link it to:
http://wiki.gridengine.info/wiki/index.php/Main_Page#Application_Integration
Posted by Melvin Koh on January 27, 2009 at 11:18 PM PST #
how to configure the projects in sun grid engine 6.2 (qmon window)
Posted by sugan g on February 12, 2009 at 11:18 PM PST #
what should be done to finish jobs which is in pending lists?and in which the user have to give jobs whether in master or slave?
Posted by sugan g on February 12, 2009 at 11:24 PM PST #
Melvin,
Thanks for the reference to Olesen's work on the race condition.
What I'll be describing in an upcoming blog is how I use one of MSC Software's tools called "estimate"
to pre-scan the Nastran input file and determine the total number of license tokens that will be
consumed during the run--then with this knowledge and a SGE consumable resource set to the total
available licenses for the customer site the nastran user can make his request to run his job. The
value of using the "estimate" tool is that it provides an automated method to determine each jobs
token usage (which varies depending on the features used during execution).
btw, I'll also be showing how you can also use this "estimate" tool to determine Nastran disk and memory usage.
Posted by Dale Layfield on February 13, 2009 at 10:24 AM PST #
Sugan,
Regarding what should be done to finish pending jobs--the jobs will be released/dispatched
once the resource requested is available (i.e, if -l disk_space=10G is requested on the qsub
submittal and it's not currently available on any host then the job will be placed in the the
"pending list"---once the space is available it will be released/dispatched to a host for
execution.
On showing how to use the GUI qmon interface I may do that in a later blog--my experience
has been that once you get familiar with the command line interface it's fairly easy to configure
SGE (although the Gui qmon comes in handy while you're becoming familiar with SGE).
Posted by Dale Layfield on February 24, 2009 at 11:58 AM PST #