In my prior blogs in this series on integrating "Sun Grid Engine and MSC.Software's MD Nastran" I described the following:
1. "Sun Grid Engine and MD Nastran" [recommended SGE configurations/queues for MD Nastran users]
2. "Part 1--How to submit "MD Nastran" (serial) jobs with Sun Grid Engine"
3. "Part 2--How to submit "MD Nastran" DMP (Distributed Memory Parallel) jobs with Sun Grid Engine"
4. "Part 3--How to configure consumable resources (Disk, Memory, and License Tokens)"
In this final blog in the series I'll describe some SGE configuration details I mentioned in the earlier blogs.
1. How to create "runtime limiting queues"
2. How to create an "SGE parallel environment"
First, a quick comment on my use of SGE's CLI (command line interface) instead of the GUI interface tool (QMON).
I continue to give examples using the SGE's CLI instead of the GUI interface (QMON) to configure SGE because I've found the CLI to be an extremely flexible and powerful way to change settings once you become familiar with SGE. [That said, I still go back to the SGE QMON GUI interface tool when I'm not sure how something works--then it's a really great way to see the relationship of a feature or setting to the other components within SGE]
Section I: "How to create runtime limiting queues"
Step #1. Execute the qconf command to make a small, medium, and large queue
#qconf -aq small.q
#qconf -aq medium.q
#qconf -aq large.q
Step #2: Modify the queues "small.q, medium.q, and large.q" to have different h_rt (runtime (elapsed time) limits).
--> First, set the small.q to have complex_value h_rt=5 minutes
# qconf -mattr queue complex_values h_rt=00:05:00 small.q
"complex_values" of "small.q" is empty - Adding new element(s).
root@tm19-232 modified "small.q" in cluster queue list
-->Second, set the medium.q to complex value h_rt=30 minutes
# qconf -mattr queue complex_values h_rt=00:30:00 medium.q
"complex_values" of "medium.q" is empty - Adding new element(s).
root@tm19-231 modified "medium.q" in cluster queue list
-->Third, set the large.q to complex value h_rt="very large number"
# qconf -mattr queue complex_values h_rt=99:00:00 large.q
"complex_values" of "large.q" is empty - Adding new element(s).
Now verify that the above values/limits have been set correctly:
For example, to check time limit of 5 minutes (300 seconds) for small.q:
tm19-231:/dpl/sge.sc08#qconf -sq small.q
qname small.q
hostlist tm19-231
.....
complex_values h_rt=300
.....
Section II: How to make a parallel environment (PE) called "nastran" for Nastran DMP (Distributed Memory Parallel) jobs:
First, some background.
What is a parallel environment? A parallel environment within SGE enables concurrent computing on parallel platforms in networked environments.
Before you continue you might want to also read my earlier blog on MD Nastran's DMP (Distributed Memory Parallel) capability:
Here's the steps in creating a Nastran "parallel environment" in SGE.
Step #1: Create the "nastran" parallel environment
First check to see what parallel environment(s) already exist in your SGE environment:
tm19-231:/dpl/sge.sc08#qconf -spl
make
Now add a parallel environment (let's call it "nastran") to SGE using the following "qconf -ap" command:
[the -ap option (add parallel environment) displays an editor containing a parallel environment configuration template.]
tm19-231:/dpl/sge.sc08#qconf -ap nastran
pe_name nastran
slots 0
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
Now you can modify/edit two of the above default parameters settings ("slots" and "allocation_rule") to the following: (1) "slots 16" to utilize all cores in my two 8-core machines, and (2) "allocation rule $round_robin" so that SGE will distribute the jobs optimally among the machines defined in the parallel environment queue.
tm19-231:/dpl/sge.sc08#qconf -sp nastran
pe_name nastran
slots 16
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
Step #2: Attach/associate one or more "queues" to the "nastran" parallel environment using "qconf -mq"
...for this example I'm only attaching queue "large.q" to the parallel environment "nastran".
tm19-231:/dpl/sge.sc08#qconf -mq large.q
qname large.q
hostlist tm19-231 tm19-232
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make
rerun FALSE
slots 8
tmpdir /tmp
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
"/var/tmp/9512-O3JEa7" 50 lines, 1494 characters
....now edit the above "pe_list" setting to add the "nastran" parallel environment
to this "large.q" queue.
qname large.q
hostlist tm19-231 tm19-232
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make nastran
rerun FALSE
slots 8
tmpdir /tmp
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
"/var/tmp/9512-O3JEa7" 50 lines, 1494 characters
Step 3: You're now ready to submit jobs to the "nastran" parallel environment:
[The queues that will be available for these jobs are only those queues that have been associated with the parallel environment interface "nastran" ( in my example I only have "large.q" associated with the "nastran" parallel environment--however, you can also add other queues (like the queues small.q and medium.q described earlier). In addition to this requirement that a queue must be associated with a parallel environment the queue must also satisfy any resource requirement specified by a "qsub -l" command when you submit your job.]
...as described in a prior blog here's the script I used to submit a Nastran job to the parallel environment "nastran".
#cat nastran_qsub.sh
qsub -pe nastran 2 -q large.q -l mem_free=6G -l nastran_tokens=`token_estimate.sh` -l export_size=10G -S /bin/ksh nastran_sge.sh
The above SGE command line defines the following:
1. "qsub" : qsub is the SGE command used to submit the Nastran job submittal script "nastran_sge.sh" to the parallel environment "nastran" I configured within SGE. The standard Nastran job submittal command "mdnast2008..." that Nastran user's are familiar with is in the script "nastran_sge.sh" (see Step 2. below).
2." -pe nastran 2" : This tells qsub to submit "nastran_sge.sh" to the SGE parallel execution environment "nastran". The "nastran_sge.sh" script will then start two MD Nastran jobs (running in parallel) on one or more of the host machines as defined in the queue (large.q).
[The queues that are suitable for this job, like the large.q, are queues that are associated with the parallel environment interface "nastran" by the parallel environment configuration. Suitable queues also must satisfy the resource requirement specification specified by the qsub -l command (see item 4. below).]
3. "-q large.q" : I configured three SGE queues for my Nastran environment (small.q, medium.q, and large.q)--each one having a different limit on the amount of elapsed time allowed for jobs within the queue. In this example I chose large.q to handle a long (elapsed time) running job.
4. "-l mem_free=6G" -l nastran_tokens=`token_estimate.sh` -l export_size=10G" defines the "complex resource attributes" that I configured within SGE. If any of these requirements (free memory, tokens, or disk space) is not met the jobs will not be dispatched.
This concludes my series of blogs on integrating "Sun Grid Engine and MD Nastran". Feel free to send me comments or suggestions on additional configurations/queues/settings that you think might be useful to MD Nastran users. Based on the input I receive I may start up another series of blogs on this topic to cover those additional suggestions.
.
