Dienstag August 08, 2006 | Stephan Grell's Weblog |
|
N1GE 6 - Scheduler Hacks: Exclusive master host for the master task Grid Engine allows to sort hosts /
queues by sequence number. Assuming that we have only one cluster queue
and the parallel environment is configured to use fill -up, we can
assign the compute queue instances a smaller sequence number than the
master machines. The job would request the pe to run in and the master
machine as the masterq. This way, all slaves would run on the compute
nodes, which are filled-up first, and the master task is singled out to
the master machine due to its special request.
Advantages:If the environment has more than one master host, wild cards in the masterq request can be used to select one of the master host. Makes best use of all resources, is
easy to setup, to understand and debug. This setup has also the least
performance impact.
Problems: As soon as there are not enough compute
nodes available, the scheduler will assign more than one task to the
master machine.
Configuration: Change the queue sort oder in the scheduler config: qconf -msconf queue_sort_method seqno The queue for on the small hosts gets: qconf -mq <queue> seq_no 0 The queue for the master hosts gets: qconf -mq <queue> seq_no 1 A job submit would look like:
2) Making accesive use of pe objects
and cluster queues:qsub -pe <PE> 6 -masterq "*@master*" ... Description: Each slot on a master host needs its
own cluster queue and its own pe. The compute nodes are combined under
1 cluster queue with all pe objects that are used on the master hosts.
Each master cluster queue has exactly one slot. The job submit will now
request the master queue via wild cards and the pe it should run in
with wild cards.
Advantages:Archives the goal.
Problems:Many configuration objects. Slows done
the scheduler quite a bit.
Configuration:I will leave the configuration for this
one open. Should not be complicated...
3) Using load adjustments: Description The scheduler uses the load adjustments for not overloading an host. The system can be configured in such a way, that the scheduler starts not more than one task on one host eventhough more slots are available. We will use this configuration to archive the desired goal. Advantages: Archives exactly what we are looking
for whichout any additionl configuration objects.
Problems:Slows down scheduling. Only one job
requesting the master host will be started in one scheduling run.
Supporting backup master hosts is not easy.
The master machine is only allowed to have one queue instance, or all
queue instances of the master machine have to share the same load
threshold. If that is not the case, it will not work.
Configuration: I have the following setup:
qstat -f queuename qtype used/tot. load_avg arch states ---------------------------------------------------------------------------- all.q@big BIP 0/4 0.02 sol-sparc64 ---------------------------------------------------------------------------- small.q@small1 BIP 0/1 0.00 lx24-amd64 ---------------------------------------------------------------------------- small.q@small2 BIP 0/1 0.02 sol-sparc64 And a configured pe in all queue instances: qconf -sp make pe_name make slots 999 user_lists NONE xuser_lists NONE start_proc_args NONE stop_proc_args NONE allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min We now go ahead and change the load_threshold in the all.q@big queue instance to be a load value that is not used in the other queue instances, such as: qconf -sq all.q qname all.q hostlist big seq_no 0 load_thresholds NONE,[big=load_avg=4] The used load threshold has to be a real load value and cannot be a fixed or consumable value. Next step to make our enviroment work is to change the scheduler configuration to the following: qconf -ssconf algorithm default schedule_interval 0:2:0 maxujobs 0 queue_sort_method load job_load_adjustments load_avg=4.100000 load_adjustment_decay_time 0:0:1 By changing the configuration of the scheduler to use the job_load_adjustments like this, it will add an artificial load to each host, that will run a task. With this configuration we can start one task on the master machine in each scheduling run. Since the load_adjustment_decay_time is only 1 second, the scheduler has forgotten about the artificial load in the next scheduling run and can start a new task on the master host. This way, we archive what we have been looking for. Extended Configuration: If the usage of multiple master hosts
is requriered, one need to create one pe object per master host. The
compute hosts are part of all pe objects. The same rule as above still
applies, each master host is only allowed to have one queue instance.
The configuration of the all.q queue would look as follows:
qconf -sq all.q qname all.q hostlist big seq_no 0 load_thresholds NONE,[big=load_avg=4],[big1=load_avg=4][big2=load_avg=4] pe_list big_pe big1_pe big2_pe,[big=big_pe],[big1=big1_pe],[big2=big2_pe] The job submit would look like: qsub -pe "big*" 5 -masterq="all.q@big*" .... ( Aug 08 2006, 09:44:12 AM CEST ) Permalink |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||