Stephan Grell's Weblog
Stephan Grell's Weblog

20060425 Dienstag April 25, 2006

N1GE 6 - Scheduler Hacks: Seperated Master host for pe jobs
In the distributions of pe jobs over a range of hosts, the pe provides a set of allocation rules. These rules allow the admin to specify that a host should be filed up first before another is used, that each host is used before any host runs a second task, or that the job uses a specified amount of slots on each host it is using. This solves most of the use cases around pe jobs.
In this commend I would like to scatch out a scenario which cannot be addressed with the existing allocation rules, the exclusive use by the master task of the master host while all other hosts will use the fill-up allocation rule. This can become handy if the master task of a job requires a lot of memory while the slave tasks do the computation and only one machine with a lot of memory is available. The big machine can and should run multiple master tasks of this job kind.

There are two solutions to the problem. One could separated the memory intense computation out into an extra job and work with job dependencies or one configures N1GE to handle the above use case as specified without any job modifications.

I have the following setup:

 qstat -f
queuename                qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@big                    BIP   0/4       0.02     sol-sparc64
----------------------------------------------------------------------------
small.q@small1        BIP   0/1       0.00     lx24-amd64
----------------------------------------------------------------------------
small.q@small2        BIP   0/1       0.02     sol-sparc64

And a configured pe in all queue instances:

qconf  -sp make
pe_name               make
slots                          999
user_lists                 NONE
xuser_lists              NONE
start_proc_args     NONE
stop_proc_args        NONE
allocation_rule        $fill_up
control_slaves         TRUE
job_is_first_task    FALSE
urgency_slots           min

We now go ahead and change the load_threshold in the all.q@big queue instance to be a load value that is not used in the other queue instances, such as:

qconf -sq all.q
qname                 all.q
hostlist              big
seq_no                0
load_thresholds       NONE,[big=load_avg=4]

The used load threshold has to be a real load value and cannot be a fixed or consumable value.

Next step to make our enviroment work is to change the scheduler configuration to the following:

qconf -ssconf
algorithm                         default
schedule_interval                 0:2:0
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              load_avg=4.000000
load_adjustment_decay_time        0:0:1

By changing the configuration of the scheduler to use the job_load_adjustments like this, it will add an artificial load to each host, that will run a task. With this configuration we can start one task on the big machine in each scheduling run. Since the load_adjustment_decay_time is only 1 second, the scheduler has forgotten about the artificial load in the next scheduling run and can start a new task on the big host. This way, we archive what we have been looking for.

One important note:
The big machine is only allowed to have one queue instance, or all queue instances of the big machine have to share the same load threshold. If that is not the
case, it will not work.


( Apr 25 2006, 10:37:37 AM CEST ) Permalink

Kommentare:

Senden Sie einen Kommentar:

Kommentare sind ausgeschaltet.

Archive
Sprache
Links
Referenzierte URLs