Stephan Grell's Weblog
Stephan Grell's Weblog

20050418 Montag April 18, 2005

N1GE 6 - Scheduler Hacks: job execution priority

The nice level of a job can be set in different ways. The simple way is to turn the reprioritization feature off (it is the default setting) and set the nice level via the queue configuration.

qconf -mq all.q
   priority  0

All jobs running in the queue instance will run with defined nice level. One can now easily configure different cluster queues (such as low, medium, and high priority) with different nice levels.

This is the easy way and allows the user to decide how important the job is by submitting it to a specific cluster queue.

This approach is not always fine grained enough. Sometimes it is important to rank the jobs based on the scheduling priority. A high priority job should not only be scheduled as fast as possible but also run on a lower nice level than low priority jobs. The importance ranking for the scheduling decision is done via the ticket policy and others. But only the ticket policy has a direct impact on the job nice level when "reprioritize" is enabled. There are two places to enable and controll job reprioritization:

qconf -mconf
   reprioritize  0

qconf -msconf
   reprioritize_interval  0:0:0

One could assume, that one can also influence the reprioritization via:

qocnf -mconf <host_name>

but, even though the setting is accepted, if does not have an effect. The "reprioritize" flag enables/disables the feature. If it sets to true, the execd will monitor the usage of each job  that it is running. It knows the amount of tickets for each job and will ensure, that the ticket ratio between the jobs is the same ratio as the usage between the jobs. Every job gets initial start tickets. The scheduler will most certainly change them while the job is running. Therefore we have the reprioritize_interval, which will update the jobs on the execd side and ensure that the ratio between the usage reflects the ratio between the tickets via the nice level. Since it takes some time to adjust the jobs' usage via the nice level, the tickets should not be send too often. The recommendation is 2 minutes for the reprioritize_interval.

If the reprioritize_interval is set to 0:0:0, the reprioritize feature is disabled (e.q. reprioritize  is set 0). It also works the other way around, setting reprioritize_interval enables to feature by setting reprioritize to 1.

A sample setup with two projects:

PRJ10 100 functional shares
PRJ1   10 functional shares

qstat shows:

JobId     P      S   Project  Tot-Tkt   ovrts   otckt  ftckt   stckt   
--------------------------------------------------
223670 1.50000   qw    PRJ10   25000       0       0   25000       0
223671 0.59091   qw     PRJ1    2272       0       0    2272       0
223672 0.50000    r       NA       0       0       0       0       0
223673 0.50000    r       NA       0       0       0       0       0
223674 0.50000    r       NA       0       0       0       0       0

Top output (note the changes in the nice level):
1)
  PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
11137 sg144703   1  40  -10 4240K 3352K cpu/3    2:35 21.93% work
11139 sg144703   1  37   -9 4240K 3352K cpu/2    2:31 21.10% work
11749 sg144703   1   0   17 4240K 3352K cpu/0    1:07 17.66% work
11743 sg144703   1   0   12 4240K 3352K run      0:54 17.20% work
11751 sg144703   1   0   19 4240K 3352K run      1:04 14.00% work

2)
  PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
11137 sg144703   1  30  -10 4240K 3352K cpu/1    3:23 23.92% work
11139 sg144703   1  27   -9 4240K 3352K cpu/2    3:19 23.41% work
11743 sg144703   1   0   19 4240K 3352K run      1:21 16.28% work
11751 sg144703   1   0   18 4240K 3352K cpu/3    1:36 15.28% work
11749 sg144703   1   8   17 4240K 3352K run      1:37 12.66% work

3)
11137 sg144703   1  30  -10 4240K 3352K run      4:30 24.02% work
11139 sg144703   1  27   -9 4240K 3352K cpu/1    4:26 23.92% work
11751 sg144703   1   0   19 4240K 3352K cpu/3    2:25 16.32% work
11749 sg144703   1   0   15 4240K 3352K run      2:17 13.70% work
11743 sg144703   1   0   17 4240K 3352K run      1:56 11.83% work

And the qstat usage output:

 job-id project          department state cpu        mem       io    tckts ovrts otckt ftckt stckt
 ----------------------------------------------------------------------
 223670 PRJ10            defaultdep r     0:00:04:41 1.13824 0.00000 90909     0     0 90909     0
 223671 PRJ1             defaultdep r     0:00:04:37 1.11933 0.00000  9090     0     0  9090     0
 223672 NA               defaultdep r     0:00:02:04 0.50110 0.00000     0     0     0     0     0
 223673 NA               defaultdep r     0:00:02:25 0.58774 0.00000     0     0     0     0     0
 223674 NA               defaultdep r     0:00:02:29 0.60243 0.00000     0     0     0     0     0

The machine is used for this test had 4 processors and there were always enough CPUs for the PRJ10 and PRJ1 job. Therefore they have more or less the same usage. The others are way behind. They have to share the resources with the other tasks and are way behind. The min / max values for the nice level are defined in the source file: source/daemons/execd/ptf.h

A different job mix results in different nice levels:

qstat output:

JobId     P      S   Project  Tot-Tkt   ovrts   otckt  ftckt   stckt  
----------------------------------------------------
223675 1.50000    r    PRJ10   30303       0       0   30303       0
223676 1.50000    r    PRJ10   30303       0       0   30303       0
223677 1.50000    r    PRJ10   30303       0       0   30303       0
223678 0.80000    r     PRJ1    9090       0       0    9090       0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

top:
1)
21625 sg144703   1  40  -10 4240K 3352K cpu/1    1:32 20.61% work
21589 sg144703   1  47   -9 4240K 3352K run      1:34 20.50% work
21590 sg144703   1  37   -9 4240K 3352K cpu/0    1:34 20.30% work
21633 sg144703   1   0   16 4240K 3352K run      1:21 18.31% work

The used nice range might be a bit extrem. There are two switches to specify the range. The settings PTF_MIN_PRIORITY and PTF_MAX_PRIORITY allow to control used nice range. It can be set via:

qconf -mconf
execd_params       PTF_MIN_PRIORITY=19, PTF_MAX_PRIORITY=0

( Apr 18 2005, 05:00:32 PM CEST ) Permalink Kommentare [2]


Archive
Sprache
Links
Referenzierte URLs