Donnerstag Juli 21, 2005 | Stephan Grell's Weblog |
|
N1GE 6 - Scheduler Hacks: Sorting queues
queue_sort_method load job_load_adjustments np_load_avg=0.50 load_adjustment_decay_time 0:7:30 load_formula np_load_avg This setting will use the load for sorting, it adds for each started job 0.5 to the load of that host and the load will decay over 7.5 minutes. Hint: If a host has more than 1 slot, the load adjustment can lead to not using all slots on that host, because the next job might overload that host. qstat -j <job_id> will show the reasons, why a job was not dispatched including the hosts, which will not be used due to load adjustments. If np_load_avg is used for the load adjustments and the load formula, the number of processors in one machine is put into account. Example (using job_load_adjustments np_load_avg=1.5). As one can see, not all slots are used. es-ergb01-01% qstat -f queuename qtype used/tot. load_avg arch states ---------------------------------------------------------------------------- all.q@host1 BIP 1/5 0.03 lx24-amd64 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 8 ---------------------------------------------------------------------------- all.q@host2 BIP 3/5 0.78 sol-sparc64 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 5 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 7 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 11 ---------------------------------------------------------------------------- all.q@host3 BIP 2/5 0.28 sol-sparc64 103 0.55500 job sg144703 t 07/21/2005 09:10:04 1 6 103 0.55500 job sg144703 t 07/21/2005 09:10:04 1 12 ---------------------------------------------------------------------------- all.q@host4 BIP 1/5 0.16 sol-x86 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 10 ---------------------------------------------------------------------------- all.q@host5 BIP 0/5 0.01 sol-x86 ---------------------------------------------------------------------------- test.q@host1 BIP 1/5 0.03 lx24-amd64 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 2 ---------------------------------------------------------------------------- test.q@host2 BIP 0/5 0.78 sol-sparc64 D ---------------------------------------------------------------------------- test.q@host3 BIP 2/5 0.28 sol-sparc64 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 3 103 0.55500 job sg144703 t 07/21/2005 09:10:04 1 9 ---------------------------------------------------------------------------- test.q@host4 BIP 1/5 0.16 sol-x86 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 4 ---------------------------------------------------------------------------- test.q@host5 BIP 1/5 0.01 sol-x86 103 0.55500 job sg144703 r 07/21/2005 09:10:04 1 1 ############################################################################ PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 103 0.00000 job sg144703 qw 07/21/2005 09:10:02 1 13-20:1 qstat -j 103 scheduling info: queue instance "test.q@ori" dropped because it is overloaded: np_load_avg=2.511719 (= 0.011719 + 2.50 * 1.000000 with nproc=1) >= 1.75 queue instance "all.q@ori" dropped because it is overloaded: np_load_avg=2.511719 (= 0.011719 + 2.50 * 1.000000 with nproc=1) >= 2.05 queue instance "all.q@carc" dropped because it is overloaded: np_load_avg=2.515000 (= 0.015000 + 2.50 * 2.000000 with nproc=1) >= 2.05 queue instance "test.q@carc" dropped because it is overloaded: np_load_avg=2.515000 (= 0.015000 + 2.50 * 2.000000 with nproc=1) >= 1.75 queue instance "test.q@gimli" dropped because it is overloaded: np_load_avg=1.945312 (= 0.070312 + 2.50 * 3.000000 with nproc=1) >= 1.75 queue instance "all.q@nori" dropped because it is overloaded: np_load_avg=2.580078 (= 0.080078 + 2.50 * 2.000000 with nproc=1) >= 2.05 queue instance "test.q@nori" dropped because it is overloaded: np_load_avg=2.580078 (= 0.080078 + 2.50 * 2.000000 with nproc=1) >= 1.75 queue instance "all.q@es-ergb01-01" dropped because it is overloaded: np_load_avg=2.070312 (= 0.195312 + 2.50 * 3.000000 with nproc=1) >= 2.05 queue instance "all.q@gimli" dropped because it is overloaded: np_load_avg=2.570312 (= 0.070312 + 2.50 * 4.000000 with nproc=1) >= 2.05 As we can see, this configuration can be a very powerful tool to setup rather complicated environments. However, there are cases were one would like to ensure that a certain queue is used before another queue. (I am using queue here to reference cluster queues and queue instances together) In these cases, one can assign a sequence number to the queues via qconf -mq <cluster queue name>: seq_no 0 This sequence number is used, when the scheduler configuration is changed to: queue_sort_method seqno After this change, queue instances with a low seq_no will be chosen first. If there are are multiple queue instances with the same sequence number, the configured load value will be used to determine, which queue instance to pick. This means, if all queue instances have the same seq_no and the scheduler should use the seq_no for sorting, it is ultimately using the load from the hosts. Example: "test.q" has a sequence number of 0 "all.q" has a sequence number of 2 queuename qtype used/tot. load_avg arch states ---------------------------------------------------------------------------- test.q@host1 BIP 2/5 0.26 lx24-amd64 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 4 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 8 ---------------------------------------------------------------------------- test.q@host2 BIP 0/5 0.58 sol-sparc64 D ---------------------------------------------------------------------------- test.q@host3 BIP 4/5 0.44 sol-sparc64 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 3 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 5 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 7 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 9 ---------------------------------------------------------------------------- test.q@host4 BIP 2/5 0.08 sol-x86 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 2 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 6 ---------------------------------------------------------------------------- test.q@host5 BIP 2/5 0.01 sol-x86 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 1 108 0.55500 job sg144703 r 07/21/2005 09:24:44 1 10 ---------------------------------------------------------------------------- all.q@host1 BIP 0/5 0.26 lx24-amd64 ---------------------------------------------------------------------------- all.q@host2 BIP 0/5 0.58 sol-sparc64 ---------------------------------------------------------------------------- all.q@host3 BIP 0/5 0.44 sol-sparc64 ---------------------------------------------------------------------------- all.q@host4 BIP 0/5 0.08 sol-x86 ---------------------------------------------------------------------------- all.q@host5 BIP 0/5 0.01 sol-x86 As one can see, only the test.q was used and within the test.q, the load values had an evect. ( Jul 21 2005, 09:35:42 AM CEST ) Permalink Kommentare [0]
Trackback URL: http://blogs.sun.com/sgrell/entry/n1ge_6_scheduler_hacks_sorting
Kommentare:
Senden Sie einen Kommentar: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||