Stephan Grell's Weblog
Stephan Grell's Weblog

20050721 Donnerstag Juli 21, 2005

N1GE 6 - Scheduler Hacks: Sorting queues

I just received a question asking on how to use the queue sequence numbers and what to do with them. I will give a short overview in this blog and hope to give enough pointers for ones own experiments. Based on the documentation, the scheduler  can sort the queue instances in two ways:

  • load based (from the hosts)
  • sequence number based (from the queues)
The load based sorting is configured by default including load adjustments. The load adjustments are added the host which will run the job during the scheduling cycle. This ensures, that one gets a kind of round robin job distribution. This load adjustment wears of overtime and will be replaced in the host load report interval by the real value. The important configuration values for the queue sorting are (scheduler configuration - qconf -msconf):

queue_sort_method                 load
job_load_adjustments              np_load_avg=0.50
load_adjustment_decay_time        0:7:30
load_formula                      np_load_avg

This setting will use the load for sorting, it adds for each started job 0.5 to the load of that host and the load will decay over 7.5 minutes.

Hint:
If a host has more than 1 slot, the load adjustment can lead to not using all slots on that host, because the next job might overload that host. qstat -j <job_id> will show the reasons, why a job was not dispatched including the hosts, which will not be used due to load adjustments. If np_load_avg is used for the load adjustments and the load formula, the number of processors in one machine is put into account.

 Example (using job_load_adjustments np_load_avg=1.5). As one can see, not all slots are used.
es-ergb01-01% qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@host1                     BIP   1/5       0.03     lx24-amd64
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 8
----------------------------------------------------------------------------
all.q@host2                    BIP   3/5       0.78     sol-sparc64
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 5
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 7
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 11
----------------------------------------------------------------------------
all.q@host3                   BIP   2/5       0.28     sol-sparc64
    103 0.55500 job        sg144703     t     07/21/2005 09:10:04     1 6
    103 0.55500 job        sg144703     t     07/21/2005 09:10:04     1 12
----------------------------------------------------------------------------
all.q@host4                    BIP   1/5       0.16     sol-x86
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 10
----------------------------------------------------------------------------
all.q@host5                    BIP   0/5       0.01     sol-x86
----------------------------------------------------------------------------
test.q@host1                    BIP   1/5       0.03     lx24-amd64
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 2
----------------------------------------------------------------------------
test.q@host2                   BIP   0/5       0.78     sol-sparc64   D
----------------------------------------------------------------------------
test.q@host3                   BIP   2/5       0.28     sol-sparc64
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 3
    103 0.55500 job        sg144703     t     07/21/2005 09:10:04     1 9
----------------------------------------------------------------------------
test.q@host4                    BIP   1/5       0.16     sol-x86
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 4
----------------------------------------------------------------------------
test.q@host5                    BIP   1/5       0.01     sol-x86
    103 0.55500 job        sg144703     r     07/21/2005 09:10:04     1 1

############################################################################
 PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    103 0.00000 job        sg144703     qw    07/21/2005 09:10:02     1 13-20:1

qstat -j 103
scheduling info:
                            queue instance "test.q@ori" dropped because it is overloaded: np_load_avg=2.511719 (= 0.011719 + 2.50 * 1.000000 with nproc=1) >= 1.75
                            queue instance "all.q@ori" dropped because it is overloaded: np_load_avg=2.511719 (= 0.011719 + 2.50 * 1.000000 with nproc=1) >= 2.05
                            queue instance "all.q@carc" dropped because it is overloaded: np_load_avg=2.515000 (= 0.015000 + 2.50 * 2.000000 with nproc=1) >= 2.05
                            queue instance "test.q@carc" dropped because it is overloaded: np_load_avg=2.515000 (= 0.015000 + 2.50 * 2.000000 with nproc=1) >= 1.75
                            queue instance "test.q@gimli" dropped because it is overloaded: np_load_avg=1.945312 (= 0.070312 + 2.50 * 3.000000 with nproc=1) >= 1.75
                            queue instance "all.q@nori" dropped because it is overloaded: np_load_avg=2.580078 (= 0.080078 + 2.50 * 2.000000 with nproc=1) >= 2.05
                            queue instance "test.q@nori" dropped because it is overloaded: np_load_avg=2.580078 (= 0.080078 + 2.50 * 2.000000 with nproc=1) >= 1.75
                            queue instance "all.q@es-ergb01-01" dropped because it is overloaded: np_load_avg=2.070312 (= 0.195312 + 2.50 * 3.000000 with nproc=1) >= 2.05
                            queue instance "all.q@gimli" dropped because it is overloaded: np_load_avg=2.570312 (= 0.070312 + 2.50 * 4.000000 with nproc=1) >= 2.05

As we can see, this configuration can be a very powerful tool to setup rather complicated environments. However, there are cases were one would like to ensure that a certain queue is used before another queue. (I am using queue here to reference cluster queues and queue instances together) In these cases, one can assign a sequence number to the queues via qconf -mq <cluster queue name>:

seq_no                0


This sequence number is used, when the scheduler configuration is changed to:

queue_sort_method                 seqno


After this change, queue instances with a low seq_no will be chosen first. If there are are multiple queue instances with the same sequence number, the configured load value will
be used to determine, which queue instance to pick. This means, if all queue instances have the same seq_no and the scheduler should use the seq_no for sorting, it is ultimately using the load from the hosts.

Example:
"test.q" has a sequence number of 0
"all.q" has a sequence number of 2

queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
test.q@host1                   BIP   2/5       0.26     lx24-amd64
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 4
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 8
----------------------------------------------------------------------------
test.q@host2           BIP   0/5       0.58     sol-sparc64   D
----------------------------------------------------------------------------
test.q@host3                   BIP   4/5       0.44     sol-sparc64
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 3
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 5
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 7
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 9
----------------------------------------------------------------------------
test.q@host4                   BIP   2/5       0.08     sol-x86
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 2
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 6
----------------------------------------------------------------------------
test.q@host5                   BIP   2/5       0.01     sol-x86
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 1
    108 0.55500 job        sg144703     r     07/21/2005 09:24:44     1 10
----------------------------------------------------------------------------
all.q@host1                    BIP   0/5       0.26     lx24-amd64
----------------------------------------------------------------------------
all.q@host2                   BIP   0/5       0.58     sol-sparc64
----------------------------------------------------------------------------
all.q@host3                    BIP   0/5       0.44     sol-sparc64
----------------------------------------------------------------------------
all.q@host4                    BIP   0/5       0.08     sol-x86
----------------------------------------------------------------------------
all.q@host5                     BIP   0/5       0.01     sol-x86

As one can see, only the test.q was used and within the test.q, the load values had an evect.


( Jul 21 2005, 09:35:42 AM CEST ) Permalink Kommentare [0]

20050714 Donnerstag Juli 14, 2005

N1GE 6 - health monitoring

A software such as our Grid Engine can a critical component in a production environment. Its perfect functioning has the highest priority. However there are cases in which the grid goes down or one of its components is not available. When this happens the administrator or the software has to react right a way. N1GE 6 provides two ways to monitor the correct functioning of its components:

- the heartbeat file at: <CELL>/common/heartbeat
- qping.

Qping was enhanced quite a bit with the different update releases. The u4 update contains a fully functional version and that is the version I reference in this blog.

1) Heartbeat file:
The heartbeat file is a simple number that gets increased in a fixed interval. If that number does not change for a couple minutes, that qmaster will most likely stopped its execution.

2) qping:
Qping gives a more comprehensive way of monitoring the grid. It can be used to monitor the qmaster and the execd deamon. Depending on the parameter it is invoked with, one gets a heartbeat replacement or profound information about the status of the daemon. I will give a short introduction into qping for more information consult the qping(1) man page. The monitoring part of the qping command can be executed from every machine under every user.

Heartbeat file replacement:

Command:  qping <MASTER_HOST> $SGE_QMASTER_PORT qmaster 1
                       qping <EXECD_HOST> $SGE_EXECD_PORT execd 1

output:           07/14/2005 14:38:19 endpoint scrabe.workgroup/qmaster/1 at port 7171 is up since 194 seconds

The output format is:
<DATE> <TIME> endpoint <MASTER_HOST/qmaster/1> at port <PORT_NUMBER> is up since <SECONDS> seconds

Extensive health information:

Command: qping -f <MASTER_HOST> $SGE_QMASTER_PORT qmaster 1
                      qping <EXECD_HOST> $SGE_EXECD_PORT execd 1

output:
07/14/2005 14:38:10:
SIRM version:             0.1
SIRM message id:          2
start time:               07/14/2005 14:35:05 (1121344505)
run time [s]:             185
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 3
status:                   0
info:                     TET: R (4.71) | EDT: R (0.71) | SIGT: R (184.61) | MT(1): R (6.17) | MT(2): R (4.62) | OK

The important information, which we did not get in the other output, is a monitoring per thread and the number of messages in the read buffer. The per-thread information allows on to have a more fine grained monitoring and to detect dead locks in the master. The messages in the read buffer can be used as and identifier for an overloaded qmaster.  The qping in update 4 and 5 do only show one MT thread even though 2 are used. This will be changed, as one can see in the output above.

    The other functions of qping are belong into the debug and analysis domain and definetly worth playing with.

( Jul 14 2005, 03:31:55 PM CEST ) Permalink Kommentare [0]


Archive
Sprache
Links
Referenzierte URLs