Stephan Grell's Weblog
Stephan Grell's Weblog

20060425 Dienstag April 25, 2006

N1GE 6 - Monitoring the qmaster
With the update 7 of the N1GE 6 software we added a new switch to monitor the qmaster. The qmaster monitoring allows to get statistics on each thread displaying what they have been busy with and how much time they spend on it. There are two switches to controll the statistic output:

qconf -mconf
qmaster_params               Monitor_Time=0:0:20 LOG_Monitor_Message=1

MONITOR_TIME
Specifies the time interval when the monitoring information should be printed. The monitoring is disabled per default and can be  enabled by specifying an interval. The monitoring is per thread and is written to the messages file or displayed by the "qping -f" command line tool. Example: MONITOR_TIME=0:0:10 generates the monitoring information most likely every 10 seconds and prints it. The specified time is a guideline and not a fixed interval. The used interval is printed and can be everything between 9 seconds and 20 in this example.

LOG_MONITOR_MESSAGE
The monitoring information is logged into the messages files per default. In addition it is provided for qping and can be requested by it. The messages files can become quite big, if the monitoring is enabled all the time, therefore this switch allows to disable the logging into the messages files and the monitoring data will only be available via "qping -f".

A description of the output format can be found here.

Example output in the qmaster messages file ($SGE_ROOT/<CELL>/spooling/qmaster/messages):

04/25/2006 19:06:17|qmaster|scrabe|P|EDT: runs: 1.20r/s (clients: 1.00 mod: 0.05/s ack: 0.05/s blocked: 0.00 busy: 0.00 | events: 0.05/s added: 0.05/s skipt: 0.00/s) out: 0.00m/s APT: 0.0001s/m idle: 99.99% wait: 0.00% time: 19.98s
04/25/2006 19:06:17|qmaster|scrabe|P|MT(2): runs: 0.25r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.05,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.05/s) out: 0.05m/s APT: 0.0002s/m idle: 100.00% wait: 0.00% time: 20.10s
04/25/2006 19:06:18|qmaster|scrabe|P|MT(1): runs: 0.19r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.05,g:0.00,m:0.05,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.05m/s APT: 0.0001s/m idle: 100.00% wait: 0.00% time: 21.15s
04/25/2006 19:06:27|qmaster|scrabe|P|TET: runs: 0.67r/s (pending: 9.00 executed: 0.67/s) out: 0.00m/s APT: 0.0205s/m idle: 98.63% wait: 0.00% time: 21.00s
04/25/2006 19:06:37|qmaster|scrabe|P|EDT: runs: 1.60r/s (clients: 1.00 mod: 0.05/s ack: 0.05/s blocked: 0.00 busy: 0.00 | events: 1.10/s added: 1.10/s skipt: 0.00/s) out: 0.05m/s APT: 0.0002s/m idle: 99.97% wait: 0.00% time: 20.00s
04/25/2006 19:06:39|qmaster|scrabe|P|MT(1): runs: 0.37r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.14,g:0.00,m:0.05,d:0.00,c:0.00,t:0.05,p:0.00)/s event-acks: 0.05/s) out: 0.32m/s APT: 0.0024s/m idle: 99.91% wait: 0.00% time: 21.55s



If we use the following settings:

qconf -mconf
qmaster_params               Monitor_Time=0:0:20 LOG_Monitor_Message=0

We will need to use qping to gain access to the monitoring messages. Thiis should be the prefered way because we will get the statics from the communication layer with the statistics in the qmaster. Here is an example:

04/25/2006 19:09:53:
SIRM version:             0.1
SIRM message id:          3
start time:               04/25/2006 08:45:06 (1145947506)
run time [s]:             37487
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 3
status:                   0
info:                     TET: R (1.99) | EDT: R (0.99) | SIGT: R (37486.73) | MT(1): R (3.99) | MT(2): R (0.99) | OK
Monitor:
04/25/2006 19:09:47 | TET: runs: 0.40r/s (pending: 9.00 executed: 0.40/s) out: 0.00m/s APT: 0.0001s/m idle: 100.00% wait: 0.00% time: 20.00s
04/25/2006 19:09:37 | EDT: runs: 1.00r/s (clients: 1.00 mod: 0.00/s ack: 0.00/s blocked: 0.00 busy: 0.00 | events: 0.00/s added: 0.00/s skipt: 0.00/s) out: 0.00m/s APT: 0.0001s/m idle: 99.99% wait: 0.00% time: 20.00s
04/25/2006 08:45:07 | SIGT: no monitoring data available
04/25/2006 19:09:36 | MT(1): runs: 0.15r/s (execd (l:0.04,j:0.04,c:0.04,p:0.04,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.00m/s APT: 0.0002s/m idle: 100.00% wait: 0.00% time: 26.86s
04/25/2006 19:09:39 | MT(2): runs: 0.14r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.00m/s APT: 0.0000s/m idle: 100.00% wait: 0.00% time: 21.04s





( Apr 25 2006, 07:14:12 PM CEST ) Permalink

N1GE 6 - Scheduler Hacks: Seperated Master host for pe jobs
In the distributions of pe jobs over a range of hosts, the pe provides a set of allocation rules. These rules allow the admin to specify that a host should be filed up first before another is used, that each host is used before any host runs a second task, or that the job uses a specified amount of slots on each host it is using. This solves most of the use cases around pe jobs.
In this commend I would like to scatch out a scenario which cannot be addressed with the existing allocation rules, the exclusive use by the master task of the master host while all other hosts will use the fill-up allocation rule. This can become handy if the master task of a job requires a lot of memory while the slave tasks do the computation and only one machine with a lot of memory is available. The big machine can and should run multiple master tasks of this job kind.

There are two solutions to the problem. One could separated the memory intense computation out into an extra job and work with job dependencies or one configures N1GE to handle the above use case as specified without any job modifications.

I have the following setup:

 qstat -f
queuename                qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@big                    BIP   0/4       0.02     sol-sparc64
----------------------------------------------------------------------------
small.q@small1        BIP   0/1       0.00     lx24-amd64
----------------------------------------------------------------------------
small.q@small2        BIP   0/1       0.02     sol-sparc64

And a configured pe in all queue instances:

qconf  -sp make
pe_name               make
slots                          999
user_lists                 NONE
xuser_lists              NONE
start_proc_args     NONE
stop_proc_args        NONE
allocation_rule        $fill_up
control_slaves         TRUE
job_is_first_task    FALSE
urgency_slots           min

We now go ahead and change the load_threshold in the all.q@big queue instance to be a load value that is not used in the other queue instances, such as:

qconf -sq all.q
qname                 all.q
hostlist              big
seq_no                0
load_thresholds       NONE,[big=load_avg=4]

The used load threshold has to be a real load value and cannot be a fixed or consumable value.

Next step to make our enviroment work is to change the scheduler configuration to the following:

qconf -ssconf
algorithm                         default
schedule_interval                 0:2:0
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              load_avg=4.000000
load_adjustment_decay_time        0:0:1

By changing the configuration of the scheduler to use the job_load_adjustments like this, it will add an artificial load to each host, that will run a task. With this configuration we can start one task on the big machine in each scheduling run. Since the load_adjustment_decay_time is only 1 second, the scheduler has forgotten about the artificial load in the next scheduling run and can start a new task on the big host. This way, we archive what we have been looking for.

One important note:
The big machine is only allowed to have one queue instance, or all queue instances of the big machine have to share the same load threshold. If that is not the
case, it will not work.


( Apr 25 2006, 10:37:37 AM CEST ) Permalink


Archive
Sprache
Links
Referenzierte URLs