Stephan Grell's Weblog
Stephan Grell's Weblog

20060425 Dienstag April 25, 2006

N1GE 6 - Monitoring the qmaster
With the update 7 of the N1GE 6 software we added a new switch to monitor the qmaster. The qmaster monitoring allows to get statistics on each thread displaying what they have been busy with and how much time they spend on it. There are two switches to controll the statistic output:

qconf -mconf
qmaster_params               Monitor_Time=0:0:20 LOG_Monitor_Message=1

MONITOR_TIME
Specifies the time interval when the monitoring information should be printed. The monitoring is disabled per default and can be  enabled by specifying an interval. The monitoring is per thread and is written to the messages file or displayed by the "qping -f" command line tool. Example: MONITOR_TIME=0:0:10 generates the monitoring information most likely every 10 seconds and prints it. The specified time is a guideline and not a fixed interval. The used interval is printed and can be everything between 9 seconds and 20 in this example.

LOG_MONITOR_MESSAGE
The monitoring information is logged into the messages files per default. In addition it is provided for qping and can be requested by it. The messages files can become quite big, if the monitoring is enabled all the time, therefore this switch allows to disable the logging into the messages files and the monitoring data will only be available via "qping -f".

A description of the output format can be found here.

Example output in the qmaster messages file ($SGE_ROOT/<CELL>/spooling/qmaster/messages):

04/25/2006 19:06:17|qmaster|scrabe|P|EDT: runs: 1.20r/s (clients: 1.00 mod: 0.05/s ack: 0.05/s blocked: 0.00 busy: 0.00 | events: 0.05/s added: 0.05/s skipt: 0.00/s) out: 0.00m/s APT: 0.0001s/m idle: 99.99% wait: 0.00% time: 19.98s
04/25/2006 19:06:17|qmaster|scrabe|P|MT(2): runs: 0.25r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.05,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.05/s) out: 0.05m/s APT: 0.0002s/m idle: 100.00% wait: 0.00% time: 20.10s
04/25/2006 19:06:18|qmaster|scrabe|P|MT(1): runs: 0.19r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.05,g:0.00,m:0.05,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.05m/s APT: 0.0001s/m idle: 100.00% wait: 0.00% time: 21.15s
04/25/2006 19:06:27|qmaster|scrabe|P|TET: runs: 0.67r/s (pending: 9.00 executed: 0.67/s) out: 0.00m/s APT: 0.0205s/m idle: 98.63% wait: 0.00% time: 21.00s
04/25/2006 19:06:37|qmaster|scrabe|P|EDT: runs: 1.60r/s (clients: 1.00 mod: 0.05/s ack: 0.05/s blocked: 0.00 busy: 0.00 | events: 1.10/s added: 1.10/s skipt: 0.00/s) out: 0.05m/s APT: 0.0002s/m idle: 99.97% wait: 0.00% time: 20.00s
04/25/2006 19:06:39|qmaster|scrabe|P|MT(1): runs: 0.37r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.14,g:0.00,m:0.05,d:0.00,c:0.00,t:0.05,p:0.00)/s event-acks: 0.05/s) out: 0.32m/s APT: 0.0024s/m idle: 99.91% wait: 0.00% time: 21.55s



If we use the following settings:

qconf -mconf
qmaster_params               Monitor_Time=0:0:20 LOG_Monitor_Message=0

We will need to use qping to gain access to the monitoring messages. Thiis should be the prefered way because we will get the statics from the communication layer with the statistics in the qmaster. Here is an example:

04/25/2006 19:09:53:
SIRM version:             0.1
SIRM message id:          3
start time:               04/25/2006 08:45:06 (1145947506)
run time [s]:             37487
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 3
status:                   0
info:                     TET: R (1.99) | EDT: R (0.99) | SIGT: R (37486.73) | MT(1): R (3.99) | MT(2): R (0.99) | OK
Monitor:
04/25/2006 19:09:47 | TET: runs: 0.40r/s (pending: 9.00 executed: 0.40/s) out: 0.00m/s APT: 0.0001s/m idle: 100.00% wait: 0.00% time: 20.00s
04/25/2006 19:09:37 | EDT: runs: 1.00r/s (clients: 1.00 mod: 0.00/s ack: 0.00/s blocked: 0.00 busy: 0.00 | events: 0.00/s added: 0.00/s skipt: 0.00/s) out: 0.00m/s APT: 0.0001s/m idle: 99.99% wait: 0.00% time: 20.00s
04/25/2006 08:45:07 | SIGT: no monitoring data available
04/25/2006 19:09:36 | MT(1): runs: 0.15r/s (execd (l:0.04,j:0.04,c:0.04,p:0.04,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.00m/s APT: 0.0002s/m idle: 100.00% wait: 0.00% time: 26.86s
04/25/2006 19:09:39 | MT(2): runs: 0.14r/s (execd (l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.00m/s APT: 0.0000s/m idle: 100.00% wait: 0.00% time: 21.04s





( Apr 25 2006, 07:14:12 PM CEST ) Permalink

Kommentare:

Senden Sie einen Kommentar:

Kommentare sind ausgeschaltet.

Archive
Sprache
Links
Referenzierte URLs