
Dienstag April 25, 2006
N1GE 6 - Monitoring the qmaster
With the update 7 of the N1GE 6 software we added a new switch to monitor the qmaster. The
qmaster monitoring allows to get statistics on each thread displaying
what they have been busy with and how much time they spend on it. There
are two switches to controll the statistic output:
qconf -mconf
qmaster_params
Monitor_Time=0:0:20 LOG_Monitor_Message=1
MONITOR_TIME
Specifies the time interval when the monitoring information should be
printed. The monitoring is disabled per default and can be
enabled by specifying an interval. The monitoring is per thread and is
written to the messages file or displayed by the "qping -f" command
line tool. Example: MONITOR_TIME=0:0:10 generates the monitoring
information most likely every 10 seconds and prints it. The specified
time is a guideline and not a fixed interval. The used interval is
printed and can be everything between 9 seconds and 20 in this example.
LOG_MONITOR_MESSAGE
The monitoring information is logged into the messages files per
default. In addition it is provided for qping and can be requested by
it. The messages files can become quite big, if the monitoring is
enabled all the time, therefore this switch allows to disable the
logging into the messages files and the monitoring data will only be
available via "qping -f".
A description of the output format can be found here.
Example output in the qmaster messages file
($SGE_ROOT/<CELL>/spooling/qmaster/messages):
04/25/2006
19:06:17|qmaster|scrabe|P|EDT: runs: 1.20r/s (clients: 1.00 mod: 0.05/s
ack: 0.05/s blocked: 0.00 busy: 0.00 | events: 0.05/s added: 0.05/s
skipt: 0.00/s) out: 0.00m/s APT: 0.0001s/m idle: 99.99% wait: 0.00%
time: 19.98s
04/25/2006
19:06:17|qmaster|scrabe|P|MT(2): runs: 0.25r/s (execd
(l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI
(a:0.05,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks:
0.05/s) out: 0.05m/s APT: 0.0002s/m idle: 100.00% wait: 0.00% time:
20.10s
04/25/2006
19:06:18|qmaster|scrabe|P|MT(1): runs: 0.19r/s (execd
(l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI
(a:0.05,g:0.00,m:0.05,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks:
0.00/s) out: 0.05m/s APT: 0.0001s/m idle: 100.00% wait: 0.00% time:
21.15s
04/25/2006
19:06:27|qmaster|scrabe|P|TET: runs: 0.67r/s (pending: 9.00 executed:
0.67/s) out: 0.00m/s APT: 0.0205s/m idle: 98.63% wait: 0.00% time:
21.00s
04/25/2006
19:06:37|qmaster|scrabe|P|EDT: runs: 1.60r/s (clients: 1.00 mod: 0.05/s
ack: 0.05/s blocked: 0.00 busy: 0.00 | events: 1.10/s added: 1.10/s
skipt: 0.00/s) out: 0.05m/s APT: 0.0002s/m idle: 99.97% wait: 0.00%
time: 20.00s
04/25/2006
19:06:39|qmaster|scrabe|P|MT(1): runs: 0.37r/s (execd
(l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI
(a:0.14,g:0.00,m:0.05,d:0.00,c:0.00,t:0.05,p:0.00)/s event-acks:
0.05/s) out: 0.32m/s APT: 0.0024s/m idle: 99.91% wait: 0.00% time:
21.55s
If we use the following settings:
qconf -mconf
qmaster_params
Monitor_Time=0:0:20 LOG_Monitor_Message=0
We will need to use qping to gain access to the monitoring
messages. Thiis should be the prefered way because we will get the
statics from the communication layer with the statistics in the
qmaster. Here is an example:
04/25/2006 19:09:53:
SIRM
version:
0.1
SIRM message id: 3
start
time:
04/25/2006 08:45:06 (1145947506)
run time
[s]:
37487
messages in read buffer: 0
messages in write buffer: 0
nr. of connected clients: 3
status:
0
info:
TET: R (1.99) | EDT: R (0.99) | SIGT: R (37486.73) | MT(1): R (3.99) |
MT(2): R (0.99) | OK
Monitor:
04/25/2006 19:09:47 | TET: runs: 0.40r/s (pending: 9.00 executed:
0.40/s) out: 0.00m/s APT: 0.0001s/m idle: 100.00% wait: 0.00% time:
20.00s
04/25/2006 19:09:37 | EDT: runs: 1.00r/s (clients: 1.00 mod: 0.00/s
ack: 0.00/s blocked: 0.00 busy: 0.00 | events: 0.00/s added: 0.00/s
skipt: 0.00/s) out: 0.00m/s APT: 0.0001s/m idle: 99.99% wait: 0.00%
time: 20.00s
04/25/2006 08:45:07 | SIGT: no monitoring data available
04/25/2006 19:09:36 | MT(1): runs: 0.15r/s (execd
(l:0.04,j:0.04,c:0.04,p:0.04,a:0.00)/s GDI
(a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks:
0.00/s) out: 0.00m/s APT: 0.0002s/m idle: 100.00% wait: 0.00% time:
26.86s
04/25/2006 19:09:39 | MT(2): runs: 0.14r/s (execd
(l:0.00,j:0.00,c:0.00,p:0.00,a:0.00)/s GDI
(a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks:
0.00/s) out: 0.00m/s APT: 0.0000s/m idle: 100.00% wait: 0.00% time:
21.04s
( Apr 25 2006, 07:14:12 PM CEST )
Permalink
|
|
| Archive |
|
|
| « November 2009 | | Mo | Di | Mi | Do | Fr | Sa | So |
|---|
| | | | | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | | | | | | | | Heute |
|
|
|
|
|
|
| Sprache |
|
|
|
|
|
| Links |
|
|
|
|
|
| Referenzierte URLs |
|
|
|
Page Hits heute: 14
|
|
|
|
|
|