Mittwoch April 13, 2005 | Stephan Grell's Weblog |
|
N1GE 6 - Scheduler Hacks: Comment on the qmaster <-> scheduler protocol 04/12/2005 09:16:28|qmaster|xxx|E| orders user/project version (16468) isnot uptodate (16469) for user/project "PRJ147" in the qmaster messages file ($SGE_CELL/spool/qmaster/messages). I would like to explain this can happen and why it is not necessary a bug when these messages are logged. The scheduler is implemented as an event client. This means that it will receive an event when ever an object in the qmaster is added, removed, or modified. These events are usually delivered to the event clients right away or with a delay that the event client can specify. In the case of the scheduler, it is every scheduling_interval (default 15s). The event delivery does not only update the data in the scheduler but also triggers a scheduling run. Depending on the amount of jobs and the complexity of the jobs it can take a while before a scheduling run has finished. With a couple 10k jobs in the system it might take longer than the scheduling interval. In this case, a second event client configuration setting is activated. It allows to specify what the event master should do, when events are not acknowledged or the client is busy. In case of the scheduler no events are send while the event client is marked as busy. This means, that the scheduling data will not be updated during a scheduling run. It can happen, that a administrator is modifying an object during a scheduling run. This will lead to the error message we saw in the beginning. After every scheduling run send the scheduler a package of orders to the qmaster. While the qmaster executes the orders it validates them and ensures that the affected objects did not change. If such a change is detected the order will be ignored and we see an error message that the order failed. Commands which might lead to the error message: - qconf -mq // modify a queue - qmod // change a queue - qconf -clearusage - qconf -mprj // modify a project and others. Due to bugs in the event master, these error messages were logged quite frequent in older version (N1GE 6.0 FCS, u1, u2, and u3). Though, if nobody changed anything and these error messages are logged, one might have found a bug. ( Apr 13 2005, 02:13:36 PM CEST ) Permalink Kommentare [1]
Trackback URL: http://blogs.sun.com/sgrell/entry/n1ge_6_scheduler_hacks_comment
Senden Sie einen Kommentar: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gesendet von Werea am September 17, 2005 at 06:19 PM CEST #