Stephan Grell's Weblog
Stephan Grell's Weblog

20050405 Dienstag April 05, 2005

N1GE 6 - Scheduler Hacks: "least used" / "fill up" configuration I also want to use this blog to talks about some "hacks" around the N1 Grid Engine 6 software and its scheduler. The scheduler in the Grid Engine project is in theory a very comfortable tool. It makes the decision for the user where to run the jobs. The user only has to specify a couple of constraints which have to be meet by the execution host. However, there are some configuration settings, which are not very intuitive. The one I want to talk about today is the configuration of"least used host first" and "fill up host".
I will not go into detail of the Grid Engine terminology.  I also assume that the reader has a basic understanding of the N1 Grid Engine 6 software.
The default setting for the scheduler is to distribute jobs load based.It looks at every host and assigns a job to the host with the least load.This is not always desired. There are use cases, where the scheduler should distribute the jobs equally over all available hosts and only assign multiple jobs to one host, when all hosts in use or in contrary, fill up a host first before assigning jobs to the next host.
I think that the equal distribution will be the more usable use-case.For example: It is useful in the case of over-subscripting the hosts in the grid. "Over-subscription" means that one host will execute more jobs than it has CPUs.  The setting "use least used host first" ensures that all available CPUs are used first, before Grid Engine starts to over-subscribe a host.
I will setup a grid with two hosts ("host A", "host B") and one cluster queue "all.q" in my example. The hosts are referenced in the host group "@allhosts".
'qconf -sq all.q' will show (reduced to the important details):

qname all.q hostlist @allhosts slots 1,[host_A=4],[host_B=4]
We see that each host can run 4 jobs at the same time. To prepare the hosts for "least used host first" or "fill up" we have to configure:
'qconf -me host_A' and set complex_values slots=4:
hostname host_A load_scaling NONE complex_values slots=4
We do the same setting for host_B ('qconf -me host_B')
This setting is needed because the scheduler distributes jobs to the hosts based on load values. A load value can be an external script, which reports values (such as: load_avg, mem_use, ...) or a consumable. We use a consumable for our configuration. To give the scheduler access to the value, we need to define it for each host, as we just did.  The N1 Grid Engine 6 software will now count the running jobs not only on a queue level but also on host level. If the sum of slots for all queue instances on a given host is bigger than the defined value for that host, the scheduler will limit the number of running jobs on that host to the defined value in the host configuration. If we have less slots in all queue instances on that host than defined for that host, the number of running jobs will not exceed the number of slots in the queue instances.
After we did the preparation, we need to tell the scheduler to use the least used host first or to fill it up.
To enable "use least used host first" we configure: 'qconf -msconf' and set "queue_sort_method  load" and "load_formula  -slots".
algorithm default schedule_interval 0:2:0 maxujobs 0 queue_sort_method load job_load_adjustments NONE load_adjustment_decay_time 0:0:0 load_formula -slots schedd_job_info true flush_submit_sec 1 flush_finish_sec 1
To enable "fill up host" we configure: 'qconf -msconf' and set "queue_sort_method  load" and "load_formula  slots".
algorithm default schedule_interval 0:2:0 maxujobs 0 queue_sort_method load job_load_adjustments NONE load_adjustment_decay_time 0:0:0 load_formula slots schedd_job_info true flush_submit_sec 1 flush_finish_sec 1
If performance is a critical to you, I am not sure, that I can recommend this configuration. If that is teh  case please use profiling to validate the performance impact.
Well, having setup the scheduler this way, one might wounder how this setting works together with the parallel environment (pe) allocation rule. The default setting is, what ever is specified in the pe, overwrites the scheduler configuration. Only if  "pe_slots"is set as an allocation rule,  the scheduler configuration is used.
Links: ( Apr 05 2005, 09:50:24 PM CEST ) Permalink Kommentare [3]


Archive
Sprache
Links
Referenzierte URLs