Melvin Koh's Weblog
I'm just a contractor
Thursday May 10, 2007
Sun Grid Engine 6.1 - Name changes
Sun Grid Engine (note the N1 has been dropped) has a new version. The latest 6.1 version contain several new features, but the most anticipated feature for me is the new resource quota capability. We had customer asking for this features long time ago, and the only way to achieve it was to use plenty of scripting. Now, SGE 6.1 supports defining finegrain resource limits at user, queue and host level. For this feature, a new command "qquota" has been introduced. For more details about resource quota, see here.
Posted by melvin
( May 10 2007, 02:26:44 PM SGT )
Permalink
Tuesday September 19, 2006
N1GE Scheduler
Stephan Grell from the N1GE team has left Sun. Although I've never met him, but I've read a lot of his postings in the mailing list, always very helpful and knowledgeble especially in the N1GE scheduling. Stephan posted plenty of N1GE tips in his blog, so before his blog account gets deleted, I'll post of his past entries here.
From Stephen Grell's
profiling blog entry:
N1GE 6 - Profiling
The Grid Engine software provides a profiling facility to determain
where the qmaster and the scheduler spend their time. This has been
introduced long before the N1GE 6 software. With the development of
N1GE 6 it was greatly improved and its improvement continued over the
the different updates we had for the N1GE 6 software. It was used very
extensivly to analyse bottlenecks and find missconfigurations in
existing installations. Until now, the source code was the only
documentation for the output format, which might change with every new
udpate and release. Lately a document was added to the source
repository to give a brief overview of the output format and the
different switches. The document is not complete, though it is a good
start.
Profiling document
Posted by melvin
( Sep 19 2006, 11:29:25 AM SGT )
Permalink
Tuesday July 04, 2006
Manage complex using hostgroups
A customer of mine are complaining that managing the host complexes in their cluster is extremely tedious, as they have ~200 exec hosts so they need to modify each of the hosts. Since their hosts are grouped in hostgroups, why not use them to ease the management.
Eg. setcomplex @hostgroupA mycomplex=5
will automatically set the complexes of each of the host listed in @hostgroupA. I heck up the simple script, written in perl:
#!/usr/bin/perl
@lines = `qconf -shgrp $ARGV[0] 2>&1`;
shift(@lines);
$string = join("", @lines);
$string =~ s/\\//;
@hosts = split(" " , $string);
shift(@hosts);
foreach(@hosts) {
`qconf -mattr exechost complex_values $ARGV[1] $_`;
}
Of course, we don't have to stop here. We can extend the script to perform many other host specific management. The flexibility of hostgroup allows us to define many hostgroups for many purposes.
Posted by melvin
( Jul 04 2006, 05:49:13 PM SGT )
Permalink
Thursday June 29, 2006
Qmaster Monitoring
A very detailed monitoring of the qmaster, described by Stephen, will be useful for performance tuning.
Qmaster Monitoring
Posted by melvin
( Jun 29 2006, 10:36:52 AM SGT )
Permalink
Monday June 12, 2006
Avoid overscription for overlapped queues
It is common to have multiple queues for different purposes when designing the N1GE cluster. A large cluster that I designed for AIST, the F32 cluster, uses 4 specific queues and 1 general queue (all.q). The 4 queues have different ACLs for different groups of users, and all.q overlaps these queues. Since each hosts has 2 processors, the queues are configured to 2 slots per host, thus it is possible that there may be 4 jobs running in a single host (2 in all.q + 2 in specific queue). Here is the tips on how to prevent overscriptions in the overlapping queues.
The trick is to assign the
slot complex and set its value to the number of processors it has. E.g.
qconf -me <exec_host>
..
complex_values slots=<number_of_cpus_or_slots>
...
Now the total number of jobs across the queues running on this host will not be more than the value assigned.
Posted by melvin
( Jun 12 2006, 11:32:33 AM SGT )
Permalink
Monday June 05, 2006
Failover using Shadow Host
A) Pre-Environmental setup for N1GE6
1. Copy all the necessary N1GE binaries files onto the system, unzip them and put them together (eg. /opt/n1ge6u8 = $SGE_ROOT)
2. Ensure all the services and configuration are setup before the actual N1GE installation (services to be available on boot up)
i) Ensure that NFS servers, NFS clients are configured correctly
ii) Ensure that the required users are created, sgeadmin, normal users and are able to write to their own directory
iii) Ensure hostname of all machines are in the /etc/hosts with the appropriate IP if they are not in the DNS
iv) Ensure the port numbers for N1GE qmaster and execution daemons are added in the /etc/services (sge_qmaster 536/tcp, sge_execd 537/tcp)
v) Ensure RPC services (server and client) setup correctly (eg. rpcinfo -p, /sbin/service portmap status, /sbin/service nfs start).
B) Installation of Berkeley DB Spooling Server
1. Run the './inst_sge -db' command on the server that you have assigned as the RPC spooling server. Note that the DB spooling server must not be the qmaster server. Use default option for all and write down the value of these two fields after installation:
- Spooling server name
- DB spooling directory
2. Verify that:
- the sgebdb startup script /etc/rc.d/init.d/sgebdb is created
- the sgebdb daemon is running at the spooling DB server "ps -ef | grep sge"
C) Installation of Qmaster, Execution Host1. Install the qmaster, invoke ./install_qmaster
i) Select the Berkeley DB option
ii) Choose “Y” when you are prompt to use the DB spooling server
iii) Specify the spooling server name and DB spooling directory when prompt about information
2. Verify that qmaster is installed successfully by typing the command “ps -ef | grep sge” and checking that the sge_qmaster and sge_schedd is running
D) Installation of Shadow Host
1. Type ./inst_sge -sm
2. Verify that:
- the sge_shadowd daemon is running ( ps –ef | grep sge )
- there is an entry in the $SGE_ROOT/$SGE_CELL/common/shadow_masters file
E) Important Environment VariableTo change the time interval that the shadow host will take over after the master host is down, set the follow environment variables:
SGE_CHECK_INTERVAL – controls the interval in which the sge_shadowd checks the hearbeat file (60 seconds by default)
SGE_GET_ACTIVE_INTERVAL – controls the interval which a sge_shadowd instance tries to take over when the hearbeat file has not changed
SGE_DELAY_TIME – controls the interval in which sge_shadowd pauses if a takeover bid fails. used only when there are more than one shadow hosts
F) Verfication of FailoverTo verify that the shadow host setup is correct, we need to simulate that a qmaster failure so that the shadow daemon will be activated.
Note: A common mistake in simulating the failure is by stopping the qmaster daemon using "sgemaster stop" or even with "kill
". Using these command will shutdown the qmaster gracefully, and is equivalent to normal shutdown of the service. The shadow host will not take over under these circumstances. When the qmaster shutdown normally, itwill create an empty "lock" file under "$SGE_ROOT/$SGE_CELL/spool/qmaster/" directory. If the shadow daemon sees this file, it will never activate the failover. Thus, the proper way to test the failover is to stop the qmaster daemon with "kill -
9 ". It is fine to kill the "sge_schedd" daemon although it is not really neccessary.
i)Verify that the shadow daemon is running on the shadow host ( ps -
fe | grep sge )
ii) Kill the qmaster (kill -
9 )
iii) Wait for the interval specified for the shadow host to takeover (default is about 10mins).
iv) Verify that the qmaster and scheduler daemons are started (ps -
fe | grep sge)
v) The handover messages are logged in the follow files under $SGE_ROOT/$SGE_CELL/spool/qmaster directory
-
messages_qmaster
- messages_shadowd.
Posted by melvin
( Jun 05 2006, 11:23:06 PM SGT )
Permalink
This is a personal weblog, I do not speak for my employer.