Grid Engine@FOSS.in >>
Thursday Jan 18, 2007
After I started on Grid Engine, it was really tempting to show someone what I knew. Then came FOSS > Free and Open Source Software.
We tried getting some real good apps that have been GRIDIZED. Though we had tough time getting things together for the event, the outcome was great.
The crowd was great, ranging from novice to implementors of internal grid. And we had the chance to explain grid engine in detail.
Grid Engine at Sun Stall
We had setup 3 v20z's and a metropolis to have a four node mini grid, totalling 7 cpus (2+1+2+2). But at the stall one of the v20z(2cpu) doesn't run. We also had to share a 2cpu v20z with another demo. We setup a zone on one of the v20z and managed to get 4 nodes ( 5cpu's).
We shared the main cell directory and the binaries with NFS on the machines. We had also set MPI (mpi-1.2 LINK) PE on the setup. It took us sometime to settings up, and soon we got it up and running.
The apps were
- a biotech app, fastDNA, looking for 6 slots to run
- a movie rendering app (using povray) also requiring 6 slots
- some batch and array jobs from the examples.
Demo
We started running fastDNA. As the non-global zone and global zone are part of single host, it was visible how the utilizations were same for the nodes. As the jobs run from fastDNA, one could see how the jobs got distributed to the queue instances. The load levels started shooting up, and one could see the scheduler messages
- for dropping the queue as it was full and,
- when the load_avg exceeded the threshold
Finally the job finished after 1 1/2hrs on the setup. We then scheduled Helloworld (the movie rendering app) which finished in less than 40 mins. The take aways from the demo was:
- Range of scheduling policies of the grid engine
- Granular resource control ( we showed subordinate queues, load sensor, host complexes)
- We also demonstrated the job suspension using the suspension threshold ( job migration was something someone was asking, unfortunately the apps didn't have checkpoint support )
- Range of OS to run on ( Windows host surprised them, we also ran a few example jobs on the brandZ!)
- The meta-scheduler* feature for co-ordinating 2 grid engines, and ability interact with globus
*Correct me if this does not qualify for a meta-scheduler feature










