DanT's GridBlog  
All | Chess | Cooking | Grid | Java | Misc | Travel
 
Friday March 20, 2009
Rube Goldberg Gone Wild

I just can't not post this. Assuming they're not cheating by editing the film, this is easily the largest Rube Goldberg machine I've ever seen, and they're really creative about the elements they used. They do lose points for not using live animals, though.

(Anyone have a better link for this video? I'm sure it's on YouTube or Google Video somewhere, but at the moment, I can't get it to play again, so I don't have details with which to search for it.)

Permalink Comments [0] (2009-03-20 05:48:54.0/2009-03-20 05:48:54.0)
Trackback: http://blogs.sun.com/templedf/entry/rube_goldberg_gone_wild
 
Thursday March 19, 2009
Podcast: New Installer in Sun Grid Engine 6.2 Update 2

I just posted a new podcast on the new installer in Sun Grid Engine 6.2u2. Check it out.

Permalink Comments [0] (2009-03-19 14:05:27.0/2009-03-19 14:05:27.0)
Trackback: http://blogs.sun.com/templedf/entry/podcast_new_installer_in_sun
 
Monday March 16, 2009
New Installer in Sun Grid Engine 6.2 Update 2

In my previous post, I talked about the new installer that is included with Sun Grid Engine 6.2u2. Lubos, one of our core team (as opposed to Service Domain Manager or QA) engineers in Prague, has just posted a couple of videos of the new installer. The first one shows how to make sure the new installer can be used with the machines you're planning to use for your cluster. Because the new installer can install an entire cluster at once, it has to be able to contact all the machines destined for the cluster, and that's where the setup comes in. The second one actually shows off the new installer. Lubos also has some screenshots of the new installer posted.

Permalink Comments [0] (2009-03-16 07:24:35.0/2009-03-16 07:24:35.0)
Trackback: http://blogs.sun.com/templedf/entry/new_install_in_sun_grid
 
Thursday March 05, 2009
Sun Grid Engine 6.2 Update 2 Is Out!

Sun Grid Engine 6.2u2 is now available. If you're not excited, you should be. First off, don't let the name fool you. 6.2u2 is not just bug fixes. It's a full feature release, and contains some great features. What features? Glad you asked.

First and foremost, job submission verifiers (JSVs). It's a feature we added specifically for TACC, but it's one that will be useful for almost everyone. In fact, I suspect that we'll discover it's the answer to some of the classic Sun Grid Engine problems. What is it? Before 6.2u2, there was no way to prevent a job from being submitted. It was (and still is) possible to choose not to schedule a job after it's been submitted, but before 6.2u2, that's all you could do. With 6.2u2 and JSV, you now have the option to insert a step between submission and acceptance. With that step, you can choose to accept or reject the job submission, but you can also choose to modify the job before accepting it, and that's where the magic comes in.

The verification step is handled through scripts or binaries. There's a new submission option, -jsv, that adds a JSV to the submission. That means you can pick up JSVs from anywhere that you can stash a submission option: most notably the global sge_request file, your user sge_request file, and the directory's sge_request file, but also DRMAA native specification, DRMAA job category, the enigmatic -@ switch, and, of course, the command line itself. The -jsv switch is cumulative, so if you have one in several of those places, several JSVs will be run for your submission. It's worth noting that all of the above listed JSV sources are controlled by the user, except the global sge_request file, and even that can be overridden with the -clear switch.

So far, we've only talked about the client side. JSVs can also come in on the server side. In the global host configuration an administrator can configure a single JSV. Unlike on the client side where every JSV is started from scratch with every job submission, on the server side the JSV is started once and queried repeatedly. The reason is that on the client side, performance isn't a big issue, but on the server side, the cost of forking and execing the JSV for every job submission can have a huge impact. By keeping the JSV running, we save that cost. The big advantage of the server-side JSV is that users can't circumvent it. If you really need to enforce a policy with a JSV, the server side is that place to do it.

Now, if you're thinking fast, you might question the point of the server-side JSV when users can change everything about the job using qalter after it's submitted. Well, so did we. When you configure a server-side JSV, users are no longer allowed to modify jobs after submission unless you specifically grant the ability to do so, and even then it's limited to the job attributes that you allow them to modify.

JSV is a huge topic, and I could probably go on for days about it. Instead I'll save it for a white paper and move on.

The next big feature in 6.2u2 is the new installer. You now have the option of using the old interactive text-based installer or a new graphical installer. The graphical installer has several important advantages. First, it lets you install an entire cluster at once. It actually sits on top of the auto-installer and reuses that same functionality to install remote nodes. The graphical installer, however, will first verify that all the nodes are reachable before the installation starts, so the installation won't quietly hang on an unreachable node. It also accepts wildcarded host name and IP address ranges, which makes installing a huge cluster much simpler.

The third major feature is that we've added support for Microsoft Windows Vista (Ultimate and Enterprise) and Server 2003R2 and 2008. Both 32-bit and 64-bit version are available. Harald (who you should encourage to start blogging!) worked really hard on ironing out the issues with the changes in the OS. We still rely on SFU for the Windows execution daemons, except that it's now called SUA.

The fourth big feature is job-level parallel job resource requests. Before 6.2u2, whenever a parallel job requested a resource, SGE would implicitly multiply that resource request by the number of assigned slaves (because each slave requests the resource on the host where it runs). That makes sense with, say, memory, where requesting 4GB really means that every slave should have 4GB. It doesn't make any sense for other things, like some software licenses. Now with 6.2u2, the administrator can flag a resource as job level, meaning that it is not multiplied by the number of assigned slaves when requested by a parallel job. In most cases, a resource that shouldn't be multiplied in for one job, shouldn't be multiplied for any job. There may be exceptions to the rule, but I doubt there will be many. I'd love to hear your feedback, though.

The last two new features aren't so much features as improvements. Starting with 6.2u2, the 64-bit Linux binaries use the jemalloc library instead of the default Linux malloc. The performance and memory footprint impact is significant, in some cases as much as 20% improvement. Also, starting with 6.2u2, the Linux binaries use poll() instead of select() in the commlib. For some flavors of Linux, the use of select() made it difficult to scale past a couple thousand hosts. With the commlib now using poll(), I've seen SGE scale well over 6000 Linux nodes.

And on top of all that, there is the usual pile of bug fixes. A handful of qmaster and scheduler issues cropped up recently in 6.2 and 6.2u1, but with 6.2u2 those should all now be resolved.

I highly recommend giving 6.2u2 a try, if for no reason other than JSV. Let me know what you think!

Permalink Comments [2] (2009-03-05 09:46:14.0/2009-03-05 09:46:14.0)
Trackback: http://blogs.sun.com/templedf/entry/sun_grid_engine_6_21
 
 
Calendar
« March 2009 »
SunMonTueWedThuFriSat
1
2
3
4
6
7
8
9
10
11
12
13
14
15
17
18
21
22
23
24
25
26
27
28
29
30
31
    
       
Today
Blog::Navigation
Bookmarks::Grid Engine
Bookmarks::Blogroll
Bookmarks::News
Link to DanT's GridBlog

Link to DanTs GridBlog

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.
Powered by Roller Weblogger.

XML