DanT's GridBlog  
All | Chess | Cooking | Grid | Java | Misc | Travel
 
Wednesday September 03, 2008
Big Just Got Huge

I'm sure you're all familiar with the TACC Ranger system by now. 62,000 cores, 125TB RAM, a couple of petabytes of storage, #4 on the Top500 list, all running one gigantic Sun Grid Engine cluster under a single qmaster. As if that weren't exciting enough, I have some new amazingness to share.

I heard a couple of weeks ago from one of the Sun engineers onsite at TACC that they have successfully run a 60,000-core parallel job on Ranger. For those of you who are familiar with MPI, I'll give you a moment to recover. For those of you who aren't, a parallel job is a distributed application with multiple cooperating tasks running across multiple machines. In this this case, it's a single application instance composed of cooperating tasks spread across several thousand servers. Yes, really.

Even more unbelievable, this feat was accomplished with a special branch of the Sun Grid Engine 6.1 release using the old SSH-based parallel job support. (The 6.2 Grid Engine release includes a more scalable "built-in" method for starting parallel jobs that blows the doors off the old RSH- or SSH-based model.) When TACC has completed the upgrade to 6.2, the scalability numbers will be outrageous!

This 60k-core job is part of a facial recognition application being developed by PNNL. The application is able to recognize faces in images in faster-than-real-time using the Ranger system at TACC. The reason the job didn't use all 62k+ cores in the system is administrative: there isn't a single queue that spans every host yet. That will be remedied soon, I'm told.

Permalink Comments [0] (2008-09-04 07:05:36.0/2008-09-03 22:28:54.0)
Trackback: http://blogs.sun.com/templedf/entry/big_just_got_huge
 
Trackback URL: http://blogs.sun.com/templedf/entry/big_just_got_huge
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed
 
Calendar
« February 2010
SunMonTueWedThuFriSat
 
1
2
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
      
       
Today
Blog::Navigation
Bookmarks::Grid Engine
Bookmarks::Blogroll
Bookmarks::News
Link to DanT's GridBlog

Link to DanTs GridBlog

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.
Powered by Roller Weblogger.

XML
 
  The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.