The Navel of Narcissus
Josh Simons' Coordinates in the Blogosphere

20051112 Saturday November 12, 2005

Sun HPC Consortium Day I

The Sun HPC Consortium meeting kicked off promptly at 8am in the Bell Harbor Conference Center on the Seattle waterfront. Three Sun customers spoke in the morning session.

Jim Pepin, University of Southern California, USA

First up was Jim Pepin, from the University of Southern California. Jim is the USC Information Services CTO and Director of their HPC Center. His talk was titled Building and Benchmarking a 10TF Linux Cluster.

The USC cluster comprises a mixture of over 1800 Sun, Dell, and IBM nodes with a Mryinet interconnect fabric. In addition to their cluster, the center also has a Sun Fire 15K with 72 processors and 288GB memory, which is used for jobs requiring very large amounts of shared memory.

Jim identified several unique challenges in building large clusters. Power and airflow are primary issues. Power draw during benchmark runs is about 2X over idle. Jim focused on the amount of power consumed under load, but to me the real issue was that the multiplier was only a factor of two--idle power consumption in the datacenter is huge. With respect to cooling, he mentioned they can see the effects of non-uniform cooling during their benchmark runs as some processors near the tops of their racks automatically declock (and slow down) as they overheat.

Jim also mentioned wiring as a huge issue at this scale. They recently upgraded their Myrinet infrastructure and spent about 12 hours running new cables. Primary issues include density, testing, and power cabling. In addition, this large mass of cabling can also interfere with effective cooling.

Graham Mowbray, ACEnet, Canada

Our second customer speaker was Graham Mowbray, the Executive Director of ACEnet. His talk was titled Transforming Research in Atlantic Canada.

ACEnet (Atlantic Computational Excellence Network) is a program designed to support distributed collaboration amongst 20+ universities and institutions across Canada's four Atlantic provinces: Newfoundland, Nova Scotia, Prince Edward Island, and New Brunswick.

Because the Atlantic provinces have a sparse and very distributed population, distributed technology collaboration is seen as an important need to reach critical mass for research effectiveness in the region.

ACEnet has decided to partner with Sun to deliver the value of distributed technologies to Altantic Canada. As Graham described it, they were looking for a partner with several important characteristics:

First, they realized that they did not have the human resources to 'self-build' their solution--they needed a partner to help. Second, they realized that they didn't need to buy the future right now, but they did need to find a partner whose vision is consistent with ACEnet's so that elements would be available in the future when they need them. They also wanted a vendor who would be around for a long time and also that they preferred a single vendor as a way to reduce complexity.

And, at the end of the day, having the appropriate hardware was just "table stakes" to be considered as a partner. What carried more weight was having a strong grid vision and a real commitment to a substantial partnership over the long term.

Sean Smith, University of Queensland, Australia

Our third speaker was Sean Smith, Director of the Centre for Computational Molecular Science (CMMS) from the University of Queensland. His talk was titled Computational Molecular Science in Nanotechnology and Biotechnology: Implemented on Sun's Intel Xeon and AMD Opteron Grid Platforms.

CCMS hosts Australia's first Sun Center of Excellence, the Sun Center of Excellence for Simulation of Bio- and Nano-Systems. CMMS is also the site of Australia's second largest cluster resource.

CMMS runs a set of clusters using a mix of Sun Fire v60x and v20z two-processor servers connected with Gigabit ethernet. These systems are used to run a mix of single-process and parallel jobs using a mixture of ISV and homegrown applications.

CMMS also uses an older Sun Fire V880 with 8 processors and 32 GB of memory. As Prof. Smith said, "it's an older system, but still useful for quantum chemistry codes that just won't fit in a cluster node."

This Sun customer is running mostly Fedora Core 3 along with other Fedora revs and Solaris 9 on their large shared-memory machines. They use Intel and Portland Group compilers and generally see a 25% or so performance advantage on Opteron over Xeon for their applications. CMMS uses the Linux version of Sun's Grid Engine software.

Dr. Smith ended with an overview of the particular scientific questions being explored at the Center. I can't do justice to that and so would recommend instead a visit to the Center's website for more details.


(2005-11-12 14:08:02.0) Permalink Comments [0]

Trackback URL: http://blogs.sun.com/simons/entry/sun_hpc_consortium_day_i
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed

 
archives
links
stats