The Navel of Narcissus
Josh Simons' Coordinates in the Blogosphere

20051112 Saturday November 12, 2005

Sun HPC Consortium Afternoon I

My second and final installment from Day I of Sun HPC Consortium meeting in Seattle. We had two customer presentations, which I summarize below.

L. Eric Greenwade, Idaho National Laboratory (INL), USA

Eric Greenwade is the HPC Architect at the Idaho National Laboratory and at the newly formed Center for Modeling and Simulation (CAMS). His talk was titled Ozone: Sun v20z Cluster Retrospective.

INL is charged by the US Department of Energy (DOE) to lead the revitalization of nuclear energy in the US. It was quite interesting to hear some of the nuclear engineering applications with which INL is involved. They are working with NASA on space transportation architectures (both robotic and crewed vehicles) and are also working on homeland security applications: photonuclear detection of nuclear materials, including cargo container detection of SNM (special nuclear materials). As Eric says, you actually don't WANT to detect SNMs in cargo containers. :-)

With respect to HPC computing infrastructure, Eric described a heterogeneous site with a majority of Sun equipment (other systems include some old Cray vector machines and some generic Linux clusters.) INL use Sun Enterprise servers and storage to form the backbone and have a 494 processor v20z (called Ozone) with a dual GigE network dedicated to message-passing traffic.

According to Eric, their v20z cluster has allowed INL to leapfrog their original goals by two years. TThey've achieved 20-80x performance increases with the system and it is the centerpiece of the INL HPC environment. The system is the first step in INL's 10-year strategic HPC plan which takes them to over 1 PetaFLOP within the decade.

David De Roure, University of Southampton, UK

Our second customer speaker in the afternoon was David De Roure, who is the head of Grid and Pervasive Computing at the University of Southampton. His talk was titled "WUN Grid -- The Worldwide University Network Grid.

The Worldwide University Network is an international alliance of 16 research-led institutions in the UK, US, and Scandinavia. The WUN came first, followed by the WUN Grid, which is a deployment of grid technology across these institutions to form a virtual organization. One of David's main points was that the pre-existing relationships and trust that existed betweenWUN member organizations was a key factor in the success of establishing a shared grid infrastructure between the members. With these relationships they were able to easily overcome the typical organizational barriers that often hobble international grid efforts.

Interestingly, the WUN Grid decided to focus their initial efforts in a non-traditional area for grid computing. Their priorities are arts and humanities and social sciences. These areas were chosen in part because it was felt that even modest efforts in these areas could yield large results. In terms of infrastructure priorities, these were similarly non-traditional: First data grids, then collaborative grids, and then finally computational grids.

David mentioned several efforts, one of which I'll sketch here. They've established an arts and humanities effort that has come to be known unofficially as the Culture Grid. It is linked to HASTAC in the US, which is a strategic alliance of scientists, humanists, artists, social theorists, legal specialists and information technology specialists. In the UK, it is linked with the UK Arts and Humanities Research Council and the Humanties Data Service. In addition, they have linked to the Global Grid Forum's Humanities, Arts, and Social Sciences Research Group.


(2005-11-12 18:33:04.0) Permalink Comments [0]

Sun HPC Consortium Day I

The Sun HPC Consortium meeting kicked off promptly at 8am in the Bell Harbor Conference Center on the Seattle waterfront. Three Sun customers spoke in the morning session.

Jim Pepin, University of Southern California, USA

First up was Jim Pepin, from the University of Southern California. Jim is the USC Information Services CTO and Director of their HPC Center. His talk was titled Building and Benchmarking a 10TF Linux Cluster.

The USC cluster comprises a mixture of over 1800 Sun, Dell, and IBM nodes with a Mryinet interconnect fabric. In addition to their cluster, the center also has a Sun Fire 15K with 72 processors and 288GB memory, which is used for jobs requiring very large amounts of shared memory.

Jim identified several unique challenges in building large clusters. Power and airflow are primary issues. Power draw during benchmark runs is about 2X over idle. Jim focused on the amount of power consumed under load, but to me the real issue was that the multiplier was only a factor of two--idle power consumption in the datacenter is huge. With respect to cooling, he mentioned they can see the effects of non-uniform cooling during their benchmark runs as some processors near the tops of their racks automatically declock (and slow down) as they overheat.

Jim also mentioned wiring as a huge issue at this scale. They recently upgraded their Myrinet infrastructure and spent about 12 hours running new cables. Primary issues include density, testing, and power cabling. In addition, this large mass of cabling can also interfere with effective cooling.

Graham Mowbray, ACEnet, Canada

Our second customer speaker was Graham Mowbray, the Executive Director of ACEnet. His talk was titled Transforming Research in Atlantic Canada.

ACEnet (Atlantic Computational Excellence Network) is a program designed to support distributed collaboration amongst 20+ universities and institutions across Canada's four Atlantic provinces: Newfoundland, Nova Scotia, Prince Edward Island, and New Brunswick.

Because the Atlantic provinces have a sparse and very distributed population, distributed technology collaboration is seen as an important need to reach critical mass for research effectiveness in the region.

ACEnet has decided to partner with Sun to deliver the value of distributed technologies to Altantic Canada. As Graham described it, they were looking for a partner with several important characteristics:

First, they realized that they did not have the human resources to 'self-build' their solution--they needed a partner to help. Second, they realized that they didn't need to buy the future right now, but they did need to find a partner whose vision is consistent with ACEnet's so that elements would be available in the future when they need them. They also wanted a vendor who would be around for a long time and also that they preferred a single vendor as a way to reduce complexity.

And, at the end of the day, having the appropriate hardware was just "table stakes" to be considered as a partner. What carried more weight was having a strong grid vision and a real commitment to a substantial partnership over the long term.

Sean Smith, University of Queensland, Australia

Our third speaker was Sean Smith, Director of the Centre for Computational Molecular Science (CMMS) from the University of Queensland. His talk was titled Computational Molecular Science in Nanotechnology and Biotechnology: Implemented on Sun's Intel Xeon and AMD Opteron Grid Platforms.

CCMS hosts Australia's first Sun Center of Excellence, the Sun Center of Excellence for Simulation of Bio- and Nano-Systems. CMMS is also the site of Australia's second largest cluster resource.

CMMS runs a set of clusters using a mix of Sun Fire v60x and v20z two-processor servers connected with Gigabit ethernet. These systems are used to run a mix of single-process and parallel jobs using a mixture of ISV and homegrown applications.

CMMS also uses an older Sun Fire V880 with 8 processors and 32 GB of memory. As Prof. Smith said, "it's an older system, but still useful for quantum chemistry codes that just won't fit in a cluster node."

This Sun customer is running mostly Fedora Core 3 along with other Fedora revs and Solaris 9 on their large shared-memory machines. They use Intel and Portland Group compilers and generally see a 25% or so performance advantage on Opteron over Xeon for their applications. CMMS uses the Linux version of Sun's Grid Engine software.

Dr. Smith ended with an overview of the particular scientific questions being explored at the Center. I can't do justice to that and so would recommend instead a visit to the Center's website for more details.


(2005-11-12 14:08:02.0) Permalink Comments [0]


 
archives
links
stats