Thursday Jun 28, 2007

Switching Subjects

Until very recently, Sun's been hard to find in the world of high performance computing (HPC). Back in the 2001 timeframe, we lost the recipe for the P in HPC - customers that wanted performance were no longer looking at small numbers of big systems (a traditional Sun specialty), they were looking to large numbers, clusters, of small systems. And in 2001, that wasn't our focus.

But over the last five years, we've been investing to change that. Our Galaxy and Niagara product lines are just about the fastest growing products at Sun. OpenSolaris is beginning to catch a wave of adoption on small systems, and we've been doubling down on compiler optimization and language innovation. All with a focus on extreme efficiency/performance. If there were ever a time to reenter the market, it's now.

Rather than mimic the competition, we started out examining the issues and challenges facing the largest HPC installations we could find. Performance was certainly the main priority. But there were others - and not what you'd expect if your idea of a cluster was three PC's in a closet.

At three or four hundred computers, the challenges of building a cluster shift around quite a bit. Dissipating heat, sourcing enough power, managing software versioning or hardware failures, just to name a few. Get to three or four thousand nodes, and all of a sudden everything from weight (floor loading), to the bend radius of optical cabling, to massive software provisioning challenges, even the speed with which data can be moved around a room become critical factors. And that's where we decided to focus our efforts, at the extreme - on the assumption it would one day become the norm (as so often is the case in this industry).

I've read quite a lot of feedback from pundits and analysts over the past few days, and wanted to be sure to respond to one item - from those who believe the high end supercomputing marketplace is small, esoteric, and has very slim profit margins.

The high end of the supercomputing marketplace is small, esoteric, and has very small profit margins - they're absolutely right.

And like the world of free software (in which no one's going to get rich selling to the open source community), no one's going to build a profitable business selling to the academics and researchers who dominate the extremes of HPC.

That's not the point.

The academic supercomputing community (there's that word again) sets the pace for enterprise computing across the world – which has grabbed on to HPC for an array of real world challenges, from virus, disease, and drug discovery, to customer purchase pattern analytics, capital markets trading, energy discovery, dynamic resource management - you name it, it's one of the fastest growing segments in the marketplace. Proving that what starts in academia, ends up on main street. Industry looks to academia and research institutions to understand the innovations that enable breakthrough scale and performance (just ask Linus - who, come to think of it, still hasn't responded to my dinner invite... I hope it's not my cooking.)

What We Announced

In Dresden, Germany, earlier in the week, we announced the Constellation System - a set of generally available building blocks any customer, educational or commercial, can use to build from a few teraflops system, to more than a 2 petaflops system. As a part of this broad announcement, we unveiled a few component elements - notably...

Our commitment to the rise of OpenSolaris in the HPC community – joining Linux as a reliable, resilient platform for petaflops scale systems (those capable of executing a thousand trillion instructions per second). What's driving preference for OpenSolaris? Legendary support for huge memory configurations, integrated virtualization, DTrace and the ZFS file system are probably the biggest drivers – but support for ROCKS, a price tag that says FREE/open source, and the fact it'll run on any server built are a big help, too. Success in HPC is a very high priority for the Solaris team, and an area of investment for us and our partners. (And no, this doesn't lessen our focus on Linux - if we can combine licenses, it'll amplify it.)

Second, we unveiled an integrated 48 blade rack that supports all volume microprocessors, AMD, Niagara and Intel – in the same rack, with standardized I/O. Picture on the left. We also announced a new blade, Pegasus, designed purely for HPC grids. No seatbelts, no redundant anything, just raw compute performance.

Third, and most importantly, we unveiled Project Magnum (at right), an absolutely massive (3,456 port - click here to find out the significance of that number) infiniband (IB) switch – designed to alleviate a ton (three tons, actually) of the cabling, weight, expense and latency nightmares saddling most supercomputing facilities. This one innovation, courtesy of the extraordinary Systems team led by chief architect Andy Bechtolsheim, allows those with serious computing needs to dispense with a massive amount of complexity and expense. The largest competitive IB switch in the market today is 288 ports - so you'd need a lot of them (with an equivalent proliferation of support nodes, cabling and complexity) to match Magnum. In an industry where size matters, we're feeling plucky. (We expect the economics behind Magnum to prove out around 420 nodes – so even if you're building a little grid, Magnum pays for itself.)

Our view is we can reduce by a factor of two or three, at least, the cost and complexity of building a supercomputer – in an academic or commercial environment. Bringing general purpose systems, and volume economics, back to a market that was starting to turn proprietary. What the Constellation System allows for is a transition from this first picture...

To this, a vastly simpler, lighter, easier to manage/maintain Petaflops scale HPC installation.

.

Three tons lighter, three times less expensive to build, a fraction of the cabling and vastly simpler to manage. And at up to two petaflops, I'm quite convinced we could spank Bobby Fischer...

For those interested in the details behind our win at the Texas Advanced Computing Center (TACC). Here's what they're running:

TFLOPs: approximately500 TERAFLOPs
Magnums: 2 (>2000 4x IB ports each, expandable to 6,912 ports)
Thumpers: 72 (1.728 PB)
Metadata storage: STK6450 RAID (9.3 TB)
Tape storage: STK SL8500
Storage/Data Management: SAM/QFS
Racks: 82
IB NEMs: 328
Pegasus blades: 3936
Aggregate memory size: 123 TB
Number of cores: 62,976

Total racks: 94
Approx footprint: 2,037 sq ft
Approx power: 2.4 MWatts
IB cable length: ~14 kilometers

To put that in perspective, their computing facility will be about half the size of an NBA basketball court. Not exactly small - and in fact, likely the largest on earth.

And for those curious as to why we settled on 3,456 ports...

____________________________

Begin forwarded message:
From: Andreas Bechtolsheim
Date: June 28, 2007 6:58:59 AM PDT
To: Jonathan Schwartz
Cc: John Fowler
Subject: 3,456

We implement a 5-stage fabric, and with a 24-port switching element
the maximum number of ports is n*n/2*n/2, or 24*12*12 =3456.

Other Infiniband switches in the market today are 3-stage fabrics
and they have n*n/2 or 24*12 = 288 ports.

Now you can build a 5-stage 3456 port switch with 12 288-port switches
and 288 24-port leaf switches but you end up with 300 boxes occupying
about 456U of rack space or 12 racks, and 6912 cables.
We use one double rack with 1152 cables, so it is 1/6th the space,
1/6th the cables and 1/6th the weight.

On Jun 28, 2007, at 6:36 AM, Jonathan Schwartz wrote:

so - why 3,456 ports?

----------------------------

and last, but certainly not least - if you'd like to try a supercomputer on an hourly basis, just point your browser to network.com... we've made a ton of progress in the past 6 months...

Share this post  del.icio.us | digg.com | slashdot.org | technorati.com | reddit | facebook | stumbleupon