Switching Subjects
Until very recently, Sun's been hard to find in the world of high performance computing (HPC). Back in the 2001 timeframe, we lost the recipe for the P in HPC - customers that wanted performance were no longer looking at small numbers of big systems (a traditional Sun specialty), they were looking to large numbers, clusters, of small systems. And in 2001, that wasn't our focus.
But over the last five years, we've been investing to change that. Our Galaxy and Niagara product lines are just about the fastest growing products at Sun. OpenSolaris is beginning to catch a wave of adoption on small systems, and we've been doubling down on compiler optimization and language innovation. All with a focus on extreme efficiency/performance. If there were ever a time to reenter the market, it's now.
Rather than mimic the competition, we started out examining the issues and challenges facing the largest HPC installations we could find. Performance was certainly the main priority. But there were others - and not what you'd expect if your idea of a cluster was three PC's in a closet.
At three or four hundred computers, the challenges of building a cluster shift around quite a bit. Dissipating heat, sourcing enough power, managing software versioning or hardware failures, just to name a few. Get to three or four thousand nodes, and all of a sudden everything from weight (floor loading), to the bend radius of optical cabling, to massive software provisioning challenges, even the speed with which data can be moved around a room become critical factors. And that's where we decided to focus our efforts, at the extreme - on the assumption it would one day become the norm (as so often is the case in this industry).
I've read quite a lot of feedback from pundits and analysts over the past few days, and wanted to be sure to respond to one item - from those who believe the high end supercomputing marketplace is small, esoteric, and has very slim profit margins.
The high end of the supercomputing marketplace is small, esoteric, and has very small profit margins - they're absolutely right.
And like the world of free software (in which no one's going to get rich selling to the open source community), no one's going to build a profitable business selling to the academics and researchers who dominate the extremes of HPC.
That's not the point.
The academic supercomputing community (there's that word again) sets the pace for enterprise computing across the world – which has grabbed on to HPC for an array of real world challenges, from virus, disease, and drug discovery, to customer purchase pattern analytics, capital markets trading, energy discovery, dynamic resource management - you name it, it's one of the fastest growing segments in the marketplace. Proving that what starts in academia, ends up on main street. Industry looks to academia and research institutions to understand the innovations that enable breakthrough scale and performance (just ask Linus - who, come to think of it, still hasn't responded to my dinner invite... I hope it's not my cooking.)
What We Announced
In Dresden, Germany, earlier in the week, we announced the Constellation System - a set of generally available building blocks any customer, educational or commercial, can use to build from a few teraflops system, to more than a 2 petaflops system. As a part of this broad announcement, we unveiled a few component elements - notably...
Our commitment to the rise of OpenSolaris in the HPC community – joining Linux as a reliable, resilient platform for petaflops scale systems (those capable of executing a thousand trillion instructions per second). What's driving preference for OpenSolaris? Legendary support for huge memory configurations, integrated virtualization, DTrace and the ZFS file system are probably the biggest drivers – but support for ROCKS, a price tag that says FREE/open source, and the fact it'll run on any server built are a big help, too. Success in HPC is a very high priority for the Solaris team, and an area of investment for us and our partners. (And no, this doesn't lessen our focus on Linux - if we can combine licenses, it'll amplify it.)
Second, we unveiled an integrated 48 blade rack that supports all volume microprocessors, AMD, Niagara and Intel – in the same rack, with
standardized I/O. Picture on the left. We also announced a new blade, Pegasus, designed purely for HPC grids. No seatbelts, no redundant anything, just raw compute performance.
Third, and most importantly, we unveiled Project Magnum (at right), an absolutely massive (3,456 port - click here to find out the significance of that number) infiniband (IB) switch – designed to alleviate a ton (three tons, actually) of the cabling, weight, expense and latency nightmares saddling most supercomputing facilities. This one innovation, courtesy of the extraordinary Systems team led by chief architect Andy Bechtolsheim, allows those with serious computing needs to dispense with a massive amount of complexity and expense. The largest competitive IB switch in the market today is 288 ports - so you'd need a lot of them (with an equivalent proliferation of support nodes, cabling and complexity) to match Magnum. In an industry where size matters, we're feeling plucky. (We expect the economics behind Magnum to prove out around 420 nodes – so even if you're building a little grid, Magnum pays for itself.)
Our view is we can reduce by a factor of two or three, at least, the cost and complexity of building a supercomputer – in an academic or commercial environment. Bringing general purpose systems, and volume economics, back to a market that was starting to turn proprietary. What the Constellation System allows for is a transition from this first picture...

To this, a vastly simpler, lighter, easier to manage/maintain Petaflops scale HPC installation.
.
Three tons lighter, three times less expensive to build, a fraction of the cabling and vastly simpler to manage. And at up to two petaflops, I'm quite convinced we could spank Bobby Fischer...
For those interested in the details behind our win at the Texas Advanced Computing Center (TACC). Here's what they're running:
TFLOPs: approximately500 TERAFLOPs
Magnums: 2 (>2000 4x IB ports each, expandable to 6,912 ports)
Thumpers: 72 (1.728 PB)
Metadata storage: STK6450 RAID (9.3 TB)
Tape storage: STK SL8500
Storage/Data Management: SAM/QFS
Racks: 82
IB NEMs: 328
Pegasus blades: 3936
Aggregate memory size: 123 TB
Number of cores: 62,976
Total racks: 94
Approx footprint: 2,037 sq ft
Approx power: 2.4 MWatts
IB cable length: ~14 kilometers
To put that in perspective, their computing facility will be about half the size of an NBA basketball court. Not exactly small - and in fact, likely the largest on earth.
And for those curious as to why we settled on 3,456 ports...
____________________________
Begin forwarded message:
From: Andreas Bechtolsheim
Date: June 28, 2007 6:58:59 AM PDT
To: Jonathan Schwartz
Cc: John Fowler
Subject: 3,456
We implement a 5-stage fabric, and with a 24-port switching element
the maximum number of ports is n*n/2*n/2, or 24*12*12 =3456.
Other Infiniband switches in the market today are 3-stage fabrics
and they have n*n/2 or 24*12 = 288 ports.
Now you can build a 5-stage 3456 port switch with 12 288-port switches
and 288 24-port leaf switches but you end up with 300 boxes occupying
about 456U of rack space or 12 racks, and 6912 cables.
We use one double rack with 1152 cables, so it is 1/6th the space,
1/6th the cables and 1/6th the weight.
On Jun 28, 2007, at 6:36 AM, Jonathan Schwartz wrote:
so - why 3,456 ports?
----------------------------
and last, but certainly not least - if you'd like to try a supercomputer on an hourly basis, just point your browser to network.com... we've made a ton of progress in the past 6 months...
Posted on 10:52PM Jun 28, 2007 | Comments[43]

























Posted by Dennis on June 29, 2007 at 12:20 AM PDT #
Posted by Anantha on June 29, 2007 at 03:23 AM PDT #
Posted by Paul on June 29, 2007 at 03:28 AM PDT #
Posted by Peter on June 29, 2007 at 03:34 AM PDT #
Posted by as on June 29, 2007 at 04:44 AM PDT #
Posted by Austin on June 29, 2007 at 06:12 AM PDT #
Posted by Paul Morriss on June 29, 2007 at 06:35 AM PDT #
Posted by observer on June 29, 2007 at 06:57 AM PDT #
Posted by Wayne Abbott on June 29, 2007 at 06:59 AM PDT #
Posted by Mike Coe on June 29, 2007 at 09:58 AM PDT #
I was so pleased today when I met with representatives of said company and I was able to point to the turnaround that Sun is demonstrating via an open culture and concrete solutions such as Constellation, Blade, Opteron-based products, Magnum, BlackBox, ZFS, DTrace, OpenJDK, JavaFX et al.
Congratulations to Sun's employees, you, Andy Bechtolsheim, Jeff Bonwick and the various teams for an open and results oriented approach to computing. I think there is a good change Sun will be here in the next 5-years. Now I hope you have time for that anticipated dinner and we will see soon a boot-able ZFS as part of the Linux kernel.
Posted by J.F. Zarama on June 29, 2007 at 12:26 PM PDT #
Posted by Jan on June 29, 2007 at 01:13 PM PDT #
Posted by Gumby on June 29, 2007 at 01:13 PM PDT #
Posted by Gumby on June 29, 2007 at 01:30 PM PDT #
Posted by Gumby on June 29, 2007 at 01:35 PM PDT #
Posted by Stenley on June 29, 2007 at 02:21 PM PDT #
Gumby: Your choice of handle is perhaps more revealing about you than, on the evidence of your comment, you'll ever be capable of imagining.
You got a cell phone? Ever browsed the web? Ever bought or sold anything on ebay? If you've done any of those things, you probably touched a Sun product without ever even knowing it.
In this blog entry, Jonathan explains how Sun's stock got in to the doghouse, and how Sun might see sunlight again.
If the only audience's for Jonathan's blog were his employees, his blog would be pointless. He could just use email within Sun. His blog is aimed users of Sun products, potential users of Sun products, partnerts, financial analysts, anybody else interested in how the network brings people together, and, probably lastly, Sun employees.
Posted by Paul Davies on June 29, 2007 at 02:58 PM PDT #
Posted by nobody on June 29, 2007 at 04:05 PM PDT #
Posted by nobody on June 29, 2007 at 04:12 PM PDT #
Posted by Ryan on June 29, 2007 at 11:10 PM PDT #
Posted by chris on June 30, 2007 at 02:33 AM PDT #
We want an OS free of patent threats.
We are willing to buy anything as long as it is clean and straight.
With OpenSolaris and OpenJDK, we are almost there, but for the marketing.
Java was made for Jini use. J2ME runs on mobiles. But people don't know that as much - "SUN is shining in the sky. Stanford University Network? What is that? A group of professors? "
And we really want UltraSPARCs and the lot, put more than one of them in a box, instead of bothering too much about clock speed.
-----------------------------
What you've done with HPC(3456), do with the PC - 3,4,5,6 CPUs. And RAM _is_ cheap.
-----------------------------
And for once, for Heaven's sake, use the _established channel_ of distributors and vendors.
Direct sellers are booted out by cartels and politicians.
Mighty Dell had to bend.
No one here knows about SUN grid (IIRC, it's around since 2003).
No one knows about Sun Global Desktop, dammit!
You should really do some marketing here. There are tons of Java programmers out here. What are you waiting for?
Jon, I hate you and your company.
You make the breakthroughs, year after year, don't market them, and let others copy and earn.
W H Y ?
Just as a piece of superb design backed by a robust people-participation methodolgy, Java needs more fame with non-tech humans. What to say of OpenJDK, all the Java code out there and OpenSolaris.
Finally, stick to your promise, GPLv3.
And, let us all make Heyyyyy!! while the SUN shines.
Posted by Arvinda on June 30, 2007 at 04:50 AM PDT #
Posted by Al on June 30, 2007 at 07:57 AM PDT #
I don't know if anyone noticed this: An e-mail was sent from the CEO at 6:36 "am", and it was promptly replied by one of Sun's top employees at 6:58 (OK OK let's make it 6:59) am.
Can't see how in the world this company will not succeed.
Posted by W. Wayne Liauh on June 30, 2007 at 01:41 PM PDT #
In the area of InfiniBand HPC clusters, Constellation is an interesting concept. SGI released a similar concept with its Altix ICE. It will be interesting to see how IBM, with BladeCenter H, HP, with C-Class, Dell, combined with Cisco, Voltaire, and QLogic respond.
The 48 blade chassis also explains why Sun never attempted to buy a dense rack computer vendor such as Rackable or Verrari.
I would like to see Sun also offer the 48 blade Constellation chassis as an SP play (with low-power chips), now that Rackable seems to be struggling.
Posted by Mark on June 30, 2007 at 01:47 PM PDT #
Posted by John Biggs on July 01, 2007 at 02:49 AM PDT #
Posted by Kevin on July 01, 2007 at 05:48 PM PDT #
Posted by Rob on July 02, 2007 at 07:30 AM PDT #
Posted by netique on July 02, 2007 at 08:32 AM PDT #
Posted by Kevin on July 02, 2007 at 11:19 AM PDT #
Posted by Phenom on July 02, 2007 at 02:57 PM PDT #
Posted by Tim Scanlon on July 02, 2007 at 06:50 PM PDT #
Posted by Jeffrey Fall on July 02, 2007 at 10:30 PM PDT #
Posted by Jeffrey Fall on July 02, 2007 at 11:00 PM PDT #
One suggestion that would probably help you immensely, if for no reason other than marketing purposes: please provide plug-ins for Netbeans for these programs, in particular for R. Eclipse has a mediocre plug-in available. I much prefer Netbeans but it's limited in its support of languages. The R user base is huge. Providing a cross-platform IDE would be good for everyone.
You could even include a link to network.com as well as a connection from within Netbeans.
Posted by Computing Guy on July 03, 2007 at 06:49 AM PDT #
Posted by Alan on July 03, 2007 at 08:35 AM PDT #
Posted by Jacob Mathai on July 03, 2007 at 08:42 AM PDT #
Posted by DMCC on July 04, 2007 at 01:59 AM PDT #
Posted by Ivy on July 05, 2007 at 03:07 AM PDT #
Posted by Bill on July 06, 2007 at 09:51 AM PDT #
Posted by Phenom-non on July 06, 2007 at 08:39 PM PDT #
Posted by Rob-a-not on July 06, 2007 at 08:43 PM PDT #
Posted by Fred on July 08, 2007 at 06:11 AM PDT #