Thursday May 28, 2009

The Jülich VIP data center tour was a rare treat. I can only imagine anthropologists examining the site 100's of years from now and wondering why anyone would run 5 MW of power and 1000's of gallons an hour of water into the basement of a gymnasium sized building.

Running a PetaFlop BlueGene system, a 2000 node Sun cluster, and a 1000 node Bull cluster takes a lot of cooling water. The facilities work that goes into a modern HPC data center is an absolutely amazing act of mechanical engineering in and of itself, never mind the computers.

If you look closely you will see the chilled water connector from the Sun Constellation System rear cooling door at the bottom right of the rack. A flexible connector is used to connect to the under-floor chiller water supply. There is also a top-of-door connector (not shown) for customers who run chiller water pipes above the racks.

Here is another shot of the under-floor piping.

Sun also has a gas refrigerant cooling door option for Sun Constellation System. The gas refrigerant door requires smaller diameter piping to the racks and can have other advantages, although it does require an external unit which can be outside the computer room, typically with its own chilled water heat exchanger. Sun's data center design team can help design the ideal data center cooling system for your Sun Constellation System.

Tuesday May 26, 2009

Everyone from traditional web hosting companies to university HPC centers are rebranding themselves with the Cloud Computing moniker these days, but what's the reality behind the hype. We certainly have many great hardware and software products that can be used for HPC, web hosting, and just about anything else you would want to do "in a cloud", but when Sun talks about cloud computing, there are a couple of key defining concepts we focus on:
  • Virtualization
  • Multi-tenancy
  • Real-time, user-controlled provisioning
  • Pay-per-use

    We believe there will be many different types of clouds, including public clouds like Amazon's offerings and the Sun Cloud, enterprise clouds, clouds run by service providers, and other hybrid offerings.

    My group at Sun, besides selling large HPC systems, is responsible for helping customers build enterprise clouds using Sun's technology. Many of our customers are starting down the path of building enterprise clouds, most are not ready to talk about it in public. So I was very excited to read about NASA's enterprise cloud, called Nebula, and how it is using the Sun Lustre file system as a key part of their cloud architecture. The Nebula web site gives a detailed description of their Lustre implementation.

    My group has, of course, worked on most of the large Lustre deployments, including many on non-Sun hardware, that have been done around the world. One thing we realized is that not everyone has rocket scientists on their staff, and even if they do, they don't always want to spend their time custom-designing Lustre storage systems. So to help HPC and enterprise cloud customers simplify and accelerate the deployment of Lustre, we have created the Sun Lustre Storage System. Scaling from 1 to over 100 GB/sec, the modular architecture of the Sun Lustre Storage System makes it easy for anyone to deploy Lustre.

    In the coming days, you will be hearing a lot more about the Sun Cloud, and many other Sun technologies being used in our cloud deployments.

  • The Jülich web site has been updated with a nice shot of the Sun Constellation System. They also have details on the technical configuration.

    Here is a good view of three of the six Sun Magnum QDR switches at Jülich. Each switch has 648 QDR IB ports exposed as 216 CXP 12x connectors.

    I won't show pictures of the other vendor's IB rack, but just think about this with three times as many cables.

    Of course, the cabling gets even more challenging under the floor.

    But it sure likes nice when you are all done.

    And here is the team to thank for all that hard work, yours truly just there for the picture op as I have to say I didn't help with any of this.

    Memorial Day started about 9 hours too early for me, as the first rays of sunlight broke through the bottom of the window shade in United Airlines 747 as we descended towards Frankfurt airport. I'm visiting Germany this week for the grand opening of the new Jülich Supercomputer Center, and its 2000 node Sun Constellation System. The Jülich system is one of the first large QDR-based InfiniBand supercomputers, but we expect that 40 Gb/sec QDR technology will rapidly replace the previous generation 20 Gb/sec DDR technology in large clusters, not only because of its higher bandwidth but also because of the improved latency of QDR.

    The Jülich system also features a Sun Lustre Storage System directly connected to its InfiniBand network, using multiple Lustre Object Storage Servers (OSS) to provide high speed & parallel access to large single namespace filesystem easily expandable to PetaBytes of storage and 10's or even 100's of GB/sec of storage bandwidth (Oak Ridge National Labs has achieved over 200 GB/sec on their Sun Lustre system).

    One unique feature the Jülich system is its InfiniBand fabric using Sun and Mellanox QDR switches. Besides the 2000 node Sun Constellation System using Sun Magnum QDR switches, the Jülich QDR fabric also supports a 1000 node Bull cluster using Mellanox QDR switches. While both the Sun and Bull supercomputers are built out of 2-socket Intel Nehalem compute nodes, the physical size and complexity of the systems stands in stark contrast. Using regular 4x IB cables to connect to the Mellanox switches, the Bull cluster, while only half the number of compute nodes, requires more cables than the Sun Constellation System with its 3-in-1 12x cables. In addition, the Sun Constellation System racks require no internal cables to connect the compute nodes to its built-in "QNEM", the world's first in-chassis QDR leaf switch. While most Sun Constellation Systems use the QNEM to build a fully connected "fat tree" IB fabric, the QNEM also supports mesh and 3D Torus IB fabrics, the latter being used at a Sun Constellation System being deployed at Sandia National Labs in the US.

    Bull does a good job of packing 72 of their Nehalem compute nodes into a single rack, but counting their IB racks still requires almost 2x the floorspace of the Sun Constellation System sporting 96 compute nodes in each rack.

    Jülich choose Sun's new water-cooled rear door option for the Sun Constellation System, greatly simplifying the cooling design of their data center. Depending on exact CPU and memory configuration, Sun Constellation System racks can require 30-40 KW of cooling per rack which requires some sort of supplemental cooling. Sun provides both water-cooled and refrigerant gas cooled rear door options for Sun Constellation System racks. This approach has advantages over in-row or top-of-rack based supplemental cooling systems in that no supplemental fans are required, air is moved through the cooling doors using only the blade chassis's build-in fans.The supplemental fans in in-row and top-of-rack systems are often left out of customer's power-usage calculations. Sun's Data Center Efficiency practice can help customers design more efficient data centers, be it an entire new from the ground up data center or retrofitting an existing data center.

    Well, it is time to head off to the grand opening ceremonies, I'll be back afterwards with more of the story.

    Wednesday May 13, 2009

    In between planning for our upcoming HPC Consortium user group meeting prior to next month's ISC09 conference, I've had a busy week meeting with a number of our HPC partners. I started off the week meeting with Cray's CEO Pete Ungaro. Cray is one of our Lustre partners, offering their customers storage solutions based on the Sun Lustre file system. I also met with Sun partner Integrated Media Technologies. IMT is one of our first partners to qualify for our new Sun HPC Elite partner program, bringing their expertise in Lustre, InfiniBand, and GPGPU technology to our customers. Speaking of GPGPU, yesterday I met with Shanker Trivedi Nvidia's new VP of sales for GPGPU and professional graphics technologies. A number of Sun customers are adding GPGPU's to their Sun clusters, most notably the TiTech TSUBAME supercomputer which now includes 170 Nvidia Tesla GPGPUs.

    Want to hear more about Sun's latest plans for Lustre, InfiniBand, and GPGPU technology? June is a wonderful month to visit Germany and the HPC Consortium user group meeting in Hamburg promises to bring you updates in all these areas. There is still time to register at the reduced early bird registration fee. I hope to see many of you there.

    Thursday May 07, 2009

    Probably the software you are already using today! One of the biggest challenges with any new service is to get people to use it. Adoption always precedes monitization. I tried out the Dropbox service the day it was launched as I happened to be sitting in an airport with an hour to kill when I received the email invite. They have a great front-end, at least on my Mac where I've tried it, but to be honest I don't use it much anymore because I've reached the 2 GB limit of my free account. By contrast, I've been using the beta version of OpenOffice "save to cloud" since the day Sun launched its internal testing, and I save virtually all of my documents to the cloud these days by default. Other than 1 or 2 documents I might need to edit on a flight, any other documents I need to access offline tend to be cached in my email and are accessible offline in that manner. I use Google Docs too for a few personal files that I want to share with family members, but not for mainstream use.

    So at least in my trivial example, the killer app for cloud computing is the one I already use - OpenOffice. I don't have to take any extra steps to drag files between folders, I just use save to cloud and open from cloud instead of save and open.

    One could easily expand this notion to almost any enterprise software. Do you want to buy special cloud backup software for your database or do you just want a "backup to cloud" button for the database you already use?

    I'm not saying there isn't room for innovation in apps like Dropbox, and ultimately the "front end" of clouds and the "back end" are not necessarily linked. That is why Sun is is promoting a set of Open Cloud APIs so that in the future a company like Dropbox can decide to focus on innovating on the front end/client and simply use an existing cloud back end, without getting locked into that back end.

    Saturday May 02, 2009

    I had not originally planned to pack in LAX-SFO-FRA-PNQ-BOM-SIN-CDG-ORD-LAX in one week, but thanks to modern air travel I was able to make all of last week's important customer events. Everyone I talked to was quite excited about Sun's new HPC specific blade products which along with our new QDR (quad data rate - 40 Gb/sec) InfiniBand and Lustre powered open storage products are bringing great new levels of performance to the Sun Constellation System. We have had our first customer Linpack runs on both 3D Torus and fat tree IB configs, and while we are not quite ready to publicly discuss results, they are pretty amazing. It should make for some interesting Top500 announcements at ISC this June.

    Well, since I can't talk about our latest Linpack results, I though I would share my world airport tips.

    SFO is simply the best for international connections, now that the walkway from Terminal 3 (United) to the International terminal is complete. Now if only the United flight attendants would update their script to not confuse people with, "please take the shuttle to the international terminal". Due to a late departure from LAX, I had only 30 minutes to make my first connection, luckily my plane pulled up to SFO gate 75, less than a five minute walk to my international gate.

    United to Lufthansa connections can require quite a bit longer walk in Frankfurt, and for some reason the airlines are determined to make it just plain hard to get to India. Well, I guess it might have something to do with geography too. Six hours in Frankfurt was plenty of time to sample both the United Red Carpet Club lounge as well as the Lufthansa lounge. But six hours is a long time to spend in an airport, no matter how many lounges you visit.

    Pune, India, is an interesting city, especially when you arrive at 3:30 am, one of the few times the streets are uncrowded. With 600 new car registrations every day, and no new roads, well, you get the picture. Tata Motors is based in Pune as I am sure will be thousands of Tata's new Nanos before long. Officially the Nano is a 4 passenger vehicle, but given that I've seen more passengers on an Indian motercycle, I expect the occasional Nano will be found with 5 or more passengers.

    Mumbai (Bombay) was destined to be another six hour layover on my way out of the country. Privatization is greatly improving service at India's airports. Sometimes too much. The Jet Airways staff seemed so proud to provide a modern "kneeling" airport bus to transport us no more than 20 yards from our plane's parking spot to the terminal. Then came the dreaded domestic to international terminal transfer bus. Just a few years ago, said "bus" resembled a pre WWII relic of engineering. While today its a modern bus with air conditioning, it was still nearly an hour wait followed by a slow 45 minute crawl across the airfield including what seemed like a 10 minute standoff with an Airbus A320 before the driver finally went around what appeared to be an illegal shortcut.

    No surprise given its history near the center of the SARS and Avian (H5N1) flu, Singapore already had their thermal imaging monitors out scanning all incoming travelers for telltale signs of the new H1N1 flu. While definitely my shortest visit to Singapore at less than 24 hours, the Sun Singapore team was as efficient as ever, having organized a great HPC Symposium for about fifty customers from across Asia South.

    Singapore Airlines deserves special mention for making my 11 hour flight to Paris as restful as an 11 hour flight can be. I stepped off the plane at 6:30 am and about half a dozen customer meetings later stepped into my hotel room at about 9 pm and collapsed. Luckily I had a late start the next morning and felt recovered after my first full night of sleep of the week. I sincerely thank Europe's PRACE Project for extending their vendor briefings an additional day to meet with me.

    Having enjoyed favorable tail winds most of the week, it was now time for payback and my flight back to Chicago, at over 9 hours, was considerably late. At the risk of spreading one of the best kept secrets of international travel, the US Global Entry program got me past a huge line of several hundred travelers and through immigration and customs, and despite United's txt msg that I was rebooked I had glimmers of making my final flight home. Alas, the train from Chicago's terminal 5 to terminal 2 was even more crowded, as was the security line, and I missed my connection. Luckily, United had already rebooked me on a flight only 90 minutes later, and I was soon home.

    Next week, I'm staying in California. Maybe even for two weeks :)

    This blog copyright 2009 by marchamilton