Wednesday November 22, 2006 | The Navel of Narcissus Josh Simons' Coordinates in the Blogosphere |
|
Sun Not Selected for HPCS Phase III: My Thoughts DARPA announced yesterday [PDF] that IBM and Cray have been selected to proceed to Phase III of their High Productivitiy Computing Systems (HPCS) program. Sun will not receive funding for Phase III. While this is understandably a big disappointment to the many Sun people who have worked on Phase I and Phase II of the program, in my own view this is a net positive for Sun. And I don't think that's just sour grapes. Sun did a lot of very creative, future-looking work as part of our Phase II work. We've learned a great deal about productivity for HPC, and have generated some very interesting and relevant technologies. I don't think we are going to throw all of that away simply because we didn't win funding for Phase III. Instead, we are now free to pick and choose from our new toolbox and incorporate the most appropriate technologies at our own pace into our future products. I believe that execution flexibility will yield a better result for Sun and for our HPC customers in the long term. Does this lack of HPCS funding mean Sun is no longer committed to HPC? No. Does it mean Sun will not be playing in the high-end, at PetaFLOPs scale? No. In fact, systems of this size built from more commodity-oriented approaches will likely be deployed earlier than the HPCS program's milestones. And I expect Sun will play a leading role in this space. (2006-11-22 08:17:29.0) Permalink Comments [4] Toyota Service I'm in the midst of a Toyota service experience, and so far, so good. I took my RAV4 down to the local Jiffy Lube yesterday for an oil change and discovered two things. First, that the hood would not unlatch, and second, that the vehicle was leaking some sort of fluid at a noticeable rate. Noticeable, that is, when the helpful Jiffy Lube employee pointed it out for which I'm very thankful. Jiffy Lube sent me on my way since they could do nothing for me. Luckily, I was on the Automile--a stretch of Rt. 1 south of Boston famous for having a car dealership for pretty much every make of car on the market. The Toyota dealership was just a mile or so down the road (the Automile these days stretches for considerably more than a mile.) I had to wait about two hours for a diagnosis, which isn't great, but they were busy and I dropped in unannounced, so I'm certainly not complaining. They got the hood latch unjammed without a problem. The fluid was transmission fluid. There's a leak in the pan and the hoses are all rotted out. Yesterday, they were inituially hopeful they could get the parts for today, but found that one of the parts wasn't even in the country and would have to be ordered. They told me they'd have a better idea on Monday how long it would take to get the repair done. This morning, I received eight email messages from Toyota, each titled SPECIAL ORDER PART ARRIVED. So far, they've received:
The remaining part is on back-order. While it's frustrating to not have the vehicle, I do appreciate the email notifications as a way of keeping me informed about forward progress. (2006-11-22 07:33:00.0) Permalink Comments [3] Danvers Chemical Explosion The explosion early this morning in Danvers was at a chemical plant owned by CAI that makes inks, not a propane facility as was initially reported on the national news. Thankfully, there have been only injuries reported and no deaths despite the fact that over 30 buildings were destroyed or damaged. A woman called into one of the local news shows to describe her experiences. She was in Salem, an adjacent town, babysitting her nephews. She said she was awoken just before 3am by the biggest noise she'd ever heard. Her bed was shaking and she thought initially it was an earthquake. She said she was very frightened and couldn't get through to the police to find out what had happened. Outside she and a neighbor saw a large, mushroom-shaped cloud of very black smoke rising into the air. She said she wondered if they had been bombed. They got in the car (she didn't mention whether she had the nephews with her or not)...and then the story went in a direction I wasn't expecting at all. The next thing I hear is that she got to the site so quickly that she was able to get a really good view of what was going on. WHAT? Hello? You are woken up by an enormous explosion, you see a mushroom cloud of smoke shooting into the air, you are responsible for watching over several children---AND YOU DRIVE TO GROUND ZERO? (2006-11-22 07:14:57.0) Permalink Comments [0] New Logan Runway Opens on Thanksgiving Day After some thirty years of controversy, a new runway is scheduled to open at Boston's Logan airport this Thursday. Runway 14/32 is 5000 feet long and is suitable for small jets and props, which should allow the FAA to free capacity on other runways by shunting smaller planes to this new runway. When it's operational, that is. Under agreements to limit noise for South Shore residents, the runway will only be used when the wind blows out of the northwest and only when it exceeds 10 knots. According to the FAA, current capacity reductions due to wind from the northwest occur frequently enough at Logan that this new runway should deliver material improvements to throughput. The above is based, for the most part, on this article. (2006-11-20 05:01:41.0) Permalink Comments [0] Marmottan Museum (Paris) While packing to fly home tomorrow, I just found an old brochure in my bag left over from a vacation my wife and I took to Paris with my parents a few years ago. It's a brochure for the Marmottan Museum, one my favorite places from that trip. The museum is known for three reasons: its collection of Empire furniture, art, etc., its collection of medieval sculpture, and, finally, its Monet collection. The museum has 150 Monets, donated by the artist's son. I saw the Monet show several years ago at the Boston Museum of Fine Arts. I enjoyed it, but it did not affect me the way the Marmottan did. The main lower level of the museum has a floorplan in the shape of a keyhole: a long rectangular space that ends in a circular area. As I walked down the rectanglar space, the curved wall at the end began to come into view. As I stepped forward to view the entire circular space, I was stunned by the beauty of the display. In front of me was a wide, sweeping view of a series of the Giverny Nymphéas with such rich colors and patterns that it took my breath away. All I could do was sit and look. It was absolutely marvelous. The museum is located at 2 rue Louis Boilly 75016 Paris. Next time you are in Paris, do check it out. You will not be disappointed. (2006-11-17 20:50:58.0) Permalink Comments [1]Tampa Bay Sunset I'm flying home to Providence from Tampa tomorrow morning now that Supercomputing '06 is over. I hope to blog today's two panels at some point--both were excellent. The first explored whether FPGAs might be the basis of the next big thing in HPC. The second was a very thoughtful and thought provoking discussion of the impact of multi-core processors on HPC. In the meantime, a sunset shot to close this trip...
(2006-11-17 20:30:28.0) Permalink Comments [3] Multi-Core Processing for Dummies AMD gave away this booklet at their Supercomputing '06 booth. It's a marketing blurb and not a technical document, but it does cover the basics of the value proposition at a high level. I have a few copies if anyone (at Sun) is interested in taking a look.
(2006-11-17 18:11:53.0) Permalink Comments [1] In Your Face! I mentioned in an earlier post that researchers from the University of Houston gave a talk this past weekend at Sun's HPC Consortium meeting in Florida about remote sensing of a person's physiological and mental state. I didn't realize at the time that these technologies would be demo'ed in the Sun booth at Supercomputing '06. I tried the infrared imaging first. There are three pieces of information that can be derived from analysis of infrared video. First, if your carotid artery (side of neck) can be imaged over time, your pulse rate can be determined. Second, if your nostrils can be imaged, then your breathing rate can be detected. I was surprised how strong and easily detected this signal is: the nostrils turn black when inhaling and red/orange when exhaling--like two lighthouse flashing in the distance. And, third, your stress level can be assessed by monitoring the temperature of the proximal regions of your eye sockets.
Me: Inhaling and apparently a little stressed though it's hard to tell with the cool shades Another demo used two cameras and a flash to capture stereo images of show attendees. The two views are used to compute facial 3D geometry and create a polygon model. Once the face has been modeled, a database is searched to find matches with earlier scans. I was impressed with how fast this ran and with how accurate it was. One thing did seem to confuse it though: reflections on my glasses prevented good imaging near my eyes and usually resulted in a search failure. You can see a little of the reflection effect below.
Polygonal Man (2006-11-17 17:51:43.0) Permalink Comments [0] MPI Engineers Gone Wild Two engineers from Sun's MPI engineering team are here in Tampa at Supercomputing '06, primarily to staff our ClusterTools demo station and talk with customers. In addition, though, Rolf Vandevaart sat on a panel session on Monday at our HPC Aces training meeting which was held after our Sun HPC Consortium meeting at the Westin Innisbrook Golf Resort. ![]() Rolf at the ClusterTools demo station Don Kerr gave a talk in the booth theatre about the history of Sun's involvement with MPI over the last 10 years, including a good description of our current involvement as active, contributing members of the Open MPI open source effort. It's difficult doing talks in a show booth, but Don did a great job. ![]() Don discussing Sun's involvement in open standards and Open MPI Imagine my surprise when I returned to our booth yesterday afternoon and found Don moving up from live presentations to the world of video interviews. He was lit by bright lights, surrounded by a camera and sound crew, and looked to be having a great time as he was taped for an internal Sun piece on Supercomputing. ![]() ![]() ![]() (2006-11-16 06:22:45.0) Permalink Comments [2] MIT Exploring Wireless Power Delivery Wouldn't it be nice if our mobile devices could all be recharged wirelessly? Researchers at MIT are revisiting some of Nikola Tesla's thinking about wireless power. Read about it here. (2006-11-15 15:17:24.0) Permalink Comments [0] Scaling Infiniband Clusters The sobering graphic below was presented this morning by Lloyd Dickman from QLogic at an InfiniBand session here at Supercomputing '06 in Tampa. ![]() The link-level bit error rate (BER) allowed by the InfiniBand standard results in some alarming overall error rates when the number of links is scaled to the level required to build large clusters using current approaches. Clearly there is value in any efforts to reduce the BER or to reduce the number of links required to build large clusters. Of course, software must ultimately be resilient to these failures, but keeping the underlying hardware error rate at manageable levels will allow applications to make forward progress at reasonable and useful rates. (2006-11-15 10:08:23.0) Permalink Comments [2] An Observation on Sun Branding Supercomputing '06 here in Tampa includes a very large vendor exhibition with hundreds of companies, universities, research laboratories, etc. participating. As I finished a tour of the show floor yesterday and was walking towards the Sun booth, I was struck by how different it was from any other booth at the show.
It wasn't the content: We have as much flashy gear and as many demo stations as any other large vendor in our 50' x 50' booth. It wasn't the activity level: I think we have a rather high number of visitors given our excellent location near the main entrance to the hall, and given also the number of interesting new products we're displaying (e.g., Thumper, Sun Fire 8000.) But that wasn't what I'd noticed.
It took me a few minutes to realize it was our branding program I was reacting to. In that mass of hard-edged, high-tech focused graphics, our earth-toned imaging with naturalistic themes was a little island of warmth in a big, cold sea of uninspiring and unremarkable messaging. Green trees, a red surfboard, children. It made the booth a warmer and more welcoming place to visit.
(2006-11-15 09:45:58.0) Permalink Comments [0] Sun HPC Consortium: Day III Customer Talks In addition to numerous Sun technology and product talks, we had one final customer talk today here at the Sun HPC Consortium in Tampa. Sensor Driven HPC for Medical Applications
George Zouridakis (left) and Marc Garbey (right) from the Department of Computer Science at the University of Houston spoke about the use of sensor technology for non-invasive and remote sensing of biomedical data. They ask a basic question: Can an individual's internal state be assessed via remote sensing or other non-invasive sensing technology? Preliminary results indicate this is possible through a multi-modal approach. Researchers at the University are exploring several areas, among them the use of infrared to monitor human physiology from a distance--for example, heart rate. Others are looking at multimodal facial recognition and geometric modeling to extract facial feature information. By merging data from MRI, optical and other imaging techniques and routing this sensor data to their large compute cluster, University researchers are able to do near real-time analysis with the ultimate aim of understanding brain function, detecting cognitive impairment, or assessing the state of mind of subjects from a distance. I would imagine that such remote sensing would be very interesting to TSA and other agencies for obvious reasons.Some interesting tidbits. Heart rate is assessed by motion tracking a moving subject's carotid artery. Breathing rate can be assessed by monitoring the nasal area. And capturing certain data in the area of the eyes can be used to assess stress level. (2006-11-13 09:18:47.0) Permalink Comments [0] Sun HPC Consortium: Day II Customer Talks
As we started Day II of the Sun HPC Consortium in Tampa, I was impressed to see 80 or so customers and partners in their seats at 8:30 on a Sunday morning. We had three customer talks: A tour of HPC at Mississippi State, a discussion of Thumper and ZFS as a high-capacity data store for particle physics research, and a status report on TSUBAME, currently the 7th largest supercomputer in the world. We also learned about the surprising perils of bubble formation in nuclear reactor cooling systems. HPC at Mississippi StateTrey Breckenridge, HPC Research and Operations Administrator, and Roger Smith, Senior Computer Specialist, gave a tag-team talk about the science and engineering foci at Mississippi State. They also gave a brief history of HPC systems at MSU, including their latest, big HPC cluster from Sun. We also heard about how staff from the site helped save lives in the aftermath of Katrina. The HPC Collaboratory (HPC^2) at Mississippi State includes five centers:
Roger Smith then gave us an overview of Mississippi State's long involvement with HPC and their long relationship with Sun and Sun hardware. They built their first cluster, MADEM, in 1987 from Sun gear. Their second cluster, called SuperMSPARC, was built in 1993 and included Ethernet, Myrinet, and ATM interconnects between eight SPARCstation 10s with a total of 32 processors. I didn't realize that the original Myrinet drivers for SunOS/Solaris were done at MSU and that this cluster predated the first Beowulf cluster by a year. Cool. MSU's latest system, Raptor, is a cluster with 512 Sun X2200 M2 diskless nodes, each with 8GB of memory. They are using GbE between nodes and 10 GbE pipes between the 16 racks that comprise the system. Their 2048 Opteron cores deliver a peak performance of about 10 TFLOPs. As Roger put it, a human doing one computation per second by hand would take about 338,000 years to do the work this system can do in one second. Sun has a program called Customer Ready Systems (CRS) that can pre-build and pre-configure systems for customers prior to shipment. I've been aware of the program for a long time, but never heard a customer talk about the experience until today. It was impressive. The first eleven of their racks arrived by truck at 5pm on a Monday. The final five arrived that Wednesday at 5pm. By 7pm that same day, the entire system was assembled, booted, and running. As Roger said, the CRS approach hugely reduced the effort needed by MSU staff since so much work had been done previously back at Sun. And it also drastically reduced the amount of trash left on-site, which can be a very significant issue for large systems like this. The presentation closed with a short time-lapse movie documenting the installation procedure. It looked a lot different than the typical build-a-cluster-from-scratch which typically shows each system cabinet being populated incrementally, server by server. Contrast that with the MSU installation in which full racks appear in quick succession on the floor. I nominate the MSU movie for "shortest movie in this genre" award. And that's a good thing. Thumper for the Teeny
Martin Gasthuber from Deutsches Elektronen-Synchrotron (DESY) in Hamburg spoke about the enormous data processing and storage requirements of the tiny world of particle physics. As he said, they are "hunting the smallest, using the biggest." He estimates they need about 100K of today's fastest CPUs and they generate about 15 PB of new data per year currently (moving to exabytes.) dCache is a key component of their multi-tiered, grid-based approach. It is designed to be used as a building block to create very large, module storage systems that deliver both high bandwidth and large capacity. The system has few dependencies: it requires only a JVM, a local file system, and one or more GbE connections. Most of the components are written in Java and testing has shown there is no real I/O penalty with this approach--they get excellent performance. Having now built four generations of storage boxes, the dCache team has learned several important lessons:
According to Gasthuber, Thumper and ZFS fit their requirements perfectly and they've already validated it addresses most of their issues. Performance is already higher than expected and they are looking forward to moving to 10 GbE over time. They will soon have about 160TB of thumper space online. See http://www.dcache.org for details. TSUBAME UpdateThe people's Supercomputer at Tokyo Institute of Technology
Professor Satoshi Matsuoka presented an update on Tokyo Tech's TSUBAME supercomputer, currently the 7th largest computer in the world, and the largest supercomputer in the world based on Sun hardware. The system, which comprises 76 racks of compute, storage, and networking infrastructure, sits in approximately 350 square meters of floor space. The equipment weighs about 60 tons. The system has 648 Sun X4600 nodes, each with 16 Opteron cores and 32/64 GB memory. The interconnect is Voltaire Infiniband and storage capacity is about 1.1 PB, using Thumpers. There are also 360 Clearspeed accelerator cards installed in the system. Cooling and power were perhaps the largest challenges in deploying TSUBAME. It took over a year for Tokyo Tech, Sun, and NEC to work out a solution. Given the space available, the installation required a power density of 700 watts per square foot, well above the current datacenter state of the art, which can handle only 500 watts per square foot. The solution includes some interesting aspects. For example, the under-floor space is used for cabling, but not airflow. All cooling is handled through large ceiling vents with a low (3m) ceiling. Airflow is very fast and made faster through the use of narrow aisles. Hairdos do not survive for long in this machine room. Matsuoka-san commented that the choice of fat nodes was important in that it allows for maximum parallel programming flexibility and reduces node count, which increases both manageability and availability. The system is designed for both capability (very large jobs) and capacity (lots of smaller jobs.) Since its installation this Spring, the system has had an availability of over 99%. There have been frequent faults as you'd expect with a system of this size, but with local effects only: Any affected jobs are automatically restarted by Sun Grid Engine. Most of the issues have been software problems that were fixed with either reboots or patches. There have been very few hardware problems. Matsuoka-san ended his talk with a short survey of the science being done with TSUBAME. Areas include simulation of an Earth magnetosphere inversion (ported from the Earth Simulator), high resolution typhoon simulation, protein folding, and TNT explosions. Most interesting to me, however, was the bubble simulation work being done on TSUBAME. Due to the high temperatures involved in nuclear reactor cooling, bubbles naturally form in cooling pipes as water is vaporized. Vapor, however, has a reduced heat conductance which means that if bubbles somehow adhere to the walls of cooling pipes, there is a real danger that the pipes may melt, creating a serious safety problem. Summaries of Day III talks are here. (2006-11-13 08:33:54.0) Permalink Comments [0] Sun HPC Consortium: Day I Customer Talks
The Sun HPC Consortium meeting in Tampa started at 8am Saturday morning with registration and breakfast. We had a full day of talks with an excellent selection of speakers, including three customers who took us from NASCAR in South Carolina, to the pleasures and pains of building a huge new datacenter at USC, to the world of Canadian secure grid portals. Clemson University's Computational Center for Mobility Systems
Dr. James Leylek, of the Clemson University International Center for Automotive Research, gave a talk titled, Clemson University's Computational Center for Mobility Systems, which highlighted Clemson's ongoing and future involvement in the Southeast's regional automotive ecosystem--the largest in the U.S. The CU-CCMS is intended to be a technology anchor for the Clemson University International Center for Automotive Research campus in Greenville, South Carolina. Dr. Leylek presented an overview of the approximately $16M worth of computational infrastructure, which includes a 200-node Infiniband cluster using Sun v40z, dual-core Opterons (4 processor, 32GB), and a 72-processor Sun Fire 25K with 680GB of memory. The second half of the talk focused on advances in CFD at Clemson, with a focus on aerodynamics. His motivating example involved analysis to optimize a NASCAR body shape for both short-track races that require lots of aerodynamic down-force, and long track races that require top high-end speeds. Boundary layer control and the ability to predict laminar to turbulent transitions is a key requirement and a difficult problem. Dr. Leylek presented some selected result illustrating significant advances in simulation fidelity for this difficult problem, including the simulation of multiple effects simultaneously, something not doable with current commercial packages. Power and Cooling at USC
James Pepin from USC was our second customer presenter. He gave a fairly frightening talk about the construction of USC's new datacenter, which required a $30M investment and included the installation of some truly impressive pieces of gear. This new facility is designed to support variety of computational requirements including HPC (physics, chemistry, natural language processing, etc.), library technology, and some non-HPC, but critical IT infrastructure. Their current HPC computing capability includes a 5384 processor cluster with a peak performance of about 13.8 TFLOPs of peak performance. I was interested to hear that they routinely run 512 processor jobs at their site. USC has been wrestling with significant physical infrastructure problems in their current datacenter: power, air conditioning, airflow hot spots, how to handle a/c failures, wiring density, power cabling, and blocked cooling due to massive cabling infrastructure. They've configured their new space with both HPC (high-density) and non-HPC areas. The datacenter space is about 8000 square feet and the HPC space is about 5000 square feet. The HPC space is configured to handle 13-15 kilowatts per rack, while the "normal" or non-HPC space is configured for 3-4 kilowatts per rack. There is, however, some built-in expansion capability in the design as well in recognition of increasing power and cooling requirements over time. The building is designed to handle eight megawatts. Currently installed equipment can cool 2.5 megawatts. They have four 350-ton chillers. Three two megawatt generators, and 5.5 megawatts of UPS. This is some very heavy duty infrastructure! Secure Grid Computing
Ken Edgecombe, Executive Director of the High Performance Computing Virtual Laboratory (HPCVL), delivered our last customer talk of the day. Before discussing their secure grid portal, Dr. Edgecombe gave a quick sketch of the center's computing resources. These include three Sun Fire 15K servers each with 72 CPUs and 288GB of memory, and seven Sun Fire 25Ks each with 72 dual-core UltraSPARC IV+ processors and 576GB of memory. HPCVL in addition has about 160TB of disk storage capacity and 480 TB of tape storage. Some serious hardware. The Secure Grid Portal is currently hosted on two Sun Fire T2000 systems with UltraSPARC T1 processors--a nice example of the fact that large HPC centers have significant non-floating point workloads like any other large IT customer. HPCVL is a Certificate Authority and uses this capability to issue digital certificates to manage identities for the HPCVL Secure Grid Portal. The portal itself was designed with several requirements in mind:
The Portal is used by researchers across Canada and by some from outside as well. It is based on Sun technology, including the new Tarantella-based technology recently made available by Sun. It's a web-based portal with real-time access to computational capabilities back at HPCVL. It includes a capability that allows users to allow expert support personnel to share a user's session and interactively work with them to debug user problems. Dr. Edgecombe finished his presentation with a live demo of the Portal. He showed several simulations running in real-time in Canada, being displayed locally with reasonably smooth graphics in our conference room in Tampa. He also edited an OpenOffice document in a Portal window. It was pretty slick. We had several interesting talks by Sun personnel as well. I may post some brief highlights of those talks at some point. Summaries of Day II talks are here. [Thanks to Tony Warner for supplying several photos for this entry.] (2006-11-12 14:25:22.0) Permalink Comments [0] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||