Tuesday July 31, 2007 | The Navel of Narcissus Josh Simons' Coordinates in the Blogosphere |
|
There's Something About Mary I had one of those "There's Something About Mary" moments at the coffee bar in Sun's Menlo Park cafeteria this morning. I ordered a large chai, which was delivered with a wonderful head of foam on top. As I popped a top onto the cup, a dollop of foam went shooting straight up out of the drinking hole at high velocity. And never came down. At least I couldn't find it... (2007-07-31 15:24:54.0) Permalink Comments [0]HazMat: A Driving Game for Adults I remember my parents keeping us kids occupied on long vacation drives with car-related games. Our favorite was spotting licenses plates from as many different US states and Canadian provinces as we could find. If you commute to work, why not pass the time and learn a little about the many chemicals that underpin our society by collecting hazardous materials codes from nearby trucks? As an added plus, you will join me in being horrified at how some automobile drivers seem totally oblivious to the ramifications of playing chicken with these big rigs. To play HazMat, you'll need to keep track of the codes you see on truck placards. The placards look like this:
In this example, the "3" represents the hazard class of the material being transported. There are nine such categories. They are:
You will also need a copy of the US government's hazardous materials table to decipher the four-digit codes. For example, the "1203" on the sample placard means gasoline or gasohol. I found a PDF copy of the table here. Here are a few of the more unusual placard numbers I've seen commuting on Rt 128 in the Boston area.
So, give HazMat a try. Have fun, collect them all...and be careful out there. (2007-07-31 15:12:57.0) Permalink Comments [1] Have a Compiler, Tool, or HPC question? Operators are standing by! Actually, not operators. Even better, members of the Sun Studio engineering team are standing by to answer your questions on a live chat session every day this week from 9am-11am Pacific time, starting today. Questions about our C, Fortran, or C++ compilers? Questions about performance analysis? High Performance Computing? Solaris Express Developer Edition? Whatever. Everything is fair game so have at it and give them your toughest questions. Go directly to the live chat session here. Read Roman's blog intro here. (2007-07-24 09:20:13.0) Permalink Comments [0] Diamond Hunting in New York I went hunting for Herkimer Diamonds this weekend on my way to and from visiting a friend in western New York. Hermiker Diamonds aren't real diamonds; they are doubly-terminated quartz crystals sought after by mineral collectors. There is something very elemental and fun about sitting in a rock pit smashing open rocks with a hammer to reveal well-formed, free-floating crystals nestled in little vugs within the rock matrix. Here are two photos showing crystals I found in this manner.
I visited both the Herkimer Diamond Mines and the Ace of Diamonds Mine, which are located next door to each other on Rt 28 in Herkimer, NY. Herkimer Diamond Mines is the more upscale of the two businesses, featuring a nice mineral museum and an extensive mineral shop. Ace of Diamonds was more rustic and I think caters to a more serious crowd of "miners". It has a basic general store and offers a wider selection of tools for rent. Regardless of which establishment you choose, you will eventually find yourself either pounding on a pile of rocks or, if you have more time and are more serious, trying to hammer, chisel and pry your way to discovering a mineral pocket in the ledge visible at both sites. Bring your own tools if you like, and definitely bring safety glasses. (2007-07-19 19:13:49.0) Permalink Comments [3] Linux on SPARC: Dave Miller Dissects Logical Domains (LDOMS) Dave Miller (yes, that Dave Miller) is working to bring up Linux as a guest operating system on Sun's new SPARC virtualization technology, Logical Domains (LDOMS). He has posted a short technical walkthrough of LDOMS on his blog. (2007-07-16 07:13:07.0) Permalink Comments [0]Brian Eno's 77 Million Paintings I attended the North American premiere of Brian Eno's 77 Million Paintings last night. This was the third night of the premiere, which was held as a private event for Long Now members. It was held simultaneously in San Francisco and Second Life. I attended the SL event, my first virtual party. ![]() Entrance to the venue. ![]() Eno dropped in for a short while to say hello. ![]() And strolled in to visit the installment itself in the next room. blueair.tv did a fantastic job creating the SL venue. The installment was wonderfully done with subdued lighting, rich wooden flooring, and an excellent Eno mix playing in the background. Bravo! (2007-07-02 19:02:49.0) Permalink Comments [0] HPC Consortium: Summary Blog Entry with Pointers I've completed my series of blog entries about Sun's HPC Consortium meeting in Dresden last week. All customer talks and a selection of Sun talks were included. I wasn't able to cover all of the Sun talks or any of the partner talks at the event due to time constraints. You might be amazed at how long it took to create the entries referenced below. Here are pointers to the blog entries about customer talks:
And here are pointers to blog entries covering a selection of the Sun talks:
And here are a few entries with details of last week's announcement of the Sun Constellation System: The next HPC Consortium meeting will be held in Reno in November just prior to SC07. (2007-07-02 13:03:45.0) Permalink Comments [0] HPC Consortium: Big Science Means Big Compute and Big Data at CERN Our final two customer talks at the Sun HPC Consortium meeting in Dresden last week both focused on aspects of CERN's Large Hadron Collider (LHC) Project. LHC is Big Science answering Big Questions. Helge Meinhard of CERN IT spoke first, giving an overview of CERN and LHC followed by a discussion of the IT infrastructure and requirements underlying the science of LHC. Martin Gasthuber from DESY then spoke further about storage and compute-related issues for LHC. ![]() The CERN accelerator ring is 27km in circumference and at a depth of 50-150 meters CERN was founded in 1954 as the European laboratory for particle physics. CERN has 20 participating member countries, 3000 staff members, and about 6500 visiting scientists (from 500 institutions and 80 countries.) The visiting scientists constitute the user base at CERN.
The Large Hadron Collider is a proton-against-proton accelerator capable of 14 TeV collision energies. This is by far the world's most powerful accelerator with 2nd place held by the 2 TeV accelerator at Fermilab. The LHC tunnel has four experiments positioned around its circumference, each represented by a mass of human-dwarfing gear positioned in the accelerator's beam. The accelerator can fire 300 bunches of 100 billion protons each with the same number fired in the opposite direction, which will cause up to 40 million collisions at each of the four interaction sites. The entire accelerator is lined with superconducting magnets at two degrees Kelvin to keep the heavy protons moving in the correct track. The four experiments are called ATLAS, CMS, ALICE, and LHCb. The first beams are expected in 2008. The computational requirements at CERN, while large, are also embarrassingly parallel (or pleasingly parallel--no need to be embarrassed), meaning the data can be processed independently and in parallel, obviating the need for complex problem decompositions or for high-speed, low-latency interconnects. In addition, the problems are very integer intensive with little or no floating point requirements. As compute and storage requirements continue to grow, power and cooling have become huge issues as well as CERN now predicts their 2.5 MW datacenter will run out of power in 2009-2010. [I suggested to Helge at lunch before his talk that Sun's Niagara (N1, N2) processor may well be ideally suited for this massive throughput problem, a thought that was echoed by a customer during Helge's presentation to the Consortium.] The 40 million collisions per second at each of the four experiment sites will generate a huge data volume. This will be filtered and reduced within the collector itself down to a few hundred megabytes per second or about 15 Petabytes per year for four experiments. Each event corresponds to a few megabytes of filtered data and it is these events that can be processed in parallel at CERN and its partner sites.CERN has calculated it will need about 142 Mega SPECint2000's worth of processing, 57 Petabytes of disk storage, and 43 Petabytes of tape to process and store the data from these experiments. They equate this to on the order of 30K CPUS and 100000 disks. Because this is too much for CERN to handle alone, a multi-tiered consortium of organizations has been established to distribute the processing and analysis of LHC data around the world. Data will flow from CERN to a set of Tier1 sites that will perform initial processing and long-term data curation while also distributing the data to a large set of Tier 2 sites for final processing and analysis. The intent is essentially to make this entire distributed infrastructure look like one huge compute facility in spite of the fact that these Tier1 and Tier2 centers are autonomous, cooperating organizations rather than parts of a single, large entity. Thus, while there are many commonalities across the sites, there is no mandated standard hardware or software. It is true, however, that they have settled on commodity processors and all use Scientific Linux, a lightly-modified and recompiled version of Red Hat Linux. With respect to particular requirements in the infrastructure, ethernet is adequate for the workload. Servers are stripped-down HPC boxes with commodity processors and 2 Gbytes of memory per core. Processors are chosen for their ability to score well on SPECint2000.
Martin Gasthuber from DESY then spoke about computing and storage at the LHC Tier1 and Tier2 sites. He also gave some brief background on DESY, which is the largest national HEP (high energy physics) lab in Europe. Its accerator, HERA, is due to be replaced by two new ones which will be used to concentrate on proton physics. HERA was scheduled to be turned off last week in preparation for construction on the new units. DESY is a Tier2 site for two LHC experiments: ATLAS and CMS. Martin explained that Tier1 sites concentrate on reconstruction of data sent back from Tier2 sites which is a CPU-bound activity with sequential data access patterns, while Tier2 sites perform simulation work which requires CPU cycles, sequential writes with generally low bandwidth needs. User analysis is also performed at Tier2 sites, which involves chaotic data access and IO-bound jobs. There are large differences in capabilities between large and small sites in the LHC processing hierarchy. Some have much experience running experiments on-site and with the technology generally and others do not. In addition, the number of people resources available per site can vary widely. As these experiments will run for several years, sites will be upgrading their computational infrastructure over time. They will not, however, upgrade their entire site with each procurement. Instead, sites will continuously grow their resources to meet the demands of experiments and will recycle the oldest components of their infrastructure. Thus, it is expected that a site may flip 10-30% of its compute resources at a time rather than upgrade all resources simultaneously. For the LHC experiments, there has been some degree of basic standardization of the compute infrastructure, as indicated by Helge Meinhard in his talk. In particular, x86, Gigabit Ethernet, TCP/IP, and Linux have all become part of the standard approach to processing LHC data. Currently, most computing is done with 1U dual-core systems, though quad-core systems are starting to move in. Blades are still very much in the minority according to Martin and he isn't sure why. While the compute infrastructure has settled into what is viewed as a reasonably optimal place, it is harder to do the same for disk systems. Disk storage is more complex, more prone to surprises. And there are more consequences of disk failures. Commodity components are important, but operational costs become more important over time. ZFS and Solaris 10 has been looked at as a way of providing a stable lower-level of storage infrastructure on which higher-level storage objects are layered (e.g. dCache.) Simultaneously, more LCH sites are beginning to use Sun's Thumper (Sun Fire x4500) ultra-dense disk storage systems. With respect to file systems, a few centers use GPFS, but it is rare. To date, Lustre is not deployed by a HEP site, though there is interest at some Tier2 sites to support user data analysis. Having conducted testing and analysis of ZFS, it is felt that the combination of ZFS and Solaris solves the critical data integrity issues that have been seen with other approaches. They feel the problem has been solved completely with the use of this technology. There is currently about one Petabyte of Thumper storage deployed across Tier1 and Tier2 sites. That number is expected to rise to approximately four Petabytes by the end of this summer. (2007-07-02 12:12:14.0) Permalink Comments [0] HPC Consortium: Sun HPC at Clemson University (Why Big SMPs Matter)
James Leylek, Director of the Computational Center for Mobility Systems at Clemson University (CU-CCMS), spoke at Sun's HPC Consortium meeting in Dresden this past week. He presented a brief overview of the Center and its mission and gave a status update on the Center's computational infrastructure, including an explanation of why CU-CCMS believes strongly in both large SMPs and small-node clusters for HPC. Since his last update in November, the Center's computational infrastructure has now been put in place. It includes a Sun Fire E25K with 72 UltraSPARC IV+ processors and 680 Gbytes of memory; two Sun Fire E6900 systems, each with 24 UltraSPARC IV+ processors and 384 Gbytes of memory; 1600 cores worth of Sun Fire V20Z systems connected with Voltaire Infiniband; and a variety of workstations. All of the Big Iron is running Solaris 10, while the V20z cluster runs SUSE Linux. The infrastructure has a peak performance rating of about 11 TFLOPs As Dr. Leylek explained, the mission of the Center is to provide a balanced computational approach to satisfy a diverse set of requirements from the ten major technical groups (e.g., fluid dynamics, acoustics, mechanics, vehicle design, human modelling, etc.) served by the Center. Also, because CU-CCMS is not a research organization and must deliver results on time and within budget, they have a focus on supplying stable, reliable infrastructure for their customers. The wide range of systems at CU-CCMS reflects an understanding that one size does not fit all for HPC applications: not everything parallelizes onto clusters. As an example, adaptive multi-grid computations are considered to be memory monsters that benefit from the immense capabilities found within the single Solaris image of an E25K or E6900. At the Center, they view billion-element finite element simulations as a starting point for full vehicle simulations. They are dealing with big problems. As the Center prepared to bring its computational capabilities on line, what scared the CU-CCMS staff the most was the actual act of setting up and deploying this infrastructure. With Clemson, Sun, CISCO, and Voltaire all responsible for key aspects of the infrastructure, they were worried that coordinating all of these efforts successfully was going to be an absolute nightmare. In response to this, Sun assigned a program manager to run the entire integration process. As Dr. Leylek said, the Sun program manager put together the most detailed integration plan he had ever seen in his life. In addition, as work progressed, all status was reported on a site accessible to all participating parties, which aided in maintaining coordination and promoting problem solving throughout the process. In the end, Dr. Leylek said that what they had feared would be a nightmare turned out to be as seamless and painlessly smooth as it could have been. (2007-07-02 08:34:02.0) Permalink Comments [0] Totally flipped out... .ǝɔɐ1d ɹǝ11np ɐ ǝq p1noʍ p1ɹoʍ ǝɥʇ uǝɥʇ 'sıɥʇ ǝʞı1 sbuıɥʇ op ʇ,upıp ʎǝɥʇ ɟı 'ǝsɹnoɔ ɟo .sʎɐp ǝsǝɥʇ uo ǝɯıʇ ɹıǝɥʇ puǝds ǝ1doǝd ʇɐɥʍ ǝɯ oʇ buızɐɯɐ s,ʇı (2007-07-02 07:22:54.0) Permalink Comments [1] HPC Consortium: Sun Visualization System at the LCN Andrew Gormanly of the London Centre for Nanontechnology gave a talk this week at Sun's HPC Consortium meeting in Dresden. He spoke about LCN's experiences with Sun's visualization products. The LCN is a collaboration between University College London and Imperial College London, focusing on nanoCAD and nano-fabrication, among other areas, and exploring both biological and non-biological fabrication techniques for building nano-scale structures. The Center's visualization requirements include the ability to visualize very large data sets and to allow intuitive access to these capabilities for both business decision makers and lab researchers. Workstations were considered and rejected because they cannot offer the computing power needed to handle the large data sets used at the Center. LCN has been working in collaboration with Sun to create a visualization system based on Sun's Scalable Visualization Software. Their system uses a large, active stereo screen, a Sun Fire x4600 as the application engine, a Sun Fire x4500 ("Thumper") node for storage, and two headless Sun Ultra 40 systems as rendering engines. Voltaire InfiniBand is used as the system interconnect. From a user's perspective, one logs into the x4600 and runs an application which then locally generates the 3D stereo renderings they see on the large display. In fact, what is actually happening is a bit more complicated, though it does not interfere with the intuitive user experience. The application is indeed running on the X4600, taking advantage of the available large memory and processing power of this SMP node. However, the graphical rendering is actually performed on the two Ultra 40 machines which each then send pixel streams to the display with each Ultra 40 responsible for computing the image on one half of the display surface. This feat is accomplished through use of Sun's Scalable Visualization Software which transparently interposes itself on the application's OpenGL calls and routes the graphics requests to the Ultra 40s over the InfiniBand link. While LCN currently uses two graphics workstations there is no reason in principle that they could not extend the approach to use four workstations, each responsible for one quarter of the display screen. As the collaboration with Sun continues, LCN is interested in exploring the use of Sun Grid Engine to allow compute nodes beyond their single x4600 to be used as part of this system. In addition, they are interested in exporting graphics output to desktops and Sunrays so as to more broadly share the visualization system's capabilities within LCN. To do this, they will also use Sun's Shared Visualization Software.
(2007-07-01 07:21:42.0) Permalink Comments [0] HPC Consortium: A Customer View of ZFS Thomas Nau of the University of Ulm gave a customer view of ZFS at Sun's HPC Consortium meeting in Dresden this past week. His talk was titled, "ZFS – How safe do you think your data is without?" Thomas motivated his discussion by asking the audience whether it would be okay to miss the one event in a trillion that would lead to the discovery of a new particle, lose all of the email from your mail server, or lose access to all of one's mp3 or video files, or perhaps worse, not be aware of most of the errors that have occurred at all at one's site. Currently, it is a matter of trust when we store data in a file system. We trust that the disk drives, the controllers, the multiple pieces of firmware, the battery backup, the cabling, adapters, and the operating system etc, all perform well enough to protect our data. And we hope as well that the "human factor" does not cause data loss. But of course there are things that can and will go wrong. Bit rot, phantom writes, DMA errors, driver and firmware bugs, accidental overwrites, misdirected reads and/or writes, etc. In addition, because volume managers and file systems are commonly separate pieces of software, the volume manager does not have knowledge of the importance of particular pieces of data--for example, critical metadata whose loss would lead to the loss of an entire file or file system rather than "just" some data within an individual file. This "all data is equal" view of the file system further increases the vulnerability of stored data. Thomas then went on to share a case study involving data corruption at the University of Ulm. It was one of those nightmare scenarios involving the loss of email services for the entire university. It wasn't as if they hadn't thought about data protection at Ulm. Their mail service is supplied by a two-node cluster and two disk arrays fully connected through a SAN with offsite mirrors, regular backups, etc. And yet, one day one of the email servers panic'ed with a "freeing free inode" error message. After fsck'ing the file system for 10+ hours during which no user access to the file system was allowed, they felt they had fixed the problem in that fsck had found and fixed several issues. They rebooted and the system crashed within ten seconds. One more fsck and they saw the same crash again. Because they had been considering and planning a migration to ZFS, they then took this opportunity to invest an additional 40 hours to copy all of their email data into a ZFS file system after rigging up a temporary email server and recovering enough of people's important email to carry them to the following weekend when they could do the swap over. They have had no problems since moving to ZFS. The Infrastructure Department is still doing a root-cause analysis of this failure, but believe at this point that a power failure about four weeks before the outage may have somehow let their mirrors get out of sync. As Thomas pointed out, ZFS cannot eliminate hardware problems, or change the math concerning the number of failures that will result due to the failure rates of underlying components. And it can't make human decisions smarter. But it can detect and inform you about errors. It will detect out of sync mirrors. And it will correct underlying problems if you let it. Thomas then gave further details on several of ZFS's more prominent safety features, including the ubiquitous use of checksums to provide end-to-end data integrity, the built-in volume manager that allows ZFS to selectively double- or triple-replicate file system metadata depending on its importance, and the copy-on-write approach used by ZFS to avoid ever overwriting valid data that is in use. For more information on ZFS, go here. (2007-07-01 03:58:10.0) Permalink Comments [0] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||