Monday November 26, 2007 | The Navel of Narcissus Josh Simons' Coordinates in the Blogosphere |
|
The Horror of Recognition
I came across the above photo as I was browsing through old yearbook photos on my alma mater's web site several months ago. My first reaction was, "Oh my, they look incredibly geeky." My second reaction was, "Hey, I recognize the guy with the sideburns!" My third reaction was, "Oh, wait. Crap--that's me sitting next to him! What a pair." Inexplicably, my fourth reaction was "Hey, I should post this on the blog." So there you go--a glimpse into computing and attire at Harvard in the early eighties. And eyeglass styles-- let's not forget that. (2007-11-26 19:37:38.0) Permalink Comments [2]HPC Podcast: Innovating@Sun Hal Stern and I recently discussed Sun and High Performance Computing on his podcast show, Innovating@Sun. We talked about Sun's Constellation System components for HPC, trends and challenges in HPC, and Sun's history in HPC. We also discussed software, including Solaris for HPC. Check it out on blogs.sun.com or on iTunes. Running time, 17 minutes. (2007-11-23 17:05:33.0) Permalink Comments [0] Cool Math: Pick's Theorem
You see on the left three simple polygons (a polygon is simple if its boundary does not cross itself.) How would you determine the areas of these shapes? The rectangle is easy. The blue polygon is a little more tedious since you need to count the number of interior grid squares. But how would you find the area of the red polygon? As it turns out, you can easily compute the area of any simple polygon whose vertices are aligned on a regular, square grid using Pick's Theorem, which says the area of such a polygon can be found as I + B/2 - 1 where I is the number of grid points on the interior of the polygon and B is the number of grid points lying along the boundary of the polygon. I find it amazing that this works for any simple polygon. We can see by inspection that the green rectangle has area 42 (6x7.) Let's apply Pick's Theorem. There are 30 grid points in the interior and 26 grid points on the boundary. 30 + 26/2 - 1 = 42. Magic. :-) Now let's try the blue polygon. I = 25, B = 52, so Pick's Theorem says the area is 25 + 52/2 - 1 = 50, which is correct by inspection. By my count, the red polygon's area is 70 + 24/2 - 1 = 81. My lines are a little fat so I made some (consistent) judgement calls about "in" or "on"--your count may be slightly different than mine. Visit this page to explore Pick's Theorem with an interactive Java applet. See this page for one proof of the theorem. (2007-11-22 18:39:49.0) Permalink Comments [0] HPC Consortium Presentations now Online Most of the slides used at the HPC Consortium meeting in Reno are now posted to the consortium web site. These include customer talks, many or all of the partner talks, and some of the Sun talks. I counted 29 presentations as of this writing. Go here. (2007-11-21 20:38:11.0) Permalink Comments [0] Lustre Update Peter Braam, founder of Cluster File Systems Inc. and now VP of Lustre at Sun, gave an update on Lustre to the 150+ Sun HPC customers who attended the HPC Consortium meeting in Reno prior to the Supercomputing '07 conference. He spoke briefly about Lustre's place in the HPC market, citing example of its use in the TOP500. Subsequent to his talk we learned that the most current list (which was released at Supercomputing) shows that Lustre is used on 7 of the 10 largest supercomputers in the world. While it is used at the very high end, Lustre also has a strong presence in Oil & Gas, in digital animation, in EDA (electronic design automation), and at several large ISPs to name a few areas. The current Lustre release is 1.6, which has undergone some major usability improvements from 1.4, which is still in use by some customers today. Version 1.8 is targeted for the 2nd quarter of 2008 and will include support for ZFS and Solaris on the storage server. Version 2.0 is scheduled for the 4th quarter of 2008 and will add a clustered metadata capability and server network striping. These are the plans. Insert standard caveats about engineering plans here. Peter talked about the post-acquistion integration into Sun and described it as smooth. While it was disruptive to some (notably, Peter himself) the team has continued largely structured in the same way. I do know that some Sun managers and engineers have joined the Lustre team, which I think is a great way to help the Lustre team continue to transition into Sun. We've also paired Lustre engineers and managers with old Sun hands as mentors to help easy the transition. It helps, I'm sure, that some of these mentors themselves came into Sun from small companies through acquisition. From all I've heard, the integration seems to be going quite well. In terms of the business ramifications of the acquisition, continuity is the theme. Still open source, still the same model for customer support, and we will continue business with Lustre's various OEMs. And, of course, Linux continues to be a focus while we also work to expand Solaris support. So, what's up with ZFS and Lustre? Lustre servers today are built on the Linux ext3 local filesystem and CFS was able to achieve extreme performance with it. Version 1.8 will add support for ZFS with the intent of hardening Lustre and driving for even higher levels of scalability. The servers will be in user space using user space ZFS code and there will be server migration tools available for those customers wishing to migrate from an ext3-based server to one based on ZFS. Version 1.8 will also see the additional of a network request scheduler to improve I/O scheduling, based on work done at Oak Ridge National Laboratory (ORNL), a Lustre Center of Excellence, on Jaguar, their 8000-client HPC cluster. One funny point Peter made: Sun is back in Phase III of the DARPA HPCS program. Recall (perhaps) that Sun was not selected to proceed from Phase II to Phase III--but CFS was to supply the file system for Cray's solution, and so Sun is back. :-) As part of our involvement in HPCS Phase III, there will be significant future enhancements to Lustre to support some fairly daunting requirements on file creation rates, client bandwidths, and extremely large file counts. All good news for the HPC community at large. (2007-11-20 16:01:17.0) Permalink Comments [0] ClusterTools 7.1 Now Available for Free Download ClusterTools 7.1, which includes the latest version of Sun's MPI library for Solaris x86/x64 and SPARC, is now available for free download here. This release adds support for 32- and 64-bit Intel-based platforms, improves support for 3rd-party parallel debuggers, includes improved memory usage for communication, adds PBS Pro validation, and bundles additional bug fixes contributed by the Open MPI community. (2007-11-19 11:57:51.0) Permalink Comments [0] University of Warsaw: New HPC Perspectives and Prospects
Marek Niezgodka, Director of the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw spoke this weekend at the HPC Consortium meeting in Reno. ICM is a high-end computng center for research and applications in Poland, a national laboratory in computational and informational sciences, and a partner and leader on multiple grid projects. ICM research focuses in several areas, including:
In addition to research activities, the Center is heavily involved in delivering wide-area services. For example, numerical weather prediction for central Europe at a 4km horizontal resolution and additional prediction for the northern Atlantic and Asia. ICM also functions as a knowledge repository, a healthcare grid for cardiology, and it offers large-scale data processing and analysis for industry and the public sector. ICM is currently undergoing a significant infrastructure expansion, including a doubling of staff to approximately 300 by 2010, data expansion to between 5 and 10 Petabytes by 2009. Compute capabilities will be expanded to a total capacity of approximately 100 TFLOPs. This deployment is currently underway and will be completed in 2008. The core of this system is built with Sun Constellation components, including Thumper (X4500) storage. (2007-11-12 14:28:05.0) Permalink Comments [5] Multicore Performance Analysis Tools from Academia Karl Fuerlinger, from the Innovative Computing Laboratory at the University of Tennessee at Knoxville spoke about multicore performance analysis tools at the HPC Consortium meeting here in Reno yesterday. He focused on tools available from academia rather than vendor-supplied tools. In Karl's view, the vendor tools are powerful, commercially supported, and typically limited to the vendor platform, while academic tools are generally cross-platform, often include advanced or experimental techniques like automated performance analysis and often focus more on high levels of scalability. Popular academic tools include:
Karl pointed out that these academic tools tend to generally interoperate with each other. For example, PAPI can be used by most of the above tools to access performance counter information. Profiles can be gathered by several of these tools and then visualized with TAU. And trace data collected with these tools can be fed into the KOJAK/SCALASCA automatic trace analysis capabilities. Traces generated from TAU or KOJAK/SCALASCA can be visualized with Vampir. (2007-11-12 14:19:47.0) Permalink Comments [0] 100 TFLOPs Insufficient?
James Leylek, Executive Director of the Clemson University Computational Center for Mobility Systems, spoke at the HPC Consortium meeting about the computational requirements for simulation of vehicle-related phenomena. A main point of Dr. Leylek's talk was that unsteady simulations are required to adequately model the physical behavior of mobility systems. There are many cases in which unsteady or turbulent mechanisms dominate in this class of problems. There are boundary layer issues, laminar to turbulent transitions, so-called Type II transient flows, etc. They key, though, is finding appropriate numeric techniques to perform these simulations. Typical mobility application areas include formula-1 race cars, airplane wing design, engine fan design, aircraft carriers, submarines, engine block cooling, and blood flow through artificial hearts. As an example of the problems sizes in this space, Leylek described what is required to simulate the aerodynamics of a Formula-1 race car. It requires 300M finite element volumes, with eight equations per volume for a total of about 2.4B equations to be solved. And because of the unsteady nature of flows around these bodies, the simulations must be run for tens of thousands of time steps. This essentially means that dedicating even 100 TFLOPs to one team would not be sufficient to allow the dozens of "what if" experiments needed during the vehicle design phase. When one realizes that aerodynamics is just one of a number of attributes that must be simulated for this one application area, the situation becomes even more daunting. There are a number of numerical methods that can be used to perform these simulations. Full unsteady simulation is impractical for the time being until much larger computational facilities are available at a more affordable cost. In the meantime, what to do? The Computational Center for Mobility Systems at Clemson brings together a large amount of Sun HPC gear and the algorithmic expertise to team with companies and other organizations to perform these simulations using the unique capabilities of semi-deterministic stress model (SDSM) techniques to deliver value to their partners in the shorter term. The point is to be smarter about how these problems should be solved and not be intimidated by the computational requirements predicted by extrapolations based on brute-force methodologies. (2007-11-12 09:09:24.0) Permalink Comments [0] UltraSPARC T2 for HPC: A Customer Assessment
Dieter an Mey, HPC Team Lead at RWTH Aachen's Center for Computing and Communication, presented an evaluation of the suitability of Sun's UltraSPARC T2 processor for High Performance Computing at the HPC Consortium meeting in Reno. The Aachen study compares systems with the T2 processor against systems with Sun's UltraSPARC IV processor, with AMD Opteron processors, and with Intel Woodcrest and Clovertown processors. The test cases used were representative of a range of applications and attributes that are important to users at Aachen. I will briefly summarize the results here and recommend those interested in more detail visit this page for a full explanation of the methodology and to view the detailed results. Aachen examined several performance kernels: memory bandwidth, LINPACK, and sparse matrix-vector multiplication. They also examined results for several applications, including TFS, which used to model nasal flow for computer-aided surgery. This code can be run in several ways using OpenMP for parallelization. They also ran FLOWer and a code does contact analysis of bevel gears. In addition to these application tests, Aachen ran multiple instances of applications simultaneously to assess the throughput capabilities of each system. A power and performance/power analysis was also done. The results showed that a combination of T2-based systems and x64/x86 systems would be ideal for Aachen. Very cache-friendly codes did not benefit as much from the N2 architecture and these performed better on the Intel and AMD based systems. The bevel gear code is an example of such a code. TFS, on the other hand, performed better in throughput mode on the T2 system. In both cases the best results were 2X better than the altenative. That is, the Intel/AMD systems generally did about 2X better than the T2 system on cache-friendly codes while the T2 system was 2X better in cases where memory bandwidth was a limiting factor. (2007-11-12 08:12:43.0) Permalink Comments [2] Ranger Update: TACC's Path to Petascale
Jay Boisseau, Director of the Texas Advanced Computing Center, site of Sun's largest supercomputer installation to date, gave an update on the Ranger system to the HPC Consortium in Reno. Ranger is the first in a series of annual Track 2 NSF procurements that have been motivated by the findings of the NSF Cyberinfrastructure Strategic Plan, which is available in PDF format here. There are several institutions involved in this procurement. TACC / UT Austin provides project leadership, hosts and runs Ranger, provides user support, etc. ICES / UT Austin provides algorithmic expertise and applications collaborations. The Cornell Center for Advanced Computing (formerly the Cornell Theory Center) provides large-scale data management and analysis and training. Arizona State HPCI contributes user support and technology evaluation and insertion. So, just how big is this Big Iron? Just over one-half PetaFLOPs (504 TFLOPs), built with 3936 Sun four-socket blades, each socket populated by a four-core 2.0 GHz AMD Barcelona processor for a total of almost 63,000 cores. Memory is big as well, with 2 GB per core (32 GB/node) for a total of 125 Terabytes in the Ranger system. This being Texas, the disk subsystem does not disappoint with 1.7 Petabytes of storage built from 72 Sun X4500 (Thumper) I/O servers, each with 24 Terabytes delivering a total aggregate bandwidth of 72 Gbytes/sec. The largest filesystem built on this storage offers one Petabyte of storage. The system interconnect is InfiniBand using Mellanox's latest ConnectX Infiniband cards and two of Sun's 3456-port Magnum switches. Interconnect link bandwidth is approximately 10 Gb/sec and latency is approximately 2.3us. Physically, the system fits in 96 racks (82 compute, 12 support, 2 switches) that sit in about 4500 square feet along with 116 APC InRow cooling units. Due to the density of the Sun solution, floor space has not really been an issue. Power requirements on the other hand, are quite daunting for a system of this size. 1 MW of the 3.4 MW required to run Ranger are needed for cooling. I was impressed to hear that Jay expects a significant number of applications to sustain 50-100 TFLOPs on Ranger--that is some serious application scaling! He predicted there will be a double-digit number of codes using over 10,000 cores by the end of 2008 and expects a few of these to run later this year. In terms of software environment, Ranger is a Linux cluster that uses the ROCKS provisioning software to handle OS and application deployments, Lustre as its scalable parallel filesystem, and the OpenFabrics stack to control the InfiniBand interconnect. In addition, at least two MPI implementations will be used on Ranger -- MVAPICH and Open MPI. There will be several compiler suites available, including Sun Studio, Pathscale, and the Portland Group compilers. Sun Grid Engine will be used for job scheduling. The impact Ranger will have on the capabilities of the TeraGrid is considerable as it will make more CPU hours available to TeraGrid users than all other current TeraGrid systems combined. At 504 TFLOPs, Ranger is 5X larger than the current top TeraGrid system. Jay ended with a brief summary of the status of the Ranger installation process, which is ongoing. He characterized most things as good: TACC is happy with Barcelona performance, with the Sun Constellation blades, the performance of the InfiniBand fabric (Sun switch and Mellanox card), Thumper performance, the Sun racks, the APC cooling solution, and Sun Grid Engine. There have been some BIOS issues that Sun and AMD have been working through and there have been some expected component failures due to the very large number of components involved in this system. The most vexing problem, which we hope has now been solved, involved manufacturing issues related to the special InfiniBand cables used in the Constellation system. Apparently, some step in the manufacturing process introduced a crimp which caused connectivity problems. Correctly manufactured cables are now being put in place. As Jay said, it has been through the extremely hard work of Sun and TACC personnel that the delays introduced by these problems have been largely overcome. I know the Sun folks I've talked with are working incredibly hard to make TACC successful. The system is expected to be online in early December. (2007-11-11 21:45:41.0) Permalink Comments [0] Managing Petabytes with SAM-QFS
Bryan Banister, Manager, Storage Systems and Production Servers, at the San Diego Supercomputing Center (SDSC) spoke at the HPC Consortium meeting in Reno about their SAM-QFS deployment, which forms an important part of their HPC infrastructure. As background, Bryan briefly described each of the large clusters installed at SDSC: a 15.6 TFLOPs IBM cluster, a 3.1 TFLOPs IBM Itanium cluster, and a 17.1 TFLOPs IBM BlueGene/L system. Lots of IBM gear on the compute side. Sun's presence here is much more visible on the storage side of the SDSC HPC facility. As a TeraGrid site, SDSC is heavily involved in serving large amounts of data for their user base. Grid computing requires migration of input data to remote sites for processing and then either migrating the results back to the home system or on to another site for additional processing. As an example, Bryan described a recent computation done by the Southern California Earthquake Center (SCEC) which generated 47 TBytes of output data and took five days to run on 240 processors. In the near future, the center would like to do a 1 PByte run that will require transferring that data and then processing for 20 days on 1000 processors. To do this, they will require 10 GByte/sec parallel file system transfer rates, with higher rates needed in the near future. SDSC currently has about 2 PBytes of online (disk) storage and an additional 5 PBytes of near-line storage in HPSS and Sun's SAM-QFS. Of that 5 PBytes, 1.2 PBytes are on SAM-QFS, though it is interesting that 85M of the approximately 130M files are stored on SAM-QFS because SAM-QFS handles small files better than HPSS. Bryan showed several architectural diagrams that I cannot reproduce here that detail how SAM-QFS fits into the overall IT infrastructure at SDSC and how it interacts with the TeraGrid infrastructure. The talk closed with a brief description of what SDSC sees as emergent storage technologies that they consider important. These included solid state disk (ssd/flash) for high performance, 8 Gb fiberchannel technology, SATA drives, RAID6, expansion/adoption of 10 GbE, MAID for transient data, and as QDR (quad data rate) becomes available, InfiniBand as a combined storage and cluster interconnect technology. (2007-11-11 19:50:46.0) Permalink Comments [0] A Secure Attribute-Based Infrastructure for Distributed Computational Environments
Arnie Miles, Senior Systems Architect and Assistant Professor of Computer Science at Georgetown University spoke yesterday at the HPC Consortium in Reno. He presented a proposal for a new approach to grid computing to be driven by a new effort called the Thebes Consortium. A basic tenet of the Thebes Consortium is that current grid computing approaches are failing to live up to expectations. In particular, global grid efforts have over-focused on specific communities, have not dealt appropriately with scale (they typically assume a modest number of large components) and attempts to expand these frameworks has only served to make the situation more difficult. Georgetown University has created the Consortium (with funding support from Sun) to design and build a new generation of middleware. They are actively looking for others interested in joining this effort. Philosophically, the Thebes Consortium believes the following statements are true and must be considered when designing a new grid infrastructure:
And, above all, the Thebes Consortium believes in Scalability, Security, and Simplicity. If these principles resonate with you, learn more about the consortium by reading their whitepaper here. (2007-11-11 14:07:50.0) Permalink Comments [0] ARSC's New x86_64 Supercomputer
Greg Newby, Chief Scientist at the Arctic Region Supercomputing Center (ARSC)at the University of Alaska Fairbanks spoke today about Midnight, their new Sun-based HPC supercomputer. For the curious, the name comes from "Land of the Midnight Sun." Midnight contains 2312 computational cores or 413 nodes, configured in 19 racks. The aggregate peak performance of its dual core AMD processors is about 12.1 TFLOPs. It was interesting to see that ARSC had opted for a mix of node sizes for Midnight. There are 55 X4600 nodes, each with 16 cores and 64 GB memory. In addition, there are 358 4-core X2200 m2 nodes, each with 16GB memory. It's also noteworthy that ARSC opted for a generous memory configuration of 4 GB per core rather than the more standard 1-2 GB/core. The system uses a 4X SDR InfiniBand interconnect built around two Voltaire 288-port switches. The interconnect is used for both MPI communication and for connection to the Lustre cluster file system. One switch is configured with 256 attached X2200 m2 nodes while the other hosts the X4600 nodes and the balance of the X2200m2 systems. The software stack used on Midnight includes the following components:
In closing, Greg shared some of his experiences with testing and accepting such a large system and ended by noting that Sun must improve its support of Linux-based systems in order to be successful with these engagements. As it happened, Peter Braam (founder of Cluster File Systems, maker of Lustre, and now a VP at Sun) was the next speaker at the Consortium. More later on what Peter said in response to Greg's comments regarding Sun's support for Linux. (2007-11-10 22:09:49.0) Permalink Comments [3] Compute Power for C²A²S²E
Dr. Alfred Geiger, Head of Solutions & Innovations Scientific & Technical ICT, T-Systems Solutions for Research GmbH, gave the first customer presentation at the HPC Consortium here in Reno. His talk focused on the compute requirements needed by C²A²S²E, the Center for Computer Applications in Aerospace Science and Engineering. C²A²S²E is one of four European centers focused on aspects of a European initiative in aeronautics that aims to sustain growth in the air transport market without increasing the environmental impact of the industry. As one would expect, numerical flow simulations will play a critical role in achieving this goal and these simulations will require very large computational resources. These simulations span a broad range, including ice prediction, low-speed wing design, flutter prediction, ground effect modelling, etc. All of these problems require significant CFD capabilities on the order of a million-fold increase over current simulation capabilities. Computationally, the mainstay of the required CFD computation is a CFD code called Tau. Using a multi-grid approach with unstructured meshes, and typically 60 neighbors per domain, overall application performance will depend heavily on the ability to efficiently handle small message transfers at high message rates. InfiniBand has consequently been chosen as the interconnect for the C²A²S²E HPC system. And Sun's Constellation switch (Magnum) has been chosen as the switch fabric for the C²A²S²E system. When looking at processors, T-Systems found that while IBM's BlueGene/L system delivered the best absolute performance, when they looked at performance per watt, the AMD Barcelona processor proved to be the best choice. Dr. Geiger estimates that from a 2-3 year total cost of ownership (TCO) perspective, power and cooling costs are on par with hardware acquisition costs. The Barcelona processor has therefore been chosen as the processor for this project. The system will consist of 758 nodes with 16GB memory and 10 nodes with 32 GB for a total of 6144 cores all connected with Sun's InfiniBand switching technology. The system will run SLES9 and will make a variety of compiler suites (including Sun Studio) available for C²A²S²E users. (2007-11-10 12:54:34.0) Permalink Comments [0] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||