Sunday November 11, 2007 | The Navel of Narcissus Josh Simons' Coordinates in the Blogosphere |
|
Managing Petabytes with SAM-QFS
Bryan Banister, Manager, Storage Systems and Production Servers, at the San Diego Supercomputing Center (SDSC) spoke at the HPC Consortium meeting in Reno about their SAM-QFS deployment, which forms an important part of their HPC infrastructure. As background, Bryan briefly described each of the large clusters installed at SDSC: a 15.6 TFLOPs IBM cluster, a 3.1 TFLOPs IBM Itanium cluster, and a 17.1 TFLOPs IBM BlueGene/L system. Lots of IBM gear on the compute side. Sun's presence here is much more visible on the storage side of the SDSC HPC facility. As a TeraGrid site, SDSC is heavily involved in serving large amounts of data for their user base. Grid computing requires migration of input data to remote sites for processing and then either migrating the results back to the home system or on to another site for additional processing. As an example, Bryan described a recent computation done by the Southern California Earthquake Center (SCEC) which generated 47 TBytes of output data and took five days to run on 240 processors. In the near future, the center would like to do a 1 PByte run that will require transferring that data and then processing for 20 days on 1000 processors. To do this, they will require 10 GByte/sec parallel file system transfer rates, with higher rates needed in the near future. SDSC currently has about 2 PBytes of online (disk) storage and an additional 5 PBytes of near-line storage in HPSS and Sun's SAM-QFS. Of that 5 PBytes, 1.2 PBytes are on SAM-QFS, though it is interesting that 85M of the approximately 130M files are stored on SAM-QFS because SAM-QFS handles small files better than HPSS. Bryan showed several architectural diagrams that I cannot reproduce here that detail how SAM-QFS fits into the overall IT infrastructure at SDSC and how it interacts with the TeraGrid infrastructure. The talk closed with a brief description of what SDSC sees as emergent storage technologies that they consider important. These included solid state disk (ssd/flash) for high performance, 8 Gb fiberchannel technology, SATA drives, RAID6, expansion/adoption of 10 GbE, MAID for transient data, and as QDR (quad data rate) becomes available, InfiniBand as a combined storage and cluster interconnect technology. (2007-11-11 19:50:46.0) Permalink Comments [0] A Secure Attribute-Based Infrastructure for Distributed Computational Environments
Arnie Miles, Senior Systems Architect and Assistant Professor of Computer Science at Georgetown University spoke yesterday at the HPC Consortium in Reno. He presented a proposal for a new approach to grid computing to be driven by a new effort called the Thebes Consortium. A basic tenet of the Thebes Consortium is that current grid computing approaches are failing to live up to expectations. In particular, global grid efforts have over-focused on specific communities, have not dealt appropriately with scale (they typically assume a modest number of large components) and attempts to expand these frameworks has only served to make the situation more difficult. Georgetown University has created the Consortium (with funding support from Sun) to design and build a new generation of middleware. They are actively looking for others interested in joining this effort. Philosophically, the Thebes Consortium believes the following statements are true and must be considered when designing a new grid infrastructure:
And, above all, the Thebes Consortium believes in Scalability, Security, and Simplicity. If these principles resonate with you, learn more about the consortium by reading their whitepaper here. (2007-11-11 14:07:50.0) Permalink Comments [0] ARSC's New x86_64 Supercomputer
Greg Newby, Chief Scientist at the Arctic Region Supercomputing Center (ARSC)at the University of Alaska Fairbanks spoke today about Midnight, their new Sun-based HPC supercomputer. For the curious, the name comes from "Land of the Midnight Sun." Midnight contains 2312 computational cores or 413 nodes, configured in 19 racks. The aggregate peak performance of its dual core AMD processors is about 12.1 TFLOPs. It was interesting to see that ARSC had opted for a mix of node sizes for Midnight. There are 55 X4600 nodes, each with 16 cores and 64 GB memory. In addition, there are 358 4-core X2200 m2 nodes, each with 16GB memory. It's also noteworthy that ARSC opted for a generous memory configuration of 4 GB per core rather than the more standard 1-2 GB/core. The system uses a 4X SDR InfiniBand interconnect built around two Voltaire 288-port switches. The interconnect is used for both MPI communication and for connection to the Lustre cluster file system. One switch is configured with 256 attached X2200 m2 nodes while the other hosts the X4600 nodes and the balance of the X2200m2 systems. The software stack used on Midnight includes the following components:
In closing, Greg shared some of his experiences with testing and accepting such a large system and ended by noting that Sun must improve its support of Linux-based systems in order to be successful with these engagements. As it happened, Peter Braam (founder of Cluster File Systems, maker of Lustre, and now a VP at Sun) was the next speaker at the Consortium. More later on what Peter said in response to Greg's comments regarding Sun's support for Linux. (2007-11-10 22:09:49.0) Permalink Comments [3] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||