Sunday June 15, 2008 | The Navel of Narcissus Josh Simons' Coordinates in the Blogosphere |
|
HPC Consortium: High Performance Computing Virtual Laboratory
Wrapping up our first day here in Dresden at the Sun HPC Consortium meeting, Ken Edgecombe, Director of the High Performance Computing Virtual Laboratory (HPCVL) in Kingston, Canada, spoke briefly about their new UltraSPARC T2 Plus cluster. Intrigued by the UltraSPARC T2 performance analysis work done by RWTH Aachen on their HPC workloads, the team at HPCVL decided to take a closer look at the architecture of the UltraSPARC T2 Plus (which adds multi-socket capability via a coherency protocol) and its suitability to their own application mix. Based on that analysis, they have now installed a cluster of 78 Sun SPARC T5140 nodes. Ken presented several graphs showing how well the T5140 scales on a variety of workloads, including FFT and two chemistry codes compared to their existing UltraSPARC IV systems. On their FFT test, the T2 Plus and the US-IV+ reached parity at about 32 threads. Meaning that a one rack-unit server with a processor running at 1.2 GHz was able to deliver results that required multiple boards worth of more expensive and faster UltraSPARC IV+ processors. The penchant for programmers and others to use processor clock speed as a proxy for application performance was called out by Ken as a challenge to the adoption of these new, multi-core and multi-threaded processors. Getting users to think beyond single-threaded performance to overall application and workload performance is key. Ken cited an example involving a grad student who had done some CFD performance comparisons and concluded that the T5140 system was "absolutely useless" for the CFD team. In fact, in looking at the data it was clear that while the system was not at all a panacea for them, it did deliver significant overall value on some problem sizes.Another problem with adoption is the lack of movement on the part of the 3rd party software community to take advantage of multi-core and multi-threading. This is an issue that goes well beyond Sun, though we are seeing it earlier due to our aggressive use of both multiple cores and multiple threads within those cores. This is indeed the coming software crisis that analysts and others are starting to talk about. (2008-06-15 10:13:05.0) Permalink Comments [0]HPC Consortium: Georgetown University
Arnie Miles, Senior Systems Architect and Assistant Professor of Computer Science spoke today here in Dresden at the Sun HPC Consortium meeting. His topic was the Thebes Grid Middleware Consortium, of which he is a founder. Specifically, he covered two topics: The Security Token Service (STS) and resource description and discovery potentially using Ganglia as an enabling technology. SWITCH has implemented the Security Token Service, which allows users to access remote grid resources using only their local security credentials. With trust relationships between peer STS instances and between a local STS instance and local resources/applications, a user in administrative domain A can access a remote application in administrative domain B using only their local username and password. The local STS contacts the remote STS to retrieve tokens that the user agent can then use to access the remote STS and retrieve the appropriate set of access credentials which are then used to contact the remote application directly. In addition to being useful in a distributed grid environment, the STS approach can be used to simplify access to multiple local applications that have different security token requirements. Arnie also described ongoing work to develop a common resource description and discovery mechanism that could be used to enable uniform access to resources being controlled by differing distributed resource management systems. His approach is based on the observation that Ganglia, which is commonly used in many HPC installations, already implements an XML-based resource description language and can be queried to extract resource information that can be used by a higher-level meta-scheduler to make placement decisions based on resource information from heterogeneous administrative domains that has been homogenized by the Ganglia resource description language. It's an interesting idea and now first steps have been taken to allow security credentials to be represented so the STS approach described above can be enabled in a grid environment. (2008-06-15 08:56:29.0) Permalink Comments [0] HPC Consortium: Storage
Storage was the theme of the first set of talks at the Sun HPC Consortium meeting here in Dresden. Peter Braam, Vice President for Lustre, spoke first about Open Storage, which we at Sun believe marks an important shift within our industry comparable to the shift we've seen towards Open Servers and that we expect to see in the future with networking. Open Storage in a nutshell: an approach that leverages open source software, open architectures, common components, and the interoperability of open standards used to create innovative storage products, with breakthrough economics. For example, we do not believe expensive, closed, hardware RAID controllers are a part of the open storage future. Instead, data integrity will be delivered with software like Sun's ZFS filesystem with its end-to-end data integrity model using inexpensive disks, and the considerable capabilities of increasing powerful standard compute servers. Peter Bojanic, Director of the Lustre Group, spoke next. He started with some fun facts about Lustre, Sun's parallel cluster file system, which joined Sun's portfolio with the acquisition of Cluster File System, Inc. Some of the superlatives he mentioned: 25000 clients accessing a single Lustre file system on Red Storm at Sandia National Laboratory and CEA achieving an aggregate 100 GB/sec transfer rate with their Lustre configuration. He also pointed out that Lustre is used on 7 of the 10 largest supercomputers in the world (ref. Nov 2007 TOP500.) Peter spent the bulk of his time discussing the Lustre roadmap, which my fingers were not nimble enough to capture in any detail. I expect the slides to be posted at some point on the Consortium website, so watch for them there. Harriet Coverston, Sun Distinguished Engineer, spoke about Shared QFS and the Storage Archive Manager (SAM), two storage products that are well-known in the HPC community. Perhaps less well-known is that QFS has in an increasing footprint with non-HPC enterprise customers who need its scalability, performance, and reliability. Home Box Office (HBO) is a great example of this. Read about their use of QFS here. After giving an overview of Shared QFS and SAM, Harriet spoke about her group's plans to move to what she called intelligent storage. The intent is to move from a traditional SAN approach to one that embraces T10-based object storage mechanisms, which will greatly increase QFS scalability from its current limit of 256 clients to the range of thousands of clients. Read more about object-based storage here. Our cluster of storage-related talks ended with a presentation by Chris Wood, CTO for Sun's Storage and Data Management Practice. Chris focused on how Sun plans to deliver complete, modular, and scalable storage solutions rather than merely pieces of excellent product and technology. The fundamental problem is how to best satisfy a wide range of potentially conflicting user requirements, the most of common of which are high performance, low cost, high capacity, and a single architectural approach for all workloads. Sun's approach uses a modular architecture that can grow and shrink based on customer requirements and which leverages Sun's hardware and software. For example, SAM-QFS for high performance and data archiving, Lustre for additional scalability and performance, the x4500 storage server, high performance network interface cards (for example, Neptune) and current and future x86 and T2-based servers. (2008-06-15 07:34:28.0) Permalink Comments [0]HPC Consortium Kickoff in Dresden
We've just kicked off the Sun HPC Consortium meeting here in Dresden, where it is early afternoon on Sunday. Most of the expected 80 or so customers are in attendance as Marc Hamilton, Sun's Vice President for Systems Practice Americas, gives an overview of Sun and HPC. The full agenda for this two-day event is here. As you can see, the talks are a good mixture of Sun technical talks and customer talks, one of the hallmarks of this Sun event. I plan to blog as many of the customer talks as I can over the next few days. (2008-06-15 04:59:23.0) Permalink Comments [0] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||