Wednesday February 22, 2006 There really is an "expertise gap" out there in the High Performance Computing (HPC) community. One of the goals of Sun's HPCS Productivity Team has been to understand what's really going on, as opposed to what people like to complain about. Susan Squires, our group's anthropologist, would be the first to tell you that those aren't always the same thing.
Some key people in the HPC community have been telling us (in various ways) that they are constrained by the expertise needed for HPC application development, so we looked into it. We gathered several kinds of data and analyzed it using a variety of methods. It turns out that this "expertise gap" idea is right on target, and we can now say quite a bit about how it looks. It takes lots of education and years of experience for people to learn this kind of programming, and only a very few ever get really good at it. We see the problem as an inevitable consequence of the way HPC software gets developed; this will have to change if we're going to get the kind of dramatic productivity increase that DARPA is seeking with their funding of the HPCS program.
Sue Squires, Larry Votta, and I wrote up some of these conclusions for the recent Workshop on Productivity and Performance in High-End Computing (P-PHEC) in a paper we titled "Yes, There Is an 'Expertise Gap' in HPC Application Development" (download the PDF). Here's the abstract:
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC), where productivity is understood to be a composite of system performance, system robustness, programmability, portability, and administrative concerns. Of these, programmability is the least well understood and perceived to be the most problematic. It has been suggested that an “expertise gap” is at the heart of the problem in HPC application development. Preliminary results from research conducted by Sun Microsystems and other participants in the HPCS program confirm that such an “expertise gap” does exist and does exert a significant confounding influence on HPC application development. Further, the nature of the “expertise gap” appears not to be amenable to previously proposed solutions such as “more education” and “more people.” A productivity improvement of the scale sought by the HPCS program will require fundamental transformations in the way HPC applications are developed and maintained.