Monday February 27, 2006 Now as for the mission partners - After having talked to quite a few, I have come to the conlusion that they are not in agreement on the exact nature of the expertise gap although they all agree that there is one. I am reminded of [the blind men and the] elephant. The percieved expertise gap by a mission partner is dependent on the type of projects they undertake and the type of expertise they most need.No matter what the type of gap - at the system level I believe complexity is the cause.
So we can fix the complexity or we can address each of the expertise domains and try to address the gap at this level.
And yes I think, if we take this second approach, that there are different solutions dependent on the type of expertise gap. Education, appropriate team mix and practices, abstract languages etc. In fact we may have to do this as an intermediate step
The problem is multi faceted and the near term solutions will have to be as well.
At least we are beginning to understand the highly variable nature of the expertise gap.
Expertise in the program itself, exemplified by Don's case, is a problem seen in the maintenance of all large and complex programs. It is actually unrelated to expertise in the tools, since that is easily acquired. Theres a lot of work going into this area, but nobody actually seems to use it. Most tools for understanding programs become shelfware, a fact that should deeply worry tool developers. Of course this is made much worse by complexity added into the program in the tuning process.
I don't think this is the "expertise gap" bemoaned by the HPC community.
I really think that they are talking about the second gap, in the techniques of performance tuning. You don't demonstrate that education is not a potential solution for this problem, though it certainly hasn't been a good solution up to now. In fact, I think that we don't know how to educate people in scaling and tuning, though we have some success with apprenticeships and similar techniques. Maybe we need performance tuning workshops, similar to writers' workshops.
Reducing complexity helps both gaps, of course, but reducing the complexity of performance tuning has double leverage. It aids the complex job of performance tuning directly, and it reduces the complexity of the code itself, reducing the first gap. If programmers didn't have to worry about performance, they could all write compact, beautiful code.
I think that you need to call out both problems explicitly, and propose separate remedies for both of them. The different solutions may well be at odds with one another, too. In particular, the solution to individual program expertise is often more abstraction. The solution to tuning is often less abstraction. We need to be very explicit about what part of the problem we are attacking with what tools. I think separating the kinds of "expertise gap" might be a good start.
There really is an "expertise gap" out there in the High Performance Computing (HPC) community. One of the goals of Sun's HPCS Productivity Team has been to understand what's really going on, as opposed to what people like to complain about. Susan Squires, our group's anthropologist, would be the first to tell you that those aren't always the same thing.
Some key people in the HPC community have been telling us (in various ways) that they are constrained by the expertise needed for HPC application development, so we looked into it. We gathered several kinds of data and analyzed it using a variety of methods. It turns out that this "expertise gap" idea is right on target, and we can now say quite a bit about how it looks. It takes lots of education and years of experience for people to learn this kind of programming, and only a very few ever get really good at it. We see the problem as an inevitable consequence of the way HPC software gets developed; this will have to change if we're going to get the kind of dramatic productivity increase that DARPA is seeking with their funding of the HPCS program.
Sue Squires, Larry Votta, and I wrote up some of these conclusions for the recent Workshop on Productivity and Performance in High-End Computing (P-PHEC) in a paper we titled "Yes, There Is an 'Expertise Gap' in HPC Application Development" (download the PDF). Here's the abstract:
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC), where productivity is understood to be a composite of system performance, system robustness, programmability, portability, and administrative concerns. Of these, programmability is the least well understood and perceived to be the most problematic. It has been suggested that an “expertise gap” is at the heart of the problem in HPC application development. Preliminary results from research conducted by Sun Microsystems and other participants in the HPCS program confirm that such an “expertise gap” does exist and does exert a significant confounding influence on HPC application development. Further, the nature of the “expertise gap” appears not to be amenable to previously proposed solutions such as “more education” and “more people.” A productivity improvement of the scale sought by the HPCS program will require fundamental transformations in the way HPC applications are developed and maintained.
One of the main characteristics of a computer science mentality is the ability to jump very quickly between levels of abstraction, between a low level and a high level, almost unconciously. Another characteristic is that a computer scientist tends to be able to deal with nonuniform structures case 1, case 2, case 3 while a mathematician will tend to want one unifying axiom that governs an entire system. ... Experience shows that about one person in 50 has a computer scientist's way of looking at things.
That's a fascinating observation that tends to align a lot of the data about the expertise gap. It's not just expertise, so training won't really handle it. Education, in the true sense might, and apprenticeship might help, but it might just take genetics. It also fits what I was saying about levels of abstraction.
It makes you really want something like Fortress, so the library writer, who is a computer scientist, can paper over the gaps while the mathematician/physicist can think at his own level.
Allow me to introduce myself. Mike Ball has been inviting me to join in the conversation here, and I'm finally able to jump in, once we dispense with a few formalities.
I'm Michael Van De Vanter, and I've been working with Mike and others at Sun on the DARPA-funded supercomputer project for about a year and a half. It is a real privilege to participate in such an ambitious program. We've been chartered to do no less that rethink completely, from the ground up, how computing systems help people get real work done. The focus of the program is very specifically about the HPC world, but that doesn't take away from the scope of the challenge.
I'm part of the Core Productivity Team, along with Larry Votta (our lead), Susan Squires (our anthropologist), and most recently Victoria Livschitz; we work with lots of other HPCS groups inside Sun to build an understanding of Productivity as a relationship between a whole system (not just the various hardware and software parts) and the context (human, organizational, political) in which it is deployed. We're also chartered to take a deeper look into programmability, which is one of the key aspects of Productivity, and that's where my background comes into play. I've been writing software, teaching programming, and doing research into how to build tools that help people write software for many years (but not in HPC, where I'm a newbie). You can learn more about my background on my personal home page.
I've just returned from the Third Workshop on Productivity and Performance in High-End Computing held last Sunday in Austin, TX. It was a great chance to talk with some of the other folks looking at these questions, not only from the other HPCS Program Vendors (IBM and Cray) but also the broader research community. There are lots of people right now working on this big question, and we're all trying to learn as much from one another as we can.
I'll say more about the workshop and the paper we write for it in a subsequent post; I'll also mention what a pleasure it was to visit Ira Baxter, CEO of Semantic Designs while I was in Austin.
Anyway, though my partners are busy preparing posters for the presentation, my talk is frozen and I have a little bit of time. So I thought I'd talk a little about abstraction.
The mantra among the software engineering community is that increasing the level of abstraction (the language level) increases productivity. There is certainly some truth to that. One poster child for abstraction is garbage collection, which replaces allocation and deallocation with the abstraction of infinite memory. For programs where this works, it truly does improve productivity. All is not, however, sweetness and light.
I recently sat in on a panel of HPC managers and programmers, talking about what would improve their lives. A manager got up, and said that he wanted a high level language that matched the abstraction of mathematics, in which the compiler and runtime took care of all of the performance issues. A programmer got up and said that he wanted a high level language that gave him explicit control over data layout and similar performance-related issues. Of course, as is usual, both are right. The manager is right to want an abstract language that will make it easier to translate mathematics into programs. The programmer, on the other hand, had to face the fact that the abstraction offered is never perfect. In particular programs, as opposed to equations, have performance characteristics. They may be too slow, or too big, and the programmer, armed only with the high-level code, has no control over either factor. What he needs is a way to ignore the abstraction, which is imperfect in the ways that he cares most about, and to manipulate those performance elements directly.
Guy Steele has been designing a language that tries to meet both requirements. You can read about it in general or in excruciating detail, but the important idea related to our current discussion is that it provides a separation between the abstract algorithm specification and the important implementation details that determine speed. The other HPCS vendors have their own languages with similar characteristics. Cray is developing Chapel and IBM is developing X10. The PGAS languages UPC and Fortran, which I mentioned earlier, are another, though lower level, attempt to add data placement to the language while keeping algorithmic development simple. These languages are a recognition that the abstractions offered for programming mathematics are broken in ways that are important for HPC.
Sometimes, though, abstraction fails simply because it's harder to use than more direct approaches. Sometimes, you are much better off with direct manipulation of important objects. My favorite example is the game editor offered with products like WarCraft. (Note that I resisted the temptation to add a link to the gaming company!) When you are building a field to play the game, you don't want to write a loop that puts mountains, N, E, S, but not in the middle, and a ridge running NW to SE. No, you want to say "Mountain. There. There. There, (no, remove that one), There....". If you don't like computer games, consider a GUI builder, it's the same sort of thing. This is a case where direct manipulation of the objects of interest beats going though a set of indirect steps. Oh, yes, this is a lot more abstract than bits, but it reduces the task to a concrete one, not an abstract one.
More on this later, I need to knock off for some sleep.