Sunday January 29, 2006 Essentially all modern HPC code makes use of parallelism to speed up execution. Parallelism isn't the only way to speed things up, but it's the most general way, and the other approaches are usually used in massively parallel systems anyway. We'll talk about these other approaches some other time.
There are two common approaches to tying computers together for HPC. One is a Symmetric multiprocessor, or SMP, which consists of group of processors sharing a common memory. These are very common, and are becoming even more common with multi-core chips. The other is a cluster of computers interconnected with some high-speed network. These techniques can be used together to build a cluster of SMPs. In fact, the most common HPC system in the near future will probably be a cluster of multi-core Opterons.
The most common tool for writing parallel programs is MPI, or the Message Passing Interface library. This is a library, used from either Fortran or C/C++, that handles data transfer and synchronization among processes.
If we look at real-world usage we discover that it's dominated by MPI.
| F90/F95 | 17 |
| C/C++ | 3 |
| MPI | 20 |
| OpenMP | 4 (as an alternative to MPI) |
| (Sca)LAPACK | 5 |
| NetCDF | 2 |
| PETSc | 1 |
A larger survey of about 300 users at NCSA asked about how the program is parallelized.
| MPI | 44% |
| openMP | 14% |
| Mixed MPI/openMP | 11% |
| Automatic | 8% |
Unfortunately, though MPI works well, it's not easy to use, since it's a very low-level library. A good part of our productivity study has been spend looking for alternatives.