Next: 2 Technical Backdrop Up: 1 Introduction Previous: 1.3 Caltech Concurrent Computation

1.4 How Parallel Computing Works

CP's research showed that

PCW:
Parallel Computers work in a large class of scientific and engineering computations.

The book quantifies and exemplifies this assertion.
In Chapter 2, we provide the national overview of parallel computing activities during the last decade. Chapter 3 is somewhat speculative as it attempts to provide a framework to quantify the previous PCW statement.
We will show that, more precisely, parallel computing only works in a ``scaling'' fashion in a special class of problems which we call synchronous and loosely synchronous.
By scaling, we mean that the parallel implementation will efficiently extend to systems with large numbers of nodes without levelling off of the speedup obtained. These concepts are quantified in Chapter 3 with a simple performance model described in detail in [Fox:88a].
The book is organized with applications and software issues growing in complexity in later chapters. Chapter 4 describes the cleanest regular synchronous applications which included many of our initial successes. However, we already see the essential points:

DD:
Domain decomposition (or data parallelism) is a universal source of scalable parallelism
MP:
software model was a simple explicit message passing with each node of a parallel processor running conventional sequential code communicating via subroutine call with other nodes.

CrOS and its follow-on Express, described in Chapter 5, support this software paradigm. Explicit message passing is still an important software model and in many cases, the only viable approach to high-performance parallel implementations on MIMD machines.
Chapters 6 through 9 confirm these lessons with an extension to more irregular problems. Loosely synchronous problem classes are harder to parallelize, but still use the basic principles DD and MP. Chapter 7 describes a special class, embarrassingly parallel, of applications where scaling parallelism is guaranteed by the independence of separate components of the problem.
Chapters 10 and 11 describe parallel computing tools developed within CP. DIME supports parallel mesh generation and adaptation, and its use in general finite element codes. Initially, we thought load balancing would be a major stumbling block for parallel computing because formally it is an NP-complete (intractable) optimization problem. However, effective heuristic methods were developed which avoid the exponential time complexity of NP-complete problems by searching for good but not exact minima.
In Chapter 12, we describe the most complex irregular loosely synchronous problems which include some of the hardest problems tackled in CP.
As described earlier, we implemented essentially all the applications described in the book using explicit user-generated message passing. In Chapter 13, we describe our initial efforts to produce a higher level data-parallel Fortran environment, which should be able to provide a more attractive software environment for the user. High Performance Fortran has been adopted as an informal industry standard for this language.
In Chapter 14, we describe the very difficult asynchronous problem class for which scaling parallel algorithms and the correct software model are less clear. Chapters 15, 16, and 17 describe four software models, Zipcode, MOOSE, Time Warp, and MOVIE which tackle asynchronous and the mixture of asynchronous and loosely synchronous problems one finds in the complex system simulations and analysis typical of many real-world problems. Applications of this class are described in Chapter 18, with the application of Section 18.3 being an event-driven simulation-an important class of difficult-to-parallelize asynchronous applications.
In Chapter 19 we look to the future and describe some possibilities for the use of parallel computers in industry. Here we note that CP, and much of the national enterprise, concentrated on scientific and engineering computations. The examples and ``proof'' that parallel computing works are focussed in this book on such problems. However, this will not be the dominant industrial use of parallel computers where information processing is most important. This will be used for decision support in the military and large corporations, and to supply video, information and simulation ``on demand'' for homes, schools, and other institutions. Such applications have recently been termed national challenges to distinguish them from the large scale grand challenges, which underpinned the initial HPCC initiative [FCCSET:94a]. The lessons CP and others have learnt from scientific computations will have general applicability across the wide range of industrial problems.
Chapter 20 includes a discussion of education in computational science-an unexpected byproduct of CP-and other retrospective remarks. The appendices list the CP reports including those not cited directly in this book. Some information is available electronically by mailing citlib@caltech.edu.

Next: 2 Technical Backdrop Up: 1 Introduction Previous: 1.3 Caltech Concurrent Computation

Guy Robinson
Wed Mar 1 10:19:35 EST 1995