Contents Foreword 1 Introduction 1.1 Introduction 1.2 The National Vision for ParallelComputation 1.3 Caltech Concurrent Computation Program 1.4 How Parallel Computing Works 2 Technical Backdrop 2.1 Introduction 2.2 Hardware Trends 2.2.1 Parallel Scientific Computers Before 1980 2.2.2 Early 1980s 2.2.3 Birth of the Hypercube 2.2.4 Mid-1980s 2.2.5 Late 1980s 2.2.6 Parallel Systems-1992 2.3 Software 2.3.1 Languages and Compilers 2.3.2 Tools 2.4 Summary 3 A Methodology for Computation 3.1 Introduction The Process of Computation and ComplexSystems Examples of Complex Systems and TheirSpace-Time Structure 3.4 The Temporal Properties of ComplexSystems 3.5 Spatial Properties of Complex Systems 3.6 Compound Complex Systems 3.7 Mapping Complex Systems 3.8 Parallel Computing Works? 4 Synchronous Applications I 4.1 QCD and the Beginning of CP 4.2 Synchronous Applications 4.3 Quantum Chromodynamics 4.3.1 Introduction 4.3.2 Monte Carlo 4.3.3 QCD 4.3.4 Lattice QCD 4.3.5 Concurrent QCD Machines 4.3.6 QCD on the Caltech Hypercubes 4.3.7 QCD on the Connection Machine 4.3.8 Status and Prospects 4.4 Spin Models 4.4.1 Introduction 4.4.2 Ising Model 4.4.3 Potts Model 4.4.4 XY Model 4.4.5 O(3) Model 4.5 An Automata Model of Granular Materials 4.5.1 Introduction 4.5.2 Comparison to Particle Dynamics Models 4.5.3 Comparison to Lattice Gas Models 4.5.4 The Rules for the Lattice Grain Model 4.5.5 Implementation on a Parallel Computer 4.5.6 Simulations 4.5.7 Conclusion Express and CrOS - Loosely Synchronous Message Passing 5.1 Multicomputer Operating Systems 5.2 A ``Packet'' History of Message-passingSystems 5.2.1 Prehistory 5.2.2 Application-driven Development 5.2.3 Collective Communication 5.2.4 Automated Decomposition-whoami 5.2.5 ``Melting''-a Non-crystalline Problem 5.2.6 The Mark III 5.2.7 Host Programs 5.2.8 A Ray Tracer-and an ``Operating System'' 5.2.9 The Crystal Router 5.2.10 Portability 5.2.11 Express 5.2.12 Other Message-passing Systems 5.2.13 What Did We Learn? 5.2.14 Conclusions 5.3 Parallel Debugging 5.3.1 Introduction and History 5.3.2 Designing a Parallel Debugger 5.3.3 Conclusions 5.4 Parallel Profiling 5.4.1 Missing a Point 5.4.2 Visualization 5.4.3 Goals in Performance Analysis 5.4.4 Overhead Analysis 5.4.5 Event Tracing 5.4.6 Data Distribution Analysis 5.4.7 CPU Usage Analysis 5.4.8 Why So Many Separate Tools? 5.4.9 Conclusions 6 Synchronous Applications II 6.1 Computational Issues in SynchronousProblems Convectively-Dominated Flows and theFlux-Corrected Transport Technique 6.2.1 An Overview of the FCT Technique 6.2.2 Mathematics and the FCT Algorithm 6.2.3 Parallel Issues 6.2.4 Example Problem 6.2.5 Performance and Results 6.2.6 Summary Magnetism in the High-TemperatureSuperconductor Materials 6.3.1 Introduction 6.3.2 The Computational Algorithm 6.3.3 Parallel Implementation and Performance 6.3.4 Physics Results 6.3.5 Conclusions Phase Transitions in Two-dimensionalQuantum Spin Systems 6.4.1 The case of : Antiferromagnetic Transitions Origin of the Interaction Simulation Results Theoretical Interpretation Comparison with Experiments The Case of : Quantum XY Model and theTopological Transition A Brief History Evidence for the Transition Implications A Hierarchical Scheme for SurfaceReconstruction and Discontinuity Detection 6.5.1 Multigrid Method with Discontinuities 6.5.2 Interacting Line Processes Generic Look-up Table and Specific Parametrization Pyramid on a Two-Dimensional Mesh of Processors 6.5.5 Results for Orientation Constraints 6.5.6 Results for Depth Constraints 6.5.7 Conclusions 6.6 Character Recognition by Neural Nets 6.6.1 MLP in General 6.6.2 Character Recognition using MLP 6.6.3 The Multiscale Technique 6.6.4 Results 6.6.5 Comments and Variants on the Method An Adaptive Multiscale Scheme for Real-Time Motion Field Estimation 6.7.1 Errors in Computing the Motion Field 6.7.2 Adaptive Multiscale Scheme on a Multicomputer 6.7.3 Conclusions 6.8 Collective Stereopsis 7 Independent Parallelism 7.1 Embarrassingly Parallel Problem Structure 7.2 Dynamically Triangulated Random Surfaces 7.2.1 Introduction 7.2.2 Discretized Strings 7.2.3 Computational Aspects 7.2.4 Performance of String Program 7.2.5 Conclusion 7.3 Numerical Study of High-T Spin Systems 7.4 Statistical Gravitational Lensing 7.5 Parallel Random Number Generators Parallel Computing in Neurobiology: The GENESIS Project 7.6.1 What Is Computational Neurobiology? 7.6.2 Parallel Computers? Problems with Most Present Day Parallel Computers 7.6.4 What is GENESIS? 7.6.5 Task Farming 7.6.6 Distributed Modelling via the Postmaster Element Full Matrix Algorithms and Their Applications 8.1 Full and Banded Matrix Algorithms 8.1.1 Matrix Decomposition 8.1.2 Basic Matrix Arithmetic 8.1.3 Matrix Multiplication for Banded Matrices 8.1.4 Systems of Linear Equations 8.1.5 The Gauss-Jordan Method 8.1.6 Other Matrix Algorithms 8.1.7 Concurrent Linear Algebra Libraries 8.1.8 Problem Structure 8.1.9 Conclusions Quantum Mechanical Reactive ScatteringUsing a High-Performance ParallelComputer 8.2.1 Introduction 8.2.2 Methodology 8.2.3 Parallel Algorithm 8.2.4 Results and Discussion Studies of Electron-Molecule Collisions onDistributed-Memory Parallel Computers 8.3.1 Introduction 8.3.2 The SMC Method and Its Implementation 8.3.3 Parallel Implementation 8.3.4 Performance Mark IIIfp Intel Machines 8.3.5 Selected Results 8.3.6 Conclusion 9 Loosely Synchronous Problems 9.1 Problem Structure Geomorphology by Micromechanical Simulations Plasma Particle-in-Cell Simulation of anElectron Beam Plasma Instability 9.3.1 Introduction 9.3.2 GCPIC Algorithm 9.3.3 Electron Beam Plasma Instability Performance Results for One-DimensionalElectrostatic Code 9.3.5 One-Dimensional Electromagnetic Code 9.3.6 Dynamic Load Balancing 9.3.7 Summary 9.4 Computational Electromagnetics LU Factorization of Sparse, Unsymmetric Jacobian Matrices 9.5.1 Introduction 9.5.2 Design Overview 9.5.3 Reduced-Communication Pivoting 9.5.4 New Data Distributions 9.5.5 Performance Versus Scattering 9.5.6 Performance Order 13040 Example Order 2500 Example 9.5.7 Conclusions Concurrent DASSL Applied to Dynamic Distillation Column Simulation 9.6.1 Introduction 9.6.2 Mathematical Formulation 9.6.3 proto-Cdyn - Simulation Layer Template Structure Problem Preformulation 9.6.4 Concurrent Formulation Overview Single Integration Step The Integration Computations Single Residuals Jacobian Computation Exploitation of Latency The LU Factorization Forward- and Back-solving Steps Residual Communication 9.6.5 Chemical Engineering Example 9.6.6 Conclusions 9.7 Adaptive Multigrid 9.7.1 Introduction 9.7.2 The Basic Algorithm 9.7.3 The Adaptive Algorithm 9.7.4 The Concurrent Algorithm 9.7.5 Summary 9.8 Munkres Algorithm for Assignment 9.8.1 Introduction 9.8.2 The Sequential Algorithm 9.8.3 The Concurrent Algorithm Optimization Methods for Neural Nets:Automatic Parameter Tuning and FasterConvergence 9.9.1 Deficiencies of Steepest Descent 9.9.2 The ``Bold Driver'' Network The Broyden-Fletcher-Goldfarb-Shanno One-StepMemoryless Quasi-Newton Method 9.9.4 Parallel Optimization 9.9.5 Experiment: the Dichotomy Problem 9.9.6 Experiment: Time Series Prediction 9.9.7 Summary 10 DIME Programming Environment DIME: Portable Software for IrregularMeshes for Parallel or SequentialComputers 10.1.1 Applications and Extensions 10.1.2 The Components of DIME 10.1.3 Domain Definition 10.1.4 Mesh Structure 10.1.5 Refinement 10.1.6 Load Balancing 10.1.7 Summary DIMEFEM: High-level Portable Irregular-Mesh Finite-Element Solver 10.2.1 Memory Allocation 10.2.2 Operations and Elements 10.2.3 Navier-Stokes Solver 10.2.4 Results 10.2.5 Summary 11 Load Balancing and Optimization 11.1 Load Balancing as an Optimization Problem 11.1.1 Load Balancing a Finite-Element Mesh 11.1.2 The Optimization Problem and Physical Analogy 11.1.3 Algorithms for Load Balancing 11.1.4 Simulated Annealing 11.1.5 Recursive Bisection 11.1.6 Eigenvalue Recursive Bisection 11.1.7 Testing Procedure 11.1.8 Test Results 11.1.9 Conclusions Applications and Extensions of the Physical Analogy 11.3 Physical Optimization An Improved Method for the Travelling Salesman Problem 11.4.1 Background on Local Search Heuristics Background on Markov Chains and SimulatedAnnealing 11.4.3 The New Algorithm-Large-Step Markov Chains 11.4.4 Results Irregular Loosely Synchronous Problems 12.1 Irregular Loosely Synchronous Problems Are Hard Simulation of the Electrosensory System of the Fish Gnathonemus petersii 12.2.1 Physical Model 12.2.2 Mathematical Theory 12.2.3 Results 12.2.4 Summary 12.3 Transonic Flow 12.3.1 Compressible Flow Algorithm 12.3.2 Adaptive Refinement 12.3.3 Examples 12.3.4 Performance 12.3.5 Summary 12.4 Tree Codes for N-body Simulations 12.4.1 Oct-Trees 12.4.2 Computing Forces 12.4.3 Parallelism in Tree Codes 12.4.4 Acquiring Locally Essential Data 12.4.5 Comments on Performance Fast Vortex Algorithm and ParallelComputing 12.5.1 Vortex Methods 12.5.2 Fast Algorithms 12.5.3 Hypercube Implementation 12.5.4 Efficiency of Parallel Implementation 12.5.5 Results 12.6 Cluster Algorithms for Spin Models 12.6.1 Monte Carlo Calculations of Spin Models 12.6.2 Cluster Algorithms 12.6.3 Parallel Cluster Algorithms 12.6.4 Self-labelling 12.6.5 Global Equivalencing 12.6.6 Other Algorithms 12.6.7 Summary 12.7 Sorting 12.7.1 The Merge Strategy 12.7.2 The Bitonic Algorithm 12.7.3 Shellsort or Diminishing Increment Algorithm 12.7.4 Quicksort or Samplesort Algorithm Hierarchical Tree-Structures as Adaptive Meshes 12.8.1 Introduction 12.8.2 Adaptive Structures 12.8.3 Tree as Grid 12.8.4 Conclusion 13 Data Parallel C and Fortran 13.1 High-Level Languages 13.1.1 High Performance Fortran Perspective Problem Architecture and Message-Passing Fortran 13.1.3 Problem Architecture and Fortran 77 A Software Tool for Data Partitioning and Distribution 13.2.1 Is Any Assistance Really Needed? 13.2.2 Overview of the Tool 13.2.3 Dependence-based Data Partitioning 13.2.4 Mapping Data to Processors Communication Analysis and Performance Improvement Transformations 13.2.6 Communication Analysis Algorithm 13.2.7 Static Performance Estimator 13.2.8 Conclusion 13.3 Fortran 90 Experiments 13.4 Optimizing Compilers by Neural Networks 13.5 ASPAR 13.5.1 Degrees of Difficulty 13.5.2 Various Parallelizing Technologies 13.5.3 The Local View 13.5.4 The ``Global'' View 13.5.5 Global Strategies 13.5.6 Dynamic Data Distribution 13.5.7 Conclusions 13.6 Coherent Parallel C 13.7 Hierarchical Memory 14 Asynchronous Applications Asynchronous Problems and a Summary of Basic Problem Classes 14.2 Melting in Two Dimensions 14.2.1 Problem Description 14.2.2 Solution Method 14.2.3 Concurrent Update Procedure 14.2.4 Performance Analysis 14.3 Computer Chess 14.3.1 Sequential Computer Chess The Evaluation Function Quiescence Searching Iterative Deepening The Hash Table The Opening The Endgame 14.3.2 Parallel Computer Chess: The Hardware 14.3.3 Parallel Alpha-Beta Pruning Analysis of Alpha-Beta Pruning Global Hash Table 14.3.4 Load Balancing 14.3.5 Speedup Measurements 14.3.6 Real-time Graphical Performance Monitoring 14.3.7 Speculation 14.3.8 Summary High-Level Asynchronous Software Systems 15.1 Asynchronous Software Paradigms MOOS II: An Operating System forDynamic Load Balancing on the iPSC/1 15.2.1 Design of MOOSE 15.2.2 Dynamic Load-Balancing Support 15.2.3 What We Learned 15.3 Time Warp 16 The Zipcode Message-Passing System 16.1 Overview of Zipcode 16.2 Low-Level Primitives 16.2.1 CE/RK Overview 16.2.2 Interface with the CE/RK system 16.2.3 CE Functions CE Programs 16.2.4 RK Calls 16.2.5 Zipcode Calls Zipcode Class-Independent Calls Mailer Creation Predefined Mailer Classes Y-Class Z-Class L-Class G1-Class G2-Class Letter-Generating Primitives Letter-Consuming Primitives G3-Class 16.3 High-Level Primitives 16.3.1 Invoices 16.3.2 Packing and Unpacking 16.3.3 The Packed-Message Functions 16.3.4 Fortran Interface 16.4 Details of Execution 16.4.1 Initialization/Termination 16.4.2 Process Creation/Destruction 16.5 Conclusions MOVIE - Multitasking Object-oriented Visual Interactive Environment 17.1 Introduction 17.1.1 The Beginning 17.1.2 Towards the MOVIE System 17.1.3 Current Status and Outlook 17.2 System Overview 17.2.1 The MOVIE System in a Nutshell 17.2.2 MovieScript as Virtual Machine Language 17.2.3 Data-Parallel Computing 17.2.4 Model for MIMD-parallelism 17.2.5 Distributed Computing 17.2.6 Object Orientation 17.2.7 Integrated Visualization Model DPS/NeWS X/Motif/OpenLook AVS/Explorer 3D MOVIE Integration 17.2.8 ``In Large'' Extensibility Model 17.2.9 CASE Tools MetaDictionary C Language Naming Conventions MetaIndex Makefile Model Documentation Model 17.2.10 Planned MOVIE Applications Machine Vision Neural Networks Databases Global Change High Energy Physics Data Analysis Expert Systems Command and Control Virtual Reality 17.3 Map Separates 17.3.1 Problem Specification 17.3.2 Test Case 17.3.3 Segmentation via RGB Clustering 17.3.4 Comparison with JPL Neural Net Results 17.3.5 Edge Detection via Zero Crossing 17.3.6 Towards the Map Expert System 17.3.7 Summary The Ultimate User Interface: VirtualReality 17.4.1 Overall Assessment 17.4.2 Markets and Application Areas 17.4.3 VR at Syracuse University 17.4.4 MOVIE as VR Operating Shell Complex System Simulation and Analysis 18.1 MetaProblems and MetaSoftware 18.1.1 Applications 18.1.2 Asynchronous versus Loosely Synchronous? 18.1.3 Software for Compound Problems ISIS: An Interactive Seismic ImagingSystem 18.2.1 Introduction 18.2.2 Concepts of Interactive Imaging 18.2.3 Geologist-As-Analyst 18.2.4 Why Interactive Imaging? 18.2.5 System Design 18.2.6 Performance Considerations 18.2.7 Trace Manager 18.2.8 Display Manager 18.2.9 User Interface 18.2.10 Computation 18.2.11 Prototype System 18.2.12 Conclusions 18.3 Parallel Simulations that Emulate Function 18.3.1 The Basic Simulation Structure 18.3.2 The Run-Time Environment-the Centaur Operating System 18.3.3 SDI Simulation Evolution Simulation Framework and Synchronization Control 18.4 Multitarget Tracking 18.4.1 Nature of the Problem 18.4.2 Tracking Techniques Single-Target Tracking Multitarget Tracking 18.4.3 Algorithm Overview 18.4.4 Two-dimensional Mono Tracking Two-dimensional Track Extensions Two-dimensional Report Formation Track Initialization 18.4.5 Three-dimensional Tracking Track Extension Associations 19 Parallel Computing in Industry 19.1 Motivation 19.2 Examples of Industrial Applications 20 Computational Science 20.1 Lessons 20.2 Computational Science Selected Biographic Information References Index About this document ...