Next:
Foreword
Up:
Parallel Computing Works
Previous:
Parallel Computing Works
Contents
Foreword
1 Introduction
1.1 Introduction
1.2 The National Vision for ParallelComputation
1.3 Caltech Concurrent Computation Program
1.4 How Parallel Computing Works
2 Technical Backdrop
2.1 Introduction
2.2 Hardware Trends
2.2.1 Parallel Scientific Computers Before 1980
2.2.2 Early 1980s
2.2.3 Birth of the Hypercube
2.2.4 Mid-1980s
2.2.5 Late 1980s
2.2.6 Parallel Systems-1992
2.3 Software
2.3.1 Languages and Compilers
2.3.2 Tools
2.4 Summary
3 A Methodology for Computation
3.1 Introduction
3.2 The Process of Computation and ComplexSystems
3.3 Examples of Complex Systems and TheirSpace-Time Structure
3.4 The Temporal Properties of ComplexSystems
3.5 Spatial Properties of Complex Systems
3.6 Compound Complex Systems
3.7 Mapping Complex Systems
3.8 Parallel Computing Works?
4 Synchronous Applications I
4.1 QCD and the Beginning of CP
4.2 Synchronous Applications
4.3 Quantum Chromodynamics
4.3.1 Introduction
4.3.2 Monte Carlo
4.3.3 QCD
4.3.4 Lattice QCD
4.3.5 Concurrent QCD Machines
4.3.6 QCD on the Caltech Hypercubes
4.3.7 QCD on the Connection Machine
4.3.8 Status and Prospects
4.4 Spin Models
4.4.1 Introduction
4.4.2 Ising Model
4.4.3 Potts Model
4.4.4 XY Model
4.4.5 O(3) Model
4.5 An Automata Model of Granular Materials
4.5.1 Introduction
4.5.2 Comparison to Particle Dynamics Models
4.5.3 Comparison to Lattice Gas Models
4.5.4 The Rules for the Lattice Grain Model
4.5.5 Implementation on a Parallel Computer
4.5.6 Simulations
4.5.7 Conclusion
5 Express and CrOS - Loosely Synchronous Message Passing
5.1 Multicomputer Operating Systems
5.2 A ``Packet'' History of Message-passingSystems
5.2.1 Prehistory
5.2.2 Application-driven Development
5.2.3 Collective Communication
5.2.4 Automated Decomposition-whoami
5.2.5 ``Melting''-a Non-crystalline Problem
5.2.6 The Mark III
5.2.7 Host Programs
5.2.8 A Ray Tracer-and an ``Operating System''
5.2.9 The Crystal Router
5.2.10 Portability
5.2.11 Express
5.2.12 Other Message-passing Systems
5.2.13 What Did We Learn?
5.2.14 Conclusions
5.3 Parallel Debugging
5.3.1 Introduction and History
5.3.2 Designing a Parallel Debugger
5.3.3 Conclusions
5.4 Parallel Profiling
5.4.1 Missing a Point
5.4.2 Visualization
5.4.3 Goals in Performance Analysis
5.4.4 Overhead Analysis
5.4.5 Event Tracing
5.4.6 Data Distribution Analysis
5.4.7 CPU Usage Analysis
5.4.8 Why So Many Separate Tools?
5.4.9 Conclusions
6 Synchronous Applications II
6.1 Computational Issues in SynchronousProblems
6.2 Convectively-Dominated Flows and theFlux-Corrected Transport Technique
6.2.1 An Overview of the FCT Technique
6.2.2 Mathematics and the FCT Algorithm
6.2.3 Parallel Issues
6.2.4 Example Problem
6.2.5 Performance and Results
6.2.6 Summary
6.3 Magnetism in the High-TemperatureSuperconductor Materials
6.3.1 Introduction
6.3.2 The Computational Algorithm
6.3.3 Parallel Implementation and Performance
6.3.4 Physics Results
6.3.5 Conclusions
6.4 Phase Transitions in Two-dimensionalQuantum Spin Systems
6.4.1 The case of : Antiferromagnetic Transitions
Origin of the Interaction
Simulation Results
Theoretical Interpretation
Comparison with Experiments
6.4.2The Case of : Quantum XY Model and theTopological Transition
A Brief History
Evidence for the Transition
Implications
6.5 A Hierarchical Scheme for SurfaceReconstruction and Discontinuity Detection
6.5.1 Multigrid Method with Discontinuities
6.5.2 Interacting Line Processes
6.5.3 Generic Look-up Table and Specific Parametrization
6.5.4 Pyramid on a Two-Dimensional Mesh of Processors
6.5.5 Results for Orientation Constraints
6.5.6 Results for Depth Constraints
6.5.7 Conclusions
6.6 Character Recognition by Neural Nets
6.6.1 MLP in General
6.6.2 Character Recognition using MLP
6.6.3 The Multiscale Technique
6.6.4 Results
6.6.5 Comments and Variants on the Method
6.7 An Adaptive Multiscale Scheme for Real-Time Motion Field Estimation
6.7.1 Errors in Computing the Motion Field
6.7.2 Adaptive Multiscale Scheme on a Multicomputer
6.7.3 Conclusions
6.8 Collective Stereopsis
7 Independent Parallelism
7.1 Embarrassingly Parallel Problem Structure
7.2 Dynamically Triangulated Random Surfaces
7.2.1 Introduction
7.2.2 Discretized Strings
7.2.3 Computational Aspects
7.2.4 Performance of String Program
7.2.5 Conclusion
7.3 Numerical Study of High-T Spin Systems
7.4 Statistical Gravitational Lensing
7.5 Parallel Random Number Generators
Parallel Computing in Neurobiology: The GENESIS Project
7.6.1 What Is Computational Neurobiology?
7.6.2 Parallel Computers?
7.6.3 Problems with Most Present Day Parallel Computers
7.6.4 What is GENESIS?
7.6.5 Task Farming
7.6.6 Distributed Modelling via the Postmaster Element
8 Full Matrix Algorithms and Their Applications
8.1 Full and Banded Matrix Algorithms
8.1.1 Matrix Decomposition
8.1.2 Basic Matrix Arithmetic
8.1.3 Matrix Multiplication for Banded Matrices
8.1.4 Systems of Linear Equations
8.1.5 The Gauss-Jordan Method
8.1.6 Other Matrix Algorithms
8.1.7 Concurrent Linear Algebra Libraries
8.1.8 Problem Structure
8.1.9 Conclusions
8.2 Quantum Mechanical Reactive ScatteringUsing a High-Performance ParallelComputer
8.2.1 Introduction
8.2.2 Methodology
8.2.3 Parallel Algorithm
8.2.4 Results and Discussion
8.3 Studies of Electron-Molecule Collisions onDistributed-Memory Parallel Computers
8.3.1 Introduction
8.3.2 The SMC Method and Its Implementation
8.3.3 Parallel Implementation
8.3.4 Performance
Mark IIIfp
Intel Machines
8.3.5 Selected Results
8.3.6 Conclusion
9 Loosely Synchronous Problems
9.1 Problem Structure
9.2 Geomorphology by Micromechanical Simulations
9.3 Plasma Particle-in-Cell Simulation of anElectron Beam Plasma Instability
9.3.1 Introduction
9.3.2 GCPIC Algorithm
9.3.3 Electron Beam Plasma Instability
Performance Results for One-DimensionalElectrostatic Code
9.3.5 One-Dimensional Electromagnetic Code
9.3.6 Dynamic Load Balancing
9.3.7 Summary
9.4 Computational Electromagnetics
9.5 LU Factorization of Sparse, Unsymmetric Jacobian Matrices
9.5.1 Introduction
9.5.2 Design Overview
9.5.3 Reduced-Communication Pivoting
9.5.4 New Data Distributions
9.5.5 Performance Versus Scattering
9.5.6 Performance
Order 13040 Example
Order 2500 Example
9.5.7 Conclusions
9.6 Concurrent DASSL Applied to Dynamic Distillation Column Simulation
9.6.1 Introduction
9.6.2 Mathematical Formulation
9.6.3 proto-Cdyn - Simulation Layer
Template Structure
Problem Preformulation
9.6.4 Concurrent Formulation
Overview
Single Integration Step
The Integration Computations
Single Residuals
Jacobian Computation
Exploitation of Latency
The LU Factorization
Forward- and Back-solving Steps
Residual Communication
9.6.5 Chemical Engineering Example
9.6.6 Conclusions
9.7 Adaptive Multigrid
9.7.1 Introduction
9.7.2 The Basic Algorithm
9.7.3 The Adaptive Algorithm
9.7.4 The Concurrent Algorithm
9.7.5 Summary
9.8 Munkres Algorithm for Assignment
9.8.1 Introduction
9.8.2 The Sequential Algorithm
9.8.3 The Concurrent Algorithm
9.9 Optimization Methods for Neural Nets:Automatic Parameter Tuning and FasterConvergence
9.9.1 Deficiencies of Steepest Descent
9.9.2 The ``Bold Driver'' Network
The Broyden-Fletcher-Goldfarb-Shanno One-StepMemoryless Quasi-Newton Method
9.9.4 Parallel Optimization
9.9.5 Experiment: the Dichotomy Problem
9.9.6 Experiment: Time Series Prediction
9.9.7 Summary
10 DIME Programming Environment
10.1 DIME: Portable Software for IrregularMeshes for Parallel or SequentialComputers
10.1.1 Applications and Extensions
10.1.2 The Components of DIME
10.1.3 Domain Definition
10.1.4 Mesh Structure
10.1.5 Refinement
10.1.6 Load Balancing
10.1.7 Summary
10.2 DIMEFEM: High-level Portable Irregular-Mesh Finite-Element Solver
10.2.1 Memory Allocation
10.2.2 Operations and Elements
10.2.3 Navier-Stokes Solver
10.2.4 Results
10.2.5 Summary
11 Load Balancing and Optimization
11.1 Load Balancing as an Optimization Problem
11.1.1 Load Balancing a Finite-Element Mesh
11.1.2 The Optimization Problem and Physical Analogy
11.1.3 Algorithms for Load Balancing
11.1.4 Simulated Annealing
11.1.5 Recursive Bisection
11.1.6 Eigenvalue Recursive Bisection
11.1.7 Testing Procedure
11.1.8 Test Results
11.1.9 Conclusions
11.2 Applications and Extensions of the Physical Analogy
11.3 Physical Optimization
11.4 An Improved Method for the Travelling Salesman Problem
11.4.1 Background on Local Search Heuristics
11.4.2 Background on Markov Chains and SimulatedAnnealing
11.4.3 The New Algorithm-Large-Step Markov Chains
11.4.4 Results
12 Irregular Loosely Synchronous Problems
12.1 Irregular Loosely Synchronous Problems Are Hard
12.2 Simulation of the Electrosensory System of the Fish Gnathonemus petersii
12.2.1 Physical Model
12.2.2 Mathematical Theory
12.2.3 Results
12.2.4 Summary
12.3 Transonic Flow
12.3.1 Compressible Flow Algorithm
12.3.2 Adaptive Refinement
12.3.3 Examples
12.3.4 Performance
12.3.5 Summary
12.4 Tree Codes for N-body Simulations
12.4.1 Oct-Trees
12.4.2 Computing Forces
12.4.3 Parallelism in Tree Codes
12.4.4 Acquiring Locally Essential Data
12.4.5 Comments on Performance
12.5 Fast Vortex Algorithm and ParallelComputing
12.5.1 Vortex Methods
12.5.2 Fast Algorithms
12.5.3 Hypercube Implementation
12.5.4 Efficiency of Parallel Implementation
12.5.5 Results
12.6 Cluster Algorithms for Spin Models
12.6.1 Monte Carlo Calculations of Spin Models
12.6.2 Cluster Algorithms
12.6.3 Parallel Cluster Algorithms
12.6.4 Self-labelling
12.6.5 Global Equivalencing
12.6.6 Other Algorithms
12.6.7 Summary
12.7 Sorting
12.7.1 The Merge Strategy
12.7.2 The Bitonic Algorithm
12.7.3 Shellsort or Diminishing Increment Algorithm
12.7.4 Quicksort or Samplesort Algorithm
12.8 Hierarchical Tree-Structures as Adaptive Meshes
12.8.1 Introduction
12.8.2 Adaptive Structures
12.8.3 Tree as Grid
12.8.4 Conclusion
13 Data Parallel C and Fortran
13.1 High-Level Languages
13.1.1 High Performance Fortran Perspective
13.1.2 Problem Architecture and Message-Passing Fortran
13.1.3 Problem Architecture and Fortran 77
13.2 A Software Tool for Data Partitioning and Distribution
13.2.1 Is Any Assistance Really Needed?
13.2.2 Overview of the Tool
13.2.3 Dependence-based Data Partitioning
13.2.4 Mapping Data to Processors
13.2.5 Communication Analysis and Performance Improvement Transformations
13.2.6 Communication Analysis Algorithm
13.2.7 Static Performance Estimator
13.2.8 Conclusion
13.3 Fortran 90 Experiments
13.4 Optimizing Compilers by Neural Networks
13.5 ASPAR
13.5.1 Degrees of Difficulty
13.5.2 Various Parallelizing Technologies
13.5.3 The Local View
13.5.4 The ``Global'' View
13.5.5 Global Strategies
13.5.6 Dynamic Data Distribution
13.5.7 Conclusions
13.6 Coherent Parallel C
13.7 Hierarchical Memory
14 Asynchronous Applications
14.1 Asynchronous Problems and a Summary of Basic Problem Classes
14.2 Melting in Two Dimensions
14.2.1 Problem Description
14.2.2 Solution Method
14.2.3 Concurrent Update Procedure
14.2.4 Performance Analysis
14.3 Computer Chess
14.3.1 Sequential Computer Chess
The Evaluation Function
Quiescence Searching
Iterative Deepening
The Hash Table
The Opening
The Endgame
14.3.2 Parallel Computer Chess: The Hardware
14.3.3 Parallel Alpha-Beta Pruning
Analysis of Alpha-Beta Pruning
Global Hash Table
14.3.4 Load Balancing
14.3.5 Speedup Measurements
14.3.6 Real-time Graphical Performance Monitoring
14.3.7 Speculation
14.3.8 Summary
15 High-Level Asynchronous Software Systems
15.1 Asynchronous Software Paradigms
15.2 MOOS II: An Operating System forDynamic Load Balancing on the iPSC/1
15.2.1 Design of MOOSE
15.2.2 Dynamic Load-Balancing Support
15.2.3 What We Learned
15.3 Time Warp
16 The Zipcode Message-Passing System
16.1 Overview of Zipcode
16.2 Low-Level Primitives
16.2.1 CE/RK Overview
16.2.2 Interface with the CE/RK system
16.2.3 CE Functions
CE Programs
16.2.4 RK Calls
16.2.5 Zipcode Calls
Zipcode Class-Independent Calls
Mailer Creation
Predefined Mailer Classes
Y-Class
Z-Class
L-Class
G1-Class
G2-Class
Letter-Generating Primitives
Letter-Consuming Primitives
G3-Class
16.3 High-Level Primitives
16.3.1 Invoices
16.3.2 Packing and Unpacking
16.3.3 The Packed-Message Functions
16.3.4 Fortran Interface
16.4 Details of Execution
16.4.1 Initialization/Termination
16.4.2 Process Creation/Destruction
16.5 Conclusions
17 MOVIE - Multitasking Object-oriented Visual Interactive Environment
17.1 Introduction
17.1.1 The Beginning
17.1.2 Towards the MOVIE System
17.1.3 Current Status and Outlook
17.2 System Overview
17.2.1 The MOVIE System in a Nutshell
17.2.2 MovieScript as Virtual Machine Language
17.2.3 Data-Parallel Computing
17.2.4 Model for MIMD-parallelism
17.2.5 Distributed Computing
17.2.6 Object Orientation
17.2.7 Integrated Visualization Model
DPS/NeWS
X/Motif/OpenLook
AVS/Explorer
3D MOVIE
Integration
17.2.8 ``In Large'' Extensibility Model
17.2.9 CASE Tools
MetaDictionary
C Language Naming Conventions
MetaIndex
Makefile Model
Documentation Model
17.2.10 Planned MOVIE Applications
Machine Vision
Neural Networks
Databases
Global Change
High Energy Physics Data Analysis
Expert Systems
Command and Control
Virtual Reality
17.3 Map Separates
17.3.1 Problem Specification
17.3.2 Test Case
17.3.3 Segmentation via RGB Clustering
17.3.4 Comparison with JPL Neural Net Results
17.3.5 Edge Detection via Zero Crossing
17.3.6 Towards the Map Expert System
17.3.7 Summary
17.4 The Ultimate User Interface: VirtualReality
17.4.1 Overall Assessment
17.4.2 Markets and Application Areas
17.4.3 VR at Syracuse University
17.4.4 MOVIE as VR Operating Shell
18 Complex System Simulation and Analysis
18.1 MetaProblems and MetaSoftware
18.1.1 Applications
18.1.2 Asynchronous versus Loosely Synchronous?
18.1.3 Software for Compound Problems
18.2 ISIS: An Interactive Seismic ImagingSystem
18.2.1 Introduction
18.2.2 Concepts of Interactive Imaging
18.2.3 Geologist-As-Analyst
18.2.4 Why Interactive Imaging?
18.2.5 System Design
18.2.6 Performance Considerations
18.2.7 Trace Manager
18.2.8 Display Manager
18.2.9 User Interface
18.2.10 Computation
18.2.11 Prototype System
18.2.12 Conclusions
18.3 Parallel Simulations that Emulate Function
18.3.1 The Basic Simulation Structure
18.3.2 The Run-Time Environment-the Centaur Operating System
18.3.3 SDI Simulation Evolution
18.3.4 Simulation Framework and Synchronization Control
18.4 Multitarget Tracking
18.4.1 Nature of the Problem
18.4.2 Tracking Techniques
Single-Target Tracking
Multitarget Tracking
18.4.3 Algorithm Overview
18.4.4 Two-dimensional Mono Tracking
Two-dimensional Track Extensions
Two-dimensional Report Formation
Track Initialization
18.4.5 Three-dimensional Tracking
Track Extension Associations
19 Parallel Computing in Industry
19.1 Motivation
19.2 Examples of Industrial Applications
20 Computational Science
20.1 Lessons
20.2 Computational Science
A Selected Biographic Information
B References
Index
About this document ...
Guy Robinson
Wed Mar 1 10:19:35 EST 1995