It is possible to generate a huge amount of data related to parallel performance. One can vary the size of the problem, and/or the number of processors. Performance can be related to various problem parameters, for example, the nonuniformity of the particle distribution. Parallel overheads can be identified and attributed to communication, load imbalance, synchronization, or additional calculation in the parallel code [Salmon:90a]. All these provide useful diagnostics and can be used to predict performance on a variety of machines. However, they also tend to obscure the fact that the ultimate goal of parallel computation is to perform simulations larger or faster than would otherwise be possible.
Rather than analyze a large number of statistics, we restrict ourselves to the following ``bald'' facts.
In 1992, the 512-processor Intel Delta at Caltech evolved two astrophysical simulations with 17.15 million bodies for approximately 600 time steps. The machine ran at an aggregate speed exceeding 5000 MFLOPS/sec. The systems under study were simulated regions of the universe 100 Mpc (megaparsec) and 25 Mpc in diameter, which were initialized with random-density fluctuations consistent with the ``cold dark matter'' hypothesis and the recent results on the anisotropy of the microwave background radiation. The data from these runs exceeded 25 Gbytes, and is analyzed in [Zurek:93a]. Salmon and Warren were recipients of the 1992 Gordon Bell Price for performance in practical parallel processing research.