An important performance metric is
*parallel efficiency*.
Parallel efficiency, , for a problem of size on
processors is defined in the usual way [13] as

where is the runtime of the parallel algorithm, and
is the runtime of the best sequential algorithm.
An implementation is said to be *scalable* if the efficiency is
an increasing function of , the problem size per processor
(in the case of dense matrix computations, , the number
of words in the input).

We will also measure the *performance* of our algorithm in
Megaflops/sec (or Gigaflops/sec). This is appropriate for large
dense linear algebra computations, since floating point dominates
communication. For a scalable algorithm with held fixed,
we expect the performance to be proportional to .

We seek to increase the performance of our algorithms by reducing overhead due to load imbalance, data movement, and algorithm restructuring. The way the data are distributed over the memory hierarchy of a computer is of fundamental importance to these factors. We present in this section extensive performance results on various platforms for the ScaLAPACK factorization and reductions routines. Performance data for the symmetric eigensolver (PDSYEVX) are presented in [5].

Fri Mar 31 13:01:26 EST 1995