ScaLAPACK is designed to give high efficiency on MIMD distributed memory concurrent supercomputers, such as the Intel Paragon, IBM SP series, and the Cray T3 series. In addition, the software is designed so that it can be used with clusters of workstations through a networked environment and with a heterogeneous computing environment via PVM or MPI. Indeed, ScaLAPACK can run on any machine that supports either PVM or MPI . See Chapter 5 for some examples of the performance achieved by ScaLAPACK routines.

The ScaLAPACK strategy for combining efficiency with portability is to construct the software so that as much as possible of the computation is performed by calls to the Parallel Basic Linear Algebra Subprograms (PBLAS). The PBLAS [26, 104] perform global computation by relying on the Basic Linear Algebra Subprograms (BLAS) [93, 59, 57] for local computation and the Basic Linear Algebra Communication Subprograms (BLACS) [54, 113] for communication.

The efficiency of ScaLAPACK software depends on the use of block-partitioned algorithms and on efficient implementations of the BLAS and the BLACS being provided by computer vendors (and others) for their machines. Thus, the BLAS and the BLACS form a low-level interface between ScaLAPACK software and different machine architectures. Above this level, all of the ScaLAPACK software is portable.

The BLAS, PBLAS, and the BLACS
are not, strictly speaking, part of ScaLAPACK. C code for the PBLAS is
included in the ScaLAPACK distribution. Since the performance of the package
depends upon the BLAS and the BLACS being implemented efficiently,
we have not included this software with the ScaLAPACK distribution.
A machine-specific implementation of the BLAS and the BLACS should
be used. If a machine-optimized
version of the BLAS is not available, a Fortran 77 reference implementation of
the BLAS is available from *netlib* (see section 1.5).
This code constitutes
the ``model implementation'' [58, 56].
The model implementation of the BLAS is not expected
to perform as well as a specially tuned implementation
on most high-performance computers -- on some machines it may give *much*
worse performance -- but it
allows users to run ScaLAPACK codes on machines that do not offer any other
implementation of the BLAS.

If a vendor-optimized version of the BLACS is not available for a specific
architecture, efficiently ported versions of the BLACS are available
on *netlib*. Currently, the BLACS have been efficiently
ported on machine-specific message-passing libraries
such as the IBM (MPL)
and
Intel (NX) message-passing
libraries, as well as more
generic interfaces such
as PVM and
MPI . The
BLACS
overhead has been shown
to be negligible in
[54].
Refer to the URL for the *blacs* directory on *netlib*
for more details:

http://www.netlib.org/blacs/index.html

Tue May 13 09:21:01 EDT 1997