The ScaLAPACK software library, scheduled for completion by the end of 1994, will extend the LAPACK library to run scalably on MIMD, distributed memory, concurrent computers [10, 11]. For such machines the memory hierarchy includes the off-processor memory of other processors, in addition to the hierarchy of registers, cache, and local memory on each processor. Like LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. The fundamental building blocks of the ScaLAPACK library are distributed memory versions of the Level 2 and Level 3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) [16, 25] for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the distributed BLAS and the BLACS, so the source code of the top software layer of ScaLAPACK looks very similar to that of LAPACK.
We envisage a number of user interfaces to ScaLAPACK. Initially, the interface
will be similar to that of LAPACK, with some additional arguments passed to
each routine to specify the data layout. Once this is in place, we intend
to modify the interface so the arguments to each ScaLAPACK routine are the
same as in LAPACK. This will require information about the data distribution
of each matrix and vector to be hidden from the user.
We are also experimenting with object-based interfaces
for LAPACK and ScaLAPACK, with the goal of developing interfaces
compatible with Fortran 90
[10] and C++
[23].