Following the initial release of LAPACK and the emerging importance of distributed memory computing , work began on adapting LAPACK to distributed-memory architectures. Since porting software efficiently from one distributed-memory architecture to another is a challenging task, this work is an effort to establish standards for library development in the varied world of distributed-memory computing.
ScaLAPACK is an acronym for Scalable Linear Algebra PACKage, or Scalable LAPACK. As in LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. (For distributed-memory machines, the memory hierarchy includes the off-processor memory of other processors, in addition to the hierarchy of registers, cache, and local memory on each processor.) The fundamental building block of the ScaLAPACK library is a distributed-memory version of the Level 1, 2, and 3 BLAS, called the PBLAS (Parallel BLAS). The PBLAS are in turn built on the BLAS for computation on single nodes and on a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. For optimal performance, it is necessary, first, that the BLAS be implemented efficiently on the target machine, and second, that an efficient version of the BLACS be available.
Versions of the BLACS exist for both MPI and PVM, as well as versions for the Intel series (NX), IBM SP series (MPL), and Thinking Machines CM-5 (CMMD). A vendor-optimized version of the BLACS is available for the Cray T3 series. Thus, ScaLAPACK is portable on any computer or network of computers that supports MPI or PVM (as well as the aforementioned native message-passing protocols).
Most of the ScaLAPACK code is written in standard Fortran 77; the PBLAS and the BLACS are written in C, but with Fortran 77 interfaces.
The first ScaLAPACK software was written in 1989-1990, and the appearance of the code has undergone many changes since then in our pursuit to resemble and enable code reuse from LAPACK.
The first public release (version 1.0) of ScaLAPACK occurred on February 28, 1995, and subsequent releases occurred in 1996.
The ScaLAPACK library is only one facet of the ``ScaLAPACK Project,'' which is a collaborative effort involving several institutions:
For further information on any of the related ScaLAPACK projects, please refer to the scalapack index on netlib:
http://www.netlib.org/scalapack/index.html
This users guide describes version 1.5 of the dense and band matrix software package (ScaLAPACK).
The University of Tennessee, Knoxville, provided the routines for the solution of dense, band, and tridiagonal linear systems of equations, condition estimation and iterative refinement, for LU and Cholesky factorization, matrix inversion, full-rank linear least squares problems, orthogonal and generalized orthogonal factorizations, orthogonal transformation routines, reductions to upper Hessenberg, bidiagonal and tridiagonal form, and reduction of a symmetric-definite generalized eigenproblem to standard form. And finally, the BLACS, the PBLAS, and the HPF wrappers were also written at the University of Tennessee, Knoxville.
The University of California, Berkeley, provided the routines for the symmetric and generalized symmetric eigenproblem and the singular value decomposition.
Greg Henry at Intel Corporation provided the routines for the nonsymmetric eigenproblem.
Oak Ridge National Laboratory provided the out-of-core linear solvers for LU, Cholesky, and QR factorizations.
ScaLAPACK has been incorporated into several commercial packages, including the NAG Parallel Library, IBM Parallel ESSL, and Cray LIBSCI, and is being integrated into the VNI IMSL Numerical Library, as well as software libraries for Fujitsu, Hewlett-Packard/Convex, Hitachi, and NEC. Additional information can be found on the respective Web pages:
http://www.nag.co.uk:80/numeric/FM.html
http://www.rs6000.ibm.com/software/sp_products/esslpara.html
http://www.cray.com/PUBLIC/product-info/sw/PE/LibSci.html
http://www.sgi.com/Products/hardware/Power/ch_complib.html
http://www.vni.com/products/imsl/index.html
A number of technical reports have been written during the development of ScaLAPACK and published as LAPACK Working Notes by the University of Tennessee. Refer to the following URL for a complete set of working notes:
http://www.netlib.org/lapack/lawns/index.htmlMany of these reports subsequently appeared as journal articles. The Bibliography gives the most recent published reference.
As the distributed-memory version of LAPACK, ScaLAPACK has drawn heavily on the software and documentation standards set by LAPACK. The test and timing software for the Level 2 and 3 BLAS was used as a model for the PBLAS test and timing software, and the ScaLAPACK test suite was patterned after the LAPACK test suite. Because of the large amount of software, all BLACS, PBLAS, and ScaLAPACK routines are maintained in basefiles whereby the codes can be re-extracted as needed. Final formatting of the software was done using Toolpack/1 [105].
We have tried to be consistent with our documentation and coding style throughout ScaLAPACK in the hope that it will serve as a model for other distributed-memory software development efforts. ScaLAPACK has been designed as a source of building blocks for larger parallel applications.
The development of ScaLAPACK was supported in part by National Science Foundation Grant ASC-9005933; by the Defense Advanced Research Projects Agency under contract DAAH04-95-1-0077, administered by the Army Research Office; by the Division of Mathematical, Information, and Computational Sciences, of the U.S. Department of Energy, under Contract DE-AC05-96OR22464; and by the National Science Foundation Science and Technology Center Cooperative Agreement CCR-8809615.
The performance results presented in this book were obtained using computer resources at various sites:
The cover of this book was designed by Andy Cleary at Lawrence Livermore National Laboratory.
We acknowledge with gratitude the support that we have received from the following organizations, and the help of individual members of their staff: Cornell Theory Center, Cray Research, a Silicon Graphics Company, IBM (Parallel ESSL Development and Research), Lawrence Berkeley National Laboratory, National Energy Research Scientific Computing Center (NERSC), Maui High Performance Computer Center, Minnesota Supercomputing Center, NAG Ltd., and Oak Ridge National Laboratory Center for Computational Sciences (CCS).
We also thank the many, many people who have contributed code, criticism, ideas and encouragement. We especially acknowledge the contributions of Mark Adams, Peter Arbenz, Scott Betts, Shirley Browne, Henri Casanova, Soumen Chakrabarti, Mishi Derakhshan, Frederic Desprez, Brett Ellis, Ray Fellers, Markus Hegland, Nick Higham, Adolfy Hoisie, Velvel Kahan, Xiaoye Li, Bill Magro, Osni Marques, Paul McMahan, Caroline Papadopoulos, Beresford Parlett, Loic Prylli, Yves Robert, Howard Robinson, Tom Rowan, Shilpa Singhal, Françoise Tisseur, Bernard Tourancheau, Anne Trefethen, Robert van de Geijn, and Andrey Zege.
We express appreciation to all those who helped in the preparation of this work, in particular to Gail Pieper for her tireless efforts in proofreading the draft and improving the quality of the presentation.
Finally, we thank all the test sites that received several test releases of the ScaLAPACK software and that ran an extensive series of test programs for us.