The routines in ScaLAPACK are classified into three broad categories:
In general, no input error-checking is performed in the auxiliary routines. The exception to this rule is for the auxiliary routines that are Level 2 equivalents of computational routines (e.g., PxGETF2, PxGEQR2, PxORMR2, PxORM2R). For these routines, local input error-checking is performed.
Both driver routines and computational routines are fully described in this users guide, but not the auxiliary routines. A list of the auxiliary routines, with brief descriptions of their functions, is given in Appendix A.2. LAPACK auxiliary routines are also used whenever possible for local computation. Refer to the LAPACK Users' Guide [3] for details.
The PBLAS, BLAS, BLACS, and LAPACK are strictly-speaking not part of the ScaLAPACK routines. However, the ScaLAPACK routines make frequent calls to these packages.
ScaLAPACK also provides two matrix redistribution/copy routines for each data type [107, 49, 106]. These routines provide a truly general copy from any block cyclicly distributed (sub)matrix to any other block cyclicly distributed (sub)matrix. These routines are the only ones in the entire ScaLAPACK library which provide inter-context operations. Because of the generality of these routines, they may be used for many operations not usually associated with copy routines. For instance, they may be used to a take a matrix on one process and distribute it across a process grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for instance, this routine can be used to move the matrix from the workstation to the supercomputer and back. In ScaLAPACK, these routines are called to copy matrices from a two-dimensional process grid to a one-dimensional process grid. They can be used to redistribute matrices so that distributions providing maximal performance can be used by various component libraries, as well. For further details on these routines, refer to Appendix A.3.