next up previous contents index
Next: Index of ScaLAPACK Routines Up: Troubleshooting Previous: System Error Messages

Poor Performance

ScaLAPACK ultimately relies on an efficient implementation of the BLAS  and the data distribution for load balance. Refer to Chapter 5.

To avoid poor performance  from ScaLAPACK routines, note the following recommendations :

One should use machine-specific optimized BLAS if they are available. Many manufacturers and research institutions have developed, or are developing, efficient versions of the BLAS for particular machines. The BLAS enable LAPACK and ScaLAPACK routines to achieve high performance with transportable software. Users are urged to determine whether such an implementation of the BLAS exists for their platform. When such an optimized implementation of the BLAS is available, it should be used to ensure optimal performance. If such a machine-specific implementation of the BLAS does not exist for a particular platform, one should consider installing a publicly available set of BLAS that requires only an efficient implementation of the matrix-matrix multiply BLAS routine xGEMM. Examples of such implementations are [35, 90]. A machine-specific and efficient implementation of the routine GEMM can be automatically generated by publicly available software such as [16]. Although a reference implementation of the Fortran77 BLAS is available from the blas directory on netlib, these routines are not expected to perform as well as a specially tuned implementation on most high-performance computers - on some machines it may give much worse performance - but it allows users to run LAPACK and ScaLAPACK software on machines that do not offer any other implementation of the BLAS.

With the few exceptions mentioned in section 5.2.3, the performance achieved by the BLACS should be close to the one of the underlying message-passing library it is calling. Since publicly available implementations of the BLACS exist for a range of native message-passing libraries such as NX for the Intel supercomputers and MPL for the IBM SP series, as well as more generic interfaces such as PVM and MPI, users should select the BLACS implementation that is based on the most efficient message-passing library available. Some vendors, such as Cray and IBM, supply an optimized implementation of the BLACS for their systems. Users are urged to rely on these BLACS libraries whenever possible.

LWORK tex2html_wrap_inline14966 WORK(1):
In some ScaLAPACK eigenvalue routines, such as the symmetric eigenproblems (PxSYEV and PxSYEVX/PxHEEVX) and the generalized symmetric eigenproblem (PxSYGVX/PxHEGVX), a larger value of LWORK can guarantee the orthogonality of the returned eigenvectors at the risk of potentially degraded performance of the algorithm. The minimum amount of workspace required is returned in the first element of the work array, but a larger amount of workspace can allow for additional orthogonalization if desired by the user. Refer to section 5.3.6 and the leading comments of the source code for complete details.

next up previous contents index
Next: Index of ScaLAPACK Routines Up: Troubleshooting Previous: System Error Messages

Susan Blackford
Tue May 13 09:21:01 EDT 1997