Obtaining High Performance with ScaLAPACK Codes

We suggest the following approach to obtain high performance with ScaLAPACK codes:

The standard data distribution will typically achieve 25-50% of the peak performance possible (depending in part on how many processors are ignored, i.e., the difference between tex2html_wrap_inline17061 and tex2html_wrap_inline17063). We do not recommend experimenting with different data distributions until performance that is acceptable (or nearly so) has been achieved. If each individual node requires a block size larger than 64 to achieve near-peak performance on local matrix-matrix multiply, the block size may have to be increased. This step is unlikely, however, unless the computer has a shared-memory multiprocessor with more than four processors on each node.

Susan Blackford
Tue May 13 09:21:01 EDT 1997