Before experimenting with different data layouts, users should make sure that they are using the fastest BLACS and BLAS libraries.
Three major factors influence the performance of a ScaLAPACK routine: the flop rate achieved by the BLAS on each node, the computational load balance, and the communication costs.