The choice of an appropriate data distribution heavily depends on the characteristics or flow of the computation in the algorithm. For dense matrix computations, ScaLAPACK assumes the data to be distributed according to the two-dimensional block-cyclic data layout scheme. This section presents this distribution and demonstrates how the ScaLAPACK software encodes this essential information as well as the related software conventions.
Dense matrix computations feature a large amount of parallelism, so that a wide variety of distribution schemes have the potential for achieving high performance. The block-cyclic data layout has been selected for the dense algorithms implemented in ScaLAPACK principally because of its scalability , load balance, and communication  properties. The block-partitioned computation proceeds in consecutive order just like a conventional serial algorithm. This essential property of the block cyclic data layout explains why the ScaLAPACK design has been able to reuse the numerical and software expertise of the sequential LAPACK library.