The ScaLAPACK routines solving narrow-band and tridiagonal linear systems assume their operands to be distributed according to the block-column and block-row data distribution schemes. Specifically, the narrow band or tridiagonal coefficient matrix is distributed in a block-column fashion, and the dense matrix of right hand side vectors is distributed in a block-row fashion. This section presents these distributions and demonstrates how the ScaLAPACK software encodes this essential information as well as the related software conventions.
The block data layout has been selected for narrow band matrices. Divide-and-conquer algorithms have been implemented in ScaLAPACK because these algorithms offer a much greater scope for exploiting parallelism than the corresponding adapted dense algorithms. The narrow band or tridiagonal coefficient matrix is partitioned into blocks. The inherent parallelism of these divide-and-conquer methods is limited by the number of these blocks because each block is processed independently; hence, it is necessary to choose the number of blocks at least equal to the desired parallelism. However, because the size of the reduced system is proportional to the number of blocks, and solving this reduced system is the major parallelism bottleneck, it follows that a block layout in which each process has exactly one block allows maximum exploitation of parallelism while minimizing the size of the reduced system.