ScaLAPACK assumes a one-dimensional block distribution for the band and tridiagonal routines. The block distribution is used when the computational load is distributed homogeneously over the global data. This distribution leads to a highly efficient implementation of the divide-and-conquer algorithms used in ScaLAPACK.
For convenience we will number the processes from 0 to P-1, and the matrix rows from 1 to M and the matrix columns from 1 to N. Figure 4.8 shows the two data layouts used in ScaLAPACK for solving narrow band linear systems. In all cases, each submatrix is labeled with the number of the process that contains it. Process 0 owns the shaded submatrices.
Consider the layout illustrated on the left of figure 4.8, the one-dimensional block column distribution. This distribution
Figure 4.8: The one-dimensional block-column and block-row distributions
assigns a block of NB contiguous
columns of a matrix to successive
processes arranged in a
one-dimensional process grid. Each
process receives at most one block
of columns of the matrix, i.e.,
.
Column k is stored on process
.
The maximum number of columns
stored per process is given by
. In the
figure M=N=16 and P=4.
This distribution assigns
blocks of columns of size
NB to successive processes.
If the value of P evenly
divides the value of
N and NB = N / P, then
each process owns a block of
equal size. However, if this
is not the case, then either
the last process to receive
a portion of the matrix will
receive a smaller block than
other processes, or some
processes may receive an
empty portion of the matrix.
The transpose of this layout,
the one-dimensional
block-row distribution,
is shown on the right of
figure 4.8.
The block-column distribution scheme is the data layout that is used in the ScaLAPACK library for the coefficient matrix of the narrow band and tridiagonal solvers.
The block-row distribution scheme is the data layout that is used in the ScaLAPACK library for the right-hand-side matrix of the narrow band and tridiagonal solvers.