Let us first discuss how to distribute
a narrow band matrix A
over a one-dimensional process grid
using a block-column distribution.
We assume that the coefficient band
matrix A
is of size (
)
with a bandwidth BW=2 if the matrix
A is symmetric positive definite,
and BWL=2 and BWU=2 if the matrix
A is nonsymmetric. The matrix A is
represented by the following.
If we assume that the matrix A
is nonsymmetric band, the user may
choose to perform partial pivoting
or no pivoting during the factorization
(PxGBTRF
or PxDBTRF ,
respectively). Both strategies
assume a block-column distribution
of the coefficient matrix, but
additional storage is required
for fill-in if partial pivoting
is selected. First, let us assume
that we have selected no pivoting,
and we distribute this matrix onto
a process grid with a
block size of
. The
processes would contain the local
arrays found in figure 4.9.
Figure 4.9
also illustrates that the leading
dimension of the local arrays
containing the coefficient matrix
must be at least BWL+1+BWU for
the non-pivoting narrow band linear
solver.
Figure 4.9: Mapping of local arrays for nonsymmetric band matrix A
(no pivoting)
If, however, we select partial pivoting
and distribute this same matrix onto a
process grid with a block
size of
, the processes would
contain the local arrays found in
figure 4.10.
The amount of additional storage
required for fill-in is represented
by F in the figure and is equal
to the sum of the lower bandwidth
(number of subdiagonals), BWL, and
the upper bandwidth (number of
superdiagonals), BWU. In this
example, BWL=2 and BWU=2.
Refer to the leading comments
of the routine PxGBTRF for
further details. Figure 4.10
also illustrates that the leading
dimension of the local arrays
containing the coefficient matrix
must be at least 2*(BWL+BWU)+1
for the partial pivoting narrow
band linear solver.
Figure 4.10: Mapping of local arrays for nonsymmetric band matrix
A (partial pivoting)
Let us now assume that the matrix A is
symmetric positive definite band with BW=2,
and we distribute this matrix assuming lower
triangular storage (UPLO='L') onto a
process grid with a block size
.
The processes would contain the local arrays
found in figure 4.11. We would
then call the routine
PxPBTRF
with BW=2 to perform the factorization,
for example.
Figure 4.11: Mapping of local arrays for symmetric positive definite
band matrix A (UPLO='L')
If we then distributed this same matrix assuming
upper triangular storage (UPLO='U') onto a
process grid with a block size
, the processes
would contain the local arrays found in figure 4.12.
Figure 4.12: Mapping of local arrays for symmetric positive definite
band matrix A (UPLO='U')
Figures 4.11 and 4.12 also illustrate that the leading dimension of the local arrays containing the coefficient matrix must be at least BW+1 for the symmetric positive definite narrow band linear solver.
The notation in
figures 4.9,
4.10, 4.11,
and 4.12 and the
F notation in figure 4.10
signify an entry in which one
need not store a value in that
position of the local array.
These storage positions, however,
are required and overwritten
during the computation.
The matrix of
right-hand-side vectors B
(for example, used in
PxGBTRS , PxDBTRS ,
and PxPBTRS )
is assumed to be a dense matrix
distributed in a block-row manner
across the process grid. Thus,
consecutive blocks of rows of
the matrix B are assigned to
successive processes in the
process grid, as described in
section 4.4.1.