Next: Practical Algorithm
Up: Block Arnoldi Method
Previous: Block Arnoldi Method
  Contents
  Index
Let be a matrix of order and be the block size. We say that
|
(135) |
is a block Arnoldi reduction of length when
is a band upper Hessenberg matrix
of order ,
, and
For us, a band upper Hessenberg matrix is an upper triangular matrix
with subdiagonals.
The columns of
are an orthogonal basis for the block Krylov subspace
If
, then and
is the
orthogonal reduction of into banded upper Hessenberg form.
We assume, for the moment, that is of full rank and
further suppose that the elements on the diagonal of are
positive.
Thus, a straightforward extension of the implicit Q
theorem [198, pp. 367-368]
gives that is (uniquely) specified by the starting block
Note that if , then is a block tridiagonal matrix.
Algorithm 7.11 lists an algorithm to compute a block Arnoldi
reduction.
We now comment on some steps of Algorithm 7.11.
- (1)
- Here the QR factorization is computed via an iterated
CGS algorithm using a possible correction step.
See [96] for details and the simple test used to determine
whether a correction step is necessary.
One benefit of this scheme is that it allows the use of the Level 2
BLAS [133] matrix-vector multiplication
subroutine xGEMV (also see §10.2).
Moreover, this scheme also gives a simple way to fill out a rank-deficient
For instance,
if a third step of orthogonalization is needed when generating column of
then the corresponding column of is linearly dependent
on the previous columns of
The th diagonal element of
is set to zero, and a random unit vector is orthogonalized against
and the first columns of
- (3)
- This step allows the application of to a group of vectors. This might
prove essential when accessing is expensive. Clearly, the goal is
to amortize the cost of applying over several vectors.
- (4)
- This allows the use of the Level 3 BLAS [134]
matrix-matrix multiplication subroutine xGEMM
for computing
- (6)
- This is one step of block classical Gram-Schmidt (bCGR).
The Level 3 BLAS [134]
matrix-matrix multiplication subroutine xGEMM
is used for computing the rank- update needed
for the computation of .
To ensure the orthogonality of with , a second step of
bCGR is performed except when In this latter case, the simple test
in DGKS [96] is used to determine whether a second orthogonalization
step is needed. See [295] for details.
The scheme given for computing is equivalent to the one
proposed by Ruhe
[375] (for symmetric matrices) and is the one
used by the implicitly restarted block Lanczos code [24].
Although the approach in [24] cleanly deals with the problem
of rank-deficient , the implementation does not exploit the
ability to apply as in (3) above. Instead, as proposed
in [375], is applied to each column of followed by
computing the corresponding column of and Our
implementation further reorganizes Ruhe's approach so that the
computation of the matrix of coefficients
is
separated from the QR factorization of The advantage is that
steps (3)-(5) reduce the cost of I/O by a factor of the block size and
increase the amount of floating point operations per memory reference.
Next: Practical Algorithm
Up: Block Arnoldi Method
Previous: Block Arnoldi Method
  Contents
  Index
Susan Blackford
2000-11-20