Next: Practical Algorithm Up: Block Arnoldi Method   Previous: Block Arnoldi Method     Contents   Index

Block Arnoldi Reductions

Let be a matrix of order and be the block size. We say that
 (135)

is a block Arnoldi reduction of length when is a band upper Hessenberg matrix of order , , and For us, a band upper Hessenberg matrix is an upper triangular matrix with subdiagonals. The columns of are an orthogonal basis for the block Krylov subspace

If , then and is the orthogonal reduction of into banded upper Hessenberg form. We assume, for the moment, that is of full rank and further suppose that the elements on the diagonal of are positive. Thus, a straightforward extension of the implicit Q theorem [198, pp. 367-368] gives that is (uniquely) specified by the starting block Note that if , then is a block tridiagonal matrix. Algorithm 7.11 lists an algorithm to compute a block Arnoldi reduction.

We now comment on some steps of Algorithm 7.11.

(1)
Here the QR factorization is computed via an iterated CGS algorithm using a possible correction step. See [96] for details and the simple test used to determine whether a correction step is necessary. One benefit of this scheme is that it allows the use of the Level 2 BLAS [133] matrix-vector multiplication subroutine xGEMV (also see §10.2). Moreover, this scheme also gives a simple way to fill out a rank-deficient For instance, if a third step of orthogonalization is needed when generating column of then the corresponding column of is linearly dependent on the previous columns of The th diagonal element of is set to zero, and a random unit vector is orthogonalized against and the first columns of

(3)
This step allows the application of to a group of vectors. This might prove essential when accessing is expensive. Clearly, the goal is to amortize the cost of applying over several vectors.

(4)
This allows the use of the Level 3 BLAS [134] matrix-matrix multiplication subroutine xGEMM for computing

(6)
This is one step of block classical Gram-Schmidt (bCGR). The Level 3 BLAS [134] matrix-matrix multiplication subroutine xGEMM is used for computing the rank- update needed for the computation of . To ensure the orthogonality of with , a second step of bCGR is performed except when In this latter case, the simple test in DGKS [96] is used to determine whether a second orthogonalization step is needed. See [295] for details.

The scheme given for computing is equivalent to the one proposed by Ruhe
[375] (for symmetric matrices) and is the one used by the implicitly restarted block Lanczos code [24]. Although the approach in [24] cleanly deals with the problem of rank-deficient , the implementation does not exploit the ability to apply as in (3) above. Instead, as proposed in [375], is applied to each column of followed by computing the corresponding column of and Our implementation further reorganizes Ruhe's approach so that the computation of the matrix of coefficients is separated from the QR factorization of The advantage is that steps (3)-(5) reduce the cost of I/O by a factor of the block size and increase the amount of floating point operations per memory reference.

Next: Practical Algorithm Up: Block Arnoldi Method   Previous: Block Arnoldi Method     Contents   Index
Susan Blackford 2000-11-20