Block Arnoldi Reductions

Next: Practical Algorithm Up: Block Arnoldi Method Previous: Block Arnoldi Method Contents Index

Block Arnoldi Reductions

Let

be a matrix of order

and

be the block size. We say that

$\begin{displaymath} A V_{[m]} = V_{[m]} H_{[m]} + F_m E^{\ast}_m \end{displaymath}$

(135)

is a block Arnoldi reduction of length

when $V_{[m]}^\ast AV_{[m]}=H_{[m]}$ is a band upper Hessenberg matrix of order $m\cdot b$ , $V_{[m]}^\ast V_{[m]}= I_{m b}$ , and $V_{[m]}^\ast F_m=0.$ For us, a band upper Hessenberg matrix is an upper triangular matrix with

subdiagonals. The columns of $V_{[m]} = [V_1,V_2,\ldots,V_m]$ are an orthogonal basis for the block Krylov subspace

$\begin{displaymath}{\cal K}^m(A,V_1) \equiv \mbox{span}\{V_1, A V_1,\ldots, A^{m-1} V_1 \}.\end{displaymath}$

If $m > \bar{m} \equiv \mbox{ceiling}(n/b)$ , then $F_m = {0}$ and $H_{[m]}$ is the orthogonal reduction of

into banded upper Hessenberg form. We assume, for the moment, that

is of full rank and further suppose that the elements on the diagonal of $H_{m+1,m}$ are positive. Thus, a straightforward extension of the implicit Q theorem [198, pp. 367-368] gives that

is (uniquely) specified by the starting block

Note that if $A=A^\ast$ , then $H_{[m]}$ is a block tridiagonal matrix. Algorithm 7.11 lists an algorithm to compute a block Arnoldi reduction.

$\begin{algorithm}{Extending a Block Arnoldi Reduction } { \begin{tabbing} (nr)s... ...m]}^\ast W \\ H_{m+1,m+1} \end{array}\right] $\end{tabbing}} \end{algorithm}$
We now comment on some steps of Algorithm 7.11.

(1): Here the QR factorization is computed via an iterated CGS algorithm using a possible correction step. See [96] for details and the simple test used to determine whether a correction step is necessary. One benefit of this scheme is that it allows the use of the Level 2 BLAS [133] matrix-vector multiplication subroutine xGEMV (also see §10.2). Moreover, this scheme also gives a simple way to fill out a rank-deficient For instance, if a third step of orthogonalization is needed when generating column of $V_{m+1},$ then the corresponding column of is linearly dependent on the previous columns of $V_{m+1}.$ The th diagonal element of $H_{m+1,m}$ is set to zero, and a random unit vector is orthogonalized against $V_{[m]}$ and the first columns of $V_{m+1}.$
(3): This step allows the application of to a group of vectors. This might prove essential when accessing is expensive. Clearly, the goal is to amortize the cost of applying over several vectors.
(4): This allows the use of the Level 3 BLAS [134] matrix-matrix multiplication subroutine xGEMM for computing $V_m^\ast W.$
(6): This is one step of block classical Gram-Schmidt (bCGR). The Level 3 BLAS [134] matrix-matrix multiplication subroutine xGEMM is used for computing the rank- update needed for the computation of $F_{m+1}$ . To ensure the orthogonality of $F_{m+1}$ with $V_{m+1}$ , a second step of bCGR is performed except when In this latter case, the simple test in DGKS [96] is used to determine whether a second orthogonalization step is needed. See [295] for details.

The scheme given for computing $V_{[m+1]}$ is equivalent to the one proposed by Ruhe
[375] (for symmetric matrices) and is the one used by the implicitly restarted block Lanczos code [24]. Although the approach in [24] cleanly deals with the problem of rank-deficient , the implementation does not exploit the ability to apply as in (3) above. Instead, as proposed in [375], is applied to each column of followed by computing the corresponding column of $V_{m+1}$ and $H_{[m+1]}.$ Our implementation further reorganizes Ruhe's approach so that the computation of the matrix of coefficients $V_{[m]}^\ast W$ is separated from the QR factorization of The advantage is that steps (3)-(5) reduce the cost of I/O by a factor of the block size and increase the amount of floating point operations per memory reference.

Next: Practical Algorithm Up: Block Arnoldi Method Previous: Block Arnoldi Method Contents Index

Susan Blackford 2000-11-20