Previous: Wavefronts in the Gauss-Seidel and Conjugate Gradient methods
Up: Parallelism
Previous Page: Wavefronts in the Gauss-Seidel and Conjugate Gradient methods
Next Page: Remaining topics
In addition to the usual matrix-vector product, inner products and
vector updates, the preconditioned GMRES method
(see §) has a kernel where one new vector,
, is orthogonalized against the previously built
orthogonal set {
,
,...,
}.
In our version, this is
done using Level 1 BLAS, which may be quite inefficient. To
incorporate Level 2 BLAS we can apply either Householder
orthogonalization or classical Gram-Schmidt twice (which mitigates
classical Gram-Schmidt's potential instability; see
Saad [179]). Both
approaches significantly increase the computational work, but using
classical Gram-Schmidt has the advantage that all inner products can
be performed simultaneously; that is, their communication can be
packaged. This may increase the efficiency of the computation
significantly.
Another way to obtain more parallelism and
data locality is to generate a basis
{,
, ...,
} for the Krylov subspace first,
and to orthogonalize this set afterwards; this is called
m-step GMRES(m) (see Kim and Chronopoulos [137]).
(Compare this to the GMRES method in §
, where each
new vector is immediately orthogonalized to all previous vectors.)
This approach does not
increase the computational work and, in contrast to CG, the numerical
instability due to generating a possibly near-dependent set is not
necessarily a drawback.