How to Use This Book

Department of Computer Science, University of Tennessee, Knoxville, TN 37996-1301.

Applied Mathematics Department, University of California, Los Angeles, CA 90024-1555.

Computer Science Division and Mathematics Department, University of California, Berkeley, CA 94720.

Mathematical Sciences Section, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6367.

National Institute of Standards and Technology, Gaithersburg, MD, 20899

Department of Mathematics, Utrecht University, Utrecht, the Netherlands.

For a discussion of BLAS as building blocks, see [144] [71] [70] [69] and LAPACK routines [3]. Also, see Appendix

For a more detailed account of the early history of CG methods, we refer the reader to Golub and O'Leary [108] and Hestenes [123].

Under certain conditions, one can show that the point Jacobi algorithm is optimal, or close to optimal, in the sense of reducing the condition number, among all preconditioners of diagonal form. This was shown by Forsythe and Strauss for matrices with Property A [99], and by van der Sluis [198] for general sparse matrices. For extensions to block Jacobi preconditioners, see Demmel [66] and Elsner [95].

The SOR and Gauss-Seidel matrices are never used as preconditioners, for a rather technical reason. SOR-preconditioning with optimal

maps the eigenvalues of the coefficient matrix to a circle in the complex plane; see Hageman and Young [.3]HaYo:applied. In this case no polynomial acceleration is possible, i.e., the accelerating polynomial reduces to the trivial polynomial

, and the resulting method is simply the stationary SOR method. Recent research by Eiermann and Varga [84] has shown that polynomial acceleration of SOR with suboptimal

will yield no improvement over simple SOR with optimal

To be precise, if we make an incomplete factorization

, we refer to positions in

and

when we talk of positions in the factorization. The matrix

will have more nonzeros than

and

combined.

The zero refers to the fact that only ``level zero'' fill is permitted, that is, nonzero elements of the original matrix. Fill levels are defined by calling an element of level

if it is caused by elements at least one of which is of level

. The first fill level is that caused by the original matrix elements.

In graph theoretical terms,

and

coincide if the matrix graph contains no triangles.

is equally valid, but in practice harder to implement.

Writing

On a machine with IEEE Standard Floating Point Arithmetic,

in single precision, and

in double precision.

IEEE standard floating point arithmetic permits computations with

and NaN, or Not-a-Number, symbols.

CRS-based Factorization Solve

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

How to Use This Book

Next: Author's Affiliations Up: Templates for the Solution Previous: Templates for the Solution

How to Use This Book

We have divided this book into five main chapters. Chapter gives the motivation for this book and the use of templates.

Chapter describes stationary and nonstationary iterative methods. In this chapter we present both historical development and state-of-the-art methods for solving some of the most challenging computational problems facing researchers.

Chapter focuses on preconditioners. Many iterative methods depend in part on preconditioners to improve performance and ensure fast convergence.

Chapter provides a glimpse of issues related to the use of iterative methods. This chapter, like the preceding, is especially recommended for the experienced user who wishes to have further guidelines for tailoring a specific code to a particular machine. It includes information on complex systems, stopping criteria, data storage formats, and parallelism.

Chapter includes overviews of related topics such as the close connection between the Lanczos algorithm and the Conjugate Gradient algorithm, block iterative methods, red/black orderings, domain decomposition methods, multigrid-like methods, and row-projection schemes.

The Appendices contain information on how the templates and BLAS software can be obtained. A glossary of important terms used in the book is also provided.

The field of iterative methods for solving systems of linear equations is in constant flux, with new methods and approaches continually being created, modified, tuned, and some eventually discarded. We expect the material in this book to undergo changes from time to time as some of these new approaches mature and become the state-of-the-art. Therefore, we plan to update the material included in this book periodically for future editions. We welcome your comments and criticisms of this work to help us in that updating process. Please send your comments and questions by email to templates@cs.utk.edu.

List of Symbols

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Overview of the Methods

Next: Stationary Iterative Methods Up: Iterative Methods Previous: Iterative Methods

Overview of the Methods

Below are short descriptions of each of the methods to be discussed, along with brief notes on the classification of the methods in terms of the class of matrices for which they are most appropriate. In later sections of this chapter more detailed descriptions of these methods are given.

Stationary Methods
- Jacobi .
  The Jacobi method is based on solving for every variable locally with respect to the other variables; one iteration of the method corresponds to solving for every variable once. The resulting method is easy to understand and implement, but convergence is slow.
- Gauss-Seidel .
  The Gauss-Seidel method is like the Jacobi method, except that it uses updated values as soon as they are available. In general, if the Jacobi method converges, the Gauss-Seidel method will converge faster than the Jacobi method, though still relatively slowly.
- SOR .
  Successive Overrelaxation (SOR) can be derived from the Gauss-Seidel method by introducing an extrapolation parameter . For the optimal choice of , SOR may converge faster than Gauss-Seidel by an order of magnitude.
- SSOR .
  Symmetric Successive Overrelaxation (SSOR) has no advantage over SOR as a stand-alone iterative method; however, it is useful as a preconditioner for nonstationary methods.
Nonstationary Methods
- Conjugate Gradient (CG ).
  The conjugate gradient method derives its name from the fact that it generates a sequence of conjugate (or orthogonal) vectors. These vectors are the residuals of the iterates. They are also the gradients of a quadratic functional, the minimization of which is equivalent to solving the linear system. CG is an extremely effective method when the coefficient matrix is symmetric positive definite, since storage for only a limited number of vectors is required.
- Minimum Residual (MINRES ) and Symmetric LQ (SYMMLQ ).
  These methods are computational alternatives for CG for coefficient matrices that are symmetric but possibly indefinite. SYMMLQ will generate the same solution iterates as CG if the coefficient matrix is symmetric positive definite.
- Conjugate Gradient on the Normal Equations : CGNE and CGNR .
  These methods are based on the application of the CG method to one of two forms of the normal equations for . CGNE solves the system for and then computes the solution . CGNR solves for the solution vector where . When the coefficient matrix is nonsymmetric and nonsingular, the normal equations matrices and will be symmetric and positive definite, and hence CG can be applied. The convergence may be slow, since the spectrum of the normal equations matrices will be less favorable than the spectrum of .
- Generalized Minimal Residual (GMRES ).
  The Generalized Minimal Residual method computes a sequence of orthogonal vectors (like MINRES), and combines these through a least-squares solve and update. However, unlike MINRES (and CG) it requires storing the whole sequence, so that a large amount of storage is needed. For this reason, restarted versions of this method are used. In restarted versions, computation and storage costs are limited by specifying a fixed number of vectors to be generated. This method is useful for general nonsymmetric matrices.
- BiConjugate Gradient (BiCG ).
  The Biconjugate Gradient method generates two CG-like sequences of vectors, one based on a system with the original coefficient matrix , and one on . Instead of orthogonalizing each sequence, they are made mutually orthogonal, or ``bi-orthogonal''. This method, like CG, uses limited storage. It is useful when the matrix is nonsymmetric and nonsingular; however, convergence may be irregular, and there is a possibility that the method will break down. BiCG requires a multiplication with the coefficient matrix and with its transpose at each iteration.
- Quasi-Minimal Residual (QMR ).
  The Quasi-Minimal Residual method applies a least-squares solve and update to the BiCG residuals, thereby smoothing out the irregular convergence behavior of BiCG, which may lead to more reliable approximations. In full glory, it has a look ahead strategy built in that avoids the BiCG breakdown. Even without look ahead, QMR largely avoids the breakdown that can occur in BiCG. On the other hand, it does not effect a true minimization of either the error or the residual, and while it converges smoothly, it often does not improve on the BiCG in terms of the number of iteration steps.
- Conjugate Gradient Squared (CGS ).
  The Conjugate Gradient Squared method is a variant of BiCG that applies the updating operations for the -sequence and the -sequences both to the same vectors. Ideally, this would double the convergence rate, but in practice convergence may be much more irregular than for BiCG, which may sometimes lead to unreliable results. A practical advantage is that the method does not need the multiplications with the transpose of the coefficient matrix.
- Biconjugate Gradient Stabilized (Bi-CGSTAB ).
  The Biconjugate Gradient Stabilized method is a variant of BiCG, like CGS, but using different updates for the -sequence in order to obtain smoother convergence than CGS.
- Chebyshev Iteration.
  The Chebyshev Iteration recursively determines polynomials with coefficients chosen to minimize the norm of the residual in a min-max sense. The coefficient matrix must be positive definite and knowledge of the extremal eigenvalues is required. This method has the advantage of requiring no inner products.

Next: Stationary Iterative Methods Up: Iterative Methods Previous: Iterative Methods

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Sparse Incomplete Factorizations

Next: Generating a CRS-based Up: Data Structures Previous: CDS Matrix-Vector Product

Sparse Incomplete Factorizations

Efficient preconditioners for iterative methods can be found by performing an incomplete factorization of the coefficient matrix. In this section, we discuss the incomplete factorization of an matrix stored in the CRS format, and routines to solve a system with such a factorization. At first we only consider a factorization of the - type, that is, the simplest type of factorization in which no ``fill'' is allowed, even if the matrix has a nonzero in the fill position (see section ). Later we will consider factorizations that allow higher levels of fill. Such factorizations considerably more complicated to code, but they are essential for complicated differential equations. The solution routines are applicable in both cases.

For iterative methods, such as , that involve a transpose matrix vector product we need to consider solving a system with the transpose of as factorization as well.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Generating a CRS-based <IMG ALIGN=BOTTOM SRC="http://www.netlib.org/utk/papers/templates/_22900_tex2html_wrap6389.gif"> -<IMG ALIGN=BOTTOM SRC="http://www.netlib.org/utk/papers/templates/_22900_tex2html_wrap7381.gif"> Incomplete Factorization

Next: CRS-based Factorization Solve Up: Sparse Incomplete Factorizations Previous: Sparse Incomplete Factorizations

Generating a CRS-based - Incomplete Factorization

In this subsection we will consider a matrix split as in diagonal, lower and upper triangular part, and an incomplete factorization preconditioner of the form . In this way, we only need to store a diagonal matrix containing the pivots of the factorization.

Hence,it suffices to allocate for the preconditioner only a pivot array of length (pivots(1:n)). In fact, we will store the inverses of the pivots rather than the pivots themselves. This implies that during the system solution no divisions have to be performed.

Additionally, we assume that an extra integer array diag_ptr(1:n) has been allocated that contains the column (or row) indices of the diagonal elements in each row, that is, .

The factorization begins by copying the matrix diagonal

for i = 1, n
    pivots(i) = val(diag_ptr(i))
end;

Each elimination step starts by inverting the pivot

for i = 1, n
    pivots(i) = 1 / pivots(i)

For all nonzero elements

with

, we next check whether

is a nonzero matrix element, since this is the only element that can cause fill with

    for j = diag_ptr(i)+1, row_ptr(i+1)-1
        found = FALSE
        for k = row_ptr(col_ind(j)), diag_ptr(col_ind(j))-1
            if(col_ind(k) = i) then
                found = TRUE
                element = val(k)
            endif
        end;

If so, we update

        if (found = TRUE)
           pivots(col_ind(j)) = pivots(col_ind(j)) 
                                - element * pivots(i) * val(j)
    end; 
end;

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

CRS-based Factorization Solve

Next: CRS-based Factorization Transpose Up: Sparse Incomplete Factorizations Previous: Generating a CRS-based

CRS-based Factorization Solve

The system can be solved in the usual manner by introducing a temporary vector :

We have a choice between several equivalent ways of solving the system:

The first and fourth formulae are not suitable since they require both multiplication and division with ; the difference between the second and third is only one of ease of coding. In this section we use the third formula; in the next section we will use the second for the transpose system solution.

Both halves of the solution have largely the same structure as the matrix vector multiplication.

for i = 1, n
    sum =  0
    for j = row_ptr(i), diag_ptr(i)-1
        sum = sum + val(j) * z(col_ind(j))
    end;
    z(i) = pivots(i) * (x(i)-sum)
end;   
for i = n, 1, (step -1)
    sum = 0
    for j = diag(i)+1, row_ptr(i+1)-1
        sum = sum + val(j) * y(col_ind(j))
        y(i) = z(i) - pivots(i) * sum
    end;
end;

The temporary vector z can be eliminated by reusing the space for y; algorithmically, z can even overwrite x, but overwriting input data is in general not recommended .

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

CRS-based Factorization Transpose Solve

Next: Generating a CRS-based Up: Sparse Incomplete Factorizations Previous: CRS-based Factorization Solve

CRS-based Factorization Transpose Solve

Solving the transpose system is slightly more involved. In the usual formulation we traverse rows when solving a factored system, but here we can only access columns of the matrices and (at less than prohibitive cost). The key idea is to distribute each newly computed component of a triangular solve immediately over the remaining right-hand-side.

For instance, if we write a lower triangular matrix as , then the system can be written as . Hence, after computing we modify , and so on. Upper triangular systems are treated in a similar manner. With this algorithm we only access columns of the triangular systems. Solving a transpose system with a matrix stored in CRS format essentially means that we access rows of and .

The algorithm now becomes

for i = 1, n
    x_tmp(i) = x(i)
end;
for i = 1, n
    z(i) = x_tmp(i)
    tmp = pivots(i) * z(i)
    for j = diag_ptr(i)+1, row_ptr(i+1)-1
        x_tmp(col_ind(j)) = x_tmp(col_ind(j)) - tmp * val(j) 
    end;
end;
for i = n, 1 (step -1)
    y(i) = pivots(i) * z(i)
    for j = row_ptr(i), diag_ptr(i)-1
        z(col_ind(j)) = z(col_ind(j)) - val(j) * y(i)
    end;
end;

The extra temporary x_tmp is used only for clarity, and can be overlapped with z. Both x_tmp and z can be considered to be equivalent to y. Overall, a CRS-based preconditioner solve uses short vector lengths, indirect addressing, and has essentially the same memory traffic patterns as that of the matrix-vector product.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Generating a CRS-based <IMG ALIGN=BOTTOM SRC="http://www.netlib.org/utk/papers/templates/_22900_tex2html_wrap7433.gif"> Incomplete Factorization

Next: Parallelism Up: Sparse Incomplete Factorizations Previous: CRS-based Factorization Transpose

Generating a CRS-based Incomplete Factorization

Incomplete factorizations with several levels of fill allowed are more accurate than the - factorization described above. On the other hand, they require more storage, and are considerably harder to implement (much of this section is based on algorithms for a full factorization of a sparse matrix as found in Duff, Erisman and Reid [80]).

As a preliminary, we need an algorithm for adding two vectors and , both stored in sparse storage. Let lx be the number of nonzero components in , let be stored in x, and let xind be an integer array such that

Similarly, is stored as ly, y, yind.

We now add by first copying y into a full vector w then adding w to x. The total number of operations will be :

% copy y into w
for i=1,ly
   w( yind(i) ) = y(i)
% add w to x wherever x is already nonzero
for i=1,lx
   if w( xind(i) ) <> 0
      x(i) = x(i) + w( xind(i) )
   w( xind(i) ) = 0
% add w to x by creating new components
% wherever x is still zero
for i=1,ly
   if w( yind(i) ) <> 0 then
      lx = lx+1
      xind(lx) = yind(i)
      x(lx) = w( yind(i) )
   endif

In order to add a sequence of vectors

, we add the

vectors into

before executing the writes into

. A different implementation would be possible, where

is allocated as a sparse vector and its sparsity pattern is constructed during the additions. We will not discuss this possibility any further.

For a slight refinement of the above algorithm, we will add levels to the nonzero components: we assume integer vectors xlev and ylev of length lx and ly respectively, and a full length level vector wlev corresponding to w. The addition algorithm then becomes:

% copy y into w
for i=1,ly
   w( yind(i) )    = y(i)
   wlev( yind(i) ) = ylev(i)
% add w to x wherever x is already nonzero;
% don't change the levels
for i=1,lx
   if w( xind(i) ) <> 0
      x(i) = x(i) + w( xind(i) )
   w( xind(i) ) = 0
% add w to x by creating new components
% wherever x is still zero;
% carry over levels
for i=1,ly
   if w( yind(i) ) <> 0 then
      lx = lx+1
      x(lx)    = w( yind(i) )
      xind(lx) = yind(i)
      xlev(lx) = wlev( yind(i) )
   endif

We can now describe the factorization. The algorithm starts out with the matrix A, and gradually builds up a factorization M of the form , where , , and are stored in the lower triangle, diagonal and upper triangle of the array M respectively. The particular form of the factorization is chosen to minimize the number of times that the full vector w is copied back to sparse form.

Specifically, we use a sparse form of the following factorization scheme:

for k=1,n
   for j=1,k-1
      for i=j+1,n
         a(k,i) = a(k,i) - a(k,j)*a(j,i)
   for j=k+1,n
      a(k,j) = a(k,j)/a(k,k)

This is a row-oriented version of the traditional `left-looking' factorization algorithm.

We will describe an incomplete factorization that controls fill-in through levels (see equation ( )). Alternatively we could use a drop tolerance (section ), but this is less attractive from a point of implementation. With fill levels we can perform the factorization symbolically at first, determining storage demands and reusing this information through a number of linear systems of the same sparsity structure. Such preprocessing and reuse of information is not possible with fill controlled by a drop tolerance criterion.

The matrix arrays A and M are assumed to be in compressed row storage, with no particular ordering of the elements inside each row, but arrays adiag and mdiag point to the locations of the diagonal elements.

for row=1,n
%  go through elements A(row,col) with col<row
   COPY ROW row OF A() INTO DENSE VECTOR w
   for col=aptr(row),aptr(row+1)-1
      if aind(col) < row then
         acol = aind(col)
         MULTIPLY ROW acol OF M() BY A(col)
         SUBTRACT THE RESULT FROM w
         ALLOWING FILL-IN UP TO LEVEL k
      endif
      INSERT w IN ROW row OF M()
% invert the pivot
   M(mdiag(row)) = 1/M(mdiag(row))
% normalize the row of U
   for col=mptr(row),mptr(row+1)-1
      if mind(col) > row
         M(col) = M(col) * M(mdiag(row))

The structure of a particular sparse matrix is likely to apply to a sequence of problems, for instance on different time-steps, or during a Newton iteration. Thus it may pay off to perform the above incomplete factorization first symbolically to determine the amount and location of fill-in and use this structure for the numerically different but structurally identical matrices. In this case, the array for the numerical values can be used to store the levels during the symbolic factorization phase.

Next: Parallelism Up: Sparse Incomplete Factorizations Previous: CRS-based Factorization Transpose

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Parallelism

Next: Inner products Up: Related Issues Previous: Generating a CRS-based

Parallelism

Pipelining: See: Vector computer. Vector computer: Computer that is able to process consecutive identical operations (typically additions or multiplications) several times faster than intermixed operations of different types. Processing identical operations this way is called `pipelining' the operations. Shared memory: See: Parallel computer. Distributed memory: See: Parallel computer. Message passing: See: Parallel computer. Parallel computer: Computer with multiple independent processing units. If the processors have immediate access to the same memory, the memory is said to be shared; if processors have private memory that is not immediately visible to other processors, the memory is said to be distributed. In that case, processors communicate by message passing.

In this section we discuss aspects of parallelism in the iterative methods discussed in this book.

Since the iterative methods share most of their computational kernels we will discuss these independent of the method. The basic time-consuming kernels of iterative schemes are:

inner products,
vector updates,
matrix-vector products, e.g., (for some methods also ),
preconditioner solves.

We will examine each of these in turn. We will conclude this section by discussing two particular issues, namely computational wavefronts in the SOR method, and block operations in the GMRES method.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Inner products

Next: Overlapping communication and Up: Parallelism Previous: Parallelism

Inner products

The computation of an inner product of two vectors can be easily parallelized; each processor computes the inner product of corresponding segments of each vector (local inner products or LIPs). On distributed-memory machines the LIPs then have to be sent to other processors to be combined for the global inner product. This can be done either with an all-to-all send where every processor performs the summation of the LIPs, or by a global accumulation in one processor, followed by a broadcast of the final result. Clearly, this step requires communication.

For shared-memory machines, the accumulation of LIPs can be implemented as a critical section where all processors add their local result in turn to the global result, or as a piece of serial code, where one processor performs the summations.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Overlapping communication and computation

Next: Fewer synchronization points Up: Inner products Previous: Inner products

Overlapping communication and computation

Clearly, in the usual formulation of conjugate gradient-type methods the inner products induce a synchronization of the processors, since they cannot progress until the final result has been computed: updating and can only begin after completing the inner product for . Since on a distributed-memory machine communication is needed for the inner product, we cannot overlap this communication with useful computation. The same observation applies to updating , which can only begin after completing the inner product for .

Figure shows a variant of CG, in which all communication time may be overlapped with useful computations. This is just a reorganized version of the original CG scheme, and is therefore precisely as stable. Another advantage over other approaches (see below) is that no additional operations are required.

This rearrangement is based on two tricks. The first is that updating the iterate is delayed to mask the communication stage of the inner product. The second trick relies on splitting the (symmetric) preconditioner as , so one first computes , after which the inner product can be computed as where . The computation of will then mask the communication stage of the inner product.

Figure: A rearrangement of Conjugate Gradient for parallelism

Under the assumptions that we have made, CG can be efficiently parallelized as follows:

The communication required for the reduction of the inner product for can be overlapped with the update for , (which could in fact have been done in the previous iteration step).
The reduction of the inner product for can be overlapped with the remaining part of the preconditioning operation at the beginning of the next iteration.
The computation of a segment of can be followed immediately by the computation of a segment of , and this can be followed by the computation of a part of the inner product. This saves on load operations for segments of and .

For a more detailed discussion see Demmel, Heath and Van der Vorst [67]. This algorithm can be extended trivially to preconditioners of

form, and nonsymmetric preconditioners in the Biconjugate Gradient Method.

Next: Fewer synchronization points Up: Inner products Previous: Inner products

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Fewer synchronization points

Next: Vector updates Up: Inner products Previous: Overlapping communication and

Fewer synchronization points

Several authors have found ways to eliminate some of the synchronization points induced by the inner products in methods such as CG. One strategy has been to replace one of the two inner products typically present in conjugate gradient-like methods by one or two others in such a way that all inner products can be performed simultaneously. The global communication can then be packaged. A first such method was proposed by Saad [182] with a modification to improve its stability suggested by Meurant [156]. Recently, related methods have been proposed by Chronopoulos and Gear [55], D'Azevedo and Romine [62], and Eijkhout [88]. These schemes can also be applied to nonsymmetric methods such as BiCG. The stability of such methods is discussed by D'Azevedo, Eijkhout and Romine [61].

Another approach is to generate a number of successive Krylov vectors (see § ) and orthogonalize these as a block (see Van Rosendale [210], and Chronopoulos and Gear [55]).

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Vector updates

Next: Matrix-vector products Up: Parallelism Previous: Fewer synchronization points

Vector updates

Vector updates are trivially parallelizable: each processor updates its own segment.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Stationary Iterative Methods

Next: The Jacobi Method Up: Iterative Methods Previous: Overview of the

Stationary Iterative Methods

Iterative methods that can be expressed in the simple form

(where neither nor depend upon the iteration count ) are called stationary iterative methods. In this section, we present the four main stationary iterative methods: the Jacobi method, the Gauss-Seidel method, the Successive Overrelaxation (SOR) method and the Symmetric Successive Overrelaxation (SSOR) method. In each case, we summarize their convergence behavior and their effectiveness, and discuss how and when they should be used. Finally, in § , we give some historical background and further notes and references.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Matrix-vector products

Next: Preconditioning Up: Parallelism Previous: Vector updates

Matrix-vector products

The matrix-vector products are often easily parallelized on shared-memory machines by splitting the matrix in strips corresponding to the vector segments. Each processor then computes the matrix-vector product of one strip. For distributed-memory machines, there may be a problem if each processor has only a segment of the vector in its memory. Depending on the bandwidth of the matrix, we may need communication for other elements of the vector, which may lead to communication bottlenecks. However, many sparse matrix problems arise from a network in which only nearby nodes are connected. For example, matrices stemming from finite difference or finite element problems typically involve only local connections: matrix element is nonzero only if variables and are physically close. In such a case, it seems natural to subdivide the network, or grid, into suitable blocks and to distribute them over the processors. When computing , each processor requires the values of at some nodes in neighboring blocks. If the number of connections to these neighboring blocks is small compared to the number of internal nodes, then the communication time can be overlapped with computational work. For more detailed discussions on implementation aspects for distributed memory systems, see De Sturler [63] and Pommerell [175].

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Preconditioning

Next: Discovering parallelism in Up: Parallelism Previous: Matrix-vector products

Preconditioning

Preconditioning is often the most problematic part of parallelizing an iterative method. We will mention a number of approaches to obtaining parallelism in preconditioning.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Discovering parallelism in sequential preconditioners.

Next: More parallel variants Up: Preconditioning Previous: Preconditioning

Discovering parallelism in sequential preconditioners.

Certain preconditioners were not developed with parallelism in mind, but they can be executed in parallel. Some examples are domain decomposition methods (see § ), which provide a high degree of coarse grained parallelism, and polynomial preconditioners (see § ), which have the same parallelism as the matrix-vector product.

Incomplete factorization preconditioners are usually much harder to parallelize: using wavefronts of independent computations (see for instance Paolini and Radicati di Brozolo [170]) a modest amount of parallelism can be attained, but the implementation is complicated. For instance, a central difference discretization on regular grids gives wavefronts that are hyperplanes (see Van der Vorst [205] [203]).

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

More parallel variants of sequential preconditioners.

Next: Fully decoupled preconditioners. Up: Preconditioning Previous: Discovering parallelism in

More parallel variants of sequential preconditioners.

Variants of existing sequential incomplete factorization preconditioners with a higher degree of parallelism have been devised, though they are perhaps less efficient in purely scalar terms than their ancestors. Some examples are: reorderings of the variables (see Duff and Meurant [79] and Eijkhout [85]), expansion of the factors in a truncated Neumann series (see Van der Vorst [201]), various block factorization methods (see Axelsson and Eijkhout [15] and Axelsson and Polman [21]), and multicolor preconditioners.

Multicolor preconditioners have optimal parallelism among incomplete factorization methods, since the minimal number of sequential steps equals the color number of the matrix graphs. For theory and applications to parallelism see Jones and Plassman [128] [127].

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Fully decoupled preconditioners.

Next: Wavefronts in the Up: Preconditioning Previous: More parallel variants

Fully decoupled preconditioners.

If all processors execute their part of the preconditioner solve without further communication, the overall method is technically a block Jacobi preconditioner (see § ). While their parallel execution is very efficient, they may not be as effective as more complicated, less parallel preconditioners, since improvement in the number of iterations may be only modest. To get a bigger improvement while retaining the efficient parallel execution, Radicati di Brozolo and Robert [178] suggest that one construct incomplete decompositions on slightly overlapping domains. This requires communication similar to that for matrix-vector products.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Wavefronts in the Gauss-Seidel and Conjugate Gradient methods

Next: Blocked operations in Up: Parallelism Previous: Fully decoupled preconditioners.

Wavefronts in the Gauss-Seidel and Conjugate Gradient methods

At first sight, the Gauss-Seidel method (and the SOR method which has the same basic structure) seems to be a fully sequential method. A more careful analysis, however, reveals a high degree of parallelism if the method is applied to sparse matrices such as those arising from discretized partial differential equations.

We start by partitioning the unknowns in wavefronts. The first wavefront contains those unknowns that (in the directed graph of ) have no predecessor; subsequent wavefronts are then sets (this definition is not necessarily unique) of successors of elements of the previous wavefront(s), such that no successor/predecessor relations hold among the elements of this set. It is clear that all elements of a wavefront can be processed simultaneously, so the sequential time of solving a system with can be reduced to the number of wavefronts.

Next, we observe that the unknowns in a wavefront can be computed as soon as all wavefronts containing its predecessors have been computed. Thus we can, in the absence of tests for convergence, have components from several iterations being computed simultaneously. Adams and Jordan [2] observe that in this way the natural ordering of unknowns gives an iterative method that is mathematically equivalent to a multi-color ordering.

In the multi-color ordering, all wavefronts of the same color are processed simultaneously. This reduces the number of sequential steps for solving the Gauss-Seidel matrix to the number of colors, which is the smallest number such that wavefront contains no elements that are a predecessor of an element in wavefront .

As demonstrated by O'Leary [164], SOR theory still holds in an approximate sense for multi-colored matrices. The above observation that the Gauss-Seidel method with the natural ordering is equivalent to a multicoloring cannot be extended to the SSOR method or wavefront-based incomplete factorization preconditioners for the Conjugate Gradient method. In fact, tests by Duff and Meurant [79] and an analysis by Eijkhout [85] show that multicolor incomplete factorization preconditioners in general may take a considerably larger number of iterations to converge than preconditioners based on the natural ordering. Whether this is offset by the increased parallelism depends on the application and the computer architecture.

Next: Blocked operations in Up: Parallelism Previous: Fully decoupled preconditioners.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Blocked operations in the GMRES method

Next: Remaining topics Up: Parallelism Previous: Wavefronts in the

Blocked operations in the GMRES method

In addition to the usual matrix-vector product, inner products and vector updates, the preconditioned GMRES method (see § ) has a kernel where one new vector, , is orthogonalized against the previously built orthogonal set { , ,..., }. In our version, this is done using Level 1 BLAS, which may be quite inefficient. To incorporate Level 2 BLAS we can apply either Householder orthogonalization or classical Gram-Schmidt twice (which mitigates classical Gram-Schmidt's potential instability; see Saad [185]). Both approaches significantly increase the computational work, but using classical Gram-Schmidt has the advantage that all inner products can be performed simultaneously; that is, their communication can be packaged. This may increase the efficiency of the computation significantly.

Another way to obtain more parallelism and data locality is to generate a basis { , , ..., } for the Krylov subspace first, and to orthogonalize this set afterwards; this is called -step GMRES( ) (see Kim and Chronopoulos [139]). (Compare this to the GMRES method in § , where each new vector is immediately orthogonalized to all previous vectors.) This approach does not increase the computational work and, in contrast to CG, the numerical instability due to generating a possibly near-dependent set is not necessarily a drawback.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Remaining topics

Next: The Lanczos Connection Up: Templates for the Solution Previous: Blocked operations in

Remaining topics

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

The Lanczos Connection

Next: Block and -step Up: Remaining topics Previous: Remaining topics

The Lanczos Connection

As discussed by Paige and Saunders in [168] and by Golub and Van Loan in [109], it is straightforward to derive the conjugate gradient method for solving symmetric positive definite linear systems from the Lanczos algorithm for solving symmetric eigensystems and vice versa. As an example, let us consider how one can derive the Lanczos process for symmetric eigensystems from the (unpreconditioned) conjugate gradient method.

Suppose we define the matrix by

and the upper bidiagonal matrix by

where the sequences and are defined by the standard conjugate gradient algorithm discussed in § . From the equations

and , we have , where

Assuming the elements of the sequence are -conjugate, it follows that

is a tridiagonal matrix since

Since span{ } = span{ } and since the elements of are mutually orthogonal, it can be shown that the columns of matrix form an orthonormal basis for the subspace , where is a diagonal matrix whose th diagonal element is . The columns of the matrix are the Lanczos vectors (see Parlett [171]) whose associated projection of is the tridiagonal matrix

The extremal eigenvalues of approximate those of the matrix . Hence, the diagonal and subdiagonal elements of in ( ), which are readily available during iterations of the conjugate gradient algorithm (§ ), can be used to construct after CG iterations. This allows us to obtain good approximations to the extremal eigenvalues (and hence the condition number) of the matrix while we are generating approximations, , to the solution of the linear system .

For a nonsymmetric matrix , an equivalent nonsymmetric Lanczos algorithm (see Lanczos [142]) would produce a nonsymmetric matrix in ( ) whose extremal eigenvalues (which may include complex-conjugate pairs) approximate those of . The nonsymmetric Lanczos method is equivalent to the BiCG method discussed in § .

Next: Block and -step Up: Remaining topics Previous: Remaining topics

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Block and <IMG ALIGN=BOTTOM SRC="http://www.netlib.org/utk/papers/templates/_22900_tex2html_wrap7625.gif"> -step Iterative Methods

Next: Reduced System Preconditioning Up: Remaining topics Previous: The Lanczos Connection

Block and -step Iterative Methods

The methods discussed so far are all subspace methods, that is, in every iteration they extend the dimension of the subspace generated. In fact, they generate an orthogonal basis for this subspace, by orthogonalizing the newly generated vector with respect to the previous basis vectors.

However, in the case of nonsymmetric coefficient matrices the newly generated vector may be almost linearly dependent on the existing basis. To prevent break-down or severe numerical error in such instances, methods have been proposed that perform a look-ahead step (see Freund, Gutknecht and Nachtigal [101], Parlett, Taylor and Liu [172], and Freund and Nachtigal [102]).

Several new, unorthogonalized, basis vectors are generated and are then orthogonalized with respect to the subspace already generated. Instead of generating a basis, such a method generates a series of low-dimensional orthogonal subspaces.

The -step iterative methods of Chronopoulos and Gear [55] use this strategy of generating unorthogonalized vectors and processing them as a block to reduce computational overhead and improve processor cache behavior.

If conjugate gradient methods are considered to generate a factorization of a tridiagonal reduction of the original matrix, then look-ahead methods generate a block factorization of a block tridiagonal reduction of the matrix.

A block tridiagonal reduction is also effected by the Block Lanczos algorithm and the Block Conjugate Gradient method (see O'Leary [163]). Such methods operate on multiple linear systems with the same coefficient matrix simultaneously, for instance with multiple right hand sides, or the same right hand side but with different initial guesses. Since these block methods use multiple search directions in each step, their convergence behavior is better than for ordinary methods. In fact, one can show that the spectrum of the matrix is effectively reduced by the smallest eigenvalues, where is the block size.

Next: Reduced System Preconditioning Up: Remaining topics Previous: The Lanczos Connection

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

The Jacobi Method

Next: Convergence of the Up: Stationary Iterative Methods Previous: Stationary Iterative Methods

The Jacobi Method

The Jacobi method is easily derived by examining each of the equations in the linear system in isolation. If in the th equation

we solve for the value of while assuming the other entries of remain fixed, we obtain

This suggests an iterative method defined by

which is the Jacobi method. Note that the order in which the equations are examined is irrelevant, since the Jacobi method treats them independently. For this reason, the Jacobi method is also known as the method of simultaneous displacements, since the updates could in principle be done simultaneously.

Simultaneous displacements, method of: Jacobi method.

In matrix terms, the definition of the Jacobi method in ( ) can be expressed as

where the matrices , and represent the diagonal, the strictly lower-triangular, and the strictly upper-triangular parts of , respectively.

The pseudocode for the Jacobi method is given in Figure . Note that an auxiliary storage vector, is used in the algorithm. It is not possible to update the vector in place, since values from are needed throughout the computation of .

Figure: The Jacobi Method

Convergence of the Jacobi method

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Reduced System Preconditioning

Next: Domain Decomposition Methods Up: Remaining topics Previous: Block and -step

Reduced System Preconditioning

Reduced system: Linear system obtained by eliminating certain variables from another linear system. Although the number of variables is smaller than for the original system, the matrix of a reduced system generally has more nonzero entries. If the original matrix was symmetric and positive definite, then the reduced system has a smaller condition number.

As we have seen earlier, a suitable preconditioner for CG is a matrix such that the system

requires fewer iterations to solve than does, and for which systems can be solved efficiently. The first property is independent of the machine used, while the second is highly machine dependent. Choosing the best preconditioner involves balancing those two criteria in a way that minimizes the overall computation time. One balancing approach used for matrices arising from -point finite difference discretization of second order elliptic partial differential equations (PDEs) with Dirichlet boundary conditions involves solving a reduced system. Specifically, for an grid, we can use a point red-black ordering of the nodes to get

where and are diagonal, and is a well-structured sparse matrix with nonzero diagonals if is even and nonzero diagonals if is odd. Applying one step of block Gaussian elimination (or computing the Schur complement; see Golub and Van Loan [109]) we have

which reduces to

With proper scaling (left and right multiplication by ), we obtain from the second block equation the reduced system

where , , and . The linear system ( ) is of order for even and of order for odd . Once is determined, the solution is easily retrieved from . The values on the black points are those that would be obtained from a red/black ordered SSOR preconditioner on the full system, so we expect faster convergence.

The number of nonzero coefficients is reduced, although the coefficient matrix in ( ) has nine nonzero diagonals. Therefore it has higher density and offers more data locality. Meier and Sameh [150] demonstrate that the reduced system approach on hierarchical memory machines such as the Alliant FX/8 is over times faster than unpreconditioned CG for Poisson's equation on grids with .

For -dimensional elliptic PDEs, the reduced system approach yields a block tridiagonal matrix in ( ) having diagonal blocks of the structure of from the -dimensional case and off-diagonal blocks that are diagonal matrices. Computing the reduced system explicitly leads to an unreasonable increase in the computational complexity of solving . The matrix products required to solve ( ) would therefore be performed implicitly which could significantly decrease performance. However, as Meier and Sameh show [150], the reduced system approach can still be about - times as fast as the conjugate gradient method with Jacobi preconditioning for -dimensional problems.

Domain decomposition method: Solution method for linear systems based on a partitioning of the physical domain of the differential equation. Domain decomposition methods typically involve (repeated) independent system solution on the subdomains, and some way of combining data from the subdomains on the separator part of the domain.

Next: Domain Decomposition Methods Up: Remaining topics Previous: Block and -step

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Domain Decomposition Methods

Next: Overlapping Subdomain Methods Up: Remaining topics Previous: Reduced System Preconditioning

Domain Decomposition Methods

In recent years, much attention has been given to domain decomposition methods for linear elliptic problems that are based on a partitioning of the domain of the physical problem. Since the subdomains can be handled independently, such methods are very attractive for coarse-grain parallel computers. On the other hand, it should be stressed that they can be very effective even on sequential computers.

In this brief survey, we shall restrict ourselves to the standard second order self-adjoint scalar elliptic problems in two dimensions of the form:

where is a positive function on the domain , on whose boundary the value of is prescribed (the Dirichlet problem). For more general problems, and a good set of references, the reader is referred to the series of proceedings [177] [135] [107] [49] [48] [47] and the surveys [196] [51].

For the discretization of ( ), we shall assume for simplicity that is triangulated by a set of nonoverlapping coarse triangles (subdomains) with internal vertices. The 's are in turn further refined into a set of smaller triangles with internal vertices in total. Here denote the coarse and fine mesh size respectively. By a standard Ritz-Galerkin method using piecewise linear triangular basis elements on ( ), we obtain an symmetric positive definite linear system .

Generally, there are two kinds of approaches depending on whether the subdomains overlap with one another (Schwarz methods ) or are separated from one another by interfaces (Schur Complement methods , iterative substructuring).

We shall present domain decomposition methods as preconditioners for the linear system to a reduced (Schur Complement) system defined on the interfaces in the non-overlapping formulation. When used with the standard Krylov subspace methods discussed elsewhere in this book, the user has to supply a procedure for computing or given or and the algorithms to be described herein computes . The computation of is a simple sparse matrix-vector multiply, but may require subdomain solves, as will be described later.

Next: Overlapping Subdomain Methods Up: Remaining topics Previous: Reduced System Preconditioning

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Overlapping Subdomain Methods

Next: Non-overlapping Subdomain Methods Up: Domain Decomposition Methods Previous: Domain Decomposition Methods

Overlapping Subdomain Methods

In this approach, each substructure is extended to a larger substructure containing internal vertices and all the triangles , within a distance from , where refers to the amount of overlap.

Let denote the the discretizations of ( ) on the subdomain triangulation and the coarse triangulation respectively.

Let denote the extension operator which extends (by zero) a function on to and the corresponding pointwise restriction operator. Similarly, let denote the interpolation operator which maps a function on the coarse grid onto the fine grid by piecewise linear interpolation and the corresponding weighted restriction operator.

With these notations, the Additive Schwarz Preconditioner for the system can be compactly described as:

Note that the right hand side can be computed using

subdomain solves using the

's, plus a coarse grid solve using

, all of which can be computed in parallel. Each term

should be evaluated in three steps: (1) Restriction:

, (2) Subdomain solves for

, (3) Interpolation:

. The coarse grid solve is handled in the same manner.

The theory of Dryja and Widlund [76] shows that the condition number of

is bounded independently of both the coarse grid size

and the fine grid size

, provided there is ``sufficient'' overlap between

and

(essentially it means that the ratio

of the distance

of the boundary

should be uniformly bounded from below as

.) If the coarse grid solve term is left out, then the condition number grows as

, reflecting the lack of global coupling provided by the coarse grid.

For the purpose of implementations, it is often useful to interpret the definition of

in matrix notation. Thus the decomposition of

into

's corresponds to partitioning of the components of the vector

into

overlapping groups of index sets

's, each with

components. The

matrix

is simply a principal submatrix of

corresponding to the index set

is a

matrix defined by its action on a vector

defined on

as:

but is zero otherwise. Similarly, the action of its transpose

forms an

-vector by picking off the components of

corresponding to

. Analogously,

is an

matrix with entries corresponding to piecewise linear interpolation and its transpose can be interpreted as a weighted restriction matrix. If

is obtained from

by nested refinement, the action of

can be efficiently computed as in a standard multigrid algorithm. Note that the matrices

are defined by their actions and need not be stored explicitly.

We also note that in this algebraic formulation, the preconditioner

can be extended to any matrix

, not necessarily one arising from a discretization of an elliptic problem. Once we have the partitioning index sets

's, the matrices

are defined. Furthermore, if

is positive definite, then

is guaranteed to be nonsingular. The difficulty is in defining the ``coarse grid'' matrices

, which inherently depends on knowledge of the grid structure. This part of the preconditioner can be left out, at the expense of a deteriorating convergence rate as

increases. Radicati and Robert [178] have experimented with such an algebraic overlapping block Jacobi preconditioner.

Next: Non-overlapping Subdomain Methods Up: Domain Decomposition Methods Previous: Domain Decomposition Methods

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Non-overlapping Subdomain Methods

Next: Further Remarks Up: Domain Decomposition Methods Previous: Overlapping Subdomain Methods

Non-overlapping Subdomain Methods

The easiest way to describe this approach is through matrix notation. The set of vertices of can be divided into two groups. The set of interior vertices of all and the set of vertices which lies on the boundaries of the coarse triangles in . We shall re-order and as and corresponding to this partition. In this ordering, equation ( ) can be written as follows:

We note that since the subdomains are uncoupled by the boundary vertices, is block-diagonal with each block being the stiffness matrix corresponding to the unknowns belonging to the interior vertices of subdomain .

By a block LU-factorization of , the system in ( ) can be written as:

where

is the Schur complement of

By eliminating

in ( ), we arrive at the following equation for

We note the following properties of this Schur Complement system:

inherits the symmetric positive definiteness of

.
is dense in general and computing it explicitly requires as many solves on each subdomain as there are points on each of its edges.
The condition number of

is

, an improvement over the

growth for

.
Given a vector

defined on the boundary vertices

of

, the matrix-vector product

can be computed according to

where

involves

independent subdomain solves using

.
The right hand side

can also be computed using

independent subdomain solves.

These properties make it possible to apply a preconditioned iterative method to (

), which is the basic method in the nonoverlapping substructuring approach. We will also need some good preconditioners to further improve the convergence of the Schur system.

We shall first describe the Bramble-Pasciak-Schatz preconditioner [36]. For this, we need to further decompose

into two non-overlapping index sets:

where

denote the set of nodes corresponding to the vertices

's of

, and

denote the set of nodes on the edges

's of the coarse triangles in

, excluding the vertices belonging to

In addition to the coarse grid interpolation and restriction operators

defined before, we shall also need a new set of interpolation and restriction operators for the edges

's. Let

be the pointwise restriction operator (an

matrix, where

is the number of vertices on the edge

) onto the edge

defined by its action

but is zero otherwise. The action of its transpose extends by zero a function defined on

to one defined on

Corresponding to this partition of

can be written in the block form:

The block

can again be block partitioned, with most of the subblocks being zero. The diagonal blocks

are the principal submatrices of

corresponding to

. Each

represents the coupling of nodes on interface

separating two neighboring subdomains.

In defining the preconditioner, the action of

is needed. However, as noted before, in general

is a dense matrix which is also expensive to compute, and even if we had it, it would be expensive to compute its action (we would need to compute its inverse or a Cholesky factorization). Fortunately, many efficiently invertible approximations to

have been proposed in the literature (see Keyes and Gropp [136]) and we shall use these so-called interface preconditioners for

instead. We mention one specific preconditioner:

where

is an

one dimensional Laplacian matrix, namely a tridiagonal matrix with

's down the main diagonal and

's down the two off-diagonals, and

is taken to be some average of the coefficient

of ( ) on the edge

. We note that since the eigen-decomposition of

is known and computable by the Fast Sine Transform, the action of

can be efficiently computed.

With these notations, the Bramble-Pasciak-Schatz preconditioner is defined by its action on a vector

defined on

as follows:

Analogous to the additive Schwarz preconditioner, the computation of each term consists of the three steps of restriction-inversion-interpolation and is independent of the others and thus can be carried out in parallel.

Bramble, Pasciak and Schatz [36] prove that the condition number of

is bounded by

. In practice, there is a slight growth in the number of iterations as

becomes small (i.e., as the fine grid is refined) or as

becomes large (i.e., as the coarse grid becomes coarser).

The

growth is due to the coupling of the unknowns on the edges incident on a common vertex

, which is not accounted for in

. Smith [191] proposed a vertex space modification to

which explicitly accounts for this coupling and therefore eliminates the dependence on

and

. The idea is to introduce further subsets of

called vertex spaces

with

consisting of a small set of vertices on the edges incident on the vertex

and adjacent to it. Note that

overlaps with

and

. Let

be the principal submatrix of

corresponding to

, and

be the corresponding restriction (pointwise) and extension (by zero) matrices. As before,

is dense and expensive to compute and factor/solve but efficiently invertible approximations (some using variants of the

operator defined before) have been developed in the literature (see Chan, Mathew and Shao [52]). We shall let

be such a preconditioner for

. Then Smith's Vertex Space preconditioner is defined by:

Smith [191] proved that the condition number of

is bounded independent of

and

, provided there is sufficient overlap of

with

Next: Further Remarks Up: Domain Decomposition Methods Previous: Overlapping Subdomain Methods

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Further Remarks

Next: Multiplicative Schwarz Methods Up: Domain Decomposition Methods Previous: Non-overlapping Subdomain Methods

Further Remarks

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Multiplicative Schwarz Methods

Next: Inexact Solves Up: Further Remarks Previous: Further Remarks

Multiplicative Schwarz Methods

As mentioned before, the Additive Schwarz preconditioner can be viewed as an overlapping block Jacobi preconditioner. Analogously, one can define a multiplicative Schwarz preconditioner which corresponds to a symmetric block Gauss-Seidel version. That is, the solves on each subdomain are performed sequentially, using the most current iterates as boundary conditions from neighboring subdomains. Even without conjugate gradient acceleration, the multiplicative method can take many fewer iterations than the additive version. However, the multiplicative version is not as parallelizable, although the degree of parallelism can be increased by trading off the convergence rate through multi-coloring the subdomains. The theory can be found in Bramble, et al. [37].

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Inexact Solves

Next: Nonsymmetric Problems Up: Further Remarks Previous: Multiplicative Schwarz Methods

Inexact Solves

The exact solves involving and in can be replaced by inexact solves and , which can be standard elliptic preconditioners themselves (e.g. multigrid, ILU, SSOR, etc.).

For the Schwarz methods, the modification is straightforward and the Inexact Solve Additive Schwarz Preconditioner is simply:

The Schur Complement methods require more changes to accommodate inexact solves. By replacing

in the definitions of

and

, we can easily obtain inexact preconditioners

and

for

. The main difficulty is, however, that the evaluation of the product

requires exact subdomain solves in

. One way to get around this is to use an inner iteration using

as a preconditioner for

in order to compute the action of

. An alternative is to perform the iteration on the larger system ( ) and construct a preconditioner from the factorization in ( ) by replacing the terms

respectively, where

can be either

. Care must be taken to scale

and

so that they are as close to

and

as possible respectively - it is not sufficient that the condition number of

and

be close to unity, because the scaling of the coupling matrix

may be wrong.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Nonsymmetric Problems

Next: Choice of Coarse Up: Further Remarks Previous: Inexact Solves

Nonsymmetric Problems

The preconditioners given above extend naturally to nonsymmetric 's (e.g., convection-diffusion problems), at least when the nonsymmetric part is not too large. The nice theoretical convergence rates can be retained provided that the coarse grid size is chosen small enough (depending on the size of the nonsymmetric part of ) (see Cai and Widlund [43]). Practical implementations (especially for parallelism) of nonsymmetric domain decomposition methods are discussed in [138] [137].

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Choice of Coarse Grid Size <IMG ALIGN=BOTTOM SRC="http://www.netlib.org/utk/papers/templates/_22900_tex2html_wrap8189.gif">

Next: Multigrid Methods Up: Further Remarks Previous: Nonsymmetric Problems

Choice of Coarse Grid Size

Given , it has been observed empirically (see Gropp and Keyes [111]) that there often exists an optimal value of which minimizes the total computational time for solving the problem. A small provides a better, but more expensive, coarse grid approximation, and requires solving more, but smaller, subdomain solves. A large has the opposite effect. For model problems, the optimal can be determined for both sequential and parallel implementations (see Chan and Shao [53]). In practice, it may pay to determine a near optimal value of empirically if the preconditioner is to be re-used many times. However, there may also be geometric constraints on the range of values that can take.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Multigrid Methods

Next: Row Projection Methods Up: Remaining topics Previous: Choice of Coarse

Multigrid Methods

Multigrid method: Solution method for linear systems based on restricting and extrapolating solutions between a series of nested grids.

Simple iterative methods (such as the Jacobi method) tend to damp out high frequency components of the error fastest (see § ). This has led people to develop methods based on the following heuristic:

Perform some steps of a basic method in order to smooth out the error.
Restrict the current state of the problem to a subset of the grid points, the so-called ``coarse grid'', and solve the resulting projected problem.
Interpolate the coarse grid solution back to the original grid, and perform a number of steps of the basic method again.

Steps 1 and 3 are called ``pre-smoothing'' and ``post-smoothing'' respectively; by applying this method recursively to step 2 it becomes a true ``multigrid'' method. Usually the generation of subsequently coarser grids is halted at a point where the number of variables becomes small enough that direct solution of the linear system is feasible.

The method outlined above is said to be a ``V-cycle'' method, since it descends through a sequence of subsequently coarser grids, and then ascends this sequence in reverse order. A ``W-cycle'' method results from visiting the coarse grid twice, with possibly some smoothing steps in between.

An analysis of multigrid methods is relatively straightforward in the case of simple differential operators such as the Poisson operator on tensor product grids. In that case, each next coarse grid is taken to have the double grid spacing of the previous grid. In two dimensions, a coarse grid will have one quarter of the number of points of the corresponding fine grid. Since the coarse grid is again a tensor product grid, a Fourier analysis (see for instance Briggs [42]) can be used. For the more general case of self-adjoint elliptic operators on arbitrary domains a more sophisticated analysis is needed (see Hackbusch [117], McCormick [148]). Many multigrid methods can be shown to have an (almost) optimal number of operations, that is, the work involved is proportional to the number of variables.

From the above description it is clear that iterative methods play a role in multigrid theory as smoothers (see Kettler [133]). Conversely, multigrid-like methods can be used as preconditioners in iterative methods. The basic idea here is to partition the matrix on a given grid to a structure

with the variables in the second block row corresponding to the coarse grid nodes. The matrix on the next grid is then an incomplete version of the Schur complement

The coarse grid is typically formed based on a red-black or cyclic reduction ordering; see for instance Rodrigue and Wolitzer [180], and Elman [93].

Some multigrid preconditioners try to obtain optimality results similar to those for the full multigrid method. Here we will merely supply some pointers to the literature: Axelsson and Eijkhout [16], Axelsson and Vassilevski [22] [23], Braess [35], Maitre and Musy [145], McCormick and Thomas [149], Yserentant [218] and Wesseling [215].

Next: Row Projection Methods Up: Remaining topics Previous: Choice of Coarse

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Convergence of the Jacobi method

Next: The Gauss-Seidel Method Up: The Jacobi Method Previous: The Jacobi Method

Convergence of the Jacobi method

Iterative methods are often used for solving discretized partial differential equations. In that context a rigorous analysis of the convergence of simple methods such as the Jacobi method can be given.

As an example, consider the boundary value problem

discretized by

The eigenfunctions of the and operator are the same: for the function is an eigenfunction corresponding to . The eigenvalues of the Jacobi iteration matrix are then .

From this it is easy to see that the high frequency modes (i.e., eigenfunction with large) are damped quickly, whereas the damping factor for modes with small is close to . The spectral radius of the Jacobi iteration matrix is , and it is attained for the eigenfunction .

Spectral radius: The spectral radius of a matrix is . Spectrum: The set of all eigenvalues of a matrix.

The type of analysis applied to this example can be generalized to higher dimensions and other stationary iterative methods. For both the Jacobi and Gauss-Seidel method (below) the spectral radius is found to be where is the discretization mesh width, i.e., where is the number of variables and is the number of space dimensions.

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Row Projection Methods

Next: Obtaining the Software Up: Remaining topics Previous: Multigrid Methods

Row Projection Methods

Most iterative methods depend on spectral properties of the coefficient matrix, for instance some require the eigenvalues to be in the right half plane. A class of methods without this limitation is that of row projection methods (see Björck and Elfving [34], and Bramley and Sameh [38]). They are based on a block row partitioning of the coefficient matrix

and iterative application of orthogonal projectors

These methods have good parallel properties and seem to be robust in handling nonsymmetric and indefinite problems.

Row projection methods can be used as preconditioners in the conjugate gradient method. In that case, there is a theoretical connection with the conjugate gradient method on the normal equations (see § ).

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Obtaining the Software

Next: Overview of the Up: Templates for the Solution Previous: Row Projection Methods

Obtaining the Software

A large body of numerical software is freely available 24 hours a day via an electronic service called Netlib. In addition to the template material, there are dozens of other libraries, technical reports on various parallel computers and software, test data, facilities to automatically translate FORTRAN programs to C, bibliographies, names and addresses of scientists and mathematicians, and so on. One can communicate with Netlib in one of a number of ways: by email, through anonymous ftp (netlib2.cs.utk.edu) or (much more easily) via the World Wide Web through some web browser like Netscape or Mosaic. The url for the Templates work is: http://www.netlib.org/templates/ . The html version of this book can be found in: http://www.netlib.org/templates/Templates.html .

To get started using netlib, one sends a message of the form send index to netlib@ornl.gov. A description of the entire library should be sent to you within minutes (providing all the intervening networks as well as the netlib server are up).

FORTRAN and C versions of the templates for each method described in this book are available from Netlib. A user sends a request by electronic mail as follows:

             mail netlib@ornl.gov

On the subject line or in the body, single or multiple requests (one per line) may be made as follows:

             send index from templates
             send sftemplates.shar from templates

The first request results in a return e-mail message containing the index from the library templates, along with brief descriptions of its contents. The second request results in a return e-mail message consisting of a shar file containing the single precision FORTRAN routines and a README file. The versions of the templates that are available are listed in the table below:

Save the mail message to a file called templates.shar. Edit the mail header from this file and delete the lines down to and including << Cut Here >>. In the directory containing the shar file, type

             sh templates.shar

No subdirectory will be created. The unpacked files will stay in the current directory. Each method is written as a separate subroutine in its own file, named after the method (e.g., CG.f, BiCGSTAB.f, GMRES.f). The argument parameters are the same for each, with the exception of the required matrix-vector products and preconditioner solvers (some require the transpose matrix). Also, the amount of workspace needed varies. The details are documented in each routine.

Note that the matrix-vector operations are accomplished using the BLAS [144] (many manufacturers have assembly coded these kernels for maximum performance), although a mask file is provided to link to user defined routines.

The README file gives more details, along with instructions for a test routine.

Next: Overview of the Up: Templates for the Solution Previous: Row Projection Methods

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Overview of the BLAS

Next: Glossary Up: Templates for the Solution Previous: Obtaining the Software

Overview of the BLAS

The BLAS give us a standardized set of basic codes for performing operations on vectors and matrices. BLAS take advantage of the Fortran storage structure and the structure of the mathematical system wherever possible. Additionally, many computers have the BLAS library optimized to their system. Here we use five routines:

SCOPY: copies a vector onto another vector
SAXPY: adds vector (multiplied by a scalar) to vector
SGEMV: general matrix vector product
STRMV: matrix vector product when the matrix is triangular
STRSV: solves for triangular matrix

The prefix ``S'' denotes single precision. This prefix may be changed to ``D'', ``C'', or ``Z'', giving the routine double, complex, or double complex precision. (Of course, the declarations would also have to be changed.) It is important to note that putting double precision into single variables works, but single into double will cause errors.

If we define a(i,j) and = x(i), we can see what the code is doing:

ALPHA = SDOT( N, X, 1, Y, 1 ) computes the inner product of two vectors and , putting the result in scalar .
The corresponding Fortran segment is
```
ALPHA = 0.0
DO I = 1, N
  ALPHA = ALPHA  + X(I)*Y(I)
ENDDO
```
CALL SAXPY( N, ALPHA, X, 1, Y ) multiplies a vector of length by the scalar , then adds the result to the vector , putting the result in .
The corresponding Fortran segment is
```
DO I = 1, N
  Y(I) = ALPHA*X(I) + Y(I)
ENDDO
```
CALL SGEMV( 'N', M, N, ONE, A, LDA, X, 1, ONE, B, 1 ) computes the matrix-vector product plus vector , putting the resulting vector in .
The corresponding Fortran segment:
```
DO J = 1, N
   DO I = 1, M
      B(I) = A(I,J)*X(J) + B(I)
   ENDDO
ENDDO
```
This illustrates a feature of the BLAS that often requires close attention. For example, we will use this routine to compute the residual vector , where is our current approximation to the solution (merely change the fourth argument to -1.0E0). Vector will be overwritten with the residual vector; thus, if we need it later, we will first copy it to temporary storage.
CALL STRMV( 'U', 'N', 'N', N, A, LDA, X, 1 ) computes the matrix-vector product , putting the resulting vector in , for upper triangular matrix .
The corresponding Fortran segment is
```
DO J = 1, N
   TEMP = X(J)
   DO I = 1, J
      X(I) = X(I) + TEMP*A(I,J)
   ENDDO
ENDDO
```

Note that the parameters in single quotes are for descriptions such as 'U' for `UPPER TRIANGULAR', 'N' for `No Transpose'. This feature will be used extensively, resulting in storage savings (among other advantages).

The variable LDA is critical for addressing the array correctly. LDA is the leading dimension of the two-dimensional array A, that is, LDA is the declared (or allocated) number of rows of the two-dimensional array .

Next: Glossary Up: Templates for the Solution Previous: Obtaining the Software

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Glossary

Next: Notation Up: Templates for the Solution Previous: Overview of the

Glossary

Adaptive methods

Iterative methods that collect information about the coefficient matrix during the iteration process, and use this to speed up convergence.

Backward error

The size of perturbations

of the coefficient matrix and

of the right hand side of a linear system

, such that the computed iterate

is the solution of

Band matrix

A matrix

for which there are nonnegative constants

such that

. The two constants

are called the left and right halfbandwidth respectively.

Black box

A piece of software that can be used without knowledge of its inner workings; the user supplies the input, and the output is assumed to be correct.

BLAS

Basic Linear Algebra Subprograms; a set of commonly occurring vector and matrix operations for dense linear algebra. Optimized (usually assembly coded) implementations of the BLAS exist for various computers; these will give a higher performance than implementation in high level programming languages.

Block factorization

See: Block matrix operations.

Block matrix operations

Matrix operations expressed in terms of submatrices.

Breakdown

The occurrence of a zero divisor in an iterative method.

Cholesky decomposition

Expressing a symmetric matrix

as a product of a lower triangular matrix

and its transpose

, that is,

Condition number

See: Spectral condition number.

Convergence

The fact whether or not, or the rate at which, an iterative method approaches the solution of a linear system. Convergence can be

Linear : some measure of the distance to the solution decreases by a constant factor in each iteration.
Superlinear : the measure of the error decreases by a growing factor.
Smooth : the measure of the error decreases in all or most iterations, though not necessarily by the same factor.
Irregular : the measure of the error decreases in some iterations and increases in others. This observation unfortunately does not imply anything about the ultimate convergence of the method.
Stalled : the measure of the error stays more or less constant during a number of iterations. As above, this does not imply anything about the ultimate convergence of the method.

Dense matrix

Matrix for which the number of zero elements is too small to warrant specialized algorithms to exploit these zeros.

Diagonally dominant matrix

See: Matrix properties

Direct method

An algorithm that produces the solution to a system of linear equations in a number of operations that is determined a priori by the size of the system. In exact arithmetic, a direct method yields the true solution to the system. See: Iterative method.

Distributed memory

See: Parallel computer.

Divergence

An iterative method is said to diverge if it does not converge in a reasonable number of iterations, or if some measure of the error grows unacceptably. However, growth of the error as such is no sign of divergence: a method with irregular convergence behavior may ultimately converge, even though the error grows during some iterations.

Domain decomposition method

Solution method for linear systems based on a partitioning of the physical domain of the differential equation. Domain decomposition methods typically involve (repeated) independent system solution on the subdomains, and some way of combining data from the subdomains on the separator part of the domain.

Field of values

Given a matrix

, the field of values is the set

. For symmetric matrices this is the range

Fill

A position that is zero in the original matrix

but not in an exact factorization of

. In an incomplete factorization, some fill elements are discarded.

Forward error

The difference between a computed iterate and the true solution of a linear system, measured in some vector norm.

Halfbandwidth

See: Band matrix.

Ill-conditioned system

A linear system for which the coefficient matrix has a large condition number. Since in many applications the condition number is proportional to (some power of) the number of unknowns, this should be understood as the constant of proportionality being large.

IML++

A mathematical template library in C++ of iterative method for solving linear systems.

Incomplete factorization

A factorization obtained by discarding certain elements during the factorization process (`modified' and `relaxed' incomplete factorization compensate in some way for discarded elements). Thus an incomplete

factorization of a matrix

will in general satisfy

; however, one hopes that the factorization

will be close enough to

to function as a preconditioner in an iterative method.

Iterate

Approximation to the solution of a linear system in any iteration of an iterative method.

Iterative method

An algorithm that produces a sequence of approximations to the solution of a linear system of equations; the length of the sequence is not given a priori by the size of the system. Usually, the longer one iterates, the closer one is able to get to the true solution. See: Direct method.

Krylov sequence

For a given matrix

and vector

, the sequence of vectors

, or a finite initial part of this sequence.

Krylov subspace

The subspace spanned by a Krylov sequence.

LAPACK

A mathematical library of linear algebra routine for dense systems solution and eigenvalue calculations.

Lower triangular matrix

Matrix

for which

factorization

A way of writing a matrix

as a product of a lower triangular matrix

and a unitary matrix

, that is,

factorization / decomposition

Expressing a matrix

as a product of a lower triangular matrix

and an upper triangular matrix

, that is,

-Matrix

See: Matrix properties.

Matrix norms

See: Norms.

Matrix properties

We call a square matrix

Symmetric: if for all , .
Positive definite: if it satisfies for all nonzero vectors .
Diagonally dominant: if ; the excess amount is called the diagonal dominance of the matrix.
An -matrix: if for , and it is nonsingular with for all , .

Message passing

See: Parallel computer.

Multigrid method

Solution method for linear systems based on restricting and extrapolating solutions between a series of nested grids.

Modified incomplete factorization

See: Incomplete factorization.

Mutually consistent norms

See: Norms.

Natural ordering

See: Ordering of unknowns.

Nonstationary iterative method

Iterative method that has iteration-dependent coefficients.

Normal equations

For a nonsymmetric or indefinite (but nonsingular) system of equations

, either of the related symmetric systems (

) and (

;

). For complex

is replaced with

in the above expressions.

Norms

A function

is called a vector norm if

for all , and only if .
for all , .
for all , .

The same properties hold for matrix norms. A matrix norm and a vector norm (both denoted

) are called a mutually consistent pair if for all matrices

and vectors

Ordering of unknowns

For linear systems derived from a partial differential equation, each unknown corresponds to a node in the discretization mesh. Different orderings of the unknowns correspond to permutations of the coefficient matrix. The convergence speed of iterative methods may depend on the ordering used, and often the parallel efficiency of a method on a parallel computer is strongly dependent on the ordering used. Some common orderings for rectangular domains are:

The natural ordering; this is the consecutive numbering by rows and columns.
The red/black ordering; this is the numbering where all nodes with coordinates for which is odd are numbered before those for which is even.
The ordering by diagonals; this is the ordering where nodes are grouped in levels for which is constant. All nodes in one level are numbered before the nodes in the next level.

For matrices from problems on less regular domains, some common orderings are:

The Cuthill-McKee ordering; this starts from one point, then numbers its neighbors, and continues numbering points that are neighbors of already numbered points. The Reverse Cuthill-McKee ordering then reverses the numbering; this may reduce the amount of fill in a factorization of the matrix.
The Minimum Degree ordering; this orders the matrix rows by increasing numbers of nonzeros.

Parallel computer

Computer with multiple independent processing units. If the processors have immediate access to the same memory, the memory is said to be shared; if processors have private memory that is not immediately visible to other processors, the memory is said to be distributed. In that case, processors communicate by message-passing.

Pipelining

See: Vector computer.

Positive definite matrix

See: Matrix properties.

Preconditioner

An auxiliary matrix in an iterative method that approximates in some sense the coefficient matrix or its inverse. The preconditioner, or preconditioning matrix, is applied in every step of the iterative method.

Red/black ordering

See: Ordering of unknowns.

Reduced system

Linear system obtained by eliminating certain variables from another linear system. Although the number of variables is smaller than for the original system, the matrix of a reduced system generally has more nonzero entries. If the original matrix was symmetric and positive definite, then the reduced system has a smaller condition number.

Relaxed incomplete factorization

See: Incomplete factorization.

Residual

If an iterative method is employed to solve for

in a linear system

, then the residual corresponding to a vector

Search direction

Vector that is used to update an iterate.

Shared memory

See: Parallel computer.

Simultaneous displacements, method of

Jacobi method.

Sparse matrix

Matrix for which the number of zero elements is large enough that algorithms avoiding operations on zero elements pay off. Matrices derived from partial differential equations typically have a number of nonzero elements that is proportional to the matrix size, while the total number of matrix elements is the square of the matrix size.

Spectral condition number

The product

where and denote the largest and smallest eigenvalues, respectively. For linear systems derived from partial differential equations in 2D, the condition number is proportional to the number of unknowns.

Spectral radius

The spectral radius of a matrix

Spectrum

The set of all eigenvalues of a matrix.

Stationary iterative method

Iterative method that performs in each iteration the same operations on the current iteration vectors.

Stopping criterion

Since an iterative method computes successive approximations to the solution of a linear system, a practical test is needed to determine when to stop the iteration. Ideally this test would measure the distance of the last iterate to the true solution, but this is not possible. Instead, various other metrics are used, typically involving the residual.

Storage scheme

The way elements of a matrix are stored in the memory of a computer. For dense matrices, this can be the decision to store rows or columns consecutively. For sparse matrices, common storage schemes avoid storing zero elements; as a result they involve indices, stored as integer data, that indicate where the stored elements fit into the global matrix.

Successive displacements, method of

Gauss-Seidel method.

Symmetric matrix

See: Matrix properties.

Template

Description of an algorithm, abstracting away from implementational details.

Tune

Adapt software for a specific application and computing environment in order to obtain better performance in that case only. itemize

Upper triangular matrix

Matrix

for which

Vector computer

Computer that is able to process consecutive identical operations (typically additions or multiplications) several times faster than intermixed operations of different types. Processing identical operations this way is called `pipelining' the operations.

Vector norms

See: Norms.

Next: Notation Up: Templates for the Solution Previous: Overview of the

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Notation

Next: References Up: Templates for the Solution Previous: Glossary

Notation

In this section, we present some of the notation we use throughout the book. We have tried to use standard notation that would be found in any current publication on the subjects covered.

Throughout, we follow several conventions:

Matrices are denoted by capital letters.
Vectors are denoted by lowercase letters.
Lowercase Greek letters usually denote scalars, for instance
Matrix elements are denoted by doubly indexed lowercase letter, however
Matrix subblocks are denoted by doubly indexed uppercase letters.

We define matrix of dimension and block dimension as follows:

We define vector of dimension as follows:

Other notation is as follows:

(or simply if the size is clear from the context) denotes the identity matrix.
= diag denotes that matrix has elements on its diagonal, and zeros everywhere else.
denotes the th element of vector during the th iteration

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

References

Next: Index Up: Templates for the Solution Previous: Notation

References

1: J. AARDEN AND K.-E. KARLSSON, Preconditioned CG-type methods for solving the coupled systems of fundamental semiconductor equations, BIT, 29 (1989), pp. 916-937.
2: L. ADAMS AND H. JORDAN, Is SOR color-blind?, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 490-506.
3: E. ANDERSON, ET. AL., LAPACK Users Guide, SIAM, Philadelphia, 1992.
4: J. APPLEYARD AND I. CHESHIRE, Nested factorization, in Reservoir Simulation Symposium of the SPE, 1983. Paper 12264.
5: M. ARIOLI, J. DEMMEL, AND I. DUFF, Solving sparse linear systems with sparse backward error, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 165-190.
6: W. ARNOLDI, The principle of minimized iterations in the solution of the matrix eigenvalue problem, Quart. Appl. Math., 9 (1951), pp. 17-29.
7: S. ASHBY, CHEBYCODE: A Fortran implementation of Manteuffel's adaptive Chebyshev algorithm, Tech. Rep. UIUCDCS-R-85-1203, University of Illinois, 1985.
8: S. ASHBY, T. MANTEUFFEL, AND J. OTTO, A comparison of adaptive Chebyshev and least squares polynomial preconditioning for Hermitian positive definite linear systems, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1-29.
9: S. ASHBY, T. MANTEUFFEL, AND P. SAYLOR, Adaptive polynomial preconditioning for Hermitian indefinite linear systems, BIT, 29 (1989), pp. 583-609.
10: S. F. ASHBY, T. A. MANTEUFFEL, AND P. E. SAYLOR, A taxonomy for conjugate gradient methods, SIAM J. Numer. Anal., 27 (1990), pp. 1542-1568.
11: C. ASHCRAFT AND R. GRIMES, On vectorizing incomplete factorizations and SSOR preconditioners, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 122-151.
12: O. AXELSSON, Incomplete block matrix factorization preconditioning methods. The ultimate answer?, J. Comput. Appl. Math., 12& (1985), pp. 3-18.
13: height 2pt depth -1.6pt width 23pt, A general incomplete block-matrix factorization method, Linear Algebra Appl., 74 (1986), pp. 179-190.
14: O. AXELSSON AND A. BARKER, Finite element solution of boundary value problems. Theory and computation, Academic Press, Orlando, Fl., 1984.
15: O. AXELSSON AND V. EIJKHOUT, Vectorizable preconditioners for elliptic difference equations in three space dimensions, J. Comput. Appl. Math., 27 (1989), pp. 299-321.
16: height 2pt depth -1.6pt width 23pt, The nested recursive two-level factorization method for nine-point difference matrices, SIAM J. Sci. Statist. Comput., 12 (1991), pp. 1373-1400.
17: O. AXELSSON AND I. GUSTAFSSON, Iterative solution for the solution of the Navier equations of elasticity, Comput. Methods Appl. Mech. Engrg., 15 (1978), pp. 241-258.
18: O. AXELSSON AND G. LINDSKOG, On the eigenvalue distribution of a class of preconditioning matrices, Numer. Math., 48 (1986), pp. 479-498.
19: height 2pt depth -1.6pt width 23pt, On the rate of convergence of the preconditioned conjugate gradient method, Numer. Math., 48 (1986), pp. 499-523.
20: O. AXELSSON AND N. MUNKSGAARD, Analysis of incomplete factorizations with fixed storage allocation, in Preconditioning Methods - Theory and Applications, D. Evans, ed., Gordon and Breach, New York, 1983, pp. 265-293.
21: O. AXELSSON AND B. POLMAN, On approximate factorization methods for block-matrices suitable for vector and parallel processors, Linear Algebra Appl., 77 (1986), pp. 3-26.
22: O. AXELSSON AND P. VASSILEVSKI, Algebraic multilevel preconditioning methods, I, Numer. Math., 56 (1989), pp. 157-177.
23: height 2pt depth -1.6pt width 23pt, Algebraic multilevel preconditioning methods, II, SIAM J. Numer. Anal., 57 (1990), pp. 1569-1590.
24: O. AXELSSON AND P. S. VASSILEVSKI, A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 625-644.
25: R. BANK, Marching algorithms for elliptic boundary value problems; II: The variable coefficient case, SIAM J. Numer. Anal., 14 (1977), pp. 950-970.
26: R. BANK, T. CHAN, W. COUGHRAN JR., AND R. SMITH, The Alternate-Block-Factorization procedure for systems of partial differential equations, BIT, 29 (1989), pp. 938-954.
27: R. BANK AND D. ROSE, Marching algorithms for elliptic boundary value problems. I: The constant coefficient case, SIAM J. Numer. Anal., 14 (1977), pp. 792-829.
28: R. E. BANK AND T. F. CHAN, An analysis of the composite step Biconjugate gradient method, Numerische Mathematik, 66 (1993), pp. 295-319.
29: R. E. BANK AND T. F. CHAN, A composite step bi-conjugate gradient algorithm for nonsymmetric linear systems, Numer. Alg., (1994), pp. 1-16.
30: G. BAUDET, Asynchronous iterative methods for multiprocessors, J. Assoc. Comput. Mach., 25 (1978), pp. 226-244.
31: R. BEAUWENS, On Axelsson's perturbations, Linear Algebra Appl., 68 (1985), pp. 221-242.
32: height 2pt depth -1.6pt width 23pt, Approximate factorizations with S/P consistently ordered -factors, BIT, 29 (1989), pp. 658-681.
33: R. BEAUWENS AND L. QUENON, Existence criteria for partial matrix factorizations in iterative methods, SIAM J. Numer. Anal., 13 (1976), pp. 615-643.
34: A. BJÖRCK AND T. ELFVING, Accelerated projection methods for computing pseudo-inverse solutions of systems of linear equations, BIT, 19 (1979), pp. 145-163.
35: D. BRAESS, The contraction number of a multigrid method for solving the Poisson equation, Numer. Math., 37 (1981), pp. 387-404.
36: J. H. BRAMBLE, J. E. PASCIAK, AND A. H. SCHATZ, The construction of preconditioners for elliptic problems by substructuring, I, Mathematics of Computation, 47 (1986), pp. 103- 134.
37: J. H. BRAMBLE, J. E. PASCIAK, J. WANG, AND J. XU, Convergence estimates for product iterative methods with applications to domain decompositions and multigrid, Math. Comp., 57(195) (1991), pp. 1-21.
38: R. BRAMLEY AND A. SAMEH, Row projection methods for large nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 168-193.
39: C. BREZINSKI AND H. SADOK, Avoiding breakdown in the CGS algorithm, Numer. Alg., 1 (1991), pp. 199-206.
40: C. BREZINSKI, M. ZAGLIA, AND H. SADOK, Avoiding breakdown and near breakdown in Lanczos type algorithms, Numer. Alg., 1 (1991), pp. 261-284.
41: height 2pt depth -1.6pt width 23pt, A breakdown free Lanczos type algorithm for solving linear systems, Numer. Math., 63 (1992), pp. 29-38.
42: W. BRIGGS, A Multigrid Tutorial, SIAM, Philadelphia, 1977.
43: X.-C. CAI AND O. WIDLUND, Multiplicative Schwarz algorithms for some nonsymmetric and indefinite problems, SIAM J. Numer. Anal., 30 (1993), pp. 936-952.
44: T. CHAN, Fourier analysis of relaxed incomplete factorization preconditioners, SIAM J. Sci. Statist. Comput., 12 (1991), pp. 668-680.
45: T. CHAN, L. DE PILLIS, AND H. VAN DER VORST, A transpose-free squared Lanczos algorithm and application to solving nonsymmetric linear systems, Tech. Rep. CAM 91-17, UCLA, Dept. of Math., Los Angeles, CA 90024-1555, 1991.
46: T. CHAN, E. GALLOPOULOS, V. SIMONCINI, T. SZETO, AND C. TONG, A quasi-minimal residual variant of the Bi-CGSTAB algorithm for nonsymmetric systems, SIAM J. Sci. Comp., 15(2) (1994), pp. 338-347.
47: T. CHAN, R. GLOWINSKI, , J. PéRIAUX, AND O. WIDLUND, eds., Domain Decomposition Methods, Philadelphia, 1989, SIAM. Proceedings of the Second International Symposium on Domain Decomposition Methods, Los Angeles, CA, January 14 - 16, 1988.
48: height 2pt depth -1.6pt width 23pt, eds., Domain Decomposition Methods, Philadelphia, 1990, SIAM. Proceedings of the Third International Symposium on Domain Decomposition Methods, Houston, TX, 1989.
49: height 2pt depth -1.6pt width 23pt, eds., Domain Decomposition Methods, SIAM, Philadelphia, 1991. Proceedings of the Fourth International Symposium on Domain Decomposition Methods, Moscow, USSR, 1990.
50: T. CHAN AND C.-C. J. KUO, Two-color Fourier analysis of iterative algorithms for elliptic problems with red/black ordering, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 767-793.
51: T. F. CHAN AND T. MATHEW, Domain decomposition algorithms, Acta Numerica, (1994), pp. 61-144.
52: T. F. CHAN, T. P. MATHEW, AND J. P. SHAO, Efficient variants of the vertex space domain decomposition algorithm, SIAM J. Sci. Comput., 15(6) (1994), pp. 1349-1374.
53: T. F. CHAN AND J. SHAO, Optimal coarse grid size in domain decomposition, J. Comput. Math., 12(4) (1994), pp. 291-297.
54: D. CHAZAN AND W. MIRANKER, Chaotic relaxation, Linear Algebra Appl., 2 (1969), pp. 199-222.
55: A. CHRONOPOULOS AND C. GEAR, -step iterative methods for symmetric linear systems, J. Comput. Appl. Math., 25 (1989), pp. 153-168.
56: P. CONCUS AND G. GOLUB, A generalized conjugate gradient method for nonsymmetric systems of linear equations, in Computer methods in Applied Sciences and Engineering, Second International Symposium, Dec 15-19, 1975; Lecture Notes in Economics and Mathematical Systems, Vol. 134, Berlin, New York, 1976, Springer-Verlag.
57: P. CONCUS, G. GOLUB, AND G. MEURANT, Block preconditioning for the conjugate gradient method, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 220-252.
58: P. CONCUS, G. GOLUB, AND D. O'LEARY, A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations, in Sparse Matrix Computations, J. Bunch and D. Rose, eds., Academic Press, New York, 1976, pp. 309-332.
59: P. CONCUS AND G. H. GOLUB, Use of fast direct methods for the efficient numerical solution of nonseparable elliptic equations, SIAM J. Numer. Anal., 10 (1973), pp. 1103-1120.
60: E. CUTHILL AND J. MCKEE, Reducing the bandwidth of sparse symmetric matrices, in ACM Proceedings of the 24th National Conference, 1969.
61: E. D'AZEVEDO, V. EIJKHOUT, AND C. ROMINE, LAPACK working note 56: Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessor, tech. report, Computer Science Department, University of Tennessee, Knoxville, TN, 1993.
62: E. D'AZEVEDO AND C. ROMINE, Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors, Tech. Rep. ORNL/TM-12192, Oak Ridge National Lab, Oak Ridge, TN, 1992.
63: E. DE STURLER, A parallel restructured version of GMRES(m), Tech. Rep. 91-85, Delft University of Technology, Delft, The Netherlands, 1991.
64: E. DE STURLER AND D. R. FOKKEMA, Nested Krylov methods and preserving the orthogonality, Tech. Rep. Preprint 796, Utrecht University, Utrecht, The Netherlands, 1993.
65: S. DEMKO, W. MOSS, AND P. SMITH, Decay rates for inverses of band matrices, Mathematics of Computation, 43 (1984), pp. 491-499.
66: J. DEMMEL, The condition number of equivalence transformations that block diagonalize matrix pencils, SIAM J. Numer. Anal., 20 (1983), pp. 599-610.
67: J. DEMMEL, M. HEATH, AND H. VAN DER VORST, Parallel numerical linear algebra, in Acta Numerica, Vol. 2, Cambridge Press, New York, 1993.
68: S. DOI, On parallelism and convergence of incomplete LU factorizations, Appl. Numer. Math., 7 (1991), pp. 417-436.
69: J. DONGARRA, J. DUCROZ, I. DUFF, AND S. HAMMARLING, A set of level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 1-17.
70: J. DONGARRA, J. DUCROZ, S. HAMMARLING, AND R. HANSON, An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 1-32.
71: J. DONGARRA, I. DUFF, D. SORENSEN, AND H. VAN DER VORST, Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia, PA, 1991.
72: J. DONGARRA AND E. GROSSE, Distribution of mathematical software via electronic mail, Comm. ACM, 30 (1987), pp. 403-407.
73: J. DONGARRA, C. MOLER, J. BUNCH, AND G. STEWART, LINPACK Users' Guide, SIAM, Philadelphia, 1979.
74: J. DONGARRA AND H. VAN DER VORST, Performance of various computers using standard sparse linear equations solving techniques, in Computer Benchmarks, J. Dongarra and W. Gentzsch, eds., Elsevier Science Publishers B.V., New York, 1993, pp. 177-188.
75: F. DORR, The direct solution of the discrete Poisson equation on a rectangle, SIAM Rev., 12 (1970), pp. 248-263.
76: M. DRYJA AND O. B. WIDLUND, Towards a unified theory of domain decomposition algorithms for elliptic problems, Tech. Rep. 486, also Ultracomputer Note 167, Department of Computer Science, Courant Institute, 1989.
77: D. DUBOIS, A. GREENBAUM, AND G. RODRIGUE, Approximating the inverse of a matrix for use in iterative algorithms on vector processors, Computing, 22 (1979), pp. 257-268.
78: I. DUFF, R. GRIMES, AND J. LEWIS, Sparse matrix test problems, ACM Trans. Math. Soft., 15 (1989), pp. 1-14.
79: I. DUFF AND G. MEURANT, The effect of ordering on preconditioned conjugate gradients, BIT, 29 (1989), pp. 635-657.
80: I. S. DUFF, A. M. ERISMAN, AND J.K.REID, Direct methods for sparse matrices, Oxford University Press, London, 1986.
81: T. DUPONT, R. KENDALL, AND H. RACHFORD, An approximate factorization procedure for solving self-adjoint elliptic difference equations, SIAM J. Numer. Anal., 5 (1968), pp. 559-573.
82: E. D'YAKONOV, The method of variable directions in solving systems of finite difference equations, Soviet Math. Dokl., 2 (1961), pp. 577-580. TOM 138, 271-274.
83: L. EHRLICH, An Ad-Hoc SOR method, J. Comput. Phys., 43 (1981), pp. 31-45.
84: M. EIERMANN AND R. VARGA, Is the optimal best for the SOR iteration method?, Linear Algebra Appl., 182 (1993), pp. 257-277.
85: V. EIJKHOUT, Analysis of parallel incomplete point factorizations, Linear Algebra Appl., 154-156 (1991), pp. 723-740.
86: height 2pt depth -1.6pt width 23pt, Beware of unperturbed modified incomplete point factorizations, in Proceedings of the IMACS International Symposium on Iterative Methods in Linear Algebra, Brussels, Belgium, R. Beauwens and P. de Groen, eds., 1992.
87: height 2pt depth -1.6pt width 23pt, LAPACK working note 50: Distributed sparse data structures for linear algebra operations, Tech. Rep. CS 92-169, Computer Science Department, University of Tennessee, Knoxville, TN, 1992.
88: height 2pt depth -1.6pt width 23pt, LAPACK working note 51: Qualitative properties of the conjugate gradient and Lanczos methods in a matrix framework, Tech. Rep. CS 92-170, Computer Science Department, University of Tennessee, Knoxville, TN, 1992.
89: V. EIJKHOUT AND B. POLMAN, Decay rates of inverses of banded -matrices that are near to Toeplitz matrices, Linear Algebra Appl., 109 (1988), pp. 247-277.
90: V. EIJKHOUT AND P. VASSILEVSKI, Positive definiteness aspects of vectorizable preconditioners, Parallel Computing, 10 (1989), pp. 93-100.
91: S. EISENSTAT, Efficient implementation of a class of preconditioned conjugate gradient methods, SIAM J. Sci. Statist. Comput., 2 (1981), pp. 1-4.
92: R. ELKIN, Convergence theorems for Gauss-Seidel and other minimization algorithms, Tech. Rep. 68-59, Computer Science Center, University of Maryland, College Park, MD, Jan. 1968.
93: H. ELMAN, Approximate Schur complement preconditioners on serial and parallel computers, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 581-605.
94: H. ELMAN AND M. SCHULTZ, Preconditioning by fast direct methods for non self-adjoint nonseparable elliptic equations, SIAM J. Numer. Anal., 23 (1986), pp. 44-57.
95: L. ELSNER, A note on optimal block-scaling of matrices, Numer. Math., 44 (1984), pp. 127-128.
96: V. FABER AND T. MANTEUFFEL, Necessary and sufficient conditions for the existence of a conjugate gradient method, SIAM J. Numer. Anal., 21 (1984), pp. 315-339.
97: G. FAIRWEATHER, A. GOURLAY, AND A. MITCHELL, Some high accuracy difference schemes with a splitting operator for equations of parabolic and elliptic type, Numer. Math., 10 (1967), pp. 56-66.
98: R. FLETCHER, Conjugate gradient methods for indefinite systems, in Numerical Analysis Dundee 1975, G. Watson, ed., Berlin, New York, 1976, Springer Verlag, pp. 73-89.
99: G. FORSYTHE AND E. STRAUSS, On best conditioned matrices, Proc. Amer. Math. Soc., 6 (1955), pp. 340-345.
100: R. FREUND, Conjugate gradient-type methods for linear systems with complex symmetric coefficient matrices, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 425-448.
101: R. FREUND, M. GUTKNECHT, AND N. NACHTIGAL, An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, SIAM J. Sci. Comput., 14 (1993), pp. 137-158.
102: R. FREUND AND N. NACHTIGAL, QMR: A quasi-minimal residual method for non-Hermitian linear systems, Numer. Math., 60 (1991), pp. 315-339.
103: height 2pt depth -1.6pt width 23pt, An implementation of the QMR method based on coupled two-term recurrences, SIAM J. Sci. Statist. Comput., 15 (1994), pp. 313-337.
104: R. FREUND AND T. SZETO, A quasi-minimal residual squared algorithm for non-Hermitian linear systems, Tech. Rep. CAM Report 92-19, UCLA Dept. of Math., 1992.
105: R. W. FREUND, A transpose-free quasi-minimum residual algorithm for non-Hermitian linear systems, SIAM J. Sci. Comput., 14 (1993), pp. 470-482.
106: R. W. FREUND, G. H. GOLUB, AND N. M. NACHTIGAL, Iterative solution of linear systems, Acta Numerica, (1992), pp. 57-100.
107: R. GLOWINSKI, G. H. GOLUB, G. A. MEURANT, AND J. PéRIAUX, eds., Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, 1988. Proceedings of the First International Symposium on Domain Decomposition Methods for Partial Differential Equations, Paris, France, January 1987.
108: G. GOLUB AND D. O'LEARY, Some history of the conjugate gradient and Lanczos methods, SIAM Rev., 31 (1989), pp. 50-102.
109: G. GOLUB AND C. VAN LOAN, Matrix Computations, second edition, The Johns Hopkins University Press, Baltimore, 1989.
110: A. GREENBAUM AND Z. STRAKOS, Predicting the behavior of finite precision Lanczos and conjugate gradient computations, SIAM J. Mat. Anal. Appl., 13 (1992), pp. 121-137.
111: W. D. GROPP AND D. E. KEYES, Domain decomposition with local mesh refinement, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 967-993.
112: I. GUSTAFSSON, A class of first-order factorization methods, BIT, 18 (1978), pp. 142-156.
113: M. H. GUTKNECHT, The unsymmetric Lanczos algorithms and their relations to Páde approximation, continued fractions and the QD algorithm, in Proceedings of the Copper Mountain Conference on Iterative Methods, 1990.
114: height 2pt depth -1.6pt width 23pt, A completed theory of the unsymmetric Lanczos process and related algorithms, part I, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 594-639.
115: height 2pt depth -1.6pt width 23pt, Variants of Bi-CGSTAB for matrices with complex spectrum, SIAM J. Sci. Comp., 14 (1993), pp. 1020-1033.
116: height 2pt depth -1.6pt width 23pt, A completed theory of the unsymmetric Lanczos process and related algorithms, part II, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 15-58.
117: W. HACKBUSCH, Multi-Grid Methods and Applications, Springer-Verlag, Berlin, New York, 1985.
118: height 2pt depth -1.6pt width 23pt, Iterative Lösung großer schwachbesetzter Gleichungssysteme, Teubner, Stuttgart, 1991.
119: A. HADJIDIMOS, On some high accuracy difference schemes for solving elliptic equations, Numer. Math., 13 (1969), pp. 396-403.
120: L. HAGEMAN AND D. YOUNG, Applied Iterative Methods, Academic Press, New York, 1981.
121: W. HAGER, Condition estimators, SIAM J. Sci. Statist. Comput., 5 (1984), pp. 311-316.
122: M. HESTENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49 (1952), pp. 409-436.
123: M. R. HESTENES, Conjugacy and gradients, in A History of Scientific Computing, Addison-Wesley, Reading, MA, 1990, pp. 167-179.
124: N. HIGHAM, Experience with a matrix norm estimator, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 804-809.
125: K. JEA AND D. YOUNG, Generalized conjugate-gradient acceleration of nonsym- metrizable iterative methods, Linear Algebra Appl., 34 (1980), pp. 159-194.
126: O. JOHNSON, C. MICCHELLI, AND G. PAUL, Polynomial preconditioning for conjugate gradient calculation, SIAM J. Numer. Anal., 20 (1983), pp. 362-376.
127: M. JONES AND P. PLASSMANN, Parallel solution of unstructed, sparse systems of linear equations, in Proceedings of the Sixth SIAM conference on Parallel Processing for Scientific Computing, R. Sincovec, D. Keyes, M. Leuze, L. Petzold, and D. Reed, eds., SIAM, Philadelphia, pp. 471-475.
128: height 2pt depth -1.6pt width 23pt, A parallel graph coloring heuristic, SIAM J. Sci. Statist. Comput., 14 (1993), pp. 654-669.
129: W. JOUBERT, Lanczos methods for the solution of nonsymmetric systems of linear equations, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 926-943.
130: W. KAHAN, Gauss-Seidel methods of solving large systems of linear equations, PhD thesis, University of Toronto, 1958.
131: S. KANIEL, Estimates for some computational techniques in linear algebra, Mathematics of Computation, 20 (1966), pp. 369-378.
132: D. KERSHAW, The incomplete Cholesky-conjugate gradient method for the iterative solution of systems of linear equations, J. Comput. Phys., 26 (1978), pp. 43-65.
133: R. KETTLER, Analysis and comparison of relaxation schemes in robust multigrid and preconditioned conjugate gradient methods, in Multigrid Methods, Lecture Notes in Mathematics 960, W. Hackbusch and U. Trottenberg, eds., Springer-Verlag, Berlin, New York, 1982, pp. 502-534.
134: height 2pt depth -1.6pt width 23pt, Linear multigrid methods in numerical reservoir simulation, PhD thesis, Delft University of Technology, Delft, The Netherlands, 1987.
135: D. E. KEYES, T. F. CHAN, G. MEURANT, J. S. SCROGGS, AND R. G. VOIGT, eds., Domain Decomposition Methods For Partial Differential Equations, SIAM, Philadelphia, 1992. Proceedings of the Fifth International Symposium on Domain Decomposition Methods, Norfolk, VA, 1991.
136: D. E. KEYES AND W. D. GROPP, A comparison of domain decomposition techniques for elliptic partial differential equations and their parallel implementation, SIAM J. Sci. Statist. Comput., 8 (1987), pp. s166 - s202.
137: height 2pt depth -1.6pt width 23pt, Domain decomposition for nonsymmetric systems of equations: Examples from computational fluid dynamics, in Domain Decomposition Methods, proceedings of the Second Internation Symposium, Los Angeles, California, January 14-16, 1988, T. F. Chan, R. Glowinski, J. Periaux, and O. B. Widlund, eds., Philadelphia, 1989, SIAM, pp. 373-384.
138: height 2pt depth -1.6pt width 23pt, Domain decomposition techniques for the parallel solution of nonsymmetric systems of elliptic boundary value problems, Applied Num. Math., 6 (1989/1990), pp. 281-301.
139: S. K. KIM AND A. T. CHRONOPOULOS, A class of Lanczos-like algorithms implemented on parallel computers, Parallel Comput., 17 (1991), pp. 763-778.
140: D. R. KINCAID, J. R. RESPESS, D. M. YOUNG, AND R. G. GRIMES, ITPACK 2C: A Fortran package for solving large sparse linear systems by adaptive accelerated iterative methods, ACM Trans. Math. Soft., 8 (1982), pp. 302-322. Algorithm 586.
141: L. Y. KOLOTILINA AND A. Y. YEREMIN, On a family of two-level preconditionings of the incomlete block factorization type, Sov. J. Numer. Anal. Math. Modelling, (1986), pp. 293-320.
142: C. LANCZOS, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J. Res. Nat. Bur. Stand., 45 (1950), pp. 255-282.
143: height 2pt depth -1.6pt width 23pt, Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bur. Stand., 49 (1952), pp. 33-53.
144: C. LAWSON, R. HANSON, D. KINCAID, AND F. KROGH, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft., 5 (1979), pp. 308-325.
145: J. MAITRE AND F. MUSY, The contraction number of a class of two-level methods; an exact evaluation for some finite element subspaces and model problems, in Multigrid methods, Proceedings, Köln-Porz, 1981, W. Hackbusch and U. Trottenberg, eds., vol. 960 of Lecture Notes in Mathematics, 1982, pp. 535-544.
146: T. MANTEUFFEL, The Tchebychev iteration for nonsymmetric linear systems, Numer. Math., 28 (1977), pp. 307-327.
147: height 2pt depth -1.6pt width 23pt, An incomplete factorization technique for positive definite linear systems, Mathematics of Computation, 34 (1980), pp. 473-497.
148: S. MCCORMICK, Multilevel Adaptive Methods for Partial Differential Equations, SIAM, Philadelphia, 1989.
149: S. MCCORMICK AND J. THOMAS, The Fast Adaptive Composite grid (FAC) method for elliptic equations, Mathematics of Computation, 46 (1986), pp. 439-456.
150: U. MEIER AND A. SAMEH, The behavior of conjugate gradient algorithms on a multivector processor with a hierarchical memory, J. Comput. Appl. Math., 24 (1988), pp. 13-32.
151: U. MEIER-YANG, Preconditioned conjugate gradient-like methods for nonsymmetric linear systems, tech. rep., CSRD, University of Illinois, Urbana, IL, April 1992.
152: J. MEIJERINK AND H. VAN DER VORST, An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix, Mathematics of Computation, 31 (1977), pp. 148-162.
153: height 2pt depth -1.6pt width 23pt, Guidelines for the usage of incomplete decompositions in solving sets of linear equations as they occur in practical problems, J. Comput. Phys., 44 (1981), pp. 134-155.
154: R. MELHEM, Toward efficient implementation of preconditioned conjugate gradient methods on vector supercomputers, Internat. J. Supercomput. Appls., 1 (1987), pp. 77-98.
155: G. MEURANT, The block preconditioned conjugate gradient method on vector computers, BIT, 24 (1984), pp. 623-633.
156: height 2pt depth -1.6pt width 23pt, Multitasking the conjugate gradient method on the CRAY X-MP/48, Parallel Comput., 5 (1987), pp. 267-280.
157: N. MUNKSGAARD, Solving sparse symmetric sets of linear equations by preconditioned conjugate gradients, ACM Trans. Math. Software, 6 (1980), pp. 206-219.
158: N. NACHTIGAL, S. REDDY, AND L. TREFETHEN, How fast are nonsymmetric matrix iterations?, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 778-795.
159: N. NACHTIGAL, L. REICHEL, AND L. TREFETHEN, A hybrid GMRES algorithm for nonsymmetric matrix iterations, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 796-825.
160: N. M. NACHTIGAL, A Look-Ahead Variant of the Lanczos Algorithm and its Application to the Quasi-Minimal Residual Methods for Non-Hermitian Linear Systems, PhD thesis, MIT, Cambridge, MA, 1991.
161: Y. NOTAY, Solving positive (semi)definite linear systems by preconditioned iterative methods, in Preconditioned Conjugate Gradient Methods, O. Axelsson and L. Kolotilina, eds., vol. 1457 of Lecture Notes in Mathematics, Nijmegen, 1989, pp. 105-125.
162: height 2pt depth -1.6pt width 23pt, On the robustness of modified incomplete factorization methods, Internat. J. Comput. Math., 40 (1992), pp. 121-141.
163: D. O'LEARY, The block conjugate gradient algorithm and related methods, Linear Algebra Appl., 29 (1980), pp. 293-322.
164: height 2pt depth -1.6pt width 23pt, Ordering schemes for parallel processing of certain mesh problems, SIAM J. Sci. Statist. Comput., 5 (1984), pp. 620-632.
165: T. C. OPPE, W. D. JOUBERT, AND D. R. KINCAID, NSPCG user's guide, version 1.0: A package for solving large sparse linear systems by various iterative methods, Tech. Rep. CNA-216, Center for Numerical Analysis, University of Texas at Austin, Austin, TX, April 1988.
166: J. M. ORTEGA, Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press, New York and London, 1988.
167: C. PAIGE, B. PARLETT, AND H. VAN DER VORST, Approximate solutions and eigenvalue bounds from Krylov subspaces, Numer. Lin. Alg. Appls., 29 (1995), pp. 115-134.
168: C. PAIGE AND M. SAUNDERS, Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617-629.
169: C. C. PAIGE AND M. A. SAUNDERS, LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Soft., 8 (1982), pp. 43-71.
170: G. PAOLINI AND G. RADICATI DI BROZOLO, Data structures to vectorize CG algorithms for general sparsity patterns, BIT, 29 (1989), pp. 703-718.
171: B. PARLETT, The symmetric eigenvalue problem, Prentice-Hall, London, 1980.
172: B. N. PARLETT, D. R. TAYLOR, AND Z. A. LIU, A look-ahead Lanczos algorithm for unsymmetric matrices, Mathematics of Computation, 44 (1985), pp. 105-124.
173: D. PEACEMAN AND J. H.H. RACHFORD, The numerical solution of parabolic and elliptic differential equations, J. Soc. Indust. Appl. Math., 3 (1955), pp. 28-41.
174: C. POMMERELL, Solution of Large Unsymmetric Systems of Linear Equations, vol. 17 of Series in Micro-electronics, volume 17, Hartung-Gorre Verlag, Konstanz, 1992.
175: height 2pt depth -1.6pt width 23pt, Solution of large unsymmetric systems of linear equations, PhD thesis, Swiss Federal Institute of Technology, Zürich, Switzerland, 1992.
176: E. POOLE AND J. ORTEGA, Multicolor ICCG methods for vector computers, Tech. Rep. RM 86-06, Department of Applied Mathematics, University of Virginia, Charlottesville, VA, 1986.
177: A. QUARTERONI, J. PERIAUX, Y. KUZNETSOV, AND O. WIDLUND, eds., Domain Decomposition Methods in Science and Engineering,, vol. Contemporary Mathematics 157, Providence, RI, 1994, AMS. Proceedings of the Sixth International Symposium on Domain Decomposition Methods, June 15-19, 1992, Como, Italy,.
178: G. RADICATI DI BROZOLO AND Y. ROBERT, Vector and parallel CG-like algorithms for sparse non-symmetric systems, Tech. Rep. 681-M, IMAG/TIM3, Grenoble, France, 1987.
179: J. REID, On the method of conjugate gradients for the solution of large sparse systems of linear equations, in Large Sparse Sets of Linear Equations, J. Reid, ed., Academic Press, London, 1971, pp. 231-254.
180: G. RODRIGUE AND D. WOLITZER, Preconditioning by incomplete block cyclic reduction, Mathematics of Computation, 42 (1984), pp. 549-565.
181: Y. SAAD, The Lanczos biorthogonalization algorithm and other oblique projection methods for solving large unsymmetric systems, SIAM J. Numer. Anal., 19 (1982), pp. 485-506.
182: height 2pt depth -1.6pt width 23pt, Practical use of some Krylov subspace methods for solving indefinite and nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 5 (1984), pp. 203-228.
183: height 2pt depth -1.6pt width 23pt, Practical use of polynomial preconditionings for the conjugate gradient method, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 865-881.
184: height 2pt depth -1.6pt width 23pt, Preconditioning techniques for indefinite and nonsymmetric linear systems, J. Comput. Appl. Math., 24 (1988), pp. 89-105.
185: height 2pt depth -1.6pt width 23pt, Krylov subspace methods on supercomputers, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 1200-1232.
186: height 2pt depth -1.6pt width 23pt, SPARSKIT: A basic tool kit for sparse matrix computation, Tech. Rep. CSRD TR 1029, CSRD, University of Illinois, Urbana, IL, 1990.
187: height 2pt depth -1.6pt width 23pt, A flexible inner-outer preconditioned GMRES algorithm, SIAM J. Sci. Comput., 14 (1993), pp. 461-469.
188: Y. SAAD AND M. SCHULTZ, Conjugate gradient-like algorithms for solving nonsymmetric linear systems, Mathematics of Computation, 44 (1985), pp. 417-424.
189: height 2pt depth -1.6pt width 23pt, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856-869.
190: G. L. G. SLEIJPEN AND D. R. FOKKEMA, Bi-CGSTAB( ) for linear equations involving unsymmetric matrices with complex spectrum, Elec. Trans. Numer. Anal., 1 (1993), pp. 11-32.
191: B. F. SMITH, Domain decomposition algorithms for partial differential equations of linear elasticity, Tech. Rep. 517, Department of Computer Science, Courant Institute, 1990.
192: P. SONNEVELD, CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 36-52.
193: R. SOUTHWELL, Relaxation Methods in Theoretical Physics, Clarendon Press, Oxford, 1946.
194: H. STONE, Iterative solution of implicit approximations of multidimensional partial differential equations, SIAM J. Numer. Anal., 5 (1968), pp. 530-558.
195: P. SWARZTRAUBER, The methods of cyclic reduction, Fourier analysis and the FACR algorithm for the discrete solution of Poisson's equation on a rectangle, SIAM Rev., 19 (1977), pp. 490-501.
196: P. L. TALLEC, Domain decomposition methods in computational mechanics, Computational Mechanics Advances, 1994.
197: C. TONG, A comparative study of preconditioned Lanczos methods for nonsymmetric linear systems, Tech. Rep. SAND91-8240, Sandia Nat. Lab., Livermore, CA, 1992.
198: A. VAN DER SLUIS, Condition numbers and equilibration of matrices, Numer. Math., 14 (1969), pp. 14-23.
199: A. VAN DER SLUIS AND H. VAN DER VORST, The rate of convergence of conjugate gradients, Numer. Math., 48 (1986), pp. 543-560.
200: H. VAN DER VORST, Iterative solution methods for certain sparse linear systems with a non-symmetric matrix arising from PDE-problems, J. Comput. Phys., 44 (1981), pp. 1-19.
201: height 2pt depth -1.6pt width 23pt, A vectorizable variant of some ICCG methods, SIAM J. Sci. Statist. Comput., 3 (1982), pp. 350-356.
202: height 2pt depth -1.6pt width 23pt, Large tridiagonal and block tridiagonal linear systems on vector and parallel computers, Parallel Comput., 5 (1987), pp. 45-54.
203: height 2pt depth -1.6pt width 23pt, (M)ICCG for 2D problems on vector computers, in Supercomputing, A.Lichnewsky and C.Saguez, eds., North-Holland, 1988.
204: height 2pt depth -1.6pt width 23pt, High performance preconditioning, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 1174-1185.
205: height 2pt depth -1.6pt width 23pt, ICCG and related methods for 3D problems on vector computers, Computer Physics Communications, 53 (1989), pp. 223-235.
206: height 2pt depth -1.6pt width 23pt, The convergence behavior of preconditioned CG and CG-S in the presence of rounding errors, in Preconditioned Conjugate Gradient Methods, O. Axelsson and L. Y. Kolotilina, eds., vol. 1457 of Lecture Notes in Mathematics, Berlin, New York, 1990, Springer-Verlag.
207: height 2pt depth -1.6pt width 23pt, Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 631-644.
208: H. VAN DER VORST AND J. MELISSEN, A Petrov-Galerkin type method for solving where is symmetric complex, IEEE Trans. Magnetics, 26 (1990), pp. 706-708.
209: H. VAN DER VORST AND C. VUIK, GMRESR: A family of nested GMRES methods, Numer. Lin. Alg. Applic., 1 (1994), pp. 369-386.
210: J. VAN ROSENDALE, Minimizing inner product data dependencies in conjugate gradient iteration, Tech. Rep. 172178, ICASE, NASA Langley Research Center, 1983.
211: R. VARGA, Matrix Iterative Analysis, Prentice-Hall Inc., Englewood Cliffs, NJ, 1962.
212: P. VASSILEVSKI, Preconditioning nonsymmetric and indefinite finite element matrices, J. Numer. Alg. Appl., 1 (1992), pp. 59-76.
213: V. VOEVODIN, The problem of non-self-adjoint generalization of the conjugate gradient method is closed, U.S.S.R. Comput. Maths. and Math. Phys., 23 (1983), pp. 143-144.
214: H. F. WALKER, Implementation of the GMRES method using Householder transformations, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 152-163.
215: P. WESSELING, An Introduction to Multigrid Methods, Wiley, Chichester, 1991.
216: O. WIDLUND, A Lanczos method for a class of non-symmetric systems of linear equations, SIAM J. Numer. Anal., 15 (1978), pp. 801-812.
217: D. YOUNG, Iterative solution of large linear systems, Academic Press, New York, 1971.
218: H. YSERENTANT, On the multilevel splitting of finite element spaces, Numer. Math., 49 (1986), pp. 379-412.

=0pt plus 40pt

Jack Dongarra
Mon Nov 20 08:52:54 EST 1995

Index

Next: About this document Up: Templates for the Solution Previous: References

Index

ad hoc SOR method

seemethod, ad hoc SOR

asynchronous method

seemethod, asynchronous

Bi-CGSTAB method

seemethod, Bi-CGSTAB

Bi-Conjugate Gradient Stabilized method

seemethod, Bi-CGSTAB

bi-orthogonality

in BiCG: BiConjugate Gradient (BiCG)
in QMR: Quasi-Minimal Residual (QMR)

BiCG method

seemethod, BiCG

BiConjugate Gradient method

seemethod, BiCG

BLAS

Why Use Templates?

BLAS

block methods

(, )

breakdown

avoiding by look-ahead: Convergence
in Bi-CGSTAB: Convergence
in BiCG: Convergence, Convergence, Convergence, Quasi-Minimal Residual (QMR)
in BiCG: Convergence, Convergence, Convergence, Quasi-Minimal Residual (QMR)
in BiCG: Convergence, Convergence, Convergence, Quasi-Minimal Residual (QMR)
in BiCG: Convergence, Convergence, Convergence, Quasi-Minimal Residual (QMR)
in CG for indefinite systems: MINRES and SYMMLQ

CG method

seemethod, CG

CGNE method

seemethod, CGNE

CGNR method

seemethod, CGNR

CGS method

seemethod, CGS

chaotic method

seemethod, asynchronous

Chebyshev iteration

seemethod, Chebyshev iteration

codes

C++: Why Use Templates?
FORTRAN: Why Use Templates?
MATLAB: Why Use Templates?

complex systems

(, )

Conjugate Gradient method

seemethod, CG

Conjugate Gradient Squared method

seemethod, CGS

convergence

irregular: Glossary
irregular: Glossary
irregular: Glossary
irregular: Glossary
irregular: Glossary
irregular: Glossary
linear: Glossary
of Bi-CGSTAB: (, )
of Bi-CGSTAB: (, )
of BiCG: (, )
of BiCG: (, )
of CG: (, )
of CG: (, )
of CGNR and CGNE: Theory
of CGS: (, )
of CGS: (, )
of Chebyshev iteration: (, )
of Chebyshev iteration: (, )
of Gauss-Seidel: The Gauss-Seidel Method
of Jacobi: (, )
of Jacobi: (, )
of MINRES: MINRES and SYMMLQ
of QMR: (, )
of QMR: (, )
of SSOR: The Symmetric Successive
smooth: Glossary
smooth: Glossary
stalled: Glossary
stalled: Glossary
stalled: Glossary
superlinear: Glossary
superlinear: Glossary
superlinear: Glossary
superlinear: Glossary

data structures

(, )

diffusion

artificial: Modified incomplete factorizations

domain decomposition

multiplicative Schwarz: (, )
multiplicative Schwarz: (, )
non-overlapping subdomains: (, )
non-overlapping subdomains: (, )
overlapping subdomains: (, )
overlapping subdomains: (, )
Schur complement: Domain Decomposition Methods
Schwarz: Domain Decomposition Methods

fill-in strategies

seepreconditioners, point incomplete"factorizations

FORTRAN codes

seecodes, FORTRAN

Gauss-Seidel method

seemethod, Gauss-Seidel

Generalized Minimal Residual method

seemethod, GMRES

GMRES method

seemethod, GMRES

ill-conditioned systems

using GMRES on: Implementation

implementation

of Bi-CGSTAB: (, )
of Bi-CGSTAB: (, )
of BiCG: (, )
of BiCG: (, )
of CG: (, )
of CG: (, )
of CGS: (, )
of CGS: (, )
of Chebyshev iteration: (, )
of Chebyshev iteration: (, )
of GMRES: (, )
of GMRES: (, )
of QMR: (, )
of QMR: (, )

IMSL

inner products

as bottlenecks: Implementation, Chebyshev Iteration, Comparison with other
as bottlenecks: Implementation, Chebyshev Iteration, Comparison with other
as bottlenecks: Implementation, Chebyshev Iteration, Comparison with other
avoiding with Chebyshev: Chebyshev Iteration, Comparison with other , Comparison with other , Implementation
avoiding with Chebyshev: Chebyshev Iteration, Comparison with other , Comparison with other , Implementation
avoiding with Chebyshev: Chebyshev Iteration, Comparison with other , Comparison with other , Implementation
avoiding with Chebyshev: Chebyshev Iteration, Comparison with other , Comparison with other , Implementation

irregular convergence

seeconvergence, irregular

ITPACK

Choosing the Value

Jacobi method

seemethod, Jacobi

Krylov subspace

Theory

Lanczos

and CG: Theory, (, )
and CG: Theory, (, )
and CG: Theory, (, )

LAPACK

linear convergence

seeconvergence, linear

LINPACK

MATLAB codes

seecodes, MATLAB

method

ad hoc SOR: Notes and References
adaptive Chebyshev: Chebyshev Iteration, Comparison with other
adaptive Chebyshev: Chebyshev Iteration, Comparison with other
asynchronous: Notes and References
Bi-CGSTAB: What Methods Are , Overview of the , (ii, )
Bi-CGSTAB: What Methods Are , Overview of the , (ii, )
Bi-CGSTAB: What Methods Are , Overview of the , (ii, )
Bi-CGSTAB: What Methods Are , Overview of the , (ii, )
Bi-CGSTAB2: Convergence
BiCG: What Methods Are , Overview of the , (ii, )
BiCG: What Methods Are , Overview of the , (ii, )
BiCG: What Methods Are , Overview of the , (ii, )
BiCG: What Methods Are , Overview of the , (ii, )
CG: What Methods Are , Overview of the , (ii, )
CG: What Methods Are , Overview of the , (ii, )
CG: What Methods Are , Overview of the , (ii, )
CG: What Methods Are , Overview of the , (ii, )
CG: What Methods Are , Overview of the , (ii, )
CGNE: What Methods Are , Overview of the , (ii, )
CGNE: What Methods Are , Overview of the , (ii, )
CGNE: What Methods Are , Overview of the , (ii, )
CGNE: What Methods Are , Overview of the , (ii, )
CGNR: What Methods Are , Overview of the , (ii, )
CGNR: What Methods Are , Overview of the , (ii, )
CGNR: What Methods Are , Overview of the , (ii, )
CGNR: What Methods Are , Overview of the , (ii, )
CGS: What Methods Are , Overview of the , (ii, )
CGS: What Methods Are , Overview of the , (ii, )
CGS: What Methods Are , Overview of the , (ii, )
CGS: What Methods Are , Overview of the , (ii, )
chaotic: Notes and References, seemethod, asynchronous
chaotic: Notes and References, seemethod, asynchronous
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
Chebyshev iteration: What Methods Are , Iterative Methods, Overview of the , (ii, )
domain decomposition: (ii, )
domain decomposition: (ii, )
Gauss-Seidel: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Gauss-Seidel: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Gauss-Seidel: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Gauss-Seidel: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Gauss-Seidel: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
GMRES: What Methods Are , Overview of the , (ii, )
GMRES: What Methods Are , Overview of the , (ii, )
GMRES: What Methods Are , Overview of the , (ii, )
GMRES: What Methods Are , Overview of the , (ii, )
Jacobi: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Jacobi: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Jacobi: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Jacobi: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
Jacobi: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
MINRES: What Methods Are , Overview of the , (ii, )
MINRES: What Methods Are , Overview of the , (ii, )
MINRES: What Methods Are , Overview of the , (ii, )
MINRES: What Methods Are , Overview of the , (ii, )
of simultaneous displacements: seemethod, Jacobi
of successive displacements: seemethod, Gauss-Seidel
QMR: What Methods Are , Overview of the , (ii, )
QMR: What Methods Are , Overview of the , (ii, )
QMR: What Methods Are , Overview of the , (ii, )
QMR: What Methods Are , Overview of the , (ii, )
relaxation: Notes and References, Notes and References
relaxation: Notes and References, Notes and References
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SSOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SSOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SSOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SSOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SSOR: What Methods Are , Overview of the , Stationary Iterative Methods, (ii, )
SYMMLQ: What Methods Are , Overview of the , (ii, )
SYMMLQ: What Methods Are , Overview of the , (ii, )
SYMMLQ: What Methods Are , Overview of the , (ii, )
SYMMLQ: What Methods Are , Overview of the , (ii, )

minimization property

in Bi-CGSTAB: Convergence
in CG: Theory, MINRES and SYMMLQ
in CG: Theory, MINRES and SYMMLQ
in MINRES: MINRES and SYMMLQ

MINRES method

seemethod, MINRES

multigrid

(, )

NAG

Overview of the , Overview of the

nonstationary methods

(, )

normal equations

overrelaxation

Choosing the Value

parallelism

(, )

in BiCG: Implementation
in CG: Implementation
in Chebyshev iteration: Implementation
in GMRES: Implementation, Implementation
in GMRES: Implementation, Implementation
in QMR: Implementation
inner products: (, )
inner products: (, )
matrix-vector products: (, )
matrix-vector products: (, )
vector updates: Vector updates

preconditioners

(, )

ADI: (, )
ADI: (, )
ADI: (, )
block factorizations: (, )
block factorizations: (, )
block tridiagonal: (, )
block tridiagonal: (, )
central differences: (, )
central differences: (, )
cost: (, )
cost: (, )
fast solvers: (, )
fast solvers: (, )
incomplete factorization: (, )
incomplete factorization: (, )
left: Left and right
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point incomplete factorizations: (, )
point Jacobi: (, )
point Jacobi: (, )
polynomial: (, )
polynomial: (, )
reduced system: (, )
reduced system: (, )
right: Left and right
SSOR: (, )
SSOR: (, )
SSOR: (, )
symmetric part: (, )
symmetric part: (, )

QMR method

seemethod, QMR

Quasi-Minimal Residual method

seemethod, QMR

relaxation method

seemethod, relaxation

residuals

in BiCG: BiConjugate Gradient (BiCG)
in CG: Conjugate Gradient Method
in CG: Conjugate Gradient Method

restarting

in BiCG: Convergence
in GMRES: Generalized Minimal Residual , Theory, Implementation
in GMRES: Generalized Minimal Residual , Theory, Implementation
in GMRES: Generalized Minimal Residual , Theory, Implementation

row projection methods

(, )

search directions

in BiCG: BiConjugate Gradient (BiCG)
in CG: Conjugate Gradient Method , Conjugate Gradient Method , Theory
in CG: Conjugate Gradient Method , Conjugate Gradient Method , Theory
in CG: Conjugate Gradient Method , Conjugate Gradient Method , Theory
in CG: Conjugate Gradient Method , Conjugate Gradient Method , Theory

smooth convergence

seeconvergence, smooth

software

obtaining: (ii, )
obtaining: (ii, )

SOR method

seemethod, SOR

sparse matrix storage

(, )

BCRS: (, )
BCRS: (, )
CCS: (, )
CCS: (, )
CDS: (, )
CDS: (, )
CRS: (, )
CRS: (, )
JDS: (, )
JDS: (, )
SKS: (, )
SKS: (, )

SSOR method

seemethod, SSOR

stalled convergence

seeconvergence, stalled

Stationary methods

(, )

stopping criteria

(, )

Successive Overrelaxation method

seemethod, SOR

superlinear convergence

seeconvergence, superlinear

Symmetric LQ method

seemethod, SYMMLQ

Symmetric Successive Overrelaxation method

seemethod, SSOR

SYMMLQ method

seemethod, SYMMLQ

template