.EQ delim @@ .EN .TL A Preconditioned Conjugate Gradient Method for Solving a Class of Non-Symmetric Linear Systems .AU J.J. Dongarra, G.K. Leaf, and M. Minkoff .AI Argonne National Laboratory .QS Summary - This report describes a conjugate gradient preconditioning scheme for solving a certain system of equations which arises in the solution of a three dimensional partial differential equation. The problem involves solving systems of equations where the matrices are large, sparse, and non-symmetric. .QE .sp 2 .FS * Work supported in part by the Applied Mathematical Sciences Research Program (KC-04-02) of the Office of Energy Research of the U.S. Department of Energy under Contract W-31-109-Eng-38. .FE .NH STATEMENT OF THE PROBLEM .PP We are concerned with the solution of large scale linear systems where the coefficient matrix arises from a finite difference approximation (seven point) to an elliptic partial differential equation in three dimensions. Such systems can arise in the modeling of three dimensional, transient, two-phase flow systems [1,2]. The elliptic equation governs the pressure (or pressure difference) and has to be solved numerically at each time step in the course of the transient analysis. Because the physical problem involves convection, the elliptic equation contains first order partial derivatives which implies that the usual seven point finite difference approximation leads to a non-symmetric coefficient matrix. Since these matrices arise from a finite difference approximation in three dimensions, the matrices will tend to be large. For example, a finite difference mesh of @15 times 15 times 30@ mesh cells leads to a linear system of order 6750. with about 45,000 non-zero coefficients in the matrix for a density of about .001. Thus the class of matrices will be large, sparse, and non-symmetric. In this report we intend to review several possible approaches for solving systems of this type, describe an implementation based on the use of incomplete LU factorization coupled with a conjugate gradient method, and finally we shall describe some numerical results for this algorithm. .NH MATRIX GENERATION .PP In general, we are interested in solving linear systems @A x ~=~ b @ where @A@ is large, sparse and non-symmetric. A simple finite difference approximation to a Poisson equation with convection (or transport) terms can be used as the model problem for generating such matrices. The matrices used in this report will be generated in the following manner: .PP Consider the rectangular domain .EQ D ~=~ [ 0 , ~ L sub x ] ~ times ~ [ 0 , L sub y ] ~times~ [ 0 , ~ L sub z ] .EN in three dimensions and the elliptic equation .EQ I (2.1) -{ partial sup 2 phi } over { partial x sup 2 } ~-~ { partial sup 2 phi } over { partial y sup 2 } ~-~ { partial sup 2 phi } over { partial z sup 2 } ~+~ OMEGA ( x,y,z) phi ~+~ V vec ( x,y,z ) cdot grad phi ~=~ F (x,y,z) .EN subject to the boundary conditions @{ partial phi } over { partial n } ~=~ 0 @ on the four vertical faces, ( @ partial over { partial n } @ denotes the normal derivative), and either, .EQ I (2.2) phi ~=~ G sup L (x,y,0) ~or~ { partial phi } over {partial n } ~=~ 0 .EN on the bottom face, and either, .EQ I (2.3) phi ~=~ G sup U ( x,y , L sub z ) ~or~ {partial phi } over { partial n } ~=~ 0 .EN on the top face. If a Neumann condition is imposed on all faces, then we shall specify the value of @ phi @ at some point in order to ensure uniqueness. We shall assume that @ OMEGA , V vec , F @ are given bounded functions with @ OMEGA ~>=~ 0 @ everywhere in @D@. .PP For any three integers @ N sub x , N sub y, N sub z @ we generate a simple mesh centered difference approximations of the seven point type using centered differences on the convective terms. Let @ DELTA x ~=~ L sub x / N sub x @, @~ DELTA y ~=~ L sub y / N sub y, @ @DELTA z ~=~ L sub z / N sub z,@ @x sub i ~=~ i DELTA x,@ @0 <= i <= N sub x,@ @ y sub j ~=~ j DELTA y , ~ 0 <= j <= N sub y , z sub k ~=~ k DELTA z , @ @0 <= k <= N sub z @. We use an overbar to denote the midpoint of the mesh, for example @ x bar sub i ~=~ half ( x sub i-1 ~+~ x sub i ) @. We evaluate @ OMEGA , F , G sup L , ~and~ G sup U @ at the mesh centers; thus for example @ OMEGA sub i,k ~=~ OMEGA ( x bar sub i , y bar sub j , z bar sub k ) @ . The convection velocity @ V vec ~=~ ( V sup x , V sup y , V sup z ) @ is evaluated at the mid-faces of the edges of the mesh cells. Thus for example, .EQ V sub i sup x ~=~ V sup x ( x sub i , y bar sub j , z bar sub k ) , ~ V sub j sup y ~=~ V sup y ( x bar sub i , y sub j , z bar sub k ) , ~and~ V sub k sup z ~=~ V sup z ( x bar sub i , y bar sub j , z sub k ) . .EN Let any quantity associated with the @ ( i , j , k ) sup th @ cell be denoted with a zero subscript, the @(i +- 1 , j , k ) sup th @ cell with a @ left { pile { 1 above 2 } right } @, the @( i,j +- 1 , k ) sup th @ cell with a @ left { pile { 3 above 4 } right } @, and the @( i , j , k +- 1 ) sup th @ cell with a @left { pile { 5 above 6 } right } @ subscript. With this notation a seven point approximation for equation (2.1) has the form .EQ I (2.4) a sub 0 phi sub 0 ~+~ sum from l=1 to 6 a sub l phi sub l ~=~ f sub 0 .EN where at an interior cell .EQ I a)~~~a sub 0 ~=~ 2 ( 1 / DELTA x sup 2 ~+~ 1 / DELTA y sup 2 ~+~ 1 / DELTA z sup 2 ) ~+~ OMEGA sub 0 .EN .EQ I (2.5) b)~~~a sub 1 ~=~ - 1 / DELTA x sup 2 ~-~ V sub i-1 sup x / 2 DELTA x , ~~ a sub 2 ~=~ - 1 / DELTA x sup 2 ~+~ V sub i sup x / 2 DELTA x .EN .EQ I c)~~~ a sub 3 ~=~ - 1 / DELTA y sup 2 ~-~ V sub j-1 sup y / 2 DELTA y , ~~ a sub 4 ~=~ - 1/ DELTA y sup 2 ~+~ V sub j sup y / 2 DELTA y .EN .EQ I d) ~~~ a sub 5 ~=~ - 1 / DELTA z sup 2 ~-~ V sub k-1 sup z / 2 DELTA z , ~~ a sub 6 ~=~ - 1/ DELTA z sup 2 ~+~ V sub k sup z / 2 DELTA z .EN At the boundary cells these coefficients are modified according to the boundary conditions. For example, if @ ( i , j , k ) sup th @ cell is on the bottom interior face @( k = 1,~2 <= i <= N sub x -1 , ~ 2 <= j <= N sub y -1 ) @, then we use .EQ I (2.6) a sub 0 ~=~ 2 ( 1/ DELTA x sup 2 ~+~ 1 / DELTA y sup 2 ) ~+~ 3 / DELTA z sup 2 ~+~ OMEGA sub 0 ~+~ V bar sub 0 sup z / 2 DELTA z , ~~ a sub 5 ~=~ 0 .EN for Dirichlet conditions and .EQ I (2.7) a sub 0 ~=~ 2 ( 1/ DELTA x sup 2 ~+~ 1 / DELTA y sup 2 ) ~+~ 1 / DELTA z sup 2 ~+~ OMEGA sub 0 ~-~ V bar sub 0 sup z / 2 DELTA z , ~~ a sub 5 ~=~ 0 .EN for Neumann conditions where @ V bar sub 0 sup z ~=~ V sup z ( x bar sub i , y bar sub j , 0 ) @. Similar modifications occur on the other faces, edges, and corners depending on the imposed boundary conditions. We do not illustrate the modifications to the right hand side since we are primarily interested in generating matrices. We order the unknowns @ phi sub ijk @ using a @( kij )@ ordering with the linear index .EQ I (2.8) m ~=~ k ~+~(i - 1 ) N sub z ~+~ (j-1) N sub z N sub x , ~~ 1 <= k <= N sub z, ~ 1 <= i <= N sub x, ~ 1 <= j <= N sub y . .EN .PP In this manner we generate a class of large sparse matrices @ A @ which are typical of the class of matrices we are concerned with in this report. We observe that the convective terms insure that in general these matrices will be non-symmetric . This particular class of matrices is rather simple in that the second order terms in the elliptic operator come from the Laplacian. However, in the applications which motivated this study, the elliptical operator had variable coefficients in the second order terms. When considering methods, it is this larger and more general class of matrices which we have in mind. To illustrate the matrix structure we are dealing with we note that a @ 3 times 3 times 3 @ mesh would lead to a @ 27 times 27 @ matrix @ A @ with the sparsity pattern shown in Figure 1. .sp 30 .EQ FIGURE~1. .EN .EQ SPARSITY~PATTERN~FOR~3~ times ~3~ times ~3~~ CASE. .EN .NH BACKGROUND .PP As a background to some of the work going on in this area we will briefly sketch a few of the alternate approaches for solving this problem. .PP In a paper by Concus, Golub, and O'Leary [8] a generalized conjugate gradient method is presented. The original matrix @A@ is split into two matrices of the form @ M ~=~ half ( A ~+~ A sup T ) @ and @ N ~=~ half ( A ~-~ A sup T ) @, such that @ M ~=~ M sup T @, @ N ~=~ - N sup T @ and @Mx ~=~ Nx + f @. The form of the matrix @M@, it is hoped, is such that the system @ M eta ~=~ b @ is easy to solve. If this is not the case then one must resort to a scheme in which an inner and an outer iteration is used. The inner iteration is used to solve @ M eta ~=~ b @ and the outer iteration to find solution to the original problem. .PP In a paper by Manteuffel[9], a method is described based on Tchebychev polynominals in the complex plane. For the solution of the linear system @ A x ~=~ f @ the following iterative method can be used: .EQ I x sub i+1 ~= - alpha sub i A x sub i ~+~ ( 1 ~+~ beta sub i ) x sub i ~-~ beta sub i x sub i-1 ~+~ alpha sub i f @ .EN Manteuffel shows that this method converges to the solution of @ A x ~=~ f @ if the spectrum of @A@ is enclosed in an ellipse, with focii @d-c@ , @d+c@, in the right half plane and if @ alpha sub i@ and @ beta sub i @ are chosen to be .EQ I alpha sub i ~=~ { 2 T sub i (d/c) } over {c T sub i+1 (d/c) } .EN .EQ I beta sub i ~=~ { T sub i-1 (d/c ) } over { T sub i+1 ( d/c ) } .EN where @ T sub i @ is the @i sup th @ Tchebycheff polynomial .EQ I T sub i (z) ~=~ cosh ( i cosh sup -1 (z) ) .EN for @z@ complex. Manteuffel[10] has provided an algorithm for estimating the parameters @d@ and @c@ adaptively. .PP In order to improve the speed of convergence of the algorithm one may use a preconditioning matrix and solve the resulting system. Results involving a preconditioning of Manteuffel's algorithm are given by van der Vorst and van Kats [11]. They considered the use of a fast poisson solver, an incomplete Crout factorization (with a parameter included to avoid partial pivoting), and incomplete Cholesky factorization as preconditioners. Also fill-in on stripes adjacent to the bands in the original matrix is allowed in the Cholesky approach. Results for a discretization of a two spatially dimensioned convective-diffusion equation are provided. Van der Vost and Van Kats conclude the Manteuffel's algorithm with incomplete Crout preconditioning provides a competitive approach. .PP In two papers [12,13] Axelsson has studied iterative methods for the numerical solution of boundary value problems. In particular he extends the use of conjugate gradient methods to nonlinear problems and to constrained boundary value problems. In [13] he studies various incomplete factorizations, i.e. LU factorizations which approximate A. If, during factorization, we allow no fill-in or accept fill-in in certain entries we might expect to obtain a good incomplete factorization. However it is possible that the factorization process may be unstable since the dropping of fill-in may cause pivot elements to become negative (assuming the matrix was originally positive definite). In [14] Meijerink and van der Vorst show that for M-matrices the process is in fact stable. This is however not the case in general. To overcome this problem Gustafsson [15] presents a modification in which neglected fill-in elements are added to the diagonal. He proves this method to be stable for diagonally dominant matrices (although this method is again unstable for general matrices). Munksgaard and Axelsson [16] extend this approach by adding neglected elements to the diagonal or other elements in the row. A major property of this technique is that row sums are preserved. This is important since for second order Laplace problems using constant mesh spacing h [15] the condition number of A is @O( h sup -2 )@ whereas the condition number of the preconditioned matrix is @O( h sup -1 )@. This result holds if the rowsums of A and the preconditioning matrix differ by no more than @O(h)@. Thus by preserving rowsums we reduce the condition number of the preconditioned matrix which, in turn, affects the number of conjugate gradient iterations. In addition to the above preselection of fill-in elements, Munksgaard and Axelsson [16] have also considered the use of dynamic fill-in. In this approach fill-in elements are selected during factorization based on their magnitude. .PP Another approach to dealing with the instability of incomplete decomposition is presented by Kershaw [5]. Kershaw identifies the unstable pivots as decomposition proceeds and these pivots are perturbed to create a stable approximate factorization. Kershaw identifies the instable pivots as the decomposition proceeds. He characterizes these pivots in terms of the number of significant digits available on the machine and determines a correction to provide a "best possible" inverse. Also a bound on the resulting error matrix is provided. His approach is applicable to both complete and incomplete factorizations. Finally, Kershaw shows that for a complete factorization, if @p@ pivots have been perturbed by his procedure then with exact arithmetic, only @2p ~+~ 1@ conjugate gradient iterations are needed. .PP When considering splittings of the matrix A=M-N to be used with conjugate gradient algorithms we would like to have a criterion for selecting a "best" splitting. In [23] Greenbaum considers splittings when A is positive definite and symmetric. A sharp upper bound on the error is given and a comparision test is presented for determining when one splitting always yields a smaller error than another splitting. .PP In [17] Greenbaum considers the effect of finite precision arithmetic on the conjugate gradient algorithm. She shows that the effect of finite precision versus exact arithmetic is to cause the finite precision conjugate gradient algorithm to periodically resemble the exact arithmetic algorithm associated with an earlier step and different starting point. .sp .NH BACKGROUND TO CONJUGATE GRADIENT AND PRECONDITIONING .PP The method of conjugate gradients has been known for some time [7]. The algorithm in theory has finite termination after @n@ steps when the matrix, say @A@, of order @n@ is symmetric positive definite. Much of the early interest in the method was tarnished since the finite termination of the algorithm no longer holds in the presence of roundoff errors and as a direct method the algorithm is not competitive with Gaussian elimination either with respect to accuracy or number of operations. The algorithm has a simple form and can be easily translated into a computer program. A conjugate gradient algorithm can be stated as: .sp 2 Given a system @ A x ~=~ b@ of @n@ linear equations where the matrix @A@ is symmetric positive definite and an initial vector @x sub 0 @, form the corresponding residual .EQ r sub 0 ~=~ b ~-~ A x sub 0. .EN Set @p sub 0 ~=~ r sub 0 @ and for @ i ~=~ 0,1,2, ... @ find @x sub i+1 , r sub i+1 , p sub i+1 , a sub i , ~and~ b sub i @ using the equations .EQ a sub i ~=~ { r sub i sup T r sub i } over { p sub i sup T A p sub i } .EN .EQ x sub i+1 ~=~ x sub i ~+~ a sub i p sub i .EN .EQ (4.1) r sub i+1 ~=~ r sub i ~-~ a sub i A p sub i .EN .EQ b sub i ~=~ { r sub i+1 sup T r sub i+1 } over { r sub i sup T r sub i } .EN .EQ p sub i+1 ~=~ r sub i+1 ~+~ b sub i p sub i . .EN Termination is controlled by monitoring @ r sub i ~or~ p sub i @. .PP Some of the basic properties of the C-G method are [7]: .EQ (i)~~~~~ mark r sub i p sub j ~=~ 0 ~~~~~i > j , .EN .EQ I (ii)~~~~ lineup p sub i sup T A p sub j ~=~ 0 ~~~~~i != j , .EN .EQ I (iii)~~~ lineup when~ p sub 0 ~=~ r sub 0 ~~ then ~ r sub i sup T r sub j ~=~ 0 ~~~~~i != j , .EN .EQ I (iv)~~~ lineup when ~ p sub 0 ~=~ r sub 0 , ~the~ sets~ "{" p sub j "}" sub j=0 sup i-1 ~ and~ "{" r sub j "}" sub j=0 sup i-1 .EN .I .in +2 each form a basis for @ S sub i ~=~ Span "{" r sub 0 , A r sub 0 , ... , A sup i-1 r sub 0 "}" @ and the residual vector @ r sub i @ minimizes the quadratic form @Q ( r ) ~=~ r sup T A sup -1 r @ over all vectors of the form @r ~=~ r sub 0 ~+~ A s , ~ s \(mo S sub i @. Since @ S sub i+1 \(sp S sub i @ we see that @ Q(r sub i ) @ will monotonically decrease as @ i @ increases. .in .R .PP When dealing with very large and sparse matrices C-G has a very attractive property from an algorithmic standpoint, namely only inner products need be performed. The inner product calculations will involve a row of the matrix and a full vector. When the matrix is sparse the amount of work is diminished substantially. .PP Another attractive feature of the C-G method is that, unlike some of the other iterative method for finding solution to linear systems, one does not need an estimate of the extreme eigenvalues or other parameters for convergence. .PP In order to accelerate the convergence of the C-G method a widely used technique is to precondition the matrix @A@ in such a way as to cluster the eigenvalues, or said another way, to produce a matrix which is in some ways close to the identity matrix. .PP We turn now to the use of preconditioning techniques combined with C-G applied to non-symmetric matrices. We first observe from the C-G algorithm (4.1) that if @ A ~=~ I @, then the C-G method would converge in one step. In this case we have .EQ I r sub 0 ~=~ b ~-^ x sub 0 , .EN .EQ I (4.2) p sub 0 ~=~ r sub 0 , .EN .EQ I a sub 0 ~=~ { || r sub 0 || sup 2 } over { || r sub 0 || sup 2 } ~=~ 1, .EN .EQ I x sub 1 ~=~ x sub 0 ~+~ p sub 0 ~=~ x ~+~ b ~-~ x sub 0 ~=~ b . .EN .sp Thus, intuitively we will converge rapidly when the coefficient matrix is close, in some sense, to the identity matrix. This is the guiding principle in the use of preconditioning. .PP Consider a linear system .EQ I (4.3) B x ~=~ f , .EN where @B@ is not necessarily symmetric but is well conditioned. Suppose we have an approximate factorization .EQ I (4.4) B approx LU , .EN then the following relations are immediate .EQ I (4.5) B ( L U ) sup -1 ~ approx ~ I , ~ ( L U ) sup -T B sup T ~ approx I , .EN .EQ I (4.6) ( L U ) sup -1 B ~ approx ~ I , ~ B sup T ( L U ) sup -T ~ approx I , .EN .EQ I (4.7) L sup -1 B U sup -1 ~ approx ~ I , ~ U sup -T B sup T L sup -T ~ approx I . .EN These three sets of relations can be used to generate six variations of a preconditioned C-G method for non-symmetric matrices. .PP For the first three variations we start from the system of the form .EQ I (4.8) B sup T B x ~=~ B sup T f . .EN Our goal is to derive a system @A y ~=~ g@ where @A~=~D sup T D @ with @D ~ approx ~ I@. We observe that equation (4.8) is equivalent to the system .EQ I ( L U ) sup -T B sup T B ( L U ) sup -1 L U x ~=~ ( L U ) sup -T B sup T f . .EN Hence, setting .EQ I (4.9) A ~=~ ( L U ) sup -T B sup T B ( L U ) sup -1 ~=~ [ B ( L U ) sup -1 ] sup T [ B ( L U ) sup -1 ] .EN .EQ I y ~=~ L U x .EN .EQ I g ~=~ ( L U ) sup -T B sup T f .EN we have the system .EQ I A y ~=~ g , .EN with @ A ~ approx ~ I @ and symmetric positive definite. .PP The second variation uses relation (4.6) and starts from equation (4.3) by multipling by @ (L U ) sup -1 @. Then .EQ I ( L U ) sup -1 B x ~=~ ( L U ) sup -1 f , .EN from which we find .EQ I B sup T ( L U ) sup -T ( L U ) sup -1 B x ~=~ B sup T ( L U ) sup -T ( L U ) sup -1 f. .EN Setting .EQ I A ~=~ B sup T ( L U ) sup -T ( L U ) sup -1 B ~=~ [ ( L U ) sup -1 B ] sup T [ ( L U ) sup -1 B ] .EN .EQ I (4.10) y ~=~ x .EN .EQ I g ~=~ B sup T ( L U ) sup -T ( LU) sup -1 f .EN we have a system @ A y ~=~ g @ of the desired type. .PP The third variation starts with equation (4.3) and uses (4.7) to obtain .EQ I U sup -T B sup T L sup -T L sup -1 B U sup -1 U x ~=~ U sup -T B sup T L sup -T L sup -1 f . .EN Thus, setting .EQ I (4.11) A ~=~ U sup -T B sup T L sup -T L sup -1 B U sup -1 ~=~ [ L sup -1 B U sup -1 ] sup T [ L sup -1 B U sup -1 ] .EN .EQ I y ~=~ U x .EN .EQ I g ~=~ U sup -T B sup T L sup -T L sup -1 f .EN we have the system of the desired type. This variation was suggested in the appendix of [5]. .PP The last three variations are based on the equations, .EQ I (4.12) B B sup T z ~=~ f , ~~~ x ~=~ B sup T z . .EN We are drawn to this equation by an error analysis of Equation (4.12) by Paige [19]. In the analysis, Paige shows that the square of the condition of @B@ does not enter into the error bounds, only the condition number. .PP In this case, our goal is to obtain a system @Ay ~=~ g@ where @ A ~=~ DD sup T @ with @ D ~ approx ~ I@. Starting with @ B x ~=~ f @, it follows that .EQ I B ( L U ) sup -1 L U x ~=~ f , .EN and setting, .EQ I L U x ~=~ ( L U ) sup -T B sup T y , .EN we have .EQ I B ( L U ) sup -1 ( L U ) sup -T B sup T y ~=~ f . .EN Hence, setting .EQ I A ~=~ B ( L U ) sup -1 ( L U ) sup -T B sup T ~=~ [ B ( L U ) sup -1 ] [ B ( L U ) sup -1 ] sup T .EN .EQ I (4.13) x ~=~ ( L U ) sup -1 ( L U ) sup -T B sup T y .EN .EQ I g ~=~f .EN we have a system @ A y ~=~ g @. .PP Finally, starting with equation (4.9) and using relation (4.6), we find .EQ I ( L U ) sup -1 B B sup T ( L U ) sup -T ( L U ) sup T z ~=~ ( L U ) sup -1 f . .EN Thus we can set .EQ I (4.14) A ~=~ ( L U ) sup -1 B B sup T ( L U ) sup -T ~=~ [ ( L U ) sup -1 B ] [ ( L U ) sup -1 B ] sup T .EN .EQ I y ~=~ ( L U ) sup T z , ~~~~~ x ~=~ B sup T z .EN .EQ I g ~=~ ( L U ) sup -1 f , .EN to obtain a system of the desired type. .PP Starting with @ B x ~=~ f @ and having relation (4.7) in mind, we find .EQ I L sup -1 B U sup -1 U x ~=~ L sup -1 f , .EN then setting @ U x ~=~ U sup -T B sup T L sup -T y @, we find .EQ I L sup -1 B U sup -1 U sup -T B sup T L sup -T y ~=~ L sup -1 f . .EN Setting .EQ I A ~=~ L sup -1 B U sup -1 U sup -T B sup T L sup -T ~=~ [ L sup -1 B U sup -1 ] [ L sup -1 B U sup -1 ] sup T .EN .EQ I (4.15) g ~=~ L sup -1 f , .EN .EQ I x ~=~ U sup -1 U sup -T B sup T L sup -T y .EN we have a system @ A y ~=~ g @ of the desired type. .PP For each of the six variants, the basic C-G algorithm (4.1) can be applied. For the first three variants @ A ~=~ D sup T D @, while for the last three @ A ~=~ D D sup T @, where the matrix @D@ is the preconditioned matrix of the form @B ( LU) sup -1 @, @(LU) sup -1 B @, or @ L sup -1 B U sup -1 @. To summarize, when we apply the C-G algorithm to each of these variants, we obtain the following algorithms. .TS center,doublebox; c|c|c|c. I II III _ @D ~=~ B ( L U ) sup -1@ @D ~=~ ( L U ) sup -1 B@ @D ~=~ L sup -1 B U sup -1@ @D ~=~ B ( L U ) sup -1@ @y ~=~ LUx@ @y ~=~ x@ @y ~=~ Ux@ @g ~=~ D sup T f @ @g ~=~ D sup T ( LU) sup -1 f@ @g ~=~ D sup T L sup -1 f @ \R- \R- \R- \R- initial @y sub 0 ~=~ LUx sub 0 , r sub 0 ~=~ f - Bx sub 0 @ phase @R sub 0 ~=~ D sup T r sub 0 @ @R sub 0 ~=~ D sup T (LU) sup -1 r sub 0@ @R sub 0 ~=~ D sup T L sup -1 r sub 0@ @p sub 0 ~=~ R sub 0 @ @p sub 0 ~=~ R sub 0 @ @p sub 0 ~=~ R sub 0 @ \R- \R- \R- \R- .T& c|c c c. @a sub ~=~ || R sub i || sup 2 / || Dp sub i || sup 2 @ @y sub i+1 ~=~ y sub i ~+~ a sub i p sub i @ Iterative @R sub i+1 ~=~ R sub i ~-~ a sub i D sup T D p sub i @ phase @b sub i ~=~ || R sub i+1 || sup 2 / || R sub i || sup 2 @ @ p sub i+1 ~=~ R sub i+1 ~+~ b sub i p sub i @ \R- \R- \R- \R- .T& c|c|c|c. Final phase @x ~=~ (LU) sup -1 y@ @x ~=~ y @ @x ~=~ U sup -1 y @ .TE .I .ce Table 4.1 .R .TS center,doublebox; c|c|c|c. IV V VI _ @D ~=~ B ( L U ) sup -1@ @D ~=~ ( L U ) sup -1 B@ @D ~=~ L sup -1 B U sup -1@ @D ~=~ B ( L U ) sup -1@ @y ~=~ z@ @y ~=~ (L U ) sup T z@ @y ~=~ z@ @g ~=~ f @ @g ~=~ ( LU) sup -1 f@ @g ~=~ L sup -1 f @ @x ~=~ (LU) sup -1 (LU) sup -T B sup -T z @ @x ~=~ B sup T z @ @x ~=~ U sup -1 U sup -T B sup T L sup -1 z @ \R- \R- \R- \R- initial @y sub 0 ~=~ x sub 0 , g ~=~ f @ @ y sub 0 ~=~ x sub 0 , g ~=~ (LU) sup -1 f @ @y sub 0 ~=~ x sub 0 , ~ g ~=~ L sup -1 f @ phase @R sub 0 ~=~ g ~-~ D D sup T y sub 0 @ @R sub 0 ~=~ g ~-~ D D sup T y sub 0 @ @R sub 0 ~=~ g ~-~ D D sup T y sub 0 @ @p sub 0 ~=~ R sub 0 @ @p sub 0 ~=~ R sub 0 @ @p sub 0 ~=~ R sub 0 @ \R- \R- \R- \R- .T& c|c c c c. @a sub ~=~ || R sub i || sup 2 / || Dp sub i || sup 2 @ @y sub i+1 ~=~ y sub i ~+~ a sub i p sub i @ Iterative @R sub i+1 ~=~ R sub i ~-~ a sub i D sup T D p sub i @ phase @b sub i ~=~ || R sub i+1 || sup 2 / || R sub i || sup 2 @ @ p sub i+1 ~=~ R sub i+1 ~+~ b sub i p sub i @ \R- \R- \R- \R- .T& c|c|c|c. Final phase @x ~=~ (LU) sup -1 (LU) sup -T B sup T y@ @x ~=~ B sup T (LU) sup -1 y @ @x ~=~ U sup -1 U sup -T B sup T L sup -T y @ .TE .I .ce Table 4.2 .R .PP We note that when the algorithms are written in this form, the iterative portion of each algorithm is the same. Also note that in the last three variants, the algorithms are operating in the @y-space@ and thereby requires an initial estimate @ y sub 0 @. If these latter three algorithms were imbedded in some larger iterative procedure which was based in the @x-space@, the @y@-vector would have to be stored between calls in order to use the previous time step solution as an initial approximation for the next time step. We also observe that for the last three variants, we have equated a guess @ x sub 0 @ in the @x@-space to a guess @ y sub 0 @ in the @y@-space, ignoring the fact that @x sub 0 ~ != ~ y sub 0 @ and in fact are related by the expressions shown in the final phase of Table 4.2. .PP Considering the first three algorithms in @Table 4.1@, we observe that in each case five vectors are needed, the given vectors @ x and f@ with the three auxilary vectors @r, p,@ and @s@. In terms of computational work we see that the first three variants differ only in the final phase with variant II having the least work. For the last three variants in @Table 4.2@, we observe that in general we will have to keep six vectors; @ x@ and @f@ with the four auxiliary vectors @y,~r,~p,@ and @s@. The last three variants do differ in the work involved in three initial and final phases. However, multipling a vector by the matrices @L sup -1 @ or @ U sup -1 @ involves only @ half n sup 2 @ operations, thus all three variants involve the same amount of work. .NH PROVIDING FILL IN FOR INCOMPLETE FACTORIZATION .PP The incomplete factorization that has been described in section 4, allowed storage for elements of @L@ and @U@ in the same positions as in the original matrix @A@. No additional elements where allowed to be formed during the decomposition. If this condition is relaxed, and fill in is allowed to occur in positions other than that occupied by the original matrix, a family of factorizations can be produced [4]. .PP Intuitively it is hoped that the approximation factors, @L@ and @U@ will more closely approximate the "true" factors of the original matrix as more and more fill in is provided. Thus, @D sup T D@ or @DD sup T @ will more closely approximate the identity and fewer iterations will be expected in order to obtain a solution. On the other hand, as the fill-in increases, the computation time per iteration increases, so that the overall computation time may not be decreased even though the number of iterations is reduced. When dealing with fill-in, there are three aspects to be considered. First the amount of fill-in, second, the location of the fill-in, and third, the nature of the fill-in. We have restricted our fill-in to occur on diagonal stripes adjacent to the original pattern. With this fill-in pattern, we have considered the effect of providing additional fill-in stripes. (In our approach, we have provided additional storage for fill in to occur on these diagonal stripes.) .NH ERROR ANALYSIS .PP The iterative phase of the above algorithms is centered around the computation of @ f ~=~ D sup T D p @ or @ f ~=~D D sup T p@ where @D@ is one of the following three quantities; @A(LU) sup -1 @, @ (LU) sup -1 A@, or @U sup -1 A L sup -1 @. In examining the behavior of the algorithm it is important to understand the nature of the errors made in this computation. .PP We will concentrate on the equation @f ~=~ D sup T D p @ where @D ~=~ A ( LU) sup -1 @, the analysis follows in a similar fashion for the other definitions of @D@ and @f@. We let @ B ~=~ LU @ and the equation for @f@ can be written as: .sp .EQ I f ~=~ B sup -T A sup T A B sup -1 p. .EN .sp We are interested in determining how the computed @f@, say @f bar @, is effected by the condition of @B@. .PP We will perform the following operations to compute @f@; .sp .EQ I B p sub 1 ~=~ p ~;~~ solve~for~p sub 1 .EN .EQ I p sub 2 ~=~ A p sub 1 ~;~~form ~p sub 2 .EN .EQ I p sub 3 ~=~ A sup T p sub 2 ~;~~form~p sub 3 .EN .EQ I B sup T f ~=~ p sub 3 ~;~~ solve ~for~ f. .EN .sp There are four sources of errors, one in each of the above steps. Using a Wilkinson[18] rounding error analysis we know that the computed solution @f bar @ will satisfy .sp .EQ I (B ~+~ E sub 1 ) p bar sub 1 ~=~ p .EN .EQ I p bar sub 2 ~=~ ( A ~+~ E sub 2 ) p bar sub 1 .EN .EQ I p bar sub 3 ~=~ ( A sup T ~+~ E sub 3 ) p bar sub 2 .EN .EQ I (B sup T ~+~ E sub 4 ) f bar ~=~ p bar sub 3 , .EN .EQ I where~ || E sub i || ~ <= ~ epsilon sub i || A || ~for ~i~=~ 2~and~3, || E sub i || ~ <= ~ epsilon sub i || B || ~for~i~=~1~and~4. .EN .sp The error bounds for @ f bar @ can be stated as: .sp .I Theorem: If @ || A || ~ <= ~ sigma || B || @ then @ { || f ~-~ f bar || } over { || p || } ~ <= ~ 6 xi kappa sigma || A B sup -1 || ~+~ 9 xi sup 2 kappa sup 2 sigma sup 2 @, where @ kappa ~=~ || B || ~ || B sup -1 || @, @ sigma ~ <= ~ 1 @, and @ xi @ reflects the machine precision. .sp Proof: .sp .R The computed @f@ can be written as .sp .EQ I f bar ~=~ ( B sup -T ~+~ F sub 1 ) ( A sup T ~+~ E sub 3 )(A ~+~ E sub 4 ) (B sup -1 ~+~ F sub 2 ) p . .EN .sp where @ || E sub i || ~ <= ~ epsilon sub i || A || @ , @ || F sub i || ~ <= ~ epsilon sub i || B sup -1 || @ and @ epsilon sub i ~ <= ~ g(n) epsilon @, @g(n)@ is a modest function of @n@, the order of the matrix, and @ epsilon @ is the machine precision. Expanding we get .sp .EQ I f bar ~=~ ( B sup -T A sup T ~+~ B sup -T E sub 3 ~+~ F sub 1 A sup T ~+~ F sub 1 E sub 3 )( A B sup -1 ~+~ A F sub 2 ~+~ E sub 4 B sup -1 ~+~ E sub 4 F sub 2 ) p. .EN .sp If we let .sp .EQ I H ~=~ A B sup -1 .EN .EQ I G sub 1 ~=~ B sup -T E sub 3 ~+~ F sub 1 A sup T ~+~ F sub 1 E sub 3 .EN .EQ I G sub 2 ~=~ A F sub 2 ~+~ E sub 4 B sup -1 ~+~ E sub 4 F sub 2 .EN .sp then .EQ I f bar ~=~ ( H sup T ~+~ G sub 1 )( H ~+~ G sub 2 ) p .EN .sp We can now determine bounds for @ G sub 1 , ~ G sub 2 @. If we assume @ || A || ~ <= ~ sigma || B || @ then .sp .EQ I || G sub 1 || mark ~ <= ~ epsilon sub 3 || A || ~ || B sup -1 || ~+~ epsilon sub 1 || A || ~ || B sup -1 || ~+~ epsilon sub 1 epsilon sub 3 || A || ~ || B sup -1 || .EN .EQ I lineup ~ <= ~ ( 2 xi kappa ~+~ xi sup 2 kappa ) sigma .EN .EQ I lineup ~ <= ~ 3 xi kappa sigma .EN .sp .EQ I || G sub 2 || mark ~ <= ~ epsilon sub 2 || A || ~ || B sup -1 || ~+~ epsilon sub 4 || A || ~ || B sup -1 || ~+~ epsilon sub 2 epsilon sub 4 || A || ~ || B sup -1 || .EN .EQ I lineup ~ <= ~ ( 2 xi kappa ~+~ xi sup 2 kappa ) sigma .EN .EQ I lineup ~ <= ~ 3 xi kappa sigma .EN .sp where @ kappa ~=~ || B || ~ || B sup -1 || @ and @ xi ~=~ max from i ~{ epsilon sub i }@. Then we can write the errors made in computing @ f @ as .sp .EQ I { || f ~-~ f bar || } over { || p || } ~~ <= ~~ 6 xi kappa sigma || H || ~+~ 9 xi sup 2 kappa sup 2 sigma sup 2 . ~~~~~~~~~~~~~~~~~~~~~~\(sq .EN .sp .PP If @ xi kappa ~ < ~ 1 @ then the order of magnitude of the error bound is the same as that for the direct solution. Whenever the square of the condition number occurs in the error bound for the solution, it is effectively multiplied by the square of the precision or something smaller. The @ || H || @ is a reflection of how effective the preconditioner is. This quantity is expected to be close to order unity. .sp .NH RESULTS .PP The matrices used in this study were generated from the finite difference equations described in section 2. We choose the domain to be the unit cube (@ L sub x ~=~ L sub y ~=~ L sub z ~=~ 1 @) and throughout the study we used the following data. .EQ I V sup x ( x ,y,z) ~=~ 800 x(1~-~x)y(1~-~y)zR sup x (x,y,z) .EN .EQ I (7.1) V sup y ( x ,y,z) ~=~ 800 x(1~-~x)y(1~-~y)zR sup y (x,y,z) .EN .EQ I V sup z ( x ,y,z) ~=~ 4xyz sup 2 .EN where @ R sup x (x,y,z) ~=~ left { pile {1 above {x ~-~ half }} right } @, @ R sup y ( x,y,z) ~=~ left { pile { 1 above { y ~-~ half }} right } @. When @ R sup x ~=~ R sup y ~ == ~ 1@, the problem type will be referred to as non-rotational; while the other case will be referred to as rotational. For our problem the absorbtion distribution, @OMEGA ( x,y,z)@ is zero, and the source distributation, @F(x,y,z) ~=~ x sup 2 yz@. For the boundary condition, when Dirchlet conditions are used, .EQ G sup L (x,y,0) ~ == ~ 1 ~~and~~G sup U (x,y,1) ~ == ~ 2 . .EN We also allow Neumann conditions to be imposed on either the top or the bottom or both. When Neumann conditions are used on both the top and bottom, the resulting singular matrix is modified by zeroing out the first row and column and placing a constant on the diagonal. This would correspond to fixing the value of the solution in the first mesh cell. .PP Within the class of matrices generated in this manner, we have the following parameters at our disposal: .sp .in 1 .I a) Size of the mesh, i.e. the order of the matrix. .sp b) Combination of boundary conditions used. For example, Dirichlet top and bottom, Dirichlet top and Neumann bottom, Neumann top and bottom, etc. .sp c) Use of a rotational or non-rotational velocity field. .R .in .PP In this numerical study we have focused our attention on the following three areas. .br 1. Differences in the behavior of the six varients discussed in section 3. .br 2. Effects of fill-in along stripes on the computational times for achieving a solution. .br 3. Comparison of these six variants with an algorithm designed and implemented by Manteuffel [9,10] based on the use of Tchebychev iteration for non-symmetric linear systems. .br .PP To illustrate the differences in the six varients we used a problem based on a @ 7 times 7 times 7 @ mesh with no rotation in the velocity field. We used two types of boundary conditions. The first case used Dirichlet conditions on the top and bottom while the second case used Neumann conditions on the top and bottom with the solution value fixed in the first mesh cell. In both cases we have compared the six varients using no fill-in and using one inner and one outer stripe for fill-in. The results are shown in Tables 7.1 and 7.2. (The computations were run on a VAX 11/780 under UNIX and the timings are reported in seconds. For all the tables @D-D@ refers to Dirchlet conditions imposed on the top and bottom, and @N-N@ refers to Neumann conditions imposed on the top and bottom.) .sp .TS center,box; c|c s||c s c|c s||c s c|c|c||c|c n|n|n||n|n. no fill-in one inner and outer storage - 2287 fill-in storage 3551 _ variant iterations time iterations time _ 1 40 31 32 30 2 36 27 28 27 3 46 33 36 33 4 39 29 31 30 5 35 27 27 27 6 45 32 35 33 .TE .EQ Table 7.1 .EN .EQ D-D .EN .TS center,box; c|c s||c s c|c s||c s c|c|c||c|c n|n|n||n|n. no fill-in one inner and outer storage - 2287 fill-in storage 3551 _ variant iterations time iterations time _ 1 62 43 50 46 2 50 36 42 38 3 70 48 57 50 4 60 41 49 44 5 48 35 41 38 6 66 46 53 48 .TE .EQ Table 7.2 .EN .EQ N-N .EN .sp In all cases we see that the performance of the @2 sup nd @ and the @ 5 sup th @ variants are the best while the @ 3 sup rd@ and the @ 6 sup th @ variants perform the worst. Moreover the variations in performance are significant with a reduction factor of at least 0.75 in all cases. .PP Using the same two test problems we studied the effects of fill-in on the number of iterations and ultimately on the iteration time. The results for the @ 2 sup nd @ variant are shown in Table 7.3 and 7.4 which give the number of iterations needed to achieve convergence for a given pair @ (m,n)@, where @m@ is the number of inner fill-in stripes and @n@ is the number of outer fill-in stripes. The results are surprising and discouraging. They show that the number of iterations needed for convergence is extremely insensitive to the number of fill-in stripes. Clearly there is no point in considering fill-in beyond the (1,1) pattern when considering iteration time. From Tables 7.5 and 7.6 it is clear that for these cases, there is no point in using any fill-in. .sp .KS .TS center,box; c|c s s s s s c|n n n n n n n|n n n n n n. number inner stripes outer 0 1 2 3 4 5 _ 0 36 34 34 34 34 33 1 34 28 29 29 29 29 2 34 29 28 29 29 29 3 34 29 29 29 29 29 4 34 29 29 29 29 29 5 34 29 29 29 29 29 .TE .EQ Table 7.3 .EN .EQ Number~of~iterations~for~D-D~,~2 sup nd ~variant .EN .KE .KS .TS center,box; c|c s s s s s c|n n n n n n n|n n n n n n. number inner stripes outer 0 1 2 3 4 5 _ 0 50 47 47 47 47 47 1 47 42 42 42 42 43 2 47 42 42 42 43 43 3 47 42 42 43 43 43 4 47 42 42 43 43 43 5 47 42 42 42 42 43 .TE .EQ Table 7.4 .EN .EQ Number~of~iterations~for~N-N~,~2 sup nd ~variant .EN .KE .sp One observation made in the course of these experiments is that although the number of iterations decreases by increasing the amount of fill-in allowed in the matrix, the overall time to solve the problem never decreases. This can be understood by looking at the iterative portion of the algorithm, which is based on matrix-vector products. As the matrix is allowed to have more non-zero elements the time to do a matrix-vertor product increases. This can be seen in Tables 7.5 and 7.6. .sp .KS .TS center,box; c|c s c|n n n|n n. number inner outer 0 1 _ 0 27 29 1 29 27 .TE .EQ Table 7.5 .EN .EQ Iteration~Time~for ~D-D, ~2 sup nd ~ variant .EN .KE .KS .TS center,box; c|c s c|n n n|n n. number inner outer 0 1 _ 0 36 38 1 37 38 .TE .EQ Table 7.6 .EN .EQ Iteration~Time~for ~N-N, ~2 sup nd ~ variant .EN .KE .sp .PP We have compared the method as described above with a non-symmetric linear equation solver based on the use of Tchebychev polynominals in the complex plane. The Tchebychev iteration is based on work of Manteuffel[9,10], and is adaptive in the choice of estimating the optimal iteration parameters. This method will converge whenever the spectrum of @A@ can be enclosed in an ellipse that does not contain the origin. It can be shown that the Tchebychev iteration is optimal over all other polynomial based methods. The software to do the Tchebychev iteration was provided by Manteuffel. .PP In the comparison we took a finite difference mesh of @ 15 times 15 times 30@ mesh cells which led to a system of order 6750, with 46288 non-zero coefficients in the matrix for a density of about 0.1 per cent. In the first case we used Dirichlet conditions on top and bottom @(D-D)@, and in the second case we used Neumann conditions on top and bottom sides @(N-N)@, with a point fixed to insure a unique solution as before. The preconditioner was the same in both cases. .sp .KS .TS center,box; c|c s c|c c c|c c. method conditions CG Tchebychev _ D-D 65 min 51 min 168 iter 144 iter N-N 82 min 346 min 248 iter 2686 iter .TE .EQ Table 7.7 .EN .EQ Comparison~ CG~ vs~ Tchebychev .EN .KE .sp Table 7.7 presents results for the total computation time and iterations. When considering time note that the Tchebyshev method requires periodic reevaluation of the eigenvalue estimates. In Figue 7.1 graphs of the iteration count versus the residual are given. Figure 7.1a gives results for (D-D) and Figure 7.1b gives results for (N-N). In Figure 7.1c the (N-N) results are graphed for just the first 400 iterations. Note that the CG residual is not a monotonically decreasing function since the only residual guaranteed to decrease is @R sub i @ for the second variant in Table 4.2 and the graph deals with the residual in equation (4.1). .sp .KS .sp 23 .EQ Table ~~ 7.1a ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Table~~7.1b .EN .KE .KS .sp 23 .EQ Table ~~ 7.1c .EN .KE .sp With both methods we require a residual to be less than @10 sup -13 @ before convergence was signaled. The results from this problem show that the CG variant (variant II of table 4.1) used is competive with the Tchebychev iteration for Dirichlet conditions. When Neumann conditions are imposed, the Tchebychev approach is very slow to converge. (It should be noted that the default parameters were used in the Tchebychev code.) If no information is known about the problem the CG variant would be a better choice. .sp .NH REFERENCES .R .sp 1 1. F.H. Harlow and A.A. Amsden, .I Flow of Interpenetratary Material Phases, .R J. Comp. Phys. 18, 440-464 (1975). .sp 1 2. V.L. Shah, et. al, .I Some Numerical Results with the COMMIX-2 Computer Code, .R Technical Memorandum ANL-CT-79-30 (1979). .sp 1 3. A. Greenbaum, .I Comparison of Splittings Used with the Conjugate Gradient Algorithm, .R Numer. Math. 33, 181-194 (1979). .sp 1 4. O. Axelsson and N. Munksgaard, .I A Class of Preconditioned Conjugate Gradient Methods for the Solution of a Mixed Finite Element Discretization of the Biharmonic Operator, .R Inter. J. for Num. Meth. Eng. 14, 1001-1019 (1979). .sp 1 5. D. Kershaw, .I On the Problem of Unstable Pivots in the Incomplete LU-Conjugate Gradient Method, .R J. Comp. Phy. 38, 114-123 (1980). .sp 1 6. J. Daniel, .I The Approximate Minimization of Functionals, .R Prentice-Hall, 1971. .sp 1 7. M.R. Hestenes and E. Stiefel, .I Methods of Conjugate Gradients for Solving Linear Systems, .R NBS J. Res. 49, 409-436 (1952). 8. P. Concus, G. Golub, and D. O'Leary, .I A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Differential Equations, .R Sparse Matric Computations, Ed. J. Bunch and D. Rose, Academic Press (1976). .sp 1 9. T. Manteuffel, .I The Tchebychev Iteration for Nonsymmetric Linear Systems, .R Numer. Math. 28, 307-327, (1977). .sp 1 10. T. Manteuffel, .I Adaptive Procedure for Estimating Parameters for the Nonsymmetric Tchebychev Iteration .R Numer. Math. 31, 183-208, (1978). .sp 1 11. H.A. van der Vorst and J.M. van Kats, .I Manteuffel's Algorithm with Preconditioning for the Iterative Solution of Certain Sparse Linear Systems with a Nonsymmetric Matrix, .R Academisch Computer Centrum Report TR-11, Utrecht, The Netherlands, August, 1979. .sp 1 12. O. Axelsson, .I On Optimization Methods in the Numerical Solution of Boundary Value Problems: A Survey, .R Univ. of Texas Center for Numerical Analysis Report CNA-137, Austin, Texas, 1978. .sp 1 13. O. Axelsson, .I Conjugate Gradient Type Methods for Unsysmmetric and Inconsistent Systems of Linear Equations .R Linear Algebra and its Applications 29:1-16 (1980). .sp 1 14. J.A. Meijerink and H.A. van der Vorst, .I An Iterative Solution Method for Linear Systems of which the Coefficient Matrix is a Symmetric M-matrix, .R Math. of Comp., 31, 148-162 (1977). .sp 1 15. I. Gustafsson, .I A Class of First Order Factorization Methods, .R BIT, 18, 142-156 (1978). .sp 1 16. N. Munksgaard and O. Axelsson, .I Analysis of Incomplete Factorizations with Fixed Storage Allocation, .R submitted SIAM Jour. on Scientific and Statistical Computing. .sp 1 17. A. Greenbaum, .I Behavior of the Conjugate Gradient Algorithm in Finite Precision Arithmetic, .R Lawerence Livermore Laboratory, UCRL-85752, March 1981. .sp 1 18. J.H. Wilkinson, .I Rounding Errors in Algebraic Processes, .R Notes on Applied Sciences No. 32, Her Majesty's Stationary Office, London, Prentice-Hall, New Jersey (1963). .sp 1 19. C.C. Paige, .I An Error Analysis of a Method for Solving Matrix Equations, .R Math. Comp. 27, 355-359, April 1973.