As described for his chemistry application in Section 8.2, Hipes has studied the use of the Gauss-Jordan (GJ) algorithm as a means of solving systems of linear equations [Hipes:89b]. On a sequential computer, LU factorization followed by forward reduction and back substitution is preferable over GJ for solving linear systems since the former has a lower operation count. Another apparent drawback of GJ is that it has generally been believed that the right hand sides must be available a priori, which in applications requiring the solution for multiple right-hand sides is a handicap. Hipes' work has shown that this is not the case, and that a well-written, parallel GJ solver is significantly more efficient than using LU factorization with triangular solvers on hypercubes.
As noted by Gerasoulis, et al. [Gerasoulis:88a], GJ does not require the solution of triangular systems. The solution of such systems by LU factorization features an outer loop of fixed length and two inner loops of decreasing length, whereas GJ has two outer fixed-length loops and only one inner loop of decreasing length. GJ is, therefore, intrinsically more parallel than the LU solver, and its better load balance compensates for its higher operation count. Hipes has pointed out that the multipliers generated in the GJ algorithm can be saved where zeros are produced in the coefficient matrix. The entries in the coefficient matrix are, therefore, overwritten by the GJ multipliers, and we shall call this the GJ factorization (although we are not actually expressing the original matrix A as the product of two matrices). It is now apparent that the right-hand side matrix does not have to be known in advance, since a solution can be obtained using the previously computed multipliers. Another factor, noted by Hipes, favoring the use of the GJ solver on a multiprocessor, is the larger grain size maintained throughout the GJ factorization and solution phases, and the lower communication cost in the GJ solution phase.
Hipes has implemented his GJ solver on the Caltech/JPL Mark III and nCUBE-1 hypercubes, and compared the performance with the usual LU solver [Hipes:89d]. In the GJ factorization, a scattered column decomposition is used, similar to that shown in Figure 8.1(d). This ensures good load balance as columns become eliminated in the course of the algorithm. In the LU factorization, both rows and columns are eliminated so a scattered block decomposition is used. On both machines, it was found that the GJ approach is faster for sufficiently many right-hand sides.