The LU factorization applies a sequence of Gaussian eliminations to form , where and are matrices, and is an matrix. is unit lower triangular (lower triangular with 1's on the main diagonal), is upper triangular, and is a permutation matrix, which is stored in a vector.
At the -th step of the computation (), it is assumed that the submatrix of (, ) is to be partitioned as follows,
where the block is , is , is , and is . is a unit lower triangular matrix, and is an upper triangular matrix.
At first, a sequence of Gaussian eliminations is performed on the first panel of (i.e., and ). Once this is completed, the matrices , , and are known, and we can rearrange the block equations,
The LU factorization can be done by recursively applying the steps outlined above to the matrix . Figure 3 shows a snapshot of the block LU factorization. It shows how the column panel, and , and the row panel, and , are computed, and how the trailing submatrix is updated. In the figure, the shaded areas represent data for which the corresponding computations are completed. Later, row interchanges will be applied to and .
Figure 3: A snapshot of block LU factorization.
The computation of the above steps in the LAPACK routine, DGETRF, involves the following operations:
The corresponding parallel implementation of the ScaLAPACK routine, PDGETRF, proceeds as follows: