In Section 5, a logical (or virtual) matrix decomposition
was described in
which the global index (m,n) is mapped to a position, (p,q), in a logical
process template, a position, (b,d), in a logical array of blocks local to the
process, and a position, (i,j), in a logical array of matrix elements local
to the block. Thus, the block cyclic decomposition is hierarchical, and
attempts to represent the hierarchical memory of advanced-architecture
computers. Although the parallel LU factorization algorithm can be specified
solely in terms of this logical hierarchical memory, its performance
depends on how the logical memory is mapped to physical memory.