A consequence of the requirement that is that the block loops may appear in any order.
Suppose, without loss of generality, that
Then the flux of data
per unit surface area
across the
faces of the tiles normal to is greater than that across the other faces. We would
choose to make the block loop innermost. This is because we would avoid storing to memory the data
that flow across the faces normal to when going from one tile to the next.
This has the effect, for example, of causing us to choose a
``left-looking'' block Gaussian elimination or block Householder QR
method in preference to a ``right-looking'' method,
which helps to reduce the memory traffic further.
See the examples of Section 7 for illustration of how this technique should be applied.