From the discussion above, we see that a valid partition of may be obtained by choosing any integer vector in . The tiles so obtained are slices of the index space . These are not sufficiently small, however, to allow for all their necessary data to fit in the local memory of a given computer.
In terms of the corresponding program, tiling by slicing with a single hyperplane can be achieved by a loop reindexing of one loop followed by strip mining of that loop (and only that loop) and interchange to make the one resulting strip-loop outermost. In the case of matrix multiply, for example, this would result in a four-loop program in which the innermost three loops do operations and use data. (For, no matter which loop we strip mine, one of the matrices is indexed by the two unchanged loop indices and so is completely accessed.)
As the matrix multiply example indicates, we need to be able to strip mine all the loops in order to be able to work with tiles whose data sets can be made arbitrarily small. In this section we investigate the problem of fully tiling loop nests.
We can state this problem as follows. Given the index space and the dependence matrix D, choose k linearly independent vectors (the columns of A are a basis for k-space) such that each .
The partition induced by A and a k-vector of block size parameters b is obtained by slicing into slices of thickness with a knife aligned perpendicular to , then slicing again with thickness and with the knife rotated so that it is perpendicular to (making long, narrow strips rather than slices) and so on, until one has sliced k times, finally obtaining tiles that are shaped, except at the boundaries of , like parallelepipeds whose faces are perpendicular to the basis vectors.
Thus, in order to fully tile a loop nest with arbitrary dependences D, we must be able to choose a basis in the cone .
Should we be satisfied with any such basis? What if its elements are nearly linearly dependent? Then we have tiles that are quite elongated, with some very small angles and a low ratio of volume (which measures the number of lattice points, or iterations to be performed) to surface area. The surface area is a measure of the amount of data that must be moved into local memory in order to carry out the work of a tile. In general, the data moved in is the data required because of dependences of iterations in the tile on iterations of other tiles. The iterations near the edges require this data from outside.
The (k-1) dimensional volume of the tile, which grows like , is also a measure of the amount of local memory needed to carry out the work of each tile.
Let us therefore calculate how the choice of A determines the volume-to-surface ratio of the induced tiles. We first answer the question for the tiling that results when . We obtain a formula for the ratio when , then we show how varying changes both the ratio and the amount of local storage needed. In later sections we consider generalizing to tiles with non-unit aspect ratios.