From the discussion above, we see that a valid partition of may be obtained by choosing any integer vector
in
. The tiles so obtained are slices of the index space
. These are not sufficiently small,
however, to allow for all their necessary data to fit in the local memory of a given computer.
In terms of the corresponding program, tiling by slicing with a single hyperplane can be achieved by
a loop reindexing of one loop followed by strip mining of that loop (and only that loop)
and interchange to make the
one resulting strip-loop outermost. In the case of matrix multiply, for example, this would result in a
four-loop program in which the innermost three loops do operations and use
data.
(For, no matter which
loop we strip mine, one of the matrices is indexed by the two unchanged loop indices and so is completely
accessed.)
As the matrix multiply example indicates, we need to be able to strip mine all the loops in order to be able to work with tiles whose data sets can be made arbitrarily small. In this section we investigate the problem of fully tiling loop nests.
We can state this problem as follows. Given the index space
and the dependence matrix D, choose k linearly independent vectors
(the columns of A are a basis for k-space) such that each
.
The partition induced by A and a k-vector of block size
parameters b is obtained by slicing into slices of thickness
with a knife aligned perpendicular to
, then slicing again with thickness
and with the knife rotated so that it is
perpendicular to
(making long, narrow
strips rather than slices) and so on, until one has sliced k times, finally obtaining tiles
that are shaped, except at the boundaries of
, like parallelepipeds whose faces are perpendicular to the
basis vectors.
Thus, in order to fully tile a loop nest with arbitrary dependences D, we must be able to choose a basis
in the cone .
Should we be satisfied with any such basis? What if its elements are nearly linearly dependent? Then we have tiles that are quite elongated, with some very small angles and a low ratio of volume (which measures the number of lattice points, or iterations to be performed) to surface area. The surface area is a measure of the amount of data that must be moved into local memory in order to carry out the work of a tile. In general, the data moved in is the data required because of dependences of iterations in the tile on iterations of other tiles. The iterations near the edges require this data from outside.
The (k-1) dimensional volume of the tile, which grows like ,
is also a measure of the amount of local memory needed to carry out the work of each tile.
Let us therefore calculate how the choice of A determines the volume-to-surface ratio of the
induced tiles. We first answer the question for the tiling that results when
.
We obtain a formula for the ratio when
, then
we show how varying
changes both the ratio and the amount of local storage needed.
In later sections we consider generalizing to tiles with non-unit aspect ratios.