next up previous
Next: Geometric Considerations Up: Statement of the Problem Previous: Definitions

Tiling with Hyperplanes

From the discussion above, we see that a valid partition of tex2html_wrap_inline1556 may be obtained by choosing any integer vector in tex2html_wrap_inline1766. The tiles so obtained are slices of the index space tex2html_wrap_inline1556. These are not sufficiently small, however, to allow for all their necessary data to fit in the local memory of a given computer.

In terms of the corresponding program, tiling by slicing with a single hyperplane can be achieved by a loop reindexing of one loop followed by strip mining of that loop (and only that loop) and interchange to make the one resulting strip-loop outermost. In the case of matrix multiply, for example, this would result in a four-loop program in which the innermost three loops do tex2html_wrap_inline1860 operations and use tex2html_wrap_inline1258 data. (For, no matter which loop we strip mine, one of the matrices is indexed by the two unchanged loop indices and so is completely accessed.)

As the matrix multiply example indicates, we need to be able to strip mine all the loops in order to be able to work with tiles whose data sets can be made arbitrarily small. In this section we investigate the problem of fully tiling loop nests.

We can state this problem as follows. Given the index space tex2html_wrap_inline1864 and the dependence matrix D, choose k linearly independent vectors tex2html_wrap_inline1870 (the columns of A are a basis for k-space) such that each tex2html_wrap_inline1876.

The partition induced by A and a k-vector of block size parameters b is obtained by slicing tex2html_wrap_inline1556 into slices of thickness tex2html_wrap_inline1886 with a knife aligned perpendicular to tex2html_wrap_inline1888, then slicing again with thickness tex2html_wrap_inline1890 and with the knife rotated so that it is perpendicular to tex2html_wrap_inline1892 (making long, narrow strips rather than slices) and so on, until one has sliced k times, finally obtaining tiles that are shaped, except at the boundaries of tex2html_wrap_inline1556, like parallelepipeds whose faces are perpendicular to the basis vectors.

Thus, in order to fully tile a loop nest with arbitrary dependences D, we must be able to choose a basis in the cone tex2html_wrap_inline1766.

Should we be satisfied with any such basis? What if its elements are nearly linearly dependent? Then we have tiles that are quite elongated, with some very small angles and a low ratio of volume (which measures the number of lattice points, or iterations to be performed) to surface area. The surface area is a measure of the amount of data that must be moved into local memory in order to carry out the work of a tile. In general, the data moved in is the data required because of dependences of iterations in the tile on iterations of other tiles. The iterations near the edges require this data from outside.

The (k-1) dimensional volume of the tile, which grows like tex2html_wrap_inline1904, is also a measure of the amount of local memory needed to carry out the work of each tile.

Let us therefore calculate how the choice of A determines the volume-to-surface ratio of the induced tiles. We first answer the question for the tiling that results when tex2html_wrap_inline1908. We obtain a formula for the ratio when tex2html_wrap_inline1910, then we show how varying tex2html_wrap_inline1704 changes both the ratio and the amount of local storage needed. In later sections we consider generalizing to tiles with non-unit aspect ratios.

next up previous
Next: Geometric Considerations Up: Statement of the Problem Previous: Definitions

Jack Dongarra
Tue Feb 18 15:39:11 EST 1997