Local Memory Requirements

Next: Optimal Choice of Block Up: Precise Storage and I/O Previous: Choosing the Ordering of

Local Memory Requirements

We will make the simplifying assumption that the same computation, producing and consuming the same number of data, is done at every point of . The memory required to execute a tile depends on the order in which the individual points of the tile are executed. For this analysis, we assume that the points along hyperplanes normal to a given integer k-vector are executed simultaneously. We need to store the values produced at earlier iterations that are required by the iterations on this hyperplane. The number of such values is again given by the sum of volumes of ``shadows'' as

Here is the volume of the intersection of the hyperplane and the given tile, i.e., of the set of iteration points computed at time t. The maximum is taken over the relevant values of t.

This largest volume is a function of the tile dimensions and of the shape of the tile as well as of the time coordinate . In general, it can be larger than the faces, but by no more than a constant factor (at most ). It may be much smaller, as in this case: Let

and let and so that the tiles are long and narrow and are nearly aligned with the axis. The faces of these tiles have lengths of 10 and about 10.5. If we take , then the set of points in the tile that are simultaneously executed is small; there are at most two. On the other hand, if we choose , then there are 10 such points. So our earlier assertion that face volume is a good measure of memory required is in doubt.

This is not, however, a real possibility. The example above depends on highly elongated tiles. This happens because the basis vectors (the columns of A) are close, and this in turn is due to a narrow cone . But in order for to be used in scheduling as described above, we must have . The difficulties described above are associated with a choice of nearly orthogonal to all of the basis vectors a, which are confined to lie in a narrow cone. Such a vector cannot also be in the cone.

Next: Optimal Choice of Block Up: Precise Storage and I/O Previous: Choosing the Ordering of

Jack Dongarra
Tue Feb 18 15:39:11 EST 1997