next up previous
Next: Optimal Choice of Block Up: Precise Storage and I/O Previous: Choosing the Ordering of

Local Memory Requirements

We will make the simplifying assumption that the same computation, producing and consuming the same number of data, is done at every point of tex2html_wrap_inline1556. The memory required to execute a tile depends on the order in which the individual points of the tile are executed. For this analysis, we assume that the points along hyperplanes normal to a given integer k-vector tex2html_wrap_inline2412 are executed simultaneously. We need to store the values produced at earlier iterations that are required by the iterations on this hyperplane. The number of such values is again given by the sum of volumes of ``shadows'' as
displaymath2404
Here tex2html_wrap_inline2416 is the volume of the intersection of the hyperplane tex2html_wrap_inline2418 and the given tile, i.e., of the set of iteration points computed at time t. The maximum is taken over the relevant values of t.

This largest volume is a function of the tile dimensions and of the shape of the tile as well as of the time coordinate tex2html_wrap_inline2412. In general, it can be larger than the faces, but by no more than a constant factor (at most tex2html_wrap_inline2426). It may be much smaller, as in this case: Let
displaymath2405
and let tex2html_wrap_inline2428 and tex2html_wrap_inline2430 so that the tiles are long and narrow and are nearly aligned with the tex2html_wrap_inline1506 axis. The faces of these tiles have lengths of 10 and about 10.5. If we take tex2html_wrap_inline2434, then the set of points in the tile that are simultaneously executed is small; there are at most two. On the other hand, if we choose tex2html_wrap_inline2436, then there are 10 such points. So our earlier assertion that face volume is a good measure of memory required is in doubt.

This is not, however, a real possibility. The example above depends on highly elongated tiles. This happens because the basis vectors (the columns of A) are close, and this in turn is due to a narrow cone tex2html_wrap_inline1766. But in order for tex2html_wrap_inline2412 to be used in scheduling as described above, we must have tex2html_wrap_inline2444. The difficulties described above are associated with a choice of tex2html_wrap_inline2412 nearly orthogonal to all of the basis vectors a, which are confined to lie in a narrow cone. Such a vector cannot also be in the cone.


next up previous
Next: Optimal Choice of Block Up: Precise Storage and I/O Previous: Choosing the Ordering of

Jack Dongarra
Tue Feb 18 15:39:11 EST 1997