Next: Determining the total execution
Up: Allowing for communication-computation overlap
Previous: Allowing for communication-computation overlap
Hypotheses (H2), (H3) and (H4) may appear very restricting. However,
we point out the following justifications:
- Tile shape
- We assume that the tiles are rectangular. This is to be understood as the outcome of previous program transformations.
The first step in tiling amounts to determining the best
size and size of the tiles, assuming an infinite grid of virtual processors.
Because this step will lead to tiles whose edges are parallel to extremal dependence vectors in the convex hull
of the dependence cone, we can perform a unimodular transformation and rewrite the original loop nest along the edge axes. The resulting domain may not be a rectangular, but we can approximate it using the smallest bounding box.
- Dependence vectors
- We assume that dependences are summarized by the vector pair
. Note that these
are dependences between tiles, not between elementary computations. In Example 1, we had 5 dependence vectors in the original loop nest,
but we ended up with after tiling. This is a very general situation if the tiles are large enough. For instance,
having a dependence vector
(0,a) with between tiles, instead of having vector (0,1),
would mean unusually long dependences in the loop nest (in Example 1,
a(i,j) would depend upon a(i,j-8) but not on a(i,j-x) for ).
Note that having (0,a) in addition to (0,1) as a dependence
vector between tiles is simply redundant.
On the other hand, hypotheses (H5) and (H6) are unnecessarily restrictive,
because the mapping and scheduling of the tiles should be an output decision of the procedure of tiling with limited resources, rather than being given a priori. We overcome this restriction in Section 3.5.
Jack Dongarra
Sat Feb 8 08:17:58 EST 1997