next up previous
Next: Choosing the Ordering of Up: Automatic Blocking of Nested Previous: Other Applications

Precise Storage and I/O Requirements

 

In this section, we develop formulae that give precise measures of the storage required for execution of a tile and of the number of data (input and output from local memory) required for execution of a tile. These can be used to state more precisely the optimization problem that should be solved in determining the tiling basis.

Consider I/O requirements first. Now, let E be an integer tex2html_wrap_inline2316 matrix whose columns represent the data required to satisfy the true dependences in an index space. Consider the loop nest

 mmmmmmmmmm¯

for i = 1 to n do

for j = 1 to n do

a[i,j] = a[i,j-1] + b[i,j-1] ; od

b[i,j] = a[i,j-1] + b[i-1,j-1] + c[i] ; od

od

In this loop nest, the dependences are
displaymath2306
A given iteration requires one datum from the iteration at distance tex2html_wrap_inline2330 and two data from the iteration at distance tex2html_wrap_inline2332. Thus, the matrix E is
displaymath2307

In addition to the data computed at other iterations in the index space, for which dependence directions have been established, other data may be required in order to execute a tile, for example, the c data in the example above. We express these data requirements through a second matrix, C. Each column of C corresponds to a datum (such as c[i] in the example) that is used in common by a number of iterations. It gives the smallest displacement in the index space between iterations that use the datum. For the example above,
displaymath2308
since all iterations with fixed i use the value c[i]. If, for example, c0[i] were used for tex2html_wrap_inline2350 and c1[i] were used at iterations tex2html_wrap_inline2354, then we would have
displaymath2309

We are now ready to state the I/O required to execute a tile. We assume that no data are available in local memory to begin with and that all data that may be needed later must be written back from local memory at completion of the tile's execution. Let tex2html_wrap_inline2356. Let tex2html_wrap_inline2358. The amount of data is given by
eqnarray616
Here tex2html_wrap_inline2366 is the volume of the face of the tile normal to the tiling basis vector tex2html_wrap_inline2368, tex2html_wrap_inline2370 is a normalized tiling basis vector, and tex2html_wrap_inline2372. That this is correct follows from the observation that the grid points at the face of a tile depend on values created at iteration points in a ``shadow''; the shadow points are points not in the tile from which a dependence into the tile emanates. For each column d of E the corresponding shadow has as its base a face of the tile, say the face normal to tex2html_wrap_inline2368, and as one of its sides the vector whose direction is -d and whose tail is at any corner of the face. This shadow has height tex2html_wrap_inline2382 and base area tex2html_wrap_inline2366, so it has volume tex2html_wrap_inline2386. The factor 2 multiplying E expresses the fact that data that are responsible for dependences must be read in and written out, while data that are used but not created are read in but not written back.

The volume of faces is explained in Section 2.3.




next up previous
Next: Choosing the Ordering of Up: Automatic Blocking of Nested Previous: Other Applications

Jack Dongarra
Tue Feb 18 15:39:11 EST 1997