In this section we present solutions to two important instances of the general problem
of optimal choice of the block size parameters . We assume that
a set of tiling hyperplane normals A has been chosen and that the data fluxes E and C are known,
as are the dependences D. The choice of the outermost point loop index --
, will also play a role.
Here our viewpoint is somewhat more realistic than in Section 3. We take into account the fact that not all the data required by the execution of a tile must be read a priori. Instead, we consider the order in which the tiles iterations are processed and assume that the needed data are read (or written) at the time they are needed.
We need some constants to make our estimates precise.
Let the amount of work per grid point be
(the appropriate units for
and the constants
that follow
are seconds, so that the machine characteristics are included through these constants).
Let the flux of data per unit surface area across the face normal to
be
.
The way that
depends on E, C, and
was explained in Section 5.
First we consider the case k = 2 with the assumption that is one of the two tiling vectors,
say
. Then the amount of local memory required is proportional to
and is independent of
. The total work done is
and the amount of
data referred to is
.
Thus the ratio of computation time to memory access time is
as . (We have redefined the dimensionless parameter
here).
See Figure 5.
Figure 5: Reuse ratio vs. tile length. Note:
In this case, therefore, we always take as large as possible
(subject only to the size of the problem being solved) and obtain the ratio shown. This ratio is
the product of a ratio of work per iteration
to data per iteration
and the number of
grid points
that fit in the local memory. Note in particular that for large problems, for which
can be taken so large that the asymptote is approached, neither the data per unit surface in the direction
of
(that is,
) nor the particular choice of tile length in the
direction (that is,
)
plays a role. Similar conclusions are reached if we model execution time rather than the computation to communication
ratio
. Note also that if the ratio
is larger than
,
then we choose
instead of
.
The discussion above is little changed if we allow arbitrary . What matters is that we fix all but one of
the block size parameters and allow the other to grow, provided that with the given choice of
the memory
requirement is independent of this one parameter. For that to be true, all we need is that
should not
be close to
rather than the much stronger requirement
.
Next let us take and k > 2. Again, we fix all but one of the block size parameters,
in this case
and allow the other one to grow, limited only by problem size.
Let . Memory size places some upper limit on B.
Let the memory required per unit surface in the hyperplane normal to
be M. Thus, for the given choice
of the block size parameters, the local memory required is
.
If the available local memory has room for
data, then B is constrained by
The amount of work per unit distance in the direction is
.
Finally, the data required per unit distance in the direction is
Thus
With the constraint on b given by (7), the maximizing choice of b is
where .