We break down the problem into two subcases depending on the values taken by
the function f, whose argument ranges from 1 to
;
This result is disappointing in that we end up with degenerate tiles in most
practical situations. For instance if (which is very
likely to happen in practice),
, and the optimal tile
size is
, not a very coarse-grain tiling indeed!
The flaw is that the model is not accurate enough to take the impact of data
locality and data reuse into account (which are the main objectives of tiling).
A first solution is to model the computation cost of a tile by
an affine expression like
, where u represents some
access overhead. It is not difficult to plug this expression into the formula
given for the total execution time T, and to derive the
optimal tile size.
Another solution is to assume a fixed tile size that would be imposed by
some a priori considerations (such as the cache size). Again,
we can let
in Equation (1), and minimize T for
, say.