We can view the reorganized matrix-multiply program in two ways.
First, we can consider
the matrices *A*, *B*, and *C* as
matrices whose elements are matrices. In this case,
the inner three loops simply perform a multiply-add
of one such block-element. This is the view taken by most numerical analysts.
Second, we can derive the blocked program form the original, unblocked
program by a sequence of standard program transformations. First, the individual loops are
*strip mined*. For example, the loop

is replaced bymmmmmmmmmm¯

fori= 1 tondo

od

(Strip mining is a standard tool for dealing with vector registers. One may apply it ``legally'' to any loop. By legally, we mean that the transformed program computes the same result as before.) Strip mining, by itself, yields a six-loop program, but the order of the loops is not what is needed for a blocked algorithm. The second transformation we use ismmmmmmmmmm¯

fori0 = 1 tonstepbdo

fori=i0 to do

od

od

This approach to automatic blocking, through loop strip mining and interchange, was first advocated
by Wolfe [18]. It is derived from earlier work of Abu-Sufah, Kuck, and Lawrie on
optimization in a paged virtual-memory system [1]. Wolfe introduced the term *tiling*.
A *tile* is the collection of work to be done, *i.e.,* the set of values of the point loop indices,
for a fixed set of values of the block or outer loop indices. We like this terminology since it
allows us to distinguish what we are doing -- which is to decompose the work to be done into
small subtasks (the tiles) -- from the quite different task of
decomposing the data *a priori* into small subarrays
(the blocks), even though each tile does, in fact, act on blocks. Following Wolfe, we call the problem of
decomposing the work of a loop nest *index space tiling*.

Other authors have treated the issue of management of the memory hierarchy [8]. Some other treatments of the problem of automatic blocking have recently appeared [11], [4], [17], [18], [19]; none, however, gives the quantitative statments of the problem and the solution that we provide here.

__
__

Tue Feb 18 15:39:11 EST 1997