**Robert Schreiber
Research Institute for Advanced Computer Science
Mail Stop 230-5, NASA Ames Research Center
Mountain View, CA 94035
e-mail: **

**May 1990
**

**Abstract**
Blocked algorithms have much better properties of data locality
and therefore can be much more efficient than ordinary algorithms when a memory hierarchy is involved. On the
other hand, they are very difficult to write and to tune for particular machines.
Here we consider the reorganization of nested loops through the use of known program transformations in order
to create blocked algorithms automatically.
The program transformations we use are strip mining, loop interchange, and a variant
of loop skewing in which we allow invertible linear transformations (with integer coordinates) of the loop indices.
In this paper, we solve some problems concerning the optimal application of these transformations.
We show, in a very general setting, how to choose a nearly optimal
set of transformed indices. We then show, in one particular but rather frequently occurring situation,
how to choose an optimal set of block sizes.

block algorithm, parallel computing, compiler optimization, matrix computation, numerical methods for partial differential equations, program transformation, memory hierarchy.

???? 05C50, 15A23, 65F05, 65F50, 68M20.

- Introduction
- Statement of the Problem
- A Procedure for Choosing the Tiling Basis
- Other Applications
- Precise Storage and I/O Requirements
- Optimal Choice of Block Size
- Blocking Examples
- References
- About this document ...

Tue Feb 18 15:39:11 EST 1997