Research Institute for Advanced Computer Science
Mail Stop 230-5, NASA Ames Research Center
Mountain View, CA 94035
Jack J. Dongarra
Department of Computer Science
University of Tennessee
Knoxville, TN 37996-1301
Mathematical Sciences Section
Oak Ridge National Laboratory
Oak Ridge, TN 37831
Abstract Blocked algorithms have much better properties of data locality and therefore can be much more efficient than ordinary algorithms when a memory hierarchy is involved. On the other hand, they are very difficult to write and to tune for particular machines. Here we consider the reorganization of nested loops through the use of known program transformations in order to create blocked algorithms automatically. The program transformations we use are strip mining, loop interchange, and a variant of loop skewing in which we allow invertible linear transformations (with integer coordinates) of the loop indices. In this paper, we solve some problems concerning the optimal application of these transformations. We show, in a very general setting, how to choose a nearly optimal set of transformed indices. We then show, in one particular but rather frequently occurring situation, how to choose an optimal set of block sizes.
block algorithm, parallel computing, compiler optimization, matrix computation, numerical methods for partial differential equations, program transformation, memory hierarchy.
???? 05C50, 15A23, 65F05, 65F50, 68M20.