Tiling with resource constraints

Next: Discussion Up: Literature overview Previous: Back to Example

Tiling with resource constraints

Ohta et al. [14] aim at determining the best tile size under the following hypotheses:

(H1): There are P available processors interconnected as a ring.
(H2): The computation domain is a two-dimensional rectangle of size .
(H3): Tiles are rectangular and their edges are parallel to the axes (see Figure 2). The size of a tile is , where and are unknowns.
(H4): Dependences between tiles are summarized by the vector pair (as in Example 1).
(H5): Tiles are assigned to processor using a one-dimensional cyclic distribution: in other words, tile (i,j) is allocated to processor .
(H6): The scheduling of the tiles is column-wise: at step 0, processor executes tile (0,0) and the computed value is communicated to the adjacent processor (more precisely, a rectangular slice of width w and height is sent). At step 1, processors and execute tiles (0,1) and (1,0) simultaneously. After having executed a whole column of tiles, a processor moves on to its next column.

Figure 2: Mapping rectangular tiles onto a ring of processors.

A step is the time required to compute a tile and to communicate data. Ohta et al. [14] use the following expression:

where t is the elemental computation time, a is a communication start-up and b is the inverse of the communication bandwidth times the width w of the slice being communicated (the communication cost is a linear expression in the message size).

To compute the total execution time, Ohta et al. [14] use the formula , where is the latency (the step at which the last processor begins to work) and is the number of tiles per processor (assumed to be an integer). Using the approximation , they derive the total execution time T as

The execution time is found to be minimal when choosing

Next: Discussion Up: Literature overview Previous: Back to Example

Jack Dongarra
Sat Feb 8 08:17:58 EST 1997