next up previous
Next: Discussion Up: Literature overview Previous: Back to Example

Tiling with resource constraints

 

Ohta et al. [14] aim at determining the best tile size under the following hypotheses:

(H1)
There are P available processors interconnected as a ring.
(H2)
The computation domain is a two-dimensional rectangle of size tex2html_wrap_inline822.
(H3)
Tiles are rectangular and their edges are parallel to the axes (see Figure 2). The size of a tile is tex2html_wrap_inline824, where tex2html_wrap_inline826 and tex2html_wrap_inline828 are unknowns.
(H4)
Dependences between tiles are summarized by the vector pair tex2html_wrap_inline830 (as in Example 1).
(H5)
Tiles are assigned to processor using a one-dimensional cyclic distribution: in other words, tile (i,j) is allocated to processor tex2html_wrap_inline834.
(H6)
The scheduling of the tiles is column-wise: at step 0, processor tex2html_wrap_inline838 executes tile (0,0) and the computed value is communicated to the adjacent processor tex2html_wrap_inline842 (more precisely, a rectangular slice of width w and height tex2html_wrap_inline828 is sent). At step 1, processors tex2html_wrap_inline842 and tex2html_wrap_inline852 execute tiles (0,1) and (1,0) simultaneously. After having executed a whole column of tiles, a processor moves on to its next column.

  figure152
Figure 2: Mapping rectangular tiles onto a ring of processors.

A step is the time required to compute a tile and to communicate data. Ohta et al. [14] use the following expression:
displaymath858
where t is the elemental computation time, a is a communication start-up and b is the inverse of the communication bandwidth times the width w of the slice being communicated (the communication cost is a linear expression in the message size).

To compute the total execution time, Ohta et al. [14] use the formula tex2html_wrap_inline868, where tex2html_wrap_inline870 is the latency (the step at which the last processor begins to work) and tex2html_wrap_inline872 is the number of tiles per processor (assumed to be an integer). Using the approximation tex2html_wrap_inline874, they derive the total execution time T as
displaymath878
The execution time is found to be minimal when choosing
displaymath880


next up previous
Next: Discussion Up: Literature overview Previous: Back to Example

Jack Dongarra
Sat Feb 8 08:17:58 EST 1997