Next: Discussion
Up: Literature overview
Previous: Back to Example
Ohta et al. [14] aim at determining the best tile size under the
following hypotheses:
- (H1)
- There are P available processors interconnected as a ring.
- (H2)
- The computation domain is a two-dimensional rectangle of size
.
- (H3)
- Tiles are rectangular and their edges are parallel to the axes
(see Figure 2). The size of a tile is
, where and are unknowns.
- (H4)
- Dependences between tiles are summarized by the vector pair
(as in Example 1).
- (H5)
- Tiles are assigned to processor using a one-dimensional cyclic distribution: in other words, tile (i,j) is allocated to
processor .
- (H6)
- The scheduling of the tiles is column-wise:
at step 0, processor executes tile (0,0) and the computed value is communicated to the adjacent processor (more precisely, a rectangular
slice of width w and height is sent). At step 1,
processors and execute tiles (0,1) and (1,0) simultaneously.
After having executed a whole column of tiles, a processor moves on to its
next column.
Figure 2: Mapping rectangular tiles onto a ring of processors.
A step is the time required to compute a tile and to communicate data.
Ohta et al. [14] use the following expression:
where t is the elemental computation time, a is a communication start-up
and b is the inverse of the communication bandwidth times the width w
of the slice being communicated (the communication cost is a
linear expression in the message size).
To compute the total execution time, Ohta et al. [14] use the
formula , where is the latency (the step
at which the last processor begins to work) and is the number of tiles per processor (assumed to be an integer). Using the approximation , they derive the total execution time T as
The execution time is found to be minimal when choosing
Next: Discussion
Up: Literature overview
Previous: Back to Example
Jack Dongarra
Sat Feb 8 08:17:58 EST 1997