next up previous
Next: Proof Up: Allowing for communication-computation overlap Previous: Proof

Optimal mapping and scheduling

 

Hypotheses (H5) and (H6) are very restrictive in that they impose the mapping of tiles to processors as well as their scheduling. The intuitive motivation for (H5) is that a cyclic distribution of tiles is quite natural to load-balance computations. Once the distribution of tiles to processors is fixed, there are several possible schedulings (any wavefront execution that goes along a left-to-right diagonal is valid). Specifying a column-wise execution may lead to the simplest code generation.

It turns out that (H5) and (H6) provide the best solution among all possible distributions of tiles to processors, which is a very strong result. This result holds true under the assumption that the communication cost for a tile is not larger than its computation cost. Since the communication cost for a tile grows linearly with its size, while the computation costs grows quadratically, this hypothesis will be satisfied if the tile is large enoughgif. This result is formally stated in the theorem below. Beforehand, we need to refine the communication cost as follows:

We pay a communication cost only when the two processors that own the neighboring tiles are not the same. So far we never paid any cost for vertical communications, and we always did for horizontal communications, because of hypothesis (H5). We had to refine the communication cost because in this section, we do not make any assumption on the mapping of tiles to processors. Depending upon the values of tex2html_wrap_inline1160 and tex2html_wrap_inline1162, the best mapping will be column-wise or row-wise:


theorem354


next up previous
Next: Proof Up: Allowing for communication-computation overlap Previous: Proof

Jack Dongarra
Sat Feb 8 08:17:58 EST 1997