According to hypothesis (H4), the computation goes column-wise. When a processor has completed the execution of a whole column of tiles, it starts the next column that has been assigned to it. The time to process a whole column of tiles is the number of tiles in the column, namely , times the time to compute a tile, namely . We obtain the value for processing a whole tile column.
Now, according to hypothesis (H5), tile columns are distributed cyclically to processors. If a processor starts the execution of the first tile in a given column at time-step t, its right neighbor cannot start the execution of the first tile in the next column before time-step , where (this is due to the dependence vector ). Note that is the same as in Section 2.2, but we pay a communication cost only when the processors owning the tiles are not the same. Two cases can occur:
Figure 4: Scheduling tiles with , and P=3.
Figure: Scheduling tiles with , and P=3.