The algorithms for vectorization outlined above can be used on
parallel computers. For instance, variables on a wavefront
can be
processed in parallel, by dividing the wavefront over processors.
More radical approaches for increasing the parallelism in incomplete
factorizations are based on a renumbering of the problem variables.
For instance, on rectangular domains one could start numbering the
variables from all four corners simultaneously, thereby creating
four simultaneous wavefronts, and therefore four-fold parallelism
(see Dongarra, *et al.* [71],
Van der Vorst [204][202]) .
The most extreme case is the red/black ordering
(or for more general matrices the multi-color ordering) which gives
the absolute minimum number of sequential steps.

Multi-coloring is also an attractive method for vector computers. Since points of one color are uncoupled, they can be processed as one vector; see Doi [68], Melhem [154], and Poole and Ortega [176].

However, for such ordering strategies there is usually a trade-off between the degree of parallelism and the resulting number of iterations. The reason for this is that a different ordering may give rise to a different error matrix, in particular the norm of the error matrix may vary considerably between orderings. See experimental results by Duff and Meurant [79] and a partial explanation of them by Eijkhout [85].

Mon Nov 20 08:52:54 EST 1995