At each stage of the concurrent LU factorization, the pivot element is chosen by the user-defined pivot function. Then, the pivot row (new row of U) must be broadcast, and pivot column (new column of L) must be computed and broadcast on the logical process grid (cf., Figure 9.12), vertically and horizontally, respectively. Note that these are interchangeable operations. We use this degree of freedom to reduce the communication complexity of particular pivoting strategies, while impacting the effort of the LU factorization itself negligibly.
We define two ``correctness modes'' of pivoting functions. In the first
correctness mode, ``first row fanout,'' the exit conditions for the pivot
function are: All processes must know (the pivot process row); the
pivot process row must know
(the pivot process column) as well as
, the
-local matrix row of the pivot; and the pivot
process must know in addition the pivot value and
-local matrix
column
of the pivot. Partial column pivoting and preset
pivoting can be set up to satisfy these correctness conditions as follows. For
partial column pivoting, the krow is eliminated at the kstep of
the factorization. From this fact, each process can derive the process row
and
-local matrix row
using the row data
distribution function. Having identified themselves, the pivot-row processes
can look for the largest element in local matrix row
and choose
the pivot element globally among themselves via a combine. At
completion, this places
,
, and the pivot value in the
entire pivot process row. This completes the requirements for the ``first row
fanout'' correctness mode. For preset pivoting, the kelimination row
and column are both stored as
, and
each process knows these values without communication.
Furthermore, the pivot process looks up the
pivot value. Hence, preset pivoting satisfies the requirements of this
correctness mode also.
For ``first row fanout,'' the universal knowledge of and
knowledge of the pivot matrix row
by the pivot process
row, allow the vertical broadcast of this row (new
row of U). In addition, we broadcast
,
, and the pivot value simultaneously. This extends the
correct value of
to all processes, as well as
and
the pivot value to the pivot process column. Hence, the multiplier
(L) column may be correctly computed and broadcast .
Along with the multiplier column broadcast , we include
the pivot value. After this broadcast , all processes
have the correct indices
and
the pivot value. This provides all that's required to complete the
current elimination step.
For the second correctness mode ``first column fanout,'' the exit conditions
for the pivot function are: All processes must know and the entire
pivot process column must know
, the pivot value, and
.
The pivot process in addition knows
. Partial row pivoting can
be set up to satisfy these correctness conditions. The arguments are
analogous to partial column pivoting and are given in [Skjellum:90c].
For ``first column fanout,'' the entire pivot process column knows the
pivot value, and local column of the pivot. Hence, the multiplier
column may be computed by dividing the pivot matrix column by the pivot
value. This column of L can then be broadcast
horizontally, including the pivot value, , and
as
additional information. After this step, the entire ensemble has the
correct pivot value, and
; in addition, the pivot process row
has the correct
. Hence, the pivot matrix row may be
identified and broadcast . This second
broadcast completes the needed information in each
process for effecting the kelimination step.
Thus, when using partial row or partial column pivoting, only local combines of the pivot process column (respectively row) are needed. The other processes don't participate in the combine, as they must without this methodology. Preset pivoting implies no pivoting communication, except very occasionally (e.g., 1 in 5000 times), as noted in [Skjellum:90c], to remove memory unscalabilities. This pivoting approach is a direct savings, gained at a negligible additional broadcast overhead. See also [Skjellum:90c].