HPL_pdlaswp01N Broadcast a column panel L and swap the row panel U.
applies the NB row interchanges to NN columns of the
trailing submatrix and broadcast a column panel.
A "Spread then roll" algorithm performs the swap :: broadcast of the
row panel U at once, resulting in a minimal communication volume and
a "very good" use of the connectivity if available. With P process
rows and assuming bi-directional links, the running time of this
function can be approximated by:
(log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth
where NB is the number of rows of the row panel U, N is the global
number of columns being updated, lat and bdwth are the latency and
bandwidth of the network for double precision real words. K is
a constant in (2,3] that depends on the achieved bandwidth during a
simultaneous message exchange between two processes. An empirical
optimistic value of K is typically 2.4.
PBCST (local input/output) HPL_T_panel *
On entry, PBCST points to the data structure containing the
panel (to be broadcast) information.
IFLAG (local input/output) int *
On entry, IFLAG indicates whether or not the broadcast has
already been completed. If not, probing will occur, and the
outcome will be contained in IFLAG on exit.
PANEL (local input/output) HPL_T_panel *
On entry, PANEL points to the data structure containing the
NN (local input) const int
On entry, NN specifies the local number of columns of the
trailing submatrix to be swapped and broadcast starting at
the current position. NN must be at least zero.