Name
HPL_pdrpancrT Crout recursive panel factorization.
Synopsis
#include <hpl.h>
void
HPL_pdrpancrT(
HPL_T_panel *
PANEL
,
const int
M
,
const int
N
,
const int
ICOFF
,
double *
WORK
);
Description
HPL_pdrpancrT
recursively factorizes a panel of columns using the
recursive Crout variant of the usual one-dimensional algorithm.
The lower triangular N0-by-N0 upper block of the panel is stored in
transpose form.
Bi-directional exchange is used to perform the swap::broadcast
operations at once for one column in the panel. This results in a
lower number of slightly larger messages than usual. On P processes
and assuming bi-directional links, the running time of this function
can be approximated by (when N is equal to N0):
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
N0^2 * ( M - N0/3 ) * gam2-3
where M is the local number of rows of the panel, lat and bdwth are
the latency and bandwidth of the network for double precision real
words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS
rate of execution. The recursive algorithm allows indeed to almost
achieve Level 3 BLAS performance in the panel factorization. On a
large number of modern machines, this operation is however latency
bound, meaning that its cost can be estimated by only the latency
portion N0 * log_2(P) * lat. Mono-directional links will double this
communication cost.
Arguments
PANEL (local input/output) HPL_T_panel *
On entry, PANEL points to the data structure containing the
panel information.
M (local input) const int
On entry, M specifies the local number of rows of sub(A).
N (local input) const int
On entry, N specifies the local number of columns of sub(A).
ICOFF (global input) const int
On entry, ICOFF specifies the row and column offset of sub(A)
in A.
WORK (local workspace) double *
On entry, WORK is a workarray of size at least 2*(4+2*N0).
See Also
HPL_dlocmax,
HPL_dlocswpN,
HPL_dlocswpT,
HPL_pdmxswp,
HPL_pdpancrN,
HPL_pdpancrT,
HPL_pdpanllN,
HPL_pdpanllT,
HPL_pdpanrlN,
HPL_pdpanrlT,
HPL_pdrpancrN,
HPL_pdrpanllN,
HPL_pdrpanllT,
HPL_pdrpanrlN,
HPL_pdrpanrlT,
HPL_pdfact