The performance of Level 2 PBLAS routines is dependent on the performance of Level 2 BLAS routines which is dependent on the bulk transfer rate from main memory.
Table 5.6: Speed in Mflop/s for the PBLAS matrix-vector
multiply routine PSGEMV/PDGEMV
Table 5.6
shows execution rates for the 64-bit matrix-vector
multiply PBLAS routine PSGEMV /PDGEMV .
The rates listed are for a matrix-vector
product , where A
is a square matrix of order N and x and
y are vectors that are both distributed
over a process column.
The Level 3 PBLAS are not necessarily limited by memory bandwidth because they perform many flops for each word involved. The flop rate is correspondingly higher. Table 5.7
Table 5.7: Speed in Mflop/s for the PBLAS matrix-matrix
multiply routine PSGEMM/PDGEMM
shows the performance
results obtained by
the general matrix-matrix
multiply PBLAS routine
PSGEMM /PDGEMM . These
results have been
obtained for the
matrix-matrix
multiply operation
,
where A, B, and C
are square matrices
of order N.