The performance of Level 2 PBLAS routines is dependent on the performance of Level 2 BLAS routines which is dependent on the bulk transfer rate from main memory.

**Table 5.6:** Speed in Mflop/s for the PBLAS matrix-vector
multiply routine PSGEMV/PDGEMV

Table 5.6
shows execution rates for the 64-bit matrix-vector
multiply PBLAS routine PSGEMV /PDGEMV .
The rates listed are for a matrix-vector
product , where *A*
is a square matrix of order *N* and *x* and
*y* are vectors that are both distributed
over a process column.

The Level 3 PBLAS are not necessarily limited by memory bandwidth because they perform many flops for each word involved. The flop rate is correspondingly higher. Table 5.7

**Table 5.7:** Speed in Mflop/s for the PBLAS matrix-matrix
multiply routine PSGEMM/PDGEMM

shows the performance
results obtained by
the general matrix-matrix
multiply PBLAS routine
PSGEMM /PDGEMM . These
results have been
obtained for the
matrix-matrix
multiply operation
,
where *A*, *B*, and *C*
are square matrices
of order *N*.

