Performance of Selected PBLAS routines

Next: Solution of Common Numerical Up: Performance of Selected BLACS Previous: Performance of Selected BLACS

Performance of Selected PBLAS routines

The performance of Level 2 PBLAS routines is dependent on the performance of Level 2 BLAS routines which is dependent on the bulk transfer rate from main memory.

table3850
Table 5.6: Speed in Mflop/s for the PBLAS matrix-vector multiply routine PSGEMV/PDGEMV

Table 5.6 shows execution rates for the 64-bit matrix-vector multiply PBLAS routine PSGEMV /PDGEMV . The rates listed are for a matrix-vector product , where A is a square matrix of order N and x and y are vectors that are both distributed over a process column.

The Level 3 PBLAS are not necessarily limited by memory bandwidth because they perform many flops for each word involved. The flop rate is correspondingly higher. Table 5.7

table3877
Table 5.7: Speed in Mflop/s for the PBLAS matrix-matrix multiply routine PSGEMM/PDGEMM

shows the performance results obtained by the general matrix-matrix multiply PBLAS routine PSGEMM /PDGEMM . These results have been obtained for the matrix-matrix multiply operation , where A, B, and C are square matrices of order N.

Susan Blackford
Tue May 13 09:21:01 EDT 1997