Obtaining High Performance
Use the best BLAS and BLACS libraries available.
Start with a standard data distribution.
- A square processor grid (Pr=Pc) if P >= 9
- A one-dimensional processor grid (Pr=1,Pc=P) if P < 9
- Block size (NB) = 64
Determine whether reasonable performance is being achieved.
Identify the performance bottlenecks, if any.