Goals - Port LAPACK to Distributed-Memory Environments.
- Optimized compute and communication engines
- Block-partitioned algorithms (Level 3 BLAS) utilize memory hierarchy and yield good node performance
- Whenever possible, use LAPACK algorithms and error bounds.
- As the problem size and number of processors grow
- Replace LAPACK algorithm that did not scale; New ones into LAPACK
- Isolate machine dependencies to BLAS and the BLACS
- Modularity: Build rich set of linear algebra tools: BLAS, BLACS, PBLAS
- Calling interface similar to LAPACK