Goals - Port LAPACK to Distributed-Memory Environments.
Efficiency
- Optimized compute and communication engines
- Block-partitioned algorithms (Level 3 BLAS) utilize memory hierarchy and yield good node performance
Reliability
- Whenever possible, use LAPACK algorithms and error bounds.
Scalability
- As the problem size and number of processors grow
- Replace LAPACK algorithm that did not scale; New ones into LAPACK
Portability
- Isolate machine dependencies to BLAS and the BLACS
Flexibility
- Modularity: Build rich set of linear algebra tools: BLAS, BLACS, PBLAS
Ease-of-Use
- Calling interface similar to LAPACK