PBLAS -- Example Programs
This implementation of SUMMA was first suggested by:
- R. Agarwal, F. Gustavson, and M. Zubair, A High Performance Matrix Multiplication Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication, IBM Journal of Research and Development, Vol. 38, No. 6, pp. 673--681, 1994.
For a scalability analysis of this algorithm see:
- R. van de Geijn, and J. Watts, SUMMA: Scalable Universal Matrix Multiplication Algorithm, UT Tech Report CS-95-286, LAPACK Working Note #96, 1995.