http://www.cs.utexas.edu/users/rvdg/software.html Scalable Universal Matrix Multiplication Algorithm (SUMMA) <title_line>scalable implementation of common matrix multiplication operations <author>Robert A. van de Geijn Department of Computer Sciences University of Texas Austin, TX 78712 Jerrell Watts Scalable Concurrent Programming Laboratory California Institute of Technology Pasadena, California 91125 jwatts@scp.caltech.edu <abstract> We give a straight forward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system. </abstract> <category>numerical-linalg <keywords>GAMS D1b6. Matrix Multiplication </urc>