References
Next: About this document
Up: The Spectral Decomposition of
Previous: Acknowledgements
References
- 1
-
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen.
LAPACK Users' Guide, Release 1.0.
SIAM, Philadelphia, 1992.
235 pages.
- 2
-
E. Anderson, Z. Bai, and J. Dongarra.
Generalized QR factorization and its applictions.
Lin. Alg. Appl., 162-164:243-271, 1992.
- 3
-
L. Auslander and A. Tsao.
On parallelizable eigensolvers.
Advances in Applied Mathematics, 13:253-261, 1992.
- 4
-
Z. Bai and J. Demmel.
Design of a parallel nonsymmetric eigenroutine toolbox, Part II.
in preparation.
- 5
-
Z. Bai and J. Demmel.
On a block implementation of Hessenberg multishift QR iteration.
International Journal of High Speed Computing, 1(1):97-112,
1989.
(also LAPACK Working Note #8).
- 6
-
Z. Bai and J. Demmel.
Design of a parallel nonsymmetric eigenroutine toolbox, Part I.
In Proceedings of the Sixth SIAM Conference on Parallel
Proceesing for Scientific Computing. SIAM, 1993.
Long version available as UC Berkeley Computer Science report
all.ps.Z via anonymous ftp from tr-ftp.cs.berkeley.edu, directory
pub/tech-reports/csd/csd-92-718.
- 7
-
Z. Bai, J. Demmel, and M. Gu.
Inverse free parallel spectral divide and conquer algorithms for
nonsymmetric eigenproblems.
Computer Science Division Report UCB//CSD-94-793, UC Berkeley,
February 1994.
available by anonymous ftp to tr-ftp.cs.berkeley.edu in directory
pub/tech-reports/csd/csd-94-79.
- 8
-
A. N. Beavers and E. D. Denman.
A computational method for eigenvalue and eigenvectors of a matrix
with real eigenvalues.
Numer. Math., 21:389-396, 1973.
- 9
-
C. Bischof and X. Sun.
A divide and conquer method for tridiagonalizing symmetric matrices
with repeated eigenvalues.
MCS Report P286-0192, Argonne National Lab, 1992.
- 10
-
A. Ya. Bulgakov and S. K. Godunov.
Circular dichotomy of the spectrum of a matrix.
Siberian Math. J., 29(5):734-744, 1988.
- 11
-
R. Byers.
Numerical stability and instability in matrix sign function based
algorithms.
In C. Byrnes and A. Lindquist, editors, Computational and
Combinatorial Methods in Systems Theory, pages 185-200. North-Holland,
1986.
- 12
-
R. Byers.
Solving the algebraic Riccati equation with the matrix sign
function.
Lin. Alg. Appl., 85:267-279, 1987.
- 13
-
S. Chakrabarti, J. Demmel, and K. Yelick.
On the benefit of mixed parallelism.
Computer science division, University of California, 1994.
submitted to IPPS.
- 14
-
T. Chan.
Rank revealing QR factorizations.
Lin. Alg. Appl., 88/89:67-82, 1987.
- 15
-
J. Choi, J. Dongarra, and D. Walker.
PB-BLAS: A set of Parallel Block Basic Linear
Algebra Subprograms.
University of Tennessee, Knoxville, TN, 1993.
available in postscript from netlib/scalapack.
- 16
-
J. Choi, J. Dongarra, and D. Walker.
PUMMA: Parallel universal matrix multiplication algorithms on
distributed memory concurrent computers.
Computer Science Dept. Technical Report CS-93-187,
University of Tennessee, Knoxville, 1993.
(LAPACK Working Note #57).
- 17
-
M. Chu.
A note on the homotopy method for linear algebraic eigenvalue
problems.
Lin. Alg. Appl, 105:225-236, 1988.
- 18
-
T. Coleman and P. Plassman.
A parallel nonlinear least-squares solver: Theoretical analysis and
numerical results.
SIAM J. Sci. Comput., 13(3):771-793, 1992.
- 19
-
J. Demmel.
Trading off parallelism and numerical stability.
In G. Golub M. Moonen and B. de Moor, editors, Linear Algebra
for Large Scale and Real-Time Applications, pages 49-68. Kluwer Academic
Publishers, 1993.
NATO-ASI Series E: Applied Sciences, Vol. 232; Available as all.ps.Z
via anonymous ftp from tr-ftp.cs.berkeley.edu, in directory
pub/tech-reports/csd/csd-92-702.
- 20
-
J. Demmel, M. Heath, and H. van der Vorst.
Parallel numerical linear algebra.
In A. Iserles, editor, Acta Numerica, volume 2. Cambridge
University Press, 1993.
- 21
-
J. Dongarra.
Performance of various computers using standard linear equations
software.
Computer science dept. technical report, University of Tennessee,
Knoxville, TN, January 1994.
up-to-date version available in netlib/benchmark.
- 22
-
J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling.
A set of Level 3 Basic Linear Algebra Subprograms.
ACM Trans. Math. Soft., 16(1):1-17, March 1990.
- 23
-
J. Dongarra, J. Du Croz, S. Hammarling, and Richard J. Hanson.
An Extended Set of FORTRAN Basic Linear Algebra
Subroutines.
ACM Trans. Math. Soft., 14(1):1-17, March 1988.
- 24
-
J. Dongarra and M. Sidani.
A parallel algorithm for the non-symmetric eigenvalue problem.
SIAM J. Sci. Comp., 14(3):542-569, May 1993.
- 25
-
J. Dongarra, R. van de Geijn, and C. Whaley.
A Users' Guide to the BLACS.
University of Tennessee, Knoxville, TN, 1993.
available in postscript from netlib/scalapack.
- 26
-
J. Dongarra and D. Walker.
The design of linear algebra libraries for high performance
computers.
Computer Science Dept. Technical Report CS-93-188,
University of Tennessee, Knoxville, June 1993.
(LAPACK Working Note #58).
- 27
-
A. Dubrulle.
The multishift QR algorithm: is it worth the trouble?
Palo Alto Scientific Center Report G320-3558x, IBM Corp., 1530 Page
Mill Road, Palo Alto, CA 94304, 1991.
- 28
-
P. Eberlein.
On the Schur decomposition of a matrix for parallel computation.
IEEE Trans. Comput., 36:167-174, 1987.
- 29
-
G. A. Geist and G. J. Davis.
Finding eigenvalues and eigenvectors of unsymmetric matrices using a
distributed memory multiprocessor.
Parallel Computing, 13(2):199-209, 1990.
- 30
-
S. K. Godunov.
Problem of the dichotomy of the spectrum of a matrix.
Siberian Math. J., 27(5):649-660, 1986.
- 31
-
M. Gu and S. Eisenstat.
An efficient algorithm for computing a rank-revealing QR
decomposition.
Computer Science Dept. Report YALEU/DCS/RR-967, Yale University, June
1993.
- 32
-
G. Henry and R. van de Geijn.
Parallelizing the QR algorithm for the unsymmetric algebraic
eigenvalue problem: myths and reality.
Computer Science Dept. Technical Report CS-94-244,
University of Tennessee, Knoxville, August 1994.
(LAPACK Working Note #79).
- 33
-
N. J. Higham.
Computing the polar decomposition - with applications.
SIAM J. Sci. Stat. Comput., 7:1160-1174, 1986.
- 34
-
P. Hong and C. T. Pan.
The rank revealing QR and SVD.
Math. Comp., 58:575-232, 1992.
- 35
-
J. Howland.
The sign matrix and the separation of matrix eigenvalues.
Lin. Alg. Appl., 49:221-232, 1983.
- 36
-
S. Huss-Lederman, A. Tsao, and G. Zhang.
A parallel implementation of the invariant subspace decomposition
algorithm for dense symmetric matrices.
In Proceedings of the Sixth SIAM Conference on Parallel
Proceesing for Scientific Computing. SIAM, 1993.
- 37
-
C. Kenney and A. Laub.
Rational iteration methods for the matrix sign function.
SIAM J. Mat. Anal. Appl., 21:487-494, 1991.
- 38
-
C. Lawson, R. Hanson, D. Kincaid, and F. Krogh.
Basic Linear Algebra Subprograms for Fortran usage.
ACM Trans. Math. Soft., 5:308-323, 1979.
- 39
-
T.-Y. Li and Z. Zeng.
Homotopy-determinant algorithm for solving nonsymmetric eigenvalue
problems.
Math. Comp., 59(200):483-502, October 1992.
- 40
-
T.-Y. Li, Z. Zeng, and L. Cong.
Solving eigenvalue problems of nonsymmetric matrices with real
homotopies.
SIAM J. Num. Anal., 29(1):229-248, 1992.
- 41
-
S. Lillevik.
The Touchstone 30 Gigaflop DELTA prototype.
In Sixth Distributed Memory Computing Conference
Proceedings. IEEE Computer Society Press, 1991.
- 42
-
C-C. Lin and E. Zmijewski.
A parallel algorithm for computing the eigenvalues of an unsymmetric
matrix on an SIMD mesh of processors.
Department of Computer Science TRCS 91-15, University of California,
Santa Barbara, CA, July 1991.
- 43
-
A. Littlefield.
Characterizing and tuning communication performance on the
Touchstone DELTA and iPSC/860.
In Proceedings of the 1992 Intel User's Group Meeting,
Dallas, TX, October 4-7 1992. Intel Corp.
- 44
-
A. N. Malyshev.
Parallel algorithm for solving some spectral problems of linear
algebra.
Lin. Alg. Appl., 188,189:489-520, 1993.
- 45
-
C. Paige.
Some aspects of generalized QR factorization.
In M. Cox and S. Hammarling, editors, Reliable Numerical
Computations. Clarendon Press, Oxford, 1990.
- 46
-
J. Roberts.
Linear model reduction and solution of the algebraic Riccati
equation.
Inter. J. Control, 32:677-687, 1980.
- 47
-
A. Sameh.
On Jacobi and Jacobi-like algorithms for a parallel computer.
Math. Comp., 25:579-590, 1971.
- 48
-
G. Shroff.
A parallel algorithm for the eigenvalues and eigenvectors of a
general complex matrix.
Num. Math., 58:779-805, 1991.
- 49
-
G. W. Stewart.
A Jacobi-like algorithm for computing the Schur decomposition of
a non-Hermitian matrix.
SIAM J. Sci. Stat. Comput., 6:853-864, 1985.
- 50
-
G. W. Stewart.
A parallel implementation of the QR algorithm.
Parallel Computing, 5:187-196, 1987.
- 51
-
G. W. Stewart.
Updating a rank-revealing ULV decomposition.
SIAM J. Mat. Anal. Appl., 14(2):494-499, April 1993.
- 52
-
Thinking Machines Corporation.
CM Fortran Reference Manual, December 1992.
- 53
-
Thinking Machines Corporation.
The Connection Machine CM-5 Technical Summary, 1992.
- 54
-
Thinking Machines Corporation.
CMSSL for CM Fortran: CM-5 Edition, version 3.1,
1993.
- 55
-
R. van de Geijn.
Implementing the QR Algorithm on an Array of Processors.
PhD thesis, University of Maryland, College Park, August 1987.
Computer Science Department Report TR-1897.
- 56
-
R. van de Geijn and D. Hudson.
Efficient parallel implementation of the nonsymmetric QR algorithm.
In J. Gustafson, editor, Hypercube Concurrent Computers and
Applications. ACM, 1989.
- 57
-
D. Watkins.
Shifting strategies for the parallel QR algorithm.
Dept. of pure and applied math. report, Washington State Univ.,
Pullman, WA, 1992.
- 58
-
R. Clint Whaley.
Basic linear algebra communication subroutines: Analysis and
implementation across multiple parallel architectures.
Technical report, University of Tennessee, Knoxville, June 1994.
(LAPACK Working Note #73).
Appendix: Performance data
In this appendix,
we record performance data of PUMMA (version 1.0) for matrix multiplication and
ScaLAPACK 1.0 (beta version) matrix inversion (LU factorization plus
triangular matrix inversion), QR decomposition with and without column
pivoting on Intel Touchstone Delta system and
performance data of CMSSL 3.2 subroutines for matrix multiplication,
matrix inversion, QR decomposition with and without column pivoting
on 32 PEs VUs CM-5. Parts of data are used to draw
Figures 2 and 4.
Table 10: Performance of matrix inversion (LU + Triangular
inversion), blocksize=30.
Table 9: Performance of PUMMA (version 1.0) matrix multiplication, use BLACS,
blocksize=10
Table 12: Performance of QR decomposition with column pivoting.
Table 11: Performance of QR decomposition method for solving the least squares
problem (QR decomposition + Triangular solver), blocksize=6.
Table 14: Performance of matrix multiplication and matrix inversion, and
QRP, CMSSL version 3.2
Table 13: Backward accuracy, timing in seconds and megaflops of the SDC
algorithm with Newton iteration on a 32 PEs with VUs CM-5.
Jack Dongarra
Mon Jan 9 11:07:50 EST 1995