Click here to see the number of accesses to this library.
# ====== index for papers ====== file advarch file advarch-post file army3 for Performance and Library Issues for Mathematical Software on High , Performance Computers. , Abstract , This paper discusses , some of the fundamental issues facing designers of mathematical software , libraries for medium scale parallel processors such as the CRAY X-MP-4 , and the Denelcor HEP. We discuss the problems that arise with performance , and demonstrate that it may be appropriate to exploit parallelism at all , levels of the program, not just at the highest level. We give performance , measurements indicating the efficiency of a linear algebra library written , in terms of a few high level modules. These modules chosen at the matrix , vector level extend the concept of the BLAS [13] and provide enough , computational granularity to allow efficient implementations on a wide , variety of architectures. Only three modules must be recoded for , efficiency in order to transport the library to various machines. We , report experience on machines as diverse as the CRAY X-MP and the Denelcor , HEP. Finally, we report on some special algorithms for the HEP which , take advantage of the fine grain parallelism capabilities. , Comparison of the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20: , An Argonne Perspective by Jack Dongarra and D.C. Sorensen file band8 file blas2-paper file blas2-post file blas2.alg file blas3-post file crayxmp file directory file experience2 by Steve C. Chen, Jack J. Dongarra, and Christopher C. Hsiung for CRAY X-MP-4 (or Approaching the Gigaflop) Jack J. Dongarra and Tom Hewitt , Abstract , This paper gives a brief overview of the CRAY X-MP-2 general-purpose , multiprocessor system and discusses how it can be used effectively to , solve problems that have small granularity. An implementation is described , for linear algebra algorithms that solve systems of linear equations when , the matrix is general and when the matrix is symmetric and positive definite. , Implementing Dense Linear Algebra Algorithms Using Multitasking on the file jack6-21 file japan3 file lapack-post file netlib by JACK J. DONGARRA and ERIC GROSSE for DISTRIBUTION OF MATHEMATICAL SOFTWARE VIA ELECTRONIC MAIL , A large collection of public-domain mathematical software , is now available via electronic mail. Messages sent to "netlib@anl-mcs" , (on the Arpanet/CSNET) or to "research!netlib" (on the UNIX\(rg network) , wake up a server that distributes items from the collection. , For example, the one-line message "send index" gets a library catalog by , return mail. We describe how to use the service and some of the issues , in its implementation. Performance of Various Computers Using Standard , Linear Equations Software in a Fortran Environment file perform by Jack J. Dongarra for Abstract - This note compares the performance of different computer systems , while solving dense systems of linear equations using the LINPACK , software in a Fortran environment. About 100 computers, ranging from , a CRAY X-MP to the 68000 based systems , such as the Apollo and SUN Workstations to IBM PC's, are compared. , Implementing Linear Algebra Algorithms for Dense Matrices , on a Vector Pipeline Machine file perform-post file pipeline by J.J. Dongarra, F.G. Gustavson and A. Karp for Abstract - This paper examines common implementations of linear algebra , algorithms; such as matrix-vector multiplication, matrix-matrix , multiplication and the solution of linear equations. The different versions , are examined for efficiency on a computer architecture which uses , vector processing and has pipelined instruction execution. By using the , advanced architectural features of such machines, one can usually achieve , maximum performance, and tremendous improvements in terms of execution , speed can be seen over conventional computers. A Proposal for an , Extended Set of Fortran Basic Linear Algebra Subprograms file prospect-post file sblas by Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson for Abstract , This paper describes an extension to the set of Basic Linear Algebra , Subprograms. The extensions proposed are targeted at matrix vector , operations which should provide for more efficient and portable , implementations of algorithms for high performance computers. , Squeezing the Most out of Eigenvalue Solvers on High-Performance Computers file sblas-toms file schedule2 file squeez-eig by Jack J. Dongarra, Linda Kaufman and, Sven Hammarling for Abstract , This paper describes modifications to many of the standard , algorithms used in computing eigenvalues and eigenvectors of matrices. , These modifications can dramatically , increase the performance of the underlying software , on high performance computers , without resorting to assembler language, without significantly , influencing the , floating point operation count, and without affecting the , roundoff error properties of the algorithms. , The techniques are applied to a wide variety of algorithms , and are beneficial in various architectural settings. , Squeezing the Most out of an Algorithm , in CRAY Fortran file squeezing by Jack J. Dongarra and Stanley C. Eisenstat for Abstract , This paper describes a technique for achieving super-vector performance on , a CRAY-1 in a purely Fortran environment (i.e., without resorting to , assembler language). The technique can be applied to a wide variety of , algorithms in linear algebra, and is beneficial in other architectural , settings. file super-comp by Jack J. Dongarra and Alan Hinds for Abstract , A set of programs, gathered from major Argonne computer users, , was run on the current generation of supercomputers: , the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20. , The results show that a single processor of a CRAY X-MP-4 , is a consistently strong performer over a wide range of problems. , The Fujitsu and Hitachi computers excel on highly vectorized programs and , offer an attractive opportunity to sites with IBM-compatible computers. file symeig by J.J. Dongarra and D. C. Sorensen for A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem , In this paper we present a parallel algorithm for the symmetric algebraic , eigenvalue problem. The algorithm is based upon a divide and conquer scheme , suggested by Cuppen for computing the eigensystem of a symmetric tridiagonal , matrix. We extend this idea to obtain a parallel algorithm , that retains a number of active parallel processes that is greater than or , equal to the initial number throughout the course of the computation. , We give a new deflation technique which together with a robust root finding , technique will assure computation of an eigensystem to full accuracy in the , residuals and in the orthogonality of eigenvectors. A brief analysis of , the numerical properties and sensitivity to round off error is presented to , indicate where numerical difficulties may occur. The algorithm is able to , exploit parallelism at all levels of the computation and is well suited to , a variety of architectures. Computational results are presented for several , machines. These results are very encouraging with respect to both accuracy , and speedup. A surprising result is that the parallel algorithm, even when , run in serial mode, can be significantly faster than the previously best , sequential algorithm on large problems, and is effective on moderate size , problems when run in serial mode. , Multiprocessing Linear Algebra , Algorithms on the CRAY X-MP-2: , Experiences with Small Granularity file tred-post file tredb file tredb.post