papers

<head>
<title>papers</title>
<meta name="waisindex" value="nse">
</head>
<h1>papers</h1>
<p>
Click <A HREF="http://www.netlib.org/master_counts2.html#papers">here</A> to see the number of accesses to this library.
<p><hr>
<pre>

# ====== index for papers ======

file	<a href="advarch">advarch</a>

file	<a href="advarch-post">advarch-post</a>

file	<a href="army3">army3</a>
for	Performance and Library Issues for Mathematical Software on High 
,	Performance Computers.
,	Abstract
,	This paper discusses 
,	some of the fundamental issues facing designers of mathematical software 
,	libraries for medium scale parallel processors such as the CRAY X-MP-4
,	and the Denelcor HEP. We discuss the problems that arise with performance 
,	and demonstrate that it may be appropriate to exploit parallelism at all 
,	levels of the program, not just at the highest level.  We give performance 
,	measurements indicating the efficiency of a linear algebra library written 
,	in terms of a few high level modules.  These modules chosen at the matrix 
,	vector level extend the concept of the BLAS [13] and provide enough 
,	computational granularity to allow efficient implementations on a wide 
,	variety of architectures.  Only three modules must be recoded for
,	efficiency in order to transport the library to various machines.   We
,	report experience on machines as diverse as the CRAY X-MP and the Denelcor 
,	HEP.  Finally, we report on some special algorithms for the HEP which 
,	take advantage of the fine grain parallelism capabilities.
,	Comparison of the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20:
,	An Argonne Perspective
by	Jack Dongarra and D.C. Sorensen

file	<a href="band8">band8</a>

file	<a href="blas2-paper">blas2-paper</a>

file	<a href="blas2-post">blas2-post</a>

file	<a href="blas2.alg">blas2.alg</a>

file	<a href="blas3-post">blas3-post</a>

file	<a href="crayxmp">crayxmp</a>

file	<a href="directory">directory</a>

file	<a href="experience2">experience2</a>
by	Steve C. Chen, Jack J. Dongarra, and Christopher C. Hsiung
for	CRAY X-MP-4 (or Approaching the Gigaflop) Jack J. Dongarra and Tom Hewitt
,	Abstract
,	This paper gives a brief overview of the CRAY X-MP-2 general-purpose 
,	multiprocessor system and discusses how it can be used effectively to 
,	solve problems that have small granularity. An implementation is described
,	for linear algebra algorithms that solve systems of linear equations when
,	the matrix is general and when the matrix is symmetric and positive definite.
,	Implementing Dense Linear Algebra Algorithms Using Multitasking on the

file	<a href="jack6-21">jack6-21</a>

file	<a href="japan3">japan3</a>

file	<a href="lapack-post">lapack-post</a>

file	<a href="netlib">netlib</a>
by	JACK J. DONGARRA and ERIC GROSSE
for	DISTRIBUTION OF MATHEMATICAL SOFTWARE VIA ELECTRONIC MAIL
,	A large collection of public-domain mathematical software
,	is now available via electronic mail.  Messages sent to "netlib@anl-mcs"
,	(on the Arpanet/CSNET) or to "research!netlib" (on the UNIX\(rg network)
,	wake up a server that distributes items from the collection.
,	For example, the one-line message "send index" gets a library catalog by 
,	return mail.  We describe how to use the service and some of the issues
,	in its implementation.  Performance of Various Computers Using Standard 
,	Linear Equations Software in a Fortran Environment

file	<a href="perform">perform</a>
by	Jack J. Dongarra
for	Abstract - This note compares the performance of different computer systems
,	while solving dense systems of linear equations using the LINPACK
,	software in a Fortran environment. About 100 computers, ranging from 
,	a CRAY X-MP to the 68000 based systems 
,	such as the Apollo and SUN Workstations to IBM PC's, are compared.
,	Implementing Linear Algebra Algorithms for Dense Matrices
,	on a Vector Pipeline Machine

file	<a href="perform-post">perform-post</a>

file	<a href="pipeline">pipeline</a>
by	J.J. Dongarra, F.G. Gustavson and A. Karp 
for	Abstract - This paper examines common implementations of linear algebra 
,	algorithms; such as matrix-vector multiplication, matrix-matrix
,	multiplication and the solution of linear equations. The different versions 
,	are examined for efficiency on a computer architecture which uses
,	vector processing and has pipelined instruction execution.  By using the 
,	advanced architectural features of such machines, one can usually achieve 
,	maximum performance, and tremendous improvements in terms of execution 
,	speed can be seen over conventional computers. A Proposal for an 
,	Extended Set of Fortran Basic Linear Algebra Subprograms

file	<a href="prospect-post">prospect-post</a>

file	<a href="sblas">sblas</a>
by	Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson
for	Abstract
,	This paper describes an extension to the set of Basic Linear Algebra 
,	Subprograms. The extensions proposed are targeted at matrix vector 
,	operations which should provide for more efficient and portable 
,	implementations of algorithms for high performance computers.
,	Squeezing the Most out of Eigenvalue Solvers on High-Performance Computers

file	<a href="sblas-toms">sblas-toms</a>

file	<a href="schedule2">schedule2</a>

file	<a href="squeez-eig">squeez-eig</a>
by	Jack J. Dongarra, Linda Kaufman and, Sven Hammarling
for	Abstract
,	This paper describes modifications to many of the standard
,	algorithms used in computing eigenvalues and eigenvectors of matrices.
,	These modifications can dramatically
,	increase the performance of the underlying software 
,	on high performance computers
,	without resorting to assembler language, without significantly
,	influencing the
,	floating point operation count, and without affecting the 
,	roundoff error properties of the algorithms.
,	The techniques are applied to a wide variety of algorithms
,	and are beneficial in various architectural settings.
,	Squeezing the Most out of an Algorithm
,	in CRAY Fortran

file	<a href="squeezing">squeezing</a>
by	Jack J. Dongarra and Stanley C. Eisenstat
for	Abstract
,	This paper describes a technique for achieving super-vector performance on
,	a CRAY-1 in a purely Fortran environment (i.e., without resorting to 
,	assembler language).  The technique can be applied to a wide variety of 
,	algorithms in linear algebra, and is beneficial in other architectural 
,	settings.

file	<a href="super-comp">super-comp</a>
by	Jack J. Dongarra and Alan Hinds
for	Abstract
,	A set of programs, gathered from major Argonne computer users, 
,	was run on the current generation of supercomputers:
,	the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20.
,	The results show that a single processor of a CRAY X-MP-4 
,	is a consistently strong performer over a wide range of problems.
,	The Fujitsu and Hitachi computers excel on highly vectorized programs and
,	offer an attractive opportunity to sites with IBM-compatible computers.

file	<a href="symeig">symeig</a>
by	J.J. Dongarra and D. C. Sorensen
for	A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem
,	In this paper we present a parallel algorithm for the symmetric algebraic
,	eigenvalue problem.  The algorithm is based upon a divide and conquer scheme
,	suggested by Cuppen for computing the eigensystem of a symmetric tridiagonal
,	matrix.  We extend this idea to obtain a parallel algorithm 
,	that retains a number of active parallel processes that is greater than or 
,	equal to the initial number throughout the course of the computation.  
,	We give a new deflation technique which together with a robust root finding 
,	technique will assure computation of an eigensystem to full accuracy in the 
,	residuals and in the orthogonality of eigenvectors.   A brief analysis of 
,	the numerical properties and sensitivity to round off error is presented to 
,	indicate where numerical difficulties may occur.  The algorithm is able to 
,	exploit parallelism at all levels of the computation and is well suited to 
,	a variety of architectures.  Computational results are presented for several
,	machines.  These results are very encouraging with respect to both accuracy 
,	and speedup. A surprising result is that the parallel algorithm, even when 
,	run in serial mode, can be significantly faster than the previously best 
,	sequential algorithm on large problems, and is effective on moderate size 
,	problems when run in serial mode.
,	Multiprocessing Linear Algebra 
,	Algorithms on the CRAY X-MP-2: 
,	Experiences with Small Granularity   

file	<a href="tred-post">tred-post</a>

file	<a href="tredb">tredb</a>

file	<a href="tredb.post">tredb.post</a>
</pre>
</body>
</html>