From nacomb@surfer.EPM.ORNL.GOV Sun Sep 29 16:01:11 1991
Return-Path: <nacomb@surfer.EPM.ORNL.GOV>
Received: from surfer.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.7s-UTK)
	id AA14148; Sun, 29 Sep 91 16:01:02 -0400
Received: by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA01787; Sun, 29 Sep 91 16:00:46 -0400
Date: Sun, 29 Sep 91 16:00:46 -0400
From: nacomb@surfer.EPM.ORNL.GOV (NA-NET)
Message-Id: <9109292000.AA01787@surfer.EPM.ORNL.GOV>
Subject: NA Digest, V. 91, # 39
Apparently-To: dongarra@cs.utk.edu
Status: R

NA Digest   Sunday, September 29, 1991   Volume 91 : Issue 39

Today's Editor: Cleve Moler

Today's Topics:

     Changed address for Sven-Ake Gustafson
     New Address for Alfio Quarteroni
     New Address for Angelika Bunse-Gerstner 
     Student Support for Parallel Circus
     Looking for a Research Position
     Journal Refereeing is Breaking Down
     3-level block-Toeplitz solver
     6th ACM Conference on Supercomputing
     FTPing Technical Reports
     Matrix Calculator with C Syntax
     Errata to "Numerical Recipes" by Press et all
     Special Issue of "Computing" for H.-J. Wacker
     Positions at the University of Minnesota.
     Householder Fellowship at Oak Ridge
     On the Odor of IEEE Arithmetic

Submissions for NA Digest: 
    Mail to na.digest@na-net.ornl.gov.

Information about NA-NET:
    Mail to na.help@na-net.ornl.gov.

-------------------------------------------------------

From: Sven-Ake Gustafson <gustafson@ifi.hsr.no>
Date: Tue, 24 Sep 1991 12:02:00 +0200
Subject: Changed address for Sven-Ake Gustafson

Changed address for Sven-Ake Gustafson:
I have returned from my sabbatical leave and from now
on my address will be:
HSR
Box 2557 Ullandhaug
N-4004 Stavanger
Norway
Phone: (office) +47 4874470
       (home)   +47 4890083
Fax: +47 4874236
email: gustafson@ifi.hsr.no


------------------------------

From: Alfio Quarteroni <ALFQUA@IPMMA1.POLIMI.IT>
Date: Thu, 26 Sep 1991 15:56 MET
Subject: New Address for Alfio Quarteroni

Effective September 1st, I am back at the Politecnico of Milan,
at the following address:

Alfio Quarteroni
Dipartimento di Matematica
Politecnico di Milano
Via Bonardi 9
20133 Milano - Italy
Phone: +39-2-2399-4525
Fax: +39-2-2399-4568
e-mail: alfqua%ipmma1.polimi.it@imiclvx.bitnet


------------------------------

From: Angelika Bunse-Gerstner <abg@informatik.uni-Bremen.de>
Date: Fri, 27 Sep 91 14:48:14 N
Subject: New Address for Angelika Bunse-Gerstner 

I moved from the university of Bielefeld to the university of Bremen, where
I took a new position at the department of mathematics and computer science.
My new address, phone and e-mail are as follows:

Address:
         Angelika Bunse-Gerstner
         Fachbereich Mathematik und Informatik
         Universitaet Bremen
         Postfach 33 04 40
         W-2800 Bremen
         FRG

Phone:   049-421-218-4205
email:   abg@informatik.uni-bremen.de


------------------------------

From: Gene Golub <golub@Cholesky.Stanford.EDU>
Date: Fri, 27 Sep 91 7:57:32 PDT
Subject: Student Support for Parallel Circus

We have recently been awarded a modest grant by the NSF for supporting
the travel and expenses of students attending the Parallel Circus
meetings during the next two years. The next Circus will take place in
Oak Ridge on Oct 25, 26. Those students requesting support should send
a one page proposal to Gene Golub (na.golub@na-net.ornl.gov) giving
the reasons for attending the meeting, and a budget for expenses. The
student(s) should indicate their research interests and plans.
 A letter verifying that the student is in good standing should be
sent independently by a faculty adviser.  This letter should give the
student's GPA.  We will be pleased to consider joint proposals which
would include the expenses of several students.

The deadline is 5pm (PST) on Oct 11 and we will try to get a response
by Oct 15.  Correspondence by e-mail is desirable.

Apostolos Gerasoulis          Gene Golub		Ahmed Sameh

Gene Golub, 
Computer Science Dept.
Office: Margaret Jacks Hall, Room 306
Office Phone: 415/723-3124	Home Phone: 415/323-0105
FAX number: (415) 725-7411

------------------------------

From: Vera Talanker <TALANKER@ALCVAX.PFC.MIT.EDU>
Date: Mon, 23 Sep 91 11:17 EST
Subject: Looking for a Research Position

This spring I will be completing my MS degree in Applied Mathematics
at Colorado School of Mines.  I am looking for a 1-2 year research position
in the field of medicine (image processing, modeling drug delivery, etc.)
in a European country.  Can anyone give me some suggestions where to
start looking?

Thanks,
Vera Talanker:   e-mail: on internet talanker@alcvax.pfc.mit.edu


------------------------------

From: Stephen Vavasis <vavasis@cs.cornell.edu>
Date: Mon, 23 Sep 91 16:08:59
Subject: Journal Refereeing is Breaking Down

It's my impression that the refereeing process in the numerical
community is breaking down.  Papers are taking longer and longer to
get refereed. The reason seems to be that people are writing more
papers while agreeing to referee fewer.  I reach this conclusion from
my own experience of lengthy publication delays, and also from talking
to some editors of SIAM journals.

I would like to propose a stringent solution to the problem.  Because
many people in the numerical community read and publish in SIAM
journals, I would propose that SIAM adopt the following policy.  SIAM
would maintain a list of people who carry their share in terms of
refereeing, and of people who repeatedly decline to referee SIAM
papers.  Anyone in the latter category would not be allowed to publish
in SIAM journals.

I realize that the problem I have identified is not specific to 
numerical analysis but is widespread throughout computer science and
probably other disciplines.  We in NA, however, are fortunate to
have a single organization (SIAM) that could actually take action
to combat the problem.

I would welcome comments on this proposal, sent to me by email or
posted to NA digest.  I have not yet discussed my proposal with people
in the SIAM publication department.

-- Steve Vavasis
vavasis@cs.cornell.edu


------------------------------

From: Khalil Kalbasi <TIMBER@kuhub.cc.ukans.edu>
Date: Thu, 19 Sep 1991 11:14 CDT
Subject: 3-level block-Toeplitz solver

I like to know if anyone knows of a 3-level block Toeplitz solver
with complex general blocks. I am doing an array antenna problem which
eventually needs to solve a linear complex system Ax=b. The matrix A is
very structured, i.e. it is a 3-level block Toeplitz with a complex general
blocks. The Netlib has  3-level block-circulant routines which is not 
any use to me. The structure of the matrix A (a 6x6 block  example) is shown 
below, where each M# is a general complex block. As shown, each 2x2 block is 
a general Toeplitz and the ovearl 6x6 blocks are also general Toeplitz.  

       |  M1   M2    M4   M5    M7  M8 |
       |  M3   M1    M6   M4    M9  M7 | 
       |                               |
       |  M10  M11   M1   M2    M4  M5 |
       |  M12  M10   M3   M1    M6  M4 |
       |                               |
       |  M13  M14   M10  M11   M1  M2 |
       |  M15  M13   M12  M10   M3  M1 |

I appreciate your help.

   Khalil Kalbasi
   Department of Electrical and Computer Engineering
   The University of Kansas, Lawrence, KS
   (913)864-7739


------------------------------

From: Stratis Gallopoulos <stratis@csrd.uiuc.edu>
Date: Mon, 23 Sep 91 12:28:59 CDT
Subject: 6th ACM Conference on Supercomputing

                CALL FOR PAPERS

6th ACM International Conference on Supercomputing
             Sponsored by ACM SIGARCH

                July 19-23, 1992
    Hyatt Crystal City Hotel, Washington D.C

                   General Chair
Ken Kennedy, Center for Research on Parallel Comp., Rice University

                    Program Chair
C. D. Polychronopoulos, CSRD, University of Illinois


The 6th ACM International Conference on Supercomputing 
is soliciting papers on significant new research results 
and experience in the development and use of supercomputing systems. 
Topics of interest include, but are not limited to
parallel and high-performance computer architectures, 
parallelizing compilers and programming environments, 
operating systems and performance evaluation, 
large-scale applications and algorithms, 
technologies, new experimental and commercial systems. 

               Program Committee

F. Allen (IBM, T.J. Watson), R. Allen (Kubota Pacific), 
Arvind (MIT), J. L. Baer (U. Washington), D. Bailey (NASA Ames), 
J. C. Browne (U. Texas-Austin), R. Cytron (Washington U.),
D. DeGroot (Texas Inst.), R. Esser (KFA), E. Gallopoulos (CSRD),
J. R. Gurd (U. Manchester), F. Hossfield (KFA), E. Houstis (Purdue U.),
W. Jalby (INRIA-IRISA), Y. Jegou (INRIA-IRISA), C. Koelbel (Rice U.),
J. McGraw (LLNL), Y. Muraoka (Waseda U.), A. Nicolau (UC Irvine),
T. Papatheodorou (U. Patras), J. Riganati (SRC), A. J. Smith (UC Berkeley),
H. Terada (Osaka U.), A. Veidenbaum (CSRD), S. Wallach (Convex).

Local Arrangements Chair: Duncan Buell, SRC
Finance Chair: A. Veidenbaum, CSRD
Publicity Chair: E. Gallopoulos. CSRD


               Paper Submissions

Authors should submit five copies of the full manuscript
to the program chairman or the appropriate regional chairman 
as follows:

 NORTH & SOUTH AMERICA         EUROPE & AFRICA        JAPAN & FAR EAST
 Constantine Polychronopoulos  Harry Wijshoff         Toshitugu Yuba
 Program Chairman              Region Chairman        Region Chairman
 CSRD                          Dept. Computer Sci.    Electrotechnical Lab.
 University of Illinois        Utrecht University     1-1-4 Umezono, Tsukuba
 104 S. Wright St.             Padualaan 14           Ibaraki 305
 Urbana, Illinois 61801, USA   3584 CH Utrecht        Japan
 (cdp@csrd.uiuc.edu)           The Netherlands        (yuba@etl.go.jp)
                          (harryw@cs.ruu.nl)

DEADLINE for Submissions is December 10, 1991. 
Authors will be notified about the outcome of 
the paper selection by March 7, 1992.

             Proposals for Tutorials

The conference also solicits proposals for half and full-day 
tutorials which will be offered before and after the conference. 
Proposals for tutorials on topics of wide interest to the 
supercomputing and parallel processing community should be 
submitted by December 10, 1991 to:

                      George Paul
                      Tutorials Chairman
                      IBM T.J. Watson Research Center
                      P.O. Box 704
                      Yorktown Heights, NY 10598, USA
                      (gp1@watson.ibm.com)

Tutorial topics of particular interest include, but are not 
limited to large-scale applications, numeric and symbolic 
computing, applications of supercomputing to engineering, 
physical and biological sciences, programming environments 
and visualization tools, architectures, compilers and languages
and operating systems.

             Conference Announcement

The conference announcement is available in ASCII and
PostScript via anonymous ftp from sp2.csrd.uiuc.edu
(128.174.162.51) in pub/ics-an and pub/ics-an.ps.

For more information contact C. D. Polychronopoulos,
Program Chairman, CSRD, University of Illinois,
104 S. Wright St., Urbana, Illinois 61801-2932, USA.
Tel. (217) 333-6773. FAX (217) 244-1351. E-mail: cdp@csrd.uiuc.edu.


------------------------------

From: Pete Stewart <stewart@cs.UMD.EDU>
Date: Mon, 23 Sep 91 14:40:31 -0400
Subject: FTPing Technical Reports

As regular readers of this digest know, I have made my current
technical reports available by anonymous ftp.  The response has been
encouraging, and several people have asked how to make their own
reports available.  Hence this little note.

First some background.  Ftp--short for file transfer program--is an
internet interface for moving files over the network.  The user can
send or get ASCII or binary files.  A particularly useful command is
"mget", which expands names (e.g., paper.*) and gets all the
resulting files.  Details may be found in the on-line manual.

Ordinarily one must have a valid account on a machine in order to ftp
to it.  This causes difficulties for people who want to make
programs, reports, and the like available to the general public.  The
fix is the anonymous ftp account, which gives people restricted
access to files on a machine.  Here for an example, is a little run
in which I listed some directories on my own machine.

   thales: ftp thales.cs.umd.edu
   Connected to thales.cs.umd.edu.
   220 thales.cs.umd.edu FTP server (SunOS 4.1) ready.
   Name (thales.cs.umd.edu:stewart): anonymous
   331 Guest login ok, send ident as password.
   Password:
   230 Guest login ok, access restrictions apply.
   ftp> cd pub
   250 CWD command successful.
   ftp> ls
   200 PORT command successful.
   150 ASCII data connection for /bin/ls (128.8.128.81,1315) (0 bytes).
   jeep
   matrixpert
   references
   reports
   226 ASCII Transfer complete.
   39 bytes received in 1.1 seconds (0.035 Kbytes/s)
   ftp> bye
   221 Goodbye.
   thales:

The procedure is to identify yourself as "anonymous".  You can give
any password, but it is usual to give your name or email address, so
the account manager can tell who is transfering files.  By convention
pub is the root directory for public files.

The first step in creating a repository of technical reports is to get
your systems staff to set up an anonymous ftp account.  For people who
are concerned about security, note that anonymous ftp is quite safe.
The user cannot get out of the ftp directory, and to my knowledge the
program has never been broken.  In any event, our own security
conscious staff has no hesitation about setting up such accounts.

Once you have the account, you will probably want to create a special
directory to hold the repository (tech reports are not the only thing
you are going to want to distribute:  see the above listing of my pub
directory).  At this point you must decide what form you want to
distribute the reports in.  For those who use TeX, there are three
options--tex files, dvi files, or postscript files.  Unless you stick
to vanilla TeX or LaTeX, tex files are not the way to go, since you
will also have to distribute any auxiliary files containing your
customized macros.  I went with dvi files because many places do not
have postscript printers, while most places have a utility (called
something like dvi2ps) that converts dvi files to postscript.  One
important drawback is that postscript graphics must be transfered in
separate files; however, you can use the mget feature of ftp to ease
some of the pain.

The next thing is to decide on what auxiliary information you want to
supply your customers.  I have organized my repository around three
files:  README, Contents, and Abstracts.  The README file contains
basic information about the repository.  The Contents file contains
the following information for each paper

   Authors
   Title
   Source
   Disposition
   Files

Here is a typical entry.

G. W. Stewart
"An Updating Algorithm for Subspace Tracking"
UMIACS-TR-90-86, CS-TR 2494, July 1990, Revised January 1991
To appear in IEEE Transactions on Acoustics, Speech and Signal
Processing
uast.dvi, uast.f (fortran program)

The Abstract file contains abstracts in addition to the above
information.  Setting up these files for the first time is the most
time-consuming part of the process.  They are very easy to update as
you add reports to your repository.

The last problem is to select names for your files.  I construct mine
from the first letters of the significant words in the title.  The
results are admitedly ugly, but it solves the problem of repeated
names.  In fact, two or three letters followed by an asterisk is
usually sufficient identification for an mget.  An alternative is to
use more conventional names with a date to insure uniqueness.

That's all there is to it.  If you try it, please let me know.
When there is a critical mass, I will undertake to edit a weekly
entry in the na-digest announcing newly available reports--a sort
of technet digest.  It would be a fitting birthday present for Gene
Golub, who has done so much to unite us electronically.

Pete Stewart
stewart@cs.umd.edu
na.pstewart@na-net.ornl.gov


------------------------------

From: Rick Perry <perry@ucis.vill.edu>
Date: 24 Sep 91 09:51:00 EDT
Subject: Matrix Calculator with C Syntax

I have been working on the development of a matrix calculator program,
similar to Matlab, but with the syntax of the C programming language.
This project started as a way to combine some of my C code for matrix
computations into a coherent package, but has evolved into a fairly complete
interactive C compiler, with matrices as the fundamental data type
(e.g. matrix A = [1 2; 3 4];  ++A;) and some operators from APL ( +/  */ ).

I am interested in hearing about similar development efforts or existing
products, and comments about the usefulness of such a package.  I am aware
of many packages, e.g. Matlab, Matrix, S, SMP, etc. but I've never heard
of any which use C syntax.  My program so far runs on Sun/Unix, VAX/VMS,
and IBM PC's, and I'll gladly make it available for anyone who's interested.

...Rick			perry@vu-vlsi.vill.edu, perry@ucis.vill.edu

Dr. Rick Perry, Department of Electrical Engineering
Villanova University, Villanova, PA 19085, 215-645-4969, -4970, hm: 259-8734


------------------------------

From: Przemek Klosowski <przemek@rrdstrad.nist.gov>
Date: 17 Sep 91 18:21:32 GMT
Subject: Errata to "Numerical Recipes" by Press et all

Hello!

  The merits (or demerits) of Numerical Recipes were discussed on the
net in a series of articles, but my feeling was that there was no hard
data on the specific problems and adopted solutions.  In particular I
remember someone from JPL complaining that the "robust fit" procedure
is not robust at all, but without giving the details. I think it might
be of use to catalog specific problems with NR's algorithms (for the
sake of focus I propose that we do the C version first), so that
others might avoid the traps that possibly exist.
   I would like to start such collection by posting several items, providing
varying detail, and asking people to add their own remarks/fixes. I ask
everyone to send me an e-mail copy of their followup, so that I don't miss
it (I normally read the newsgroups, but if the backlog is too large, I tend
to catch up). In the future, I will edit the contributions and make them
available to the public.
 
  Specific problem areas that I know of are:
 
1) In the discussion of Levenberg-Marquard algorithm (ch. 14, p. 544)
   for nonlinear minimization (mrqmin), the book gives a condition for
   the end of iterations that is not satisfied if the starting point is
   near the true minimum.

2) error in medfit---could the JPL person that reported it originally
   fill in the details?

3) The authors "cathegorically recommend that you {\it only} use FFTs
   with N a power of two". I understand that the latest wisdom goes
   against this---comments/code anyone?
   I have heard rumors that some of the FFT routines are broken.

4) the variable metric BFGS method (dfpmin) uses the traditional approach
   of calculating and using the Hessian. This (as mentioned in the text)
   is prone to roundoff errors, and that instead Gill/Murray method of
   calculating H^{-1} is much preferred (more than it is implied from the
   text). Anyone willing to comment on that? Anyone has a G/M version of
   dfpmin?

    Przemek Klosowski (przemek@ndcvx.cc.nd.edu)
    Physics Department
    University of Notre Dame IN 46556


------------------------------

From: Bob Mattheij <wstanw10@urc.tue.nl>
Date: Wed, 25 Sep 91 13:27:58 +0200
Subject: Special Issue of "Computing" for H.-J. Wacker
 
The journal Computing (Archives for Informatics and Numerical Computation)
will devote a special issue for papers dedicated to the unfortunate premature
death of its Editor H.-J. Wacker. As Professor Wacker was particularly active
in the promotion of the use of mathematics in industry, this special issue
should reflect these intentions and papers from the area of "Industrial Mathe-
matics" will be particularly welcome for this issue even if they would consti-
tute borderline cases with respect to the ordinary scope of Computing.
 
Manuscripts for this issue should be submitted (in duplicate) as soon as
possible to either
 
       Prof. Robert Mattheij           Prof. Hans J. Stetter
       Technical University     or     Technical University (115.2)
       Postbus 513                     Wiedner Hauptstrasse 8-10
       5600 MB Eindhoven               A-1040 Vienna
       The Netherlands                 Austria
 
The absolute deadline for the submission of manuscripts for this special issue
is February 15 1992.
 
Manuscripts must conform with the "Instructions to Authors" reprinted in each
issue of Computing. They will be subject to refereeing and the scientific
standards of the journal. Note that Computing has a category of "Short Communi-
cations" (besides "Original Papers"); manuscripts for the special issue may
also be submitted in this category.
 
     Please, give me your reactions to the above information as soon as pos-
sible. I will be away once more between 22 Sept. and 5 Oct.; thus it would be
good to have a reply by 20 Sept.
 
     With my best regards and my appreciation of your cooperation,
 
                                 Sincerely yours,
 
                                 Hans J. Stetter


------------------------------

From: Youcef Saad <saad@cs.umn.edu>
Date: Thu, 26 Sep 91 10:35:24 CDT
Subject: Positions at the University of Minnesota.

The Department of Computer Science at the University of Minnesota invites 
applications for faculty positions beginning September 1992. Positions 
are available at both the junior and senior levels and in all areas of
Computer Science with emphasis on compilers and operating systems, 
performance evaluation, architecture, numerical analysis and optimization, 
robotics and computer vision as well as all aspects of parallel and 
distributed computing. Requirements include a Ph.D. in Computer Science or 
a related discipline, excellence in research and a commitment to teaching.

The Minneapolis-St. Paul area is a major center for high technology and 
the computer industry. Faculty in the Department of Computer Science have 
access to outstanding computer facilities both within the department and 
at the various high performance computing centers on campus, including the 
Minnesota Supercomputer Institute and the Army High Performance Computer 
Research Center. The supercomputers available on campus are Cray-2,
Cray X-MP, and Thinking Machines CM-2 and CM-X massively parallel processors.

Applicants should submit a vitae and names of at least three references to: 
Chair, Faculty Recruiting Committee, Department of Computer Science, 
4-192 EE/CSci Building, University of Minnesota, 200 Union St. S.E., 
Minneapolis, Minnesota 55455. The closing date for receipt of applications 
is February 28, 1992.

The University of Minnesota is an equal opportunity employer and
welcomes applications from women and ethnic minorities.


------------------------------

From: Richard Sincovec <sincovec@sirius.EPM.ORNL.GOV>
Date: Thu, 26 Sep 91 14:46:42 -0400
Subject: Householder Fellowship at Oak Ridge

                    HOUSEHOLDER FELLOWSHIP

                OAK RIDGE NATIONAL LABORATORY

The Mathematical Sciences Section of Oak Ridge National Laboratory
invites outstanding candidates to apply for the 1992 Alston S.
Householder Fellowship in Scientific Computing.

Alston S. Householder was the organizer and founding Director of the
Mathematics Division (precursor of the current Mathematical Sciences
Section) at Oak Ridge National Laboratory (ORNL).  In recognition of
the seminal research contributions of Dr. Householder to the fields of
numerical analysis and scientific computing, a distinguished
postdoctoral fellowship program has been established at ORNL and named
in his honor.  The Householder Fellowship is supported by the Applied
Mathematical Sciences Subprogram of the U.S. Department of Energy.

The purposes of the Householder Fellowship are to promote innovative
research in scientific computing on advanced computer architectures
and to facilitate technology transfer from the laboratory research
environment to industry and academia through advanced training of new
computational scientists.  The Householder Fellowship is normally for
a term of two years.  Benefits of the Fellowship include a competitive
salary, fringe benefits, travel opportunities, access to
state-of-the-art computational facilities (including both parallel
architectures and high-performance personal workstations), and
collaborative research opportunities in a very active research program
in advanced scientific computing.  Competition for the appointment is
open to U.S. citizens and permanent residents.  Applicants should have
completed a doctoral degree in computer science, mathematics, or
statistics within three years prior to the appointment and have a
strong background and research interest in large-scale scientific
computing.

ORNL's Mathematical Sciences Section has research programs in
Computational Mathematics, Computer Performance Characterization,
Applied Analysis, Global Climate Modeling, and Computational
Statistics.  The precise research emphasis of the Householder Fellow
would necessarily depend to a great degree on the research interests
of the selected Fellow.  Areas of particular interest at ORNL, and in
which applicants would be especially encouraged, include:
computational linear algebra, parallel algorithms for partial
differential equations including fluid flow through porous media,
tools for the development and analysis of parallel algorithms,
computational statistics, scientific visualization, and ``grand
challenges'' in computational science.

Applicants should send a resume, statement of research goals, and
three letters of recommendation to Carl Ludemann, PhD Employment, Oak
Ridge National Laboratory, P.O.\ Box 2008, Oak Ridge, TN 37831-6216,
marked ``ATTN: Householder Fellowship.''  The deadline for applying is
December 13, 1991.  Finalists will be invited to visit ORNL in late
January 1992, and the selection committee's decision on the winning
candidate will be announced in February 1992.  The fellowship will
commence in 1992.

For further information, contact Richard F. Sincovec by phone at
615-574-3125 or by electronic mail at sincovecrf@ornl.gov.


------------------------------

From: James Demmel <demmel@robalo.Berkeley.edu>
Date: Thu, 26 Sep 91 16:38:32 PDT
Subject: On the Odor of IEEE Arithmetic

This is in response to Joe Grcar's message from NA Digest v. 91, i. 33:
IEEE Arithmetic Stinks. There were interesting responses from John Reid
and John Nash, and I will not repeat their arguments here. I would instead
like to discuss a few of the real ways in which current floating point 
computing environments remain deficient, and what efforts are being
made, or could be made to fix them. I would also like to discuss the SLI
arithmetic of Lozier, Olver and Turner, as well the arithmetic of Kulisch and 
Miranker.

A thorough discussion of all these issues would fill a long journal article,
so I will necessarily be terse.

IEEE Arithmetic:

The IEEE standards committee hoped to provide a good scientific computing 
environment, a complete provision of which would encompass arithmetic hardware, 
languages, compilers, and OS support. The committee reasonably felt no one
would listen if they tried to dictate too many details beyond the "pure"
floating point properties, and perhaps also realized they did not have
the expertise to make wise decisions covering all these areas. Instead,
they made suggestions in the hope that language, compiler and OS support
for the various IEEE features would follow. This hope has usually
gone unmet.

SUN C, SUN Fortran, and the Apple SANE environment have gone farthest in
providing a simple programming model for the user, but they have still
not gone far enough. No language provides easy (or any) ability to
write exception handlers, for example.

Joe Grcar may respond that he does not want to even consider writing his
own exception handlers. Fine. In that case, it is a simple matter to call
a standard SUN library function to enable trapping on all the usual 
exceptions. That the IEEE default is to continue computing with NANs and
infinities instead of trapping is something that people on the IEEE committee
were not unanimous about, but providing the usual unforgiving programming
model is certainly no problem. (And one could imagine providing a
different default environment, where one traps on exceptions, via a
compiler flag or some similar device).

It is clear that the time is ripe for a proposed standard set of interfaces
to features of IEEE (and other!) arithmetics not usually provided for in
conventional languages. The recent LCAS proposal of Mary Payne and
Brian Wichmann is a recent attempt, even if it had flaws (see the forthcoming
paper in the SIGNUM newsletter by W. Kahan on this). Work is continuing
on this, in the context of the NCEG (Numerical C Extensions Group, chaired
by Jim Thomas  - jimt@apple.com) and the LAPACK project (W. Kahan at Berkeley).
I hope to speak about this issues in more detail at Supercomputing 91 in
November.

Joe Grcar mentions the difficulty in doing error analysis, because of
infinities and gradual underflow. One reference on error analysis with
gradual underflow is my article "Underflow and the Reliability of 
Numerical Software" in the December 1984 SIAM J. Sci. Stat. Comput.
If one wishes to take underflow into account, for most codes the 
algebraic work involved is essentially the same as for conventional 
flush-to-zero systems: One uses the error model

    floating_point_result(a op b) = (a op b)*(1 + delta) + mu

where abs(delta) is bounded by machine epsilon and mu is the underflow error. 
For gradual underflow mu is bounded by the smallest subnormal number,
and for flush-to-zero, it is bounded by the underflow threshold, which is
larger by the factor 1/machine_precision.  It is easy to analyze many codes 
using this model, and to show that gradual underflow makes them more reliable 
than flush-to-zero. Indeed, for some codes, like polynomial evaluation, 
one can show that only with gradual underflow is there a "perfect" backward 
error in the following sense:

   The computed answer is the exact answer for a slightly different
   problem, where the perturbation is in the last few bits of every
   piece of data, whether it is normalized or not. In contrast, with
   flush-to-zero, almost no bits in any piece of the data my be
   significant if all the numbers are too close to the underflow
   threshold.

Grcar is concerned about a product like 

               i=n
          p = prod a(i)
               i=1

where an intermediate product underflows (but not to zero) but subsequent a(i) 
are so large that p is normalized. In this case p could have large relative 
error without warning. In flush-to-zero, p would be zero, which also has a 
large relative error. The concern must be over codes that test for underflow by
comparing p to zero, and recover somehow; these codes could break with IEEE. 
Such codes would have to have some a(i) less than 1 and some greater than 1
(otherwise once underflow occurred the final result would still be underflowed 
and detectable by comparing p against the underflow threshold instead of 0).
Thus, to be correct, they would have to have protection against overflow,
which complicates the code at least as much as gradual underflow. 
And to be portable to machines like the Cray or i860 which divide a/b via 
a*(1/b), they would still have to compare tiny numbers against a threshold 
instead of against zero to guard against problems.

Still, someone might have a code he thinks works, and want to run it
with as few changes as possible. Here are two approaches.  One way is 
to use the underflow flag to test for underflow.  A second way is to 
have a trap handler that traps on underflow and substitutes 0, mimicing 
old fashioned flush-to-zero. Both solutions presume access to these 
features of IEEE arithmetic, and so if they are not provided the user 
has a perfect right to complain. Complain about access to features of
IEEE arithmetic, that is, not about IEEE arithmetic itself.

Finally, the true benefit of IEEE arithmetic will not be seen until
significant applications are written using its features to great
advantage. One of the goals of LAPACK 2 is to write linear algebra
codes using IEEE which are significantly faster and/or more accurate
than their conventional counterparts. Simple examples are condition
estimation and eigenvector computation: Both involve solving potentially
very ill-conditioned triangular systems. Overflow happens occasionally,
but is generally NOT a "pathological" situation deserving an exception. 
For example, it may just mean eigenvalues are close together. 
Currently, LAPACK codes must scale inside the inner loop to avoid overflow, 
significantly slowing them down. The alternative is to run at top speed 
without scaling, test for overflow at the end, and only then run the 
scaling code if needed. In other words, IEEE provides an essentially free 
insurance policy against a rare but fatal problem, instead of paying the 
high premium of always scaling.

SLI Arithmetic:

Frank Olver responded to Joe Grcar's complaints about IEEE by saying
that we should throw out floating point completely and use sli
arithmetic. For the uninitiated, SLI arithmetic represents floating
point numbers as follows:

  s1*exp(s2*exp(exp(exp(...(exp(f))...))))

where s1 and s2 are +-1, and f is a fixed point number between 0 and 1.
The representation stores s1, s2, f and m, where m is the number of exp(.)
functions. With a 64 bit format and m limited to 0..7 (3 bits), the
largest number representable is quite unimaginably large.

I made a number of arguments against adopting this arithmetic in my paper
"On error analysis in arithmetic with varying relative precision", Proc.
8th IEEE Computer Arithmetic Symposium, 1987. Let me summarize these and
add a few not presented there.

The proponents of this arithmetic like it because it is free of over/underflow
(unless one divides by 0 or uses transcendental functions). This, in their
view, eliminates the need for infinity arithmetic or denormalized numbers.
However, the reason their system cannot over/underflow on +,-,*, or / is
that at the extremes of their representation, the representable numbers are
so far apart that the correctly rounded value of a+b or a*b is just max(a,b)
(assuming a and b are positive).  In other words, rounding uncertainty is 
so large that not only is there no relative precision in the usual sense, 
but there may not even be any relative precision in the (usual) exponent. 
Indeed, a + b = a * b = max(a,b) for positive a and b is precisely the 
semantics of the infinity symbol. I am all for having an infinity symbol,
but not a profusion of them as in SLI arithmetic.

The problem with expressions like prod(a(i)) possibly losing precision because 
of denormalized IEEE arithmetic recurs in SLI, but without any possibility 
of warning. Since numbers gradually lose (all) their relative precision
for enormous or tiny magnitudes, one could end up with a modest sized number
without any relative accuracy, and without recourse to exception handling or
underflow flags as warning. (But see my above mentioned paper for a
possible, if too-expensive, hardware fix.)

The test of any new arithmetic lies in the existence of applications, even if
they are only on paper for lack of an implementation. The only applications
of SLI I have seen have been unconvincing small problems for which equal or
better alternatives using standard arithmetic exist. Indeed, I think that 
there is a conservation of effort in writing robust code: any effort saved by 
SLI in handling over/underflow is lost in dealing with the variable relative 
precision.

Finally, there are better and more easily implemented schemes with similar
tradeoffs between exponent range and precision. A paper by H. Hamada of Hitachi
in the same IEEE Proceedings describes one such scheme; see my references to
Matsui and Iri for another. Hamada's scheme had most of the advantages and 
fewer of the disadvantages of SLI.  Hitachi considered building a machine with 
Hamada's arithmetic, and to my knowledge abandoned it, reflecting their
judgment that it was too slow and had insufficient applications.

Kulisch - Miranker Arithmetic

There is a proposal circulating to incorporate Kulisch/Miranker style
arithmetic as an extension of IEEE floating point arithmetic. I'd like
to summarize this proposal briefly and argue against it.

Kulisch/Miranker propose to add the ability to compute dot products of
arbitrary vectors exactly. In other words, 

   (1,1e38,-1e38).(1,1e38,1e38) = ( 1*1 + 1e38*1e38 ) - 1e38*1e38 = 1

would be computed correctly instead of as 0. This would appear to
require carrying the equivalent of at least about  2*2*38+7=159 decimal digits 
in IEEE single, and far more in double. This has been done in the past in
microcode and cache, as on the IBM 4361. They propose to add a single type,
DOTPRECISION, which one could only use to accumulate dot products, add and
subtract, and round (up, down or to nearest) to the other IEEE precisions.

Their motivation is to build algorithms using interval arithmetic with
standard precision for most operations and DOTPRECISION for a few
critical ones. Their algorithms, as implemented for example in the IBM
ACRITH package, claim to solve linear systems and get minimally wide
output intervals guaranteed to contain each solution component; i.e. the
interval bounds are adjacent floating point numbers. They have similar
codes for some eigenproblems, zeros of polynomials, linear programming,
and other problems.

A conference proceedings on this material may be found in
"A new approach to scientific computation", U. Kulisch and W. Miranker, eds.,
Academic Press, 1983.  Some of their claims are exaggerated, as described in
"Anomalies in the IBM ACRITH Package," W. Kahan and E. Leblanc, Proc.
7th IEEE Computer Arithmetic Symposium, 1985.

Here are the main arguments against adopting Kulisch's scheme:

1) None of their algorithms that I know of depend for correctness on an
exact inner product. And their accuracy also depends only weakly on
how much extra precision you have: As the precision grows, you can
solve more problems to "full accuracy", but the number of these problems
is very small and gets smaller as the precision increases. You have
to work hard to concoct examples which need anywhere near the precision
required for exact inner products in order to be solved fully accurately.
So the benefit from all those extra bits is small.

2) Many of their iterative schemes (such as for linear equation solving)
depend on working precision Gaussian elimination being accurate
enough to get a convergent iteration going. As long as there is a little
bit of convergence, they compute residuals accurately to get lots of
accuracy in the answer. But if the working precision Gaussian elimination
is not good enough, their extra precision does not help them. You would
need to do the Gaussian elimination itself in higher precision, which they
cannot do straightforwardly with their proposed data types and instruction set. 
(They appear to do higher precision Gaussian elimination in some cases in 
later ACRITH releases, although this is just speculation since the details are
proprietary. The only way I can imagine doing it using their arithmetic
is quite cumbersome, in contrast to using higher precision arithmetic directly.)

3) They do not address exponent range limitations, which are at least
as problematic in some software development as accuracy limitations.
This is true in polynomial evaluation, for example.

4) There is really a statistical ("RISC vs CISC") question underlying this: 
Of the following reasons for failure to get a good enough answer:

   (1) not enough precision in the residuals for iterative refinement
       to work
   (2) not enough precision in the initial Gaussian elimination to get
       convergence (however slow)
   (3) exponent range limitation

which is most likely to cause failure? Kulisch and Miranker only address (1).
Quadruple precision addresses all three, but (1) to a somewhat weaker extent
than Kulisch and Miranker. I do not have statistics (I doubt anyone does) but 
I feel strongly that the extra problems one can solve with quad because
it deals with (2) and (3) far outweigh the problems Kulisch can
solve but quad can't because of (1).

5) There are faster and simpler algorithms using (I believe) the same
hardware resources which they cannot use, or prefer not to use, because 
they limit themselves to dot products. Again, consider polynomial evaluation: 
if one could multiply a long accumulator by a single precision variable, 
using a loop like:

   real s,tmp
   DOTPRECISION d1, d1
   /* compute d2=s*d1 */
   d2=0
   repeat
     tmp=d1
     d1=d1-tmp
     d2=d2+s*tmp
   until d1=0
    
then one could just use the standard algorithm to evaluate polynomials
(Horner's rule) and it would work. Instead, they need a complicated 
library routine. In other words, their proposed instruction set makes
programmers have to completely rethink things unnecessarily.

So from a software point of view, it is much easier to develop algorithms
using quad than shoehorning everything into dot products. Kahan has more
to say on this in his article cited above.

6) Trying to get tight guaranteed error bounds is a laudable goal, and
interval arithmetic is a valuable tool, but it will not be possible to do
this reliably in systems without variable precision. The reason is that if one
tries to build larger applications out of interval building blocks
(like ACRITH routines), then even if the input data is exact 
(0-width intervals), the output of the first interval routine will be 
intervals. These intervals are then fed in to a second routine. How wide 
must the output intervals of the second routine be? It is easy to show that 
in general they must be COND2 times wider than the input intervals, 
where COND2 is the condition number of the problem being solved by the 
second routine. If these intervals are then used as input to a third routine, 
their width will be magnified by another condition number factor COND3. 
If we are computing with a fixed maximum precision, eventually these 
products of COND factors will make the intermediate intervals so large 
as to be useless. The only way around this is to completely reformulate 
the problem to avoid these condition numbers, which may be very difficult 
and time-consuming (and defeats the purpose of using these libraries, which is 
getting the answer faster as well as more accurately), or else to use 
variable precision. Tom Hull at the University of Toronto, Siegfried Rump at 
TU Hamburg, and others have proposed and sometimes built such systems.

Thus, I do not foresee wide use of interval arithmetic until variable 
precision is adequately supported, and this cannot be done without a 
more major architectural revision than a small change in the IEEE floating
point standard.

   Jim Demmel
   Computer Science Division and Mathematics Department
   University of California
   Berkeley, CA 94720
   demmel@cs.berkeley.edu

------------------------------

End of NA Digest
**************************
-------