**Today's Topics:**

- Change of address for Doug Meade
- Clarification of Florida Research Position
- IEEE Arithmetic Stinks
- Book on Total Least Squares
- Tenth Parallel Circus at Oak Ridge
- Ninth Parallel Circus Summary
- Contents: SIAM J. Optimization
- Contents: SIAM J. Computing

-------------------------------------------------------

From: Douglas Meade <meade@milo.math.scarolina.edu>

Date: Fri, 16 Aug 91 10:17:21 -0400

**Subject: Change of address for Doug Meade**

Douglas B. Meade, formerly affiliated with Purdue University, is now

located at the University of South Carolina. His new address is:

Department of Mathematics

University of South Carolina

Columbia, SC 29208-0001

Phone: (803) 777-6183

E-mail: meade@math.scarolina.edu

FAX: (803) 777-3783

------------------------------

From: Tim Davis <davis@nautilus.cis.ufl.edu>

Date: Fri, 16 Aug 91 10:49:22 -0400

**Subject: Clarification of Florida Research Position**

The position that was mentioned in two weeks ago in vol. 91, issue 31,

of the NA Digest is limited to someone who would pursue their Master's

degree and/or Ph.D. degree (preferably the latter) in the Computer and

Information Science Department at the University of Florida.

Tim Davis

------------------------------

From: Joe Grcar <sepp@snll-arpagw.llnl.gov>

Date: Wed, 14 Aug 91 20:03:13 PDT

**Subject: IEEE Arithmetic Stinks**

I think the IEEE arithmetic standard is bad. I realize this shot

across the bow is nothing compared to the big bore delivery of

those who devised the standard, but I'm shooting anyway. I'm

tired of people making simple things complicated (and writing a

lot of papers about it).

(1) The standard makes simple things complicated.

The first rule of engineering is: make it simple. The second

rule is: keep it simple. In contrast, the SUN OS documentation

for IEEE arithmetic is one hundred and sixty pages long.

(2) The standard makes precision indefinite.

When magnitudes become too small, the exponent sticks and the

representation degenerates to fixed point. When magnitudes

become too large, the numbers become "infinity."

(3) The standard makes error analysis impossible.

There is no machine epsilon. Fixed point arithmetic has no

error for addition and subtraction, but when fixed and floating

point numbers mix, multiplication and division can have large

relative error.

(4) The standard doesn't prevent failure, it only hides failure.

The standard allows arithmetic with fixed point, infinite and

"not a number" data. Programs run just fine with garbage in

and garbage out.

(5) The standard makes debugging difficult.

The standard allows programs to run with garbage for data.

These are the hardest programs to debug.

Here's an example of what can go wrong. An engineer installed a

large simulation program on his Sparc station. He got the wrong

versions of some libraries, so the data structures were

inconsistent. The Sparc's printed output consisted of "infinity"

and "not a number." The engineer said: "this program must be bad;

it won't work for the Sparc's IEEE arithmetic; it only works for

the Cray YMP's bad arithmetic." I had to laugh when I saw the

output and realized what had happened. According to the IEEE

standard, infinity is a perfectly good number---it just has no

precision.

I asked the engineer if he had read the SUN arithmetic documentaion.

He asked: "do you understand all that?" I had to admit I'm not in

the habit of writing interrupt handlers. The people who wrote the

IEEE standard have done something amazing. They've convinced billion

dollar companies to design special hardware just so expensive

workstations can have "not a number" for output.

In this example the cost of the IEEE's meddling was the wasted time

spent finding the hidden source of failure. In the old days, the

program would just crash, which is a reliable error detection

system. But what about the future? What will Captain Kirk do

when the navigational computer tells him: not a number? ("Lock

the phasers on Evans Hall, Mr. Chekov!")

Regards, Joe Grcar

------------------------------

From: SIAM Publications Department <SIAMPUBS@WILMA.WHARTON.UPENN.EDU>

Date: Tue, 13 Aug 91 11:54 EDT

**Subject: Book on Total Least Squares**

RECENTLY PUBLISHED AT SIAM

THE TOTAL LEAST SQUARES PROBLEM

COMPUTATIONAL ASPECTS AND ANALYSIS

By: Sabine Van Huffel and Joos Vandewalle

Frontiers in Applied Mathematics 9

"...TLS (Total Least Squares) represents a technique which synthesizes

statistical and numerical methodologies for solving problems arising in

many applications areas. The authors of this monograph have been leaders

in showing how to use TLS for solving a variety of problems, especially

those arising in a signal processing context. They give an elegant

presentation of the various aspects of the TLS problem. Their survey

encompasses the many elements required to understand the problem. It is

a pleasure to read such a clear account which is presented using standard

mathematical ideas and nomenclature."

- Gene H. Golub, Department of Computer Science, Stanford University

This is the first book published to be devoted entirely to total least

squares. The authors give a unified presentation of the TLS problem--

a description of its basic principle, the various algebraic, statistical,

and sensitivity properties of the problem are discussed, and

generalizations are presented. Applications are surveyed to facilitate

uses in an even wider range of applications. Whenever possible, comparison

is made with the well-known least squares methods.

Contents

Introduction; Basic Principles of the Total Least Squares Problem;

Extensions of the Basic Total Least Squares Problem; Direct Speed

Improvement of the Total Least Squares Computations; Iterative Speed

Improvement for Solving Slowly Varying Total Least Squares Problems;

Algebraic Connections Between Total Least Squares and Least Squares Problems;

Sensitivity Analysis of Total Least Squares and Least Squares Problems in

the Presence of Errors in all Data; Statistical Properties of the Total

Least Squares Problem; Algebraic Connections Between Total Least Squares

Estimation and Classical Linear Regression in Multicollinearity Problems;

Conclusions.

For further information, please contact SIAM Customer Service Department,

3600 University City Science Center, Philadelphia, PA 19104-2688; telephone:

215-382-9800; fax: 215-386-7999; e-mail: siam@wharton.upenn.edu.

------------------------------

From: Mike Leuze <leuze@aldebaran.EPM.ORNL.GOV>

Date: Tue, 13 Aug 91 10:17:36 EDT

**Subject: Tenth Parallel Circus at Oak Ridge**

FIRST ANNOUNCEMENT

of the

Tenth Parallel Circus

Location: Oak Ridge, Tennessee

Dates: October 25-26, 1991

Continuing the tradition that began at Yale in 1986, the Mathematical

Sciences Section and the Engineering Physics and Mathematics Division

at the Oak Ridge National Laboratory, Oak Ridge, Tennessee will be

hosting the Tenth Parallel Circus in Oak Ridge on Friday and Saturday,

October 25-26, 1991.

The Parallel Circus is an informal meeting which emphasizes parallel

algorithms for scientific computing.

This is the first time that the Circus will be held in the

southeastern part of the USA, and we hope to have many attendees from

the southeastern states, as well as from other parts of the USA and

other countries.

The circus is unique in that it is very informal, and thus allows

attendees to talk about the very latest results as well as interesting

work in progress. In previous meetings there was a very healthy mix

of industrial and academic participants, and there were lots of infor-

mal discussions.

GRADUATE STUDENTS ARE ESPECIALLY WELCOME.

Transportation

We have arranged special conference air fares with Delta Airlines. To

get the special rate and for details, call Delta or have your Travel

Agent call toll free at 1-800-221-1212 (8:00am - 11:00pm eastern time)

and ask for the Special Meetings Network. Please refer to file refer-

ence number J28116.

Accommodation

We have made arrangements for a block of discount rooms at the Comfort

Inn, Oak Ridge, for October 24, 25 and 26. The price per night is

$46.12 for a single room or $53.12 for a double room. To get the

discount rate mention "Parallel Circus" when you call for reserva-

tions.

Presentations

Each presentation will be about 30 minutes long -- 25 minutes plus 5

minutes for questions and discussions. Their actual lengths will

depend on the number of participants that wish to speak.

The circus will begin on Friday morning. Although there is no

prescribed program, we will probably end by early Saturday afternoon.

Participants who give a talk and leave are generally regarded as

anti-social so you should plan to attend all of the talks.

Organizers: Gene Golub, Michael Leuze, Esmond Ng

For further information please contact:

Ms. Joyce Nave

Mathematical Sciences Section

Oak Ridge National Laboratory

P.O. Box 2008, Bldg. 6012

Oak Ridge, Tennessee 37831-6367

Phone: (615) 576-4720

Fax: (615) 574-0680

e-mail: workshop@msr.epm.ornl.gov

------------------------------

From: Will Sawyer <sawyer@bernina.ethz.ch>

Date: Sat, 17 Aug 91 2:17:01 MET DST

**Subject: Ninth Parallel Circus Summary**

Better late but before the Tenth Parallel Circus...

--Will Sawyer

RECAP OF THE NINTH PARALLEL CIRCUS AT UCSB, MARCH 22-23, 1991

What is the latest word on parallel computing in the world of numerical

analysis? The expression "massive parallelization" was the main topic at the

latest Parallel Circus at the UCSB campus on 22 - 23rd March 1991.

The talks were various, casual, some provocative and all too short.

Indeed the two days offered only the opportunity to get acquainted and

for the speakers to give a short summary of their research on parallel

algorithms and issues.

The most provocative talk was given by Alan Karp who turned off the overhead

projectors and stated that, instead of presenting research, he wanted to

start an argument. Alan (IBM Palo Alto), who is on the committee for

proposing IBM's long term strategy for parallel processor software,

succeeded brilliantly by recommending that IBM support only the FX90

standard as parallel language of the future. In particular, he felt that

nearly every problem can be broken down into FX90 structure. What sort of

parallel problems, he queried, could not ultimately be described by its

constructs? The replies were numerous: chaotic iterations, monte-carlo

simulations and hierarchical methods, for example. Undaunted, Alan suggested

ways of bypassing the apparent obstacles.

The majority of the audience remained unconvinced by Alan's arguments. In a

spur-of-the-moment poll, most people expressed the feeling that additional

MIMD languages will be necessary to support the state-of-the-art and future

architectures.

Hardly less interesting was David Silvester's (visiting professor at Stanford's

Institute for Scientific Computing) talk on fast parallel finite element

methods. He spoke of a 160 Mflop code on a 4K processor CM1 -- and not just

the matrix multiplies but the entire code! -- written by students in his

course. They had never even heard of finite elements before the start of

the course... well, barely. How was such performance achieved? Firstly by

sticking to the simplest elements (rectangles) and secondly by careful node

numbering to avoid recurrences.

Everyone was aware of the wonders of cyclic reduction for tridiagonal

matrices. Izzy Nelken (University of Toronto) has applied the concept

to dense matrices. The resulting algorithm is similar (but not identical

to) that resulting from the Schur Complement.

Izzy reported that the complexity was identical to that of Gaussian

elimination and to an equal degree parallelizable. The error was generally

considerable smaller. On the negative side, he conceeded, the optimized

LAPACK routines were always quicker.

Andy Cleary of Sandia National Labs, reported on his sparse Cholesky

decomposition algorithm on a Connection Machine (CM). More precisely,

T

A = L L where A is large, sparse and reordered.

He pointed out that the CM was not conceived as a number cruncher. It was

envisioned for graphical manipulations; the needs of the numerical community

have changed Thinking Machine's thoughts, though. Groups of 32 single-bit

processors form so-called "sprint nodes" for floating-point computations.

The standard algorithm --- Gilbert and Schreiber's Sparse Cholesky ---

is less than optimal: it requires routers to distribute the frontal matrices,

the processors drop out after the system shrinks and there is no use of

symmetry of fronts.

Assistance comes from Kratzer's QR algorithm, which was altered for the

Connection Machine. The CM is configured as a "data-flow" array of sprint

nodes. Columns, not rows are mapped to the nodes. By utilizing communication

in two dimensions and using optimized Sparsepak routines to do the basic

computations he has improved the performance on the CM by an order of

magnitude over Gilbert and Schreiber's algorithm.

UCLA's Tony Chan started his talk with a general philosophical point of

view: the tough problems today are tightly-coupled problems. We are

confronted with a trade off: we try to split up the problem at the risk

of losing its relationship to the original problem.

Elliptic problems illustrate this trade-off. If one marches through the

grid over the multidimensional region in one direction, there is not

enough calculation to keep all of the parallel machine's processors going.

Red-Black ordering results in better parallelization but at a cost of global

accuracy. The multi-front method of van der Vorst, Meurant and Chan--Goovaerts

may be useful on a Cray X/MP with four processors, but not on a Connection

Machine with 64K processors. Finally, Tony spoke about his research in

circulant preconditioners which improve the condition numbers of

ill-conditioned matrices at a minimum of cost.

Myung Kim from UCSB described improvements on triangular linear systems'

solvers on a Hypercube. Triangular solvers, she pointed out, tend to leave

processors unoccupied with time. The trick to improving the backwards

insertion algorithm, based on the work of Coleman, Heath and Li, et al,

is to chop up and reform the matrix so as to keep as many processors busy

as possible. By increasingly clever partitioning of the matrix, efficiencies

of up to 80\% (as compared to 33\% for the naive backward insertion) may be

obtained.

Not only the numerical analysts were represented at the conference.

Mike Warren from Los Alamos National Labs spoke about N-body simulations

on parallel computers. The problem seems trivial: n bodies interact,

e.g. by way of gravitational forces.

For n approximately 10^4 this problem can be dealt with on a Cray. But already

at n approx. 10^5 the classical approach to the problem proves intractable

due to the enormous number of interactions. Clearly, help is needed.

The innovation from Barnes and Hut is the "hierarchical decomposition."

Since interactions of nearer bodies are much more significant than those with

distant ones, local clusters are isolated into groups. The local interactions

can be performed well in parallel, particularly when the groups contain

roughly equal number of points. The major difficulty is subsequent

consideration of the interactions between the groups. A fair estimation

of the error can be found by comparing the results from different

decompositions with that of the "exact" solution from the naive algorithm.

John De Pillis of the University of California at Riverside was

"convinced" to give a talk with no prior preparation. He discussed the

differential equation studied by Chin and Manteuffel, (excuse the LaTeX

notation)

\begin{equation}

-\varepsilon \Delta u + a(x,y)u_x + b(x,y)u_y + c(x,y)u = f \nonumber

\end{equation}

This can be solved by generating two interdependent sequences, one which

tends to zero and the other which tends to the solution, namely

v ====> 0

k

u ====> u where Au = f

k

These result in a "race" between the two sequences, an ideal application

for a dual-processor machine. The relevant question for further research

is indeed, whether more than two competing sequences can be found for this

or other problems.

David Bacon from IBM's TJ Watson Research Center introduced the idea

of "Optimistic Parallelization." If there is a fast, but not necessarily

accurate way to calculate a needed value along with a sure-fire but slow

way, one can attack a problem in the following way:

y = f (x);

Fast

cont || if error(x,y) then

abort "cont";

y = f (x);

Slow

"cont";

fi;

He solicited the audience for suggestions about how this method could be

applied to known problems. While no one had a concrete suggestion, the comment

was made that sometimes the answer from the fast method can be used in the slow

method, for example in incomplete decompositions.

Last, but not least, the computer scientists had a great deal to say about the

newest in parallel computing. Seth Goldstein of U.C. Berkeley discussed

the advent of parallel debuggers for MIMD architectures. The non-deterministic

nature of programs running in parallel arises from external events (interrupts)

and shared-memory access. Value logging, that is code to copy potentially

shared read-data to a log file, can give valuable information for a later

debugging phase. Unfortunately, it makes the code slow and creates large

log files.

The goal of Seth's research is to provide the opportunity for deterministic

replay during the debugging phase, create the smallest possible logs and

not to interfere with any part of the calculation. One way to go about this

is through event logging, namely logging cache traffic, which is deterministic

at the cost of running only one process at a time.

As with any good conference, the social events did not fall short. The

banquet at the "China House" in Goleta provided good food and enabled

good conversation, not only about numerical analysis.

The NEXT CIRCUS will take place on Oct 25, 26 in Oak Ridge, Tenn.

-- Will Sawyer with thanks to Gene Golub

------------------------------

From: SIAM Publications Department <SIAMPUBS@WILMA.WHARTON.UPENN.EDU>

Date: Wed, 14 Aug 91 17:23 EDT

**Subject: Contents: SIAM J. Optimization**

SIAM Journal on Optimization

November 1991 Volume 1, Number 4

CONTENTS

An Auction Algorithm for Shortest Paths

Dimitri Bertsekas

Direct Search Methods on Parallel Machines

J. E. Dennis, Jr. and Virginia Torczon

On the Impact of Automatic Differentiation on the Relative Performance

of Parallel Truncated Newton and Variable Metric Algorithms

L. C. W. Dixon

Parallel Constraint Distribution

M. C. Ferris and O. L. Mangasarian

Parallel Solution of Large-Scale Block-Diagonal Concave Minimization Problems

J. H. Glick, R. S. Maier, and J. B. Rosen

Parallel Genetic Algorithms Applied to the Traveling Salesman Problem

Prasanna Jog, Jung Y. Suh, and Dirk Van Gucht

A General-Purpose Parallel Algorithm for Unconstrained Optimization

Stephen G. Nash and Ariela Sofer

Acceleration and Parallelization of the Path-Following Method for a

Linearly Constrained Quadratic Problem

T. Nesterov and A. Nemirovsky

Orderings for Conjugate Gradient Preconditioning

James M. Ortega

An Interior Point Method for Block Angular Optimization

Gary L. Schultz and Robert R. Meyer

On the Rate of Convergence of a Partially Asynchronous Gradient Projection

Algorithm

Paul Tseng

Partitioned Dynamic Programming for Optimal Control

Stephen J. Wright

On the Fine-Grain Decomposition of Multicommodity Transportation Problems

Stavros A. Zenios

------------------------------

From: SIAM Publications Department <SIAMPUBS@WILMA.WHARTON.UPENN.EDU>

Date: Fri, 16 Aug 91 10:30 EDT

**Subject: Contents: SIAM J. Computing**

SIAM JOURNAL ON COMPUTING

DECEMBER 1991, Volume 20, Number 6

CREW PRAMs and Decision Trees

Noam Nisan

On the Complexity of String Matching: Lower Bounds

Zvi Galil and Raffaele Giancarlo

Locality in Distributed Graph Algorithms

Nathan Linial

Self-P-Printability and Polynomial Time Turing Equivalence to a Tally Set

Roy S. Rubinstein

New Upper Bounds in Klee's Measure Problem

Mark H. Overmars and Chee-Keng Yap

An Optimal Randomized Parallel Algorithm for Finding Connected Components

in a Graph

Hillel Gazit

Asymptotically Fast Triangularization of Matrices over Rings

James L. Hafner and Kevin S. McCurley

Noninteractive Zero Knowledge

Manuel Blum, Alfredo De Santis, Silvio Micali, and Guiseppe Persiano

Elimination of Infrequent Variables Improves Average Case Performance of

Satisfiability Algorithms

John Franco

On Sets with Efficient Implicit Membership Tests

Lane A. Hemachandra and Albrecht Hoene

Parallel Tree Contraction, Part II: Further Applications

Gary L. Miller and John H. Reif

An Almost Linear-Time Algorithm for the Dense Subset-Sum Problem

Zvi Galil and Oded Margalit

------------------------------

End of NA Digest

**************************

-------