**Today's Topics:**

- Change of Address for Arthur Wouk
- Minimization Subroutines Wanted
- A Matrix Equation
- Dictionary Covering Numerical Analysis Sought
- Scaling, Underflow, and Rank Deficiency
- Fortran 8X, C++, or What?
- ACM International Conference on Supercomputing

From: Arthur Wouk <wouk@cs.duke.edu>

Date: Mon, 30 Apr 90 10:24:14 EDT

Effective May 15 I will be moving from Durham to Boulder Colorado.

My new address will be

Arthur Wouk

3849 Birchwood Drive

Boulder CO 80301

My new net address will be

wouk@alumni.colorado.edu

I remain fully retired.

------------------------------

From: Pat Nolan <nolan@meggie.stanford.edu>

Date: 27 Apr 90 00:19:03 GMT

I am looking for a nonlinear function-minimizing subroutine in

Fortran that works without derivatives. "Numerical Recipes"

recommends Powell's conjugate gradient method, and offers a

subroutine ready to use. However, it also says that Brent's

implementation of the method (too long to print in the book) is

better. Is Brent's code available anywhere on the network?

I've already checked netlib -- if it's there I missed it.

For the experts: I there anything better than Brent/Powell for

the general case? (That is, not least-squares, no derivatives,

nonlinear, < 20 dimensions.)

Patrick Nolan * Bitnet: PLN@SLACVM

Hansen Laboratories * Internet: nolan@meggie.stanford.edu

Stanford University * (36.64.0.180)

------------------------------

From: Farid Alizadeh <alizadeh@cs.umn.edu>

Date: 3 May 90 02:17:56 GMT

In the course of an optimization problem I have come across the

following problem (all matrices are symmetric):

Solve the following matrix equation:

Y + B + A*X*A = 0,

where A and B are known symmetric n x n matrices, and X and Y are unknown

matrices with the following property: given i,j entry exactly one of X[i,j] or

Y[i,j] is a variable and exactly one of them is 0. For example:

[a 0 b] [1 2 -1] [3 5 -2] [0 u 0] [3 5 -2]

[0 c 0] + [2 1 3] + [5 1 0] * [u 0 v] * [5 1 0] = 0

[b 0 d] [-1 3 2] [-2 0 3] [0 v 0] [-2 0 3]

Y + B + A * X * A = 0

Now, it is obvious that this problem is equivalent to a system of m linear

equations in m unknowns where m= n(n+1)/2, and can be solved in

O(m^3)=O(n^6) operations. My question is can we do better, say in

O(n^3) or O(m^2)=O(n^4)? Notice that in the special case where Y=0, i.e

all the variables are in X, we can solve the problem in O(n^3) by multiplying

>From left and right by inverse of A. I don't see how to extend this to the

more general problem. Is anyone aware of any method, or have any

suggestions? Iterative methods including methods that require computation

of eigenvalues and/or eigenvectors of A (and/or B) is acceptable, as long

as we get a significant improvement over O(n^6) flop count. Alternatively,

can someone put forward a plausible argument that O(m^3) is the best

I can hope for and the special structure of the problem is not special

enough to lend itself to a more clever solution?

Farid Alizadeh

Computer Science Dept.

University of Minnesota

Minneapolis, Mn 55455

e-mail alizadeh@cs.umn.edu

------------------------------

From: John Pryce <PRYCE%rmcs.cranfield.ac.uk@NSFnet-Relay.AC.UK>

Date: Thu, 3 MAY 90 15:17:03 BST

Query from John Pryce (na.pryce), Maths Group, Royal Military College of

Science, Shrivenham, Swindon, UK.

For a Knowledge-Based System project at NAG Ltd we would very much like to get

hold of a mathematical dictionary

- With a solid coverage of numerical analysis terms and concepts at a level

likely to be useful to users of a NAG help system.

- If in machine readable form (TeX) so much the better.

Does anyone know of such a work?

Replies to me at the above address.

------------------------------

From: Cleve Moler <moler@mathworks.com>

Date: Sun May 6 12:40:15 PDT 1990

Scaling, Underflow, and Rank Deficiency

by Cleve Moler

The MathWorks, Inc.

Several of the routines in LINPACK, and its derivatives like MATLAB,

have occasion to scale a column of a matrix. Let x be such a

column and let s be some norm of x . The algorithms involve

replacing x by x/s . In the various LU factorizations, s is

the largest element of x and x/s becomes a column of L .

In the QR and SVD factorizations, s is the length of x and

x/s generates a Householder reflection. LINPACK calls upon one

of the BLAS, DSCAL, to do the scaling. Unfortunately, DSCAL

multiplies a vector by a scalar because that is usually faster than

dividing by a scalar. So instead of the replacement

x = x/s

LINPACK computes

x = (1/s)*x

A difficulty occurs when all the elements of x are near the

underflow limit of some of the floating point representations.

Then s is nonzero and x/s has some elements of reasonable size,

but LINPACK fails because the temporary scalar 1/s overflows.

We knew about such difficulties when we were developing LINPACK,

but we felt they would be so rare that we didn't have to worry.

The floating point representations in common use at the time

(late '70s) had very few, if any, values whose reciprocals would

overflow. Moreover, you would have to have a really perverse

matrix to get near such values anyway.

I still think that judgement was correct. LINPACK has proved to

be very robust and I, for one, have never heard complaints from

LINPACK users about this overflow difficulty. But two recent

developments have prompted me to change my position slightly.

The first development is the emergence of the IEEE floating point

standard. IEEE floating point has "many" nonzero values whose

reciprocals overflow, namely all the denormals in the range

from 2^(-1024) to 2^(-1074), which is roughly 5.0E-309 to 5.0E-324.

(The situation is probably even worse with IEEE short precision, but

I never use it.) Some machines with IEEE floating point format

circumvent the standard by setting these tiny values to zero. But most

machines based on Intel and Motorola chips, including most PC's, Mac's

and Sun's, handle them "correctly".

The second development is our appreciation of a class of problems

that are not strangely scaled, but which can provoke this difficulty.

The simplest example is to compute the QR or SVD factorizations of

the n-by-n matrix all of whose elements are one,

X = ones(n,n)

Other examples include x(i,j) = i or x(i,j) = i+j . The point

is these matrices are highly rank deficient and their elements

are related to each other by very simple formulas. In the first

step of the QR factorization of ones(n,n), for example, the vector

x is ones(n,1) and s = sqrt(n). The reduced matrix, X(2:n,2:n),

should be flat zero, but, if n is not a perfect square, roundoff

error will probably cause the reduced matrix to again be a constant,

nonzero matrix, but now the constant is a small integer multiple of

eps , the relative machine precision. For IEEE format, eps = 2^(-52) .

The process is repeated. The second step of the reduction

produces a constant matrix with elements on the order of eps^2,

the third gives elements of order eps^3 , and so on. It may

happen that exact cancellation produces an exact zero reduced matrix

and the subsequent steps can be skipped. But on some machines,

for some values of n , the process continues until the

range of denormals and underflow is approached. The solution

to the equation

eps^k = denormal

is k = 1022/52 ~= 20 so on IEEE machines at least 20 steps

are required.

If you have access to MATLAB, you might want to try this:

format compact

format short e

for n = 10:50

home

n

X = ones(n,n);

r = diag(qr(X))'

pause

end

If you don't have ready access to MATLAB, link this VMS Fortran

program with the LINPACK library.

parameter (LDX = 50)

double precision X(LDX,LDX),qraux(LDX),work(LDX)

integer jpvt(LDX)

c

do n = 1, LDX

do j = 1, n

do i = 1, n

X(i,j) = 1.0d0

enddo

enddo

c

job = 0

call DQRDC(X,ldx,n,n,qraux,jpvt,work,job)

c

write(6,'(5d15.6)') (X(i,i),i=1,n)

enddo

end

If the machine you use does not have IEEE arithmetic, or only

has IEEE format without denormals, you will probably not encounter

any difficulties, although conceivably you could get overflows

in either MATLAB's or LINPACK's DQRDC routine.

If you use a machine with full IEEE arithmetic, including denormals,

you will very likely see NaN's show up in the output for various

values of n larger than 20, including n = 45 on all the machines

of this type that I have tried so far.

The implications for SVD are worse than just getting NaN's in the

output -- the machine appears to "hang" for a while with DSVDC in

a loop trying to get the NaN's to converge to something negligible.

You can either wait for the iteration limit to be reached, which can

take a while, or face the consequences of breaking out of a tight loop

in the floating point unit.

As one example, here is what I get on my Sun SparcStation1

with n = 24 and X = ones(n,n):

diag(qr(X)) svd(X)

-4.8990e+00 2.4000e+01

2.1298e-15 2.0849e-15

-1.3875e-30 2.8902e-45

6.0202e-46 3.6272e-76

-8.6969e-62 1.0219e-106

7.5288e-77 1.9146e-137

-6.5086e-92 1.6693e-167

2.8090e-107 2.6651e-198

-1.2102e-122 1.4180e-244

2.6018e-138 0

0 0

0 0

... ...

0 0

0 0

We haven't got NaN's or nonconvergence yet, but you can see that

we're headed for trouble. I get nonconvergence in SVD when n = 28.

My first "fix" for this involved an attempt to detect, in a

machine-independent, portable (without using DMACH) manner, when

we were near the dangerous range. Then set the vector to zero

and pretend that this was another roundoff error. In principle

this should be done in each LINPACK routine which calls DSCAL with

the reciprocal of some norm. It wasn't pretty.

The correct solution is to create another Basic Linear Algebra

Subroutine, DSCALD, which scales by division, rather than by

multiplication. Every place DSCAL is now called with 1.0/s,

call DSCALD with s instead.

The execution time of DSCALD is rarely significant; it is never the

inner loop. But if you wanted to speed it up, you could do the following.

Let

delta = 2^(-64)

This is not a machine dependent constant, but it is vaguely related

to the roundoff levels of all the machines we might be dealing with.

(We might have to revise the value of delta in another 10 years.)

For all the computers I know about:

if delta*s does not underflow, then 1/s will not overflow.

So the following algorithm for DSCALD is both safe and reasonably

efficient.

function y = dscald(x,s)

vector x,y

scalar s

constant delta = 2^(-64)

if delta*s == 0

y = x/s

else

y = (1/s)*x

end

I do not propose that all the LINPACK libraries in the world be

changed to use DSCALD. As far as I know, this difficulty only

occurs for a few very special matrices made up to show it can happen.

But it is something for other users of the BLAS and future library

developers to keep in mind. And we will incorporate it in future

releases of MATLAB.

------------------------------

From: Baker Kearfott <rbk5287@usl.edu>

Date: Sun, 6 May 90 13:13:21 CDT

Without stirring up too much of a political controversy, I would like to

gather opinions concerning the future of the programming languages we

use.

In particular, I have been using Fortran 77 for a long time. My primary

consideration has been portability and universality. Indeed, I have

seen several languages touted as infinitely superior come and go.

However, at this point I find myself in need of operator overloading, of

structures, of recursion, etc. We have a code in which the arithmetic

is in software, and in which we wish to try various arithmetic schemes.

This code also makes use of linked lists and of stacks. While

maintaining and improving the code, the objects we wish to associate

with elements of the lists and stacks change; these objects are also of

various data types. Also, we are starting to consider structured

systems (sparse and otherwise) within this framework. Such changes and

experimentation would require changing of subroutine argument lists in a

fairly deep calling heirarchy, and a consequent proliferation of code

versions, even though the basic algorithm has not changed. Furthermore,

the lack of operator overloading severely limits the ease with which we

can work with different test problems.

Much of the facility we need exists within specialized systems (which

just run on one machine, or which have only one of the capabilities we

need). However, I place a high premium on portability and the ability

to publish and distribute the actual code.

Fortran 8X seems to be crystallizing at this point. However, it may be

several years until reasonable compilers become available. Also, such

compilers probably would cost $$$ per machine, and money has been in

very short supply here.

Alternately, more and more people seem to be using C++. Furthermore, it

seems that AT&T is very strongly encouraging it through various means

(including distribution at a nominal charge, and availability of a

Fortran to C converter). It seems that we could have C++ available on

most of our machines by this summer.

Do people think we could go wrong by switching to C++? Our goals are to

make our research easier, while maintaining (near) universal access to

our codes.

Baker Kearfott

rbk@usl.edu (Internet)

------------------------------

From: E. Gallopoulos <stratis@s6.csrd.uiuc.edu>

Date: Wed, 2 May 90 02:33:17 CDT

1990 ACM International Conference on Supercomputing

June 11-15, Amsterdam, the Netherlands

Vrije Universiteit and Centrum voor Wiskunde en Informatica

Sponsored by ACM SIGARCH

in association with ACM-SIGNUM, AICA, BCS-PPG, CTI, CSRD,

CWI, GI, INRIA, IPSJ, SBMAC, and SIAM-SIAG

CONFERENCE CO-CHAIRMEN

Ahmed Sameh, University of Illinois, USA

and

Henk van der Vorst, Delft University of Technology CWI,

the Netherlands

PROGRAM DIRECTOR

John R. Sopka, Digital Equipment Corporation, USA

PROGRAM COMMITTEES

AMERICA: E. Gallopoulos (Chair), F. Darema, E. Davidson,

G. Fox, D. Gannon, M. Heath, J. Hennessy, E. Houstis,

J. McGraw, Y. Patt, C. Polychronopoulos, J. Rice, Y. Saad,

J. E. Smith, H. Wijshoff.

EUROPE-AFRICA J. R. Gurd (Chair), I. S. Duff, W. K. Giloi,

F. Hossfeld, W. Jalby, P. Leca, T. S. Papatheodorou,

R. H. Perrot, P. Sguazerro, H. v. d. Vorst, J. v. Leeuwen,

M. Vanneschi.

JAPAN-FAR EAST Y. Muraoka (Chair), K. Asai, S. Ono, M. Ishii,

T. Kamimura, M. Koike, S. Nagashima, T. Shimada, H. Tanaka,

H. Terada.

ORGANIZING COMMITTEE D. DeGroot, H. te Riele,

M. Louter-Nool, F. Snijders, L. Verdonk.

INVITED LECTURES:

Burton Smith, "The Tera Computer System"

Tony Chan, "Parallel Multilevel Algorithms for PDE's"

William Wulf, "The Collaboratory: A Larger Context for Support

of Computational Science"

Wolfgang Fichtner, "Iterative Methods and Supercomputers for

VLSI Device Simulation"

Toshitsugu Yuba, "Dataflow Computer Development in Japan"

Ahmed Noor, "Strategies for Large-Scale Structural Problems on

High Performance Computers"

Piet van der Houwen, "Parallel ODE Solvers"

Anthony Hey, "Supercomputing with transputers-past,

present and future"

This is the fourth year of the ACM International Conference on Supercomputing.

The conference will be held at Vrije Universiteit. Conference Proceedings

will be published by ACM. In addition to the invited and contributed papers,

there will also be demonstrations of projects on Software Environments

for Scientific Computing, accompanied by panel discussions, and vendor exhibits.

The conference will also host the award of the first annual

Bell-Perfect Prize for high performance scientific computing.

SOCIAL EVENTS:

Sunday, 10 June: Reception and informal get together. Admission

free for participants.

Tuesday, 12 June: Conference banquet, to be served during a

boat trip through the Amsterdam canals and on the river Zaan to the

Zaanse Schans. Costs: Dfl. 80.-- per person.

Wednesday, 13 June: Excursion to either the Kroller-Muller Museum,

where part (i.e. the drawings) of the great Van Gogh 1990 exhibition

can be admired or to the Delta works.

Costs: Dfl. 60.-- per person.

Thursday, 14 June: Reception in a typically Dutch museum. Admission

free for participants.

Friday, 15 June: (A limited number of) Tickets are available for the

other part of the great Van Gogh 1990 exhibition (the Van Gogh

paintings) in the Van Gogh Museum in Amsterdam. Price per ticket:

Dfl. 20.--.

More information, including registration forms and a detailed

program, can be obtained from

Mr. Frans Snijders

CWI

P.O. Box 4079, Kruislaan 413

1009 AB Amsterdam, The Netherlands

fax: +31-20-5924199

email: franss@cwi.nl

------------------------------

End of NA Digest

**************************

-------