Next: Algorithmic Integrity Up: LAPACK Working Note 112: Previous: Communicating Floating Point Values

Considerations Due to Poor Arithmetic

As we expand the ScaLAPACK test suite to encompass more rigorous testing, particularly for floating point values close to the edge of representable numbers (as is present in the LAPACK test suite), we are reminded of additional dangers which must be avoided in floating point arithmetic. For example, it is a sad reflection that some compilers still do not implement complex arithmetic carefully. In particular, unscaled complex division still occurs on certain architectures, leading to unnecessary overflow. To handle this difficulty ScaLAPACK, as LAPACK, restricts the range of representable numbers by a call to routine PDLABAD (in double precision), the equivalent of the LAPACK routine DLABAD, which, if desired, takes the square root of the smallest and largest representable numbers for the computation to protect from unnecessary underflow or overflow. PDLABAD calls DLABAD locally on each process and then communicates the minimum and maximum value respectively. Arguably we should have separate routines for real and complex arithmetic, but since we hope that the need for DLABAD will eventually disappear we have so far resisted taking that step.

This is particularly irritating if one machine in a network is causing us to impose unnecessary restrictions on all the machines in the network, but without this, catastrophic results can occur during computations near the overflow or underflow thresholds.

Another problem that we have encountered during testing is in the way that subnormal (denormalized) numbers are handled on certain (near) IEEE architectures. By default, some architectures flush subnormal numbers to zero. Thus, if the computation involves numbers near underflow and a subnormal number is communicated to such a machine, the computational results may be invalid and the subsequent behavior unpredictable. Often such machines have a compiler switch to allow the handling of subnormal numbers, but it can be non-obvious and we cannot guarantee that users will use such a switch.

This behavior occurred during the heterogeneous testing of the linear least squares routines when the input test matrix was a full-rank matrix scaled near underflow. During the course of the computation a subnormal number was communicated, this value was unrecognized on receipt, and a floating point exception was flagged. The execution on the processor was killed, subsequently causing the execution on the other processors to hang. As we expand the test suite we expect to discover such behavior in other parts of ScaLAPACK, since we do not believe that there was anything special about the least squares routines.

A solution would be to replace subnormal numbers either with zero, or with the nearest normal number, but we are somewhat reluctant to implement this within ScaLAPACK, since this does not seem to be the right software level at which to do this.

A simple example program to illustrate this problem is given in Appendix A.

Next: Algorithmic Integrity Up: LAPACK Working Note 112: Previous: Communicating Floating Point Values

Jack Dongarra
Fri Aug 30 15:13:52 EDT 1996