The effects of two sources of error can be measured by the bounds in this chapter: roundoff error and input error. Roundoff error arises from rounding results of floating-point operations during the algorithm. Input error is error in the input to the algorithm from prior calculations or measurements. We describe roundoff error first, and then input error.
Almost all the error bounds ScaLAPACK provides are multiples of
relative machine precision,
which we abbreviate
by . Relative machine precision (epsilon) bounds the roundoff in
individual floating-point operations.
It may be loosely defined as the largest relative error
in any floating-point operation that neither overflows nor underflows.
(Overflow means the result is too
large to represent accurately, and underflow means the result is too
small to represent accurately.) Relative machine precision (epsilon) is
available either by
the function call
PSLAMCH(ICTXT, 'Epsilon') (or simply
PSLAMCH(ICTXT, 'E'))
in single
precision, or
by the function call PDLAMCH(ICTXT, 'Epsilon') (or PDLAMCH(ICTXT, 'E'))
in double precision.
See section 6.1 and
Table 6.1 for a discussion of common values of machine
epsilon.
PDLAMCH(ICTXT,`E') returns a single value for the selected machine parameter `E' on all processes within the context ICTXT. If these processes are running on a network of heterogeneous processors , with different floating-point arithmetics, then a ``safe'' common value is returned, the maximum value of machine epsilon for all the processors.
In case of overflow,
there are two common system responses:
stopping with an error message, or returning and
continuing to compute. The latter is the default response of IEEE
standard floating-point arithmetic [7, 8]
,
the most commonly used arithmetic. It is possible to
change this default to abort with an error message, which is
often useful for debugging.
In contrast to LAPACK, ScaLAPACK can take advantage of arithmetic
with
to accelerate the routines that compute eigenvalues
of symmetric matrices using
PxLAIECT
(the drivers PxSYEVX and PxSYGVX, and their complex counterparts).
PxLAIECT comes in two different versions, one in which arithmetic with
is available (the default) and one in which it is not.
When
is available, the inner loop of PxLAIECT is
accelerated by removing a branch to test for and avoid division by zero.
This speed advantage is realized only when arithmetic with
is as fast as arithmetic with normalized floating-point numbers; this
is usually but not always the case [42].
The compile time flag NO_IEEE can be used during installation to run
without using
; see the ScaLAPACK Installation Guide
for details [24].
Since underflow is almost always less significant than roundoff, we will not consider it further in this section (but see section 6.1).
Bounds on input errors
may be easily incorporated
into most ScaLAPACK error bounds.
Suppose the input data is accurate to, say, five decimal digits
(we discuss exactly what
this means in section 6.3). Then one simply replaces
by
in the error bounds.
Further Details: Floating-Point Arithmetic
Roundoff error is bounded in terms of the relative machine precision
,
which is the smallest value satisfying
where a and b are floating-point numbers ,
is any one of the four operations +, -,
, and
,
and
is the floating-point result of
.
Relative machine precision,
, is the smallest value for which this inequality
is true for all
, and for all a and b such that
is neither too large (magnitude exceeds the overflow
threshold)
nor too small
(is nonzero with magnitude less than the underflow threshold)
to be represented accurately in the machine.
We also assume
bounds the relative error in unary
operations such as square root:
A precise characterization of depends on the details of the
machine arithmetic and sometimes even of the compiler.
For example, if addition and
subtraction are implemented without a guard digit,
we must redefine
to be the smallest number
such that
In order to assure portability , machine parameters such as relative machine precision (epsilon), the overflow threshold and underflow threshold are computed at runtime by the auxiliary routine PxLAMCH. The alternative, keeping a fixed table of machine parameter values, would degrade portability because the table would have to be changed when moving from one machine or combination of machines, or even one compiler, to another.
Most machines (but not yet all) do have the same machine
parameters because they implement IEEE Standard Floating Point Arithmetic
[7, 8], which exactly specifies floating-point number
representations and operations. For
these machines, including all modern workstations and
PCs,
the values of these parameters are given in
Table 6.1.
Unfortunately, machines claiming to implement IEEE arithmetic may still compute different results from the same program and input. Here are some examples. Intel processors have 80-bit floating-point registers, and the fastest way to use them is to evaluate all results to 80-bit accuracy until they are stored back to memory in 32-bit or 64-bit format. The IBM RS/6000 has a fused multiply-add instruction that evaluates a+b*c with one rounding error instead of two. The DEC Alpha's default (fast) mode is to flush underflowed values to zero instead of returning subnormal numbers, which is the default demanded by the IEEE standard; in this mode the DEC Alpha aborts if it encounters a subnormal number. In all these cases machines may be made to operate absolutely identically, for example, by rounding all intermediate results back to single or double on an Intel machine, or by doing subnormal arithmetic carefully and slowly on a DEC Alpha. These heterogeneities lead to errors encountered only in parallel computing; see section 6.2 for further discussion.
Table 6.1: Values of machine parameters in IEEE floating-point arithmetic
As stated above, we will ignore underflow in discussing error bounds. Reference [37] discusses extending error bounds to include underflow and shows that for many common computations, when underflow occurs, it is less significant than roundoff.
Overflow historically resulted in an error message and
stopped execution,
in which case our error bounds would not apply.
But with IEEE floating-point arithmetic,
the default is that
overflow returns and execution continues.
Indeed, with IEEE arithmetic machines can continue to compute
past overflows, even division by zero, square roots of negative
numbers, etc., by producing
and NaN
(``Not a Number'') symbols according to special rules of arithmetic.
The default on many systems is to continue computing with these
symbols. Routine PxLAIECT exploits this arithmetic to accelerate
the computations of eigenvalues, as discussed above.
It is also possible to stop with an error message when overflow occurs,
a feature that is often useful for debugging.
The user should consult the system manual to see how to turn
error messages on or off.
Most of our error bounds will simply be proportional
to relative machine precision (epsilon). This means, for example, that if the
same problem in solved in double precision and single precision,
the error bound in double precision will be smaller than the
error bound in single precision by a factor
of .
In IEEE arithmetic, this
ratio is
,
meaning that one expects the double-precision answer
to have approximately nine more decimal digits correct
than the single-precision answer.
Like their counterparts in LAPACK. ScaLAPACK routines are generally insensitive to the details of rounding, provided all processes perform arithmetic identically. The one exception is PxLAIECT, as mentioned above. The next section discusses what can happen when processes do not perform arithmetic identically, that is, are heterogeneous.