ScaLAPACK routines return four types of floating-point output arguments:
First, consider scalars. Let the scalar be an approximation of the true answer . We can measure the difference between and either by the absolute error , or, if is nonzero, by the relative error . Alternatively, it is sometimes more convenient to use instead of the standard expression for relative error. If the relative error of is, say, , we say that is accurate to 5 decimal digits.
To measure the error in vectors, we need to measure the size or norm of a vector x . A popular norm is the magnitude of the largest component, , which we denote by . This is read the infinity norm of x. See Table 6.2 for a summary of norms.
Table 6.2: Vector and matrix norms
If is an approximation to the exact vector x, we will refer to as the absolute error in (where p is one of the values in Table 6.2) and refer to as the relative error in (assuming ). As with scalars, we will sometimes use for the relative error. As above, if the relative error of is, say , we say that is accurate to 5 decimal digits. The following example illustrates these ideas.
Thus, we would say that approximates x to 2
decimal digits.
Errors in matrices may also be measured with norms .
The most obvious
generalization of to matrices would appear to be
, but this does not have certain
important mathematical properties that make deriving error bounds
convenient.
Instead, we will use
,
where A is an m-by-n matrix, or
;
see Table 6.2 for other matrix norms.
As before, is the absolute
error
in ,
is the relative error
in , and a relative error in of
means is accurate to 5 decimal digits.
The following example illustrates these ideas.
so is accurate to 1 decimal digit.
We now introduce some related notation we will use in our error bounds.
The condition number of a matrix A is defined as
, where A
is square and invertible, and p is or one of the other
possibilities in Table 6.2. The condition number
measures how sensitive is to changes in A; the larger
the condition number, the more sensitive is . For example,
for the same A as in the last example,
ScaLAPACK
error estimation routines typically compute a variable called
RCOND , which is the reciprocal of the condition number (or an
approximation of the reciprocal). The reciprocal of the condition
number is used instead of the condition number itself in order
to avoid the possibility of overflow when the condition number is very large.
Also, some of our error bounds will use the vector of absolute values
of x, |x| (), or similarly |A|
().
Now we consider errors in subspaces. Subspaces are the
outputs of routines that compute eigenvectors and invariant
subspaces of matrices. We need a careful definition
of error in these cases for the following reason. The nonzero vector x is called a
(right) eigenvector of the matrix A with eigenvalue
if . From this definition, we see that
-x, 2x, or any other nonzero multiple of x is also an
eigenvector. In other words, eigenvectors are not unique. This
means we cannot measure the difference between two supposed eigenvectors
and x by computing , because this may
be large while is small or even zero for
some . This is true
even if we normalize x so that , since both
x and -x can be normalized simultaneously. Hence, to define
error in a useful way, we need instead to consider the set of
all scalar multiples of
x. The set is
called the subspace spanned by x and is uniquely determined
by any nonzero member of . We will measure the difference
between two such sets by the acute angle between them.
Suppose is spanned by and
is spanned by . Then the acute angle between
and is defined as
One can show that does not change when either
or x is multiplied by any nonzero scalar. For example, if
as above, then for any
nonzero scalars and .
Let us consider another way to interpret the angle between and
.
Suppose is a unit vector ().
Then there is a scalar such that
The approximation holds when is much less than 1
(less than .1 will do nicely). If is an approximate
eigenvector with error bound ,
where x is a true eigenvector, there is another true eigenvector
satisfying .
For example, if
then for
.
Finally, many of our error bounds will contain a factor p(n) (or p(m,n)), which grows as a function of matrix dimension n (or dimensions m and n). It represents a potentially different function for each problem. In practice, the true errors usually grow at most linearly; using p(n)=1 in the error bound formulas will often give a reasonable estimate; p(n)=n is more conservative. Therefore, we will refer to p(n) as a ``modestly growing'' function of n; however. it can occasionally be much larger. For simplicity, the error bounds computed by the code fragments in the following sections will use p(n)=1. This means these computed error bounds may occasionally slightly underestimate the true error. For this reason we refer to these computed error bounds as ``approximate error bounds.''
Further Details: How to Measure Errors
The relative error in the approximation
of the true solution has a drawback: it often cannot
be computed directly, because it depends on the unknown quantity
. However, we can often instead estimate
, since is
known (it is the output of our algorithm). Fortunately, these two
quantities are necessarily close together, provided either one is small,
which is the only time they provide a useful bound anyway. For example,
implies
so they can be used interchangeably.
Table 6.2 contains a variety of norms we will use to measure errors. These norms have the properties that , and , where p is one of 1, 2, , and F. These properties are useful for deriving error bounds.
An error bound that uses a given norm may be changed into an error bound that uses another norm. This is accomplished by multiplying the first error bound by an appropriate function of the problem dimension. Table 6.3 gives the factors such that , where n is the dimension of x.
Table 6.3: Bounding one vector norm in terms of another
Table 6.4 gives the factors such that , where A is m-by-n.
Table 6.4: Bounding one matrix norm in terms of another
The two-norm of A, , is also called the spectral norm of A and is equal to the largest singular value of A. We shall also need to refer to the smallest singular value of A; its value can be defined in a similar way to the definition of the two-norm in Table 6.2, namely, as when A has at least as many rows as columns, and defined as when A has more columns than rows. The two-norm, Frobenius norm , and singular values of a matrix do not change if the matrix is multiplied by a real orthogonal (or complex unitary) matrix.
Now we define subspaces spanned by more than one vector, and angles between subspaces. Given a set of k n-dimensional vectors , they determine a subspace consisting of all their possible linear combinations , scalars . We also say that spans . The difficulty in measuring the difference between subspaces is that the sets of vectors spanning them are not unique. For example, , and all determine the same subspace. Therefore, we cannot simply compare the subspaces spanned by and by comparing each to . Instead, we will measure the angle between the subspaces, which is independent of the spanning set of vectors. Suppose subspace is spanned by and that subspace is spanned by . If k=1, we instead write more simply and . When k=1, we define the angle between and as the acute angle between and x. When k>1, we define the acute angle between and as the largest acute angle between any vector in and the closest vector x in to :
ScaLAPACK routines that compute subspaces return
vectors spanning a subspace
that are orthonormal. This means the
n-by-k matrix
satisfies . Suppose also that
the vectors spanning
are orthonormal, so also
satisfies .
Then there is a simple expression for the angle between
and :
For example, if
then .
As stated above, all our bounds will contain a factor p(n) (or p(m,n)), which measures how roundoff errors can grow as a function of matrix dimension n (or m and m). In practice, the true error usually grows just linearly with n, or even slower, but we can generally prove only much weaker bounds of the form . This is because we cannot rule out the extremely unlikely possibility of rounding errors all adding together instead of canceling on average. Using would give pessimistic and unrealistic bounds, especially for large n, so we content ourselves with describing p(n) as a ``modestly growing'' polynomial function of n. Using p(n)=1 in the error bound formulas will often give a reasonable error estimate. For detailed derivations of various p(n), see [71, 114, 84, 38].
There is also one situation where p(n) can grow as large as : Gaussian elimination. This typically occurs only on specially constructed matrices presented in numerical analysis courses [114, p. 212,]. However, the expert driver for solving linear systems, PxGESVX , provides error bounds incorporating p(n), and so this rare possibility can be detected.