next up previous contents index
Next: Further Details: How to Up: Accuracy and Stability Previous: Further Details: Floating Point   Contents   Index


How to Measure Errors

LAPACK routines return four types of floating-point output arguments:

This section provides measures for errors in these quantities, which we need in order to express error bounds.

First consider scalars. Let the scalar $\hat{\alpha}$ be an approximation of the true answer $\alpha$. We can measure the difference between $\alpha$ and $\hat{\alpha}$ either by the absolute error $\vert \hat{\alpha} - \alpha \vert$, or, if $\alpha$ is nonzero, by the relative error $\vert \hat{\alpha} - \alpha \vert / \vert \alpha \vert$. Alternatively, it is sometimes more convenient to use $\vert \hat{\alpha} - \alpha \vert / \vert \hat{\alpha} \vert$ instead of the standard expression for relative error (see section 4.2.1). If the relative error of $\hat{\alpha}$ is, say 10-5, then we say that $\hat{\alpha}$ is accurate to 5 decimal digits.

In order to measure the error in vectors, we need to measure the size or norm of a vector x. A popular norm is the magnitude of the largest component, $\max_{1 \leq i \leq n} \vert x_i\vert$, which we denote $\Vert x \Vert _{\infty}$. This is read the infinity norm of x. See Table 4.2 for a summary of norms.


Table 4.2: Vector and matrix norms
  Vector Matrix
one-norm $\Vert x\Vert _{1} = \sum_i \vert x_i\vert$ $\Vert A\Vert _{1} = \max_j \sum_i \vert a_{ij}\vert$
two-norm $\Vert x\Vert _2 = ( \sum_i \vert x_i\vert^2 )^{1/2}$ $\Vert A\Vert _2 = \max_{x \neq 0} \Vert Ax\Vert _2 / \Vert x\Vert _2$
Frobenius norm |x|F = |x|2 $\Vert A\Vert _F = ( \sum_{ij} \vert a_{ij}\vert^2 )^{1/2}$
infinity-norm $\Vert x\Vert _{\infty} = \max_i \vert x_i\vert$ $\Vert A\Vert _{\infty} = \max_i \sum_j \vert a_{ij}\vert$

If $\hat{x}$ is an approximation to the exact vector x, we will refer to $\Vert \hat{x} - x \Vert _{p}$ as the absolute error in $\hat{x}$ (where p is one of the values in Table 4.2), and refer to $\Vert \hat{x} - x \Vert _{p} / \Vert x \Vert _{p}$ as the relative error in $\hat{x}$ (assuming $\Vert x \Vert _{p} \neq 0$). As with scalars, we will sometimes use $\Vert \hat{x} - x \Vert _{p} / \Vert \hat{x} \Vert _{p}$ for the relative error. As above, if the relative error of $\hat{x}$ is, say 10-5, then we say that $\hat{x}$ is accurate to 5 decimal digits. The following example illustrates these ideas:


\begin{displaymath}
x = \left( \begin{array}{c} 1 \\ 100 \\ 9 \end{array} \right...
...= \left( \begin{array}{c} 1.1 \\ 99 \\ 11 \end{array} \right)
\end{displaymath}


\begin{eqnarray*}
\Vert \hat{x} - x \Vert _{\infty} = 2 \; , \; & \displaystyle{...
...{\Vert \hat{x} - x \Vert _{1}}{\Vert \hat{x} \Vert _{1}} = .0279
\end{eqnarray*}


Thus, we would say that $\hat{x}$ approximates x to 2 decimal digits.

Errors in matrices may also be measured with norms. The most obvious generalization of $\Vert x \Vert _{\infty}$ to matrices would appear to be $\Vert A \Vert = \max_{i,j} \vert a_{ij}\vert$, but this does not have certain important mathematical properties that make deriving error bounds convenient (see section 4.2.1). Instead, we will use $\Vert A \Vert _{\infty} = \max_{1 \leq i \leq m} \sum_{j=1}^n \vert a_{ij}\vert$, where A is an m-by-n matrix, or $\Vert A \Vert _{1} = \max_{1 \leq j \leq n} \sum_{i=1}^m \vert a_{ij}\vert$; see Table 4.2 for other matrix norms. As before $\Vert \hat{A} - A \Vert _{p}$ is the absolute error in $\hat{A}$, $\Vert \hat{A} - A \Vert _{p} / \Vert A \Vert _{p}$ is the relative error in $\hat{A}$, and a relative error in $\hat{A}$ of 10-5 means $\hat{A}$ is accurate to 5 decimal digits. The following example illustrates these ideas:

\begin{displaymath}
A = \left( \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & ...
... 3.09 & 5.87 & 6.66 \\ 7.36 & 7.77 & 9.07 \end{array} \right)
\end{displaymath}


\begin{eqnarray*}
\Vert \hat{A} - A \Vert _{\infty} = 2.44 \; , \; & \displaysty...
...{\Vert \hat{A} - A \Vert _{F}}{\Vert \hat{A} \Vert _{F}} = .1082
\end{eqnarray*}


so $\hat{A}$ is accurate to 1 decimal digit.

Here is some related notation we will use in our error bounds. The condition number of a matrix A is defined as $\kappa_p (A) \equiv \Vert A\Vert _p \cdot \Vert A^{-1}\Vert _p$, where A is square and invertible, and p is $\infty$ or one of the other possibilities in Table 4.2. The condition number measures how sensitive A-1 is to changes in A; the larger the condition number, the more sensitive is A-1. For example, for the same A as in the last example,

\begin{displaymath}
A^{-1} \approx \left( \begin{array}{ccc} -.667 & -1.333 & 1 ...
...t)
\; \; {\rm and} \; \; \kappa_{\infty} (A) = 158.33 \; \; .
\end{displaymath}

LAPACK error estimation routines typically compute a variable called RCOND, which is the reciprocal of the condition number (or an approximation of the reciprocal). The reciprocal of the condition number is used instead of the condition number itself in order to avoid the possibility of overflow when the condition number is very large. Also, some of our error bounds will use the vector of absolute values of x, |x| ( |x|i = |xi |), or similarly |A| ( |A|ij = |aij|).

Now we consider errors in subspaces. Subspaces are the outputs of routines that compute eigenvectors and invariant subspaces of matrices. We need a careful definition of error in these cases for the following reason. The nonzero vector x is called a (right) eigenvector of the matrix A with eigenvalue $\lambda$ if $Ax = \lambda x$. From this definition, we see that -x, 2x, or any other nonzero multiple $\beta x$ of x is also an eigenvector. In other words, eigenvectors are not unique. This means we cannot measure the difference between two supposed eigenvectors $\hat{x}$ and x by computing $\Vert \hat{x} - x \Vert _2$, because this may be large while $\Vert \hat{x} - \beta x \Vert _2$ is small or even zero for some $\beta \neq 1$. This is true even if we normalize x so that |x|2 = 1, since both x and -x can be normalized simultaneously. So in order to define error in a useful way, we need to instead consider the set $\cal S$ of all scalar multiples $\{ \beta x \; , \; \beta {\rm ~a~scalar} \}$ of x. The set $\cal S$ is called the subspace spanned by x, and is uniquely determined by any nonzero member of $\cal S$. We will measure the difference between two such sets by the acute angle between them. Suppose $\hat{\cal S}$ is spanned by $\{ \hat x \}$ and $\cal S$ is spanned by $\{ x \}$. Then the acute angle between $\hat{\cal S}$ and $\cal S$ is defined as

\begin{displaymath}
\theta ( \hat{\cal S}, {\cal S} ) =
\theta ( \hat{x} , x ) \...
...vert}{ \Vert \hat{x} \Vert _2 \cdot \Vert x \Vert _2 } \; \; .
\end{displaymath}

One can show that $\theta ( \hat{x} , x )$ does not change when either $\hat{x}$ or x is multiplied by any nonzero scalar. For example, if

\begin{displaymath}
x = \left( \begin{array}{c} 1 \\ 100 \\ 9 \end{array} \right...
...= \left( \begin{array}{c} 1.1 \\ 99 \\ 11 \end{array} \right)
\end{displaymath}

as above, then $\theta ( \gamma \hat{x} , \beta x ) = .0209$ for any nonzero scalars $\beta$ and $\gamma$.

Here is another way to interpret the angle $\theta$ between $\hat{\cal S}$ and $\cal S$. Suppose $\hat{x}$ is a unit vector ( $\Vert \hat{x} \Vert _2 = 1$). Then there is a scalar $\beta$ such that

\begin{displaymath}
\Vert \hat{x} - \beta x \Vert _2 = \frac{\sqrt{2} \sin \theta}{\sqrt{1+ \cos \theta}} \approx \theta
\; .
\end{displaymath}

The approximation $\approx \theta$ holds when $\theta$ is much less than 1 (less than .1 will do nicely). If $\hat{x}$ is an approximate eigenvector with error bound $\theta ( \hat{x} , x ) \leq \bar{\theta} \ll 1$, where x is a true eigenvector, there is another true eigenvector $\beta x$ satisfying $\Vert \hat{x} - \beta x \Vert _2 \mathrel{\raisebox{-.75ex}{$\mathop{\sim}\limits^{\textstyle <}$}}\bar{\theta}$. For example, if

\begin{displaymath}
\hat{x} = \left( \begin{array}{c} 1.1 \\ 99 \\ 11 \end{array...
...x = \left( \begin{array}{c} 1 \\ 100 \\ 9 \end{array} \right)
\end{displaymath}

then $\Vert \hat{x} - \beta x \Vert _2 \approx .0209$ for $\beta \approx .001$.

Some LAPACK routines also return subspaces spanned by more than one vector, such as the invariant subspaces of matrices returned by xGEESX. The notion of angle between subspaces also applies here; see section 4.2.1 for details.

Finally, many of our error bounds will contain a factor p(n) (or p(m,n)), which grows as a function of matrix dimension n (or dimensions m and n). It represents a potentially different function for each problem. In practice, the true errors usually grow just linearly; using p(n)=10n in the error bound formulas will often give a reasonable bound. Therefore, we will refer to p(n) as a ``modestly growing'' function of n. However it can occasionally be much larger, see section 4.2.1. For simplicity, the error bounds computed by the code fragments in the following sections will use p(n)=1. This means these computed error bounds may occasionally slightly underestimate the true error. For this reason we refer to these computed error bounds as ``approximate error bounds''.




next up previous contents index
Next: Further Details: How to Up: Accuracy and Stability Previous: Further Details: Floating Point   Contents   Index
Susan Blackford
1999-10-01