next up previous contents index
Next: Overview of Available Algorithms. Up: Singular Value Decomposition Previous: Singular Value Decomposition   Contents   Index


Introduction

In this chapter we consider the singular value decomposition (SVD) of the $m$ by $n$ matrix $A$. We assume without loss of generality that $m \geq n$; if $m<n$ consider $A^*$. As described in §2.4, this decomposition may be written

\begin{displaymath}
A = U \Sigma V^*,
\vspace*{-2pt}%% help
\end{displaymath} (108)

where $U = [u_1,\ldots,u_m]$ is an $m$ by $m$ unitary matrix, $V = [v_1,\ldots,v_n]$ is an $n$ by $n$ unitary matrix, and $\Sigma$ is an $m$ by $n$ diagonal matrix with entries $\Sigma_{ii} = \sigma_i$ for $i=1,\ldots,n$. The $u_i$ are the left singular vectors, the $v_i$ are the right singular vectors, and the $\sigma_i$ are the singular values. The singular values are nonnegative and sorted so that $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0$. The number $r$ of nonzero singular values is the rank of $A$. If $A$ is real, $U$ and $V$ are real and orthogonal.

$A = U \Sigma V^*$ may also be written $AV = U \Sigma$ or $Av_i = u_i \sigma_i$ for $i=1,\ldots,n$. $A = U \Sigma V^*$ may also be written $A^* U= V \Sigma^*$ or $A^*u_i = v_i \sigma_i$ for $i=1,\ldots,n$ and $A^*u_i = 0$ for $i=n+1,\ldots,m$.

There are several ``smaller'' versions of the SVD that are often computed. Let $U_t = [u_1,\ldots,u_t]$ be an $m$ by $t$ matrix of the first $t$ left singular vectors, $V_t = [v_1,\ldots,v_t]$ be an $n$ by $t$ matrix of the first $t$ right singular vectors, and $\Sigma_t = {\rm diag}(\sigma_1 ,\ldots, \sigma_t)$ be a $t$ by $t$ matrix of the first $t$ singular values. Then we have the following SVD types.

Thin SVD.
$A = U_n \Sigma_n V_n^*$ is the thin (or economy-sized) SVD of $A$. The thin SVD is much smaller to store and faster to compute than the full SVD when $n \ll m$.

Compact SVD.
$A = U_{r} \Sigma_{r} V_{r}^*$ is a compact SVD of $A$. The compact SVD is much smaller to store and faster to compute than the thin SVD when $r \ll n$.

Truncated SVD.
$A_t = U_{t} \Sigma_{t} V_{t}^*$ is the rank-$t$ truncated (or partial) SVD of $A$, where $t < r$. Among all rank-$t$ matrices $B$, $B=A_t$ is the unique minimizer of $\Vert A - B \Vert _F$ and also minimizes (perhaps not uniquely) $\Vert A - B \Vert _2$. The truncated SVD is much smaller to store and cheaper to compute than the compact SVD when $t \ll r$ and is the most common form of the SVD computed in applications.

The thin SVD may also be written $A = \sum_{i=1}^n \sigma_i u_i v_i^*$. Each $(\sigma_i, u_i, v_i)$ is called a singular triplet. The compact and truncated SVDs may be written similarly (the sum going from $i=1$ to $r$, or $i=1$ to $t$, respectively).

The SVD of $A$ is closely related to the eigendecompositions of three related Hermitian matrices, $AA^*$, $A^* A$, and $H(A) \equiv [{0 \atop A^*} \; {A \atop 0}]$, which were described in §2.4.7. Most iterative algorithms for the SVD amount to applying an algorithm from Chapter 4 to one of these Hermitian matrices, so we review and expand that material here. The choice of $AA^*$, $A^* A$, or $H(A)$ depends on which singular values and vectors one is interested in computing. The cost of some algorithms, like shift-and-invert (see below), may different significantly for $AA^*$, $A^* A$, and $H(A)$.

  1. Consider $A^* A$, which is an $n$ by $n$ Hermitian matrix. Then the eigendecomposition of $A^*A = V (\Sigma^*\Sigma) V^*$, where $A = U \Sigma V^*$ is the SVD of $A$. Note that $\Sigma^*\Sigma = {\rm diag}(\sigma_1^2 , \ldots , \sigma_n^2)$. In other words, the eigenvectors of $A^* A$ are the right singular vectors of $A$, and the eigenvalues of $A^* A$ are the squares of the singular values of $A$.

    Thus, if we find the eigenvalues $\lambda_i$ and eigenvectors $v_i$ of $AA^*$, then we can recover the compact SVD of $A$ by taking $\sigma_i = \lambda_i^{1/2}$ for $i=1,\ldots,n$, the right singular vectors as $v_i$ for $i=1,\ldots,n$, and the first $r$ left singular vectors by $u_i = Av_i / \sigma_i$. The left singular vectors $u_{r+1}$ through $u_m$ are not determined directly, but may be taken as any $m-r$ orthonormal vectors also orthogonal to $u_1$ through $u_r$. When $\sigma_i \neq 0$ is very small, i.e., $0 < \sigma_i \ll \sigma_1$, then $u_i$ will not be determined very accurately.

  2. Consider $AA^*$, which is an $m$ by $m$ Hermitian matrix. Then the eigendecomposition of $AA^* = U (\Sigma \Sigma^*) U^*$. Note that $\Sigma \Sigma^* = {\rm diag} ( \sigma_1^2,\ldots,\sigma_n^2 , 0,\ldots,0)$, with $m-n$ zeros after $\sigma_n^2$. In other words, the eigenvectors of $AA^*$ are the left singular vectors of $A$, and the eigenvalues of $AA^*$ are the squares of the singular values of $A$, in addition to 0 with additional multiplicity $m-n$.

    Thus, if we find the eigenvalues $\lambda_i$ and eigenvectors $u_i$ of $AA^*$, then we can recover the compact SVD of $A$ by taking $\sigma_i = \lambda_i^{1/2}$ for $i=1,\ldots,n$, the left singular vectors as $u_i$ for $i=1,\ldots,m$, and the first $r$ right singular vectors as $v_i = A^*u_i / \sigma_i$. The right singular vectors $v_{r+1}$ through $v_n$ are not determined directly, but may be taken as any $n-r$ orthonormal vectors also orthogonal to $v_1$ through $v_r$. When $\sigma_i \neq 0$ is very small, i.e., $0 < \sigma_i \ll \sigma_1$, then $v_i$ will not be determined very accurately.

  3. Consider $H(A) = [{0^{m \times m} \atop A^*} \; {A \atop 0^{n \times n}}]$, which is an $(m+n)$ by $(m+n)$ Hermitian matrix. Write $U = [U_n^{m \times n}, \tilde{U}_n^{m \times (m-n)}]$ and $\Sigma = [{\Sigma_{n}^{n \times n} \atop 0^{(m-n) \times n}}]$. Then the eigendecomposition of $H(A) = Q \Lambda Q^*$, where $\Lambda = {\rm diag}( \Sigma_n , -\Sigma_n , O^{(m-n) \times (m-n)})$ and
    \begin{displaymath}
Q=\bmat{ccc} \frac{1}{\sqrt{2}} U_n & - \frac{1}{\sqrt{2}} U...
... + \frac{1}{\sqrt{2}} V &
O^{n \times (m-n)} \emat
\; \; \; .
\end{displaymath} (109)

    In other words, $\pm \sigma_i$ is an eigenvalue with unit eigenvector $\frac{1}{\sqrt{2}} [{\pm u_i \atop v_i}]$ for $i=1$ to $n$, and 0 is an eigenvalue with eigenvector $[{u_i \atop 0^{n \times 1}}]$ for $i>n$. Thus we can extract the singular values, left singular vectors, and right singular vectors of $A$ directly from the eigenvalues and eigenvectors of $H(A)$. More precisely, we can directly extract the compact SVD from the eigenvalues and eigenvectors of $H(A)$. But if 0 is a singular value of $A$, then $0$ will be (at least) a double eigenvalue of $H(A)$, so $Q$ will not necessarily be of the form (6.2). (For example, if $A=0$ so that $H(A)=0$ too, then $Q$ can be an arbitrary orthogonal matrix not necessarily of the form (6.2).) In this case, suppose the columns of $[{\hat{U}^{m \times z} \atop \hat{V}^{n \times z}}]$ form an orthonormal basis spanning the eigenspace for the zero eigenvalue of $H(A)$; then the right singular vectors can be taken as any orthonormal vectors spanning ${\rm span} \hat{V}$, and the left singular vectors can be taken as any orthonormal vectors spanning ${\rm span} \hat{U}$.

We note that

\begin{displaymath}
H(A)^2 = \bmat{cc} AA^* & 0 \\ 0 & A^*A \emat =
\bmat{cc} H_2 & 0 \\ 0 & H_1 \emat
\end{displaymath}

so that two multiplications by $H(A)$ are equivalent to multiplication by both $AA^*$ and $A^* A$.

The correspondence between the SVD of $A$ and the eigendecomposition of $H(A)$ shows that the discussion of perturbation theory for eigenvalues and eigenvectors of Hermitian matrices in §4.1 and §4.8 applies directly to the SVD, as we now describe.

Perturbing $A$ to $A+E$ can change the singular values by at most $\Vert E\Vert _2$:

\begin{displaymath}
\vert \sigma_i (A+E) - \sigma_i(A) \vert \leq \Vert E\Vert _2.
\end{displaymath} (110)

Now suppose $(\hat{\sigma}, \hat{u}, \hat{v})$ is an approximation of the singular triplet $(\sigma_i, u_i, v_i)$, where $\hat{u}$ and $\hat{v}$ are unit vectors. The ``best'' $\hat{\sigma}$ corresponding to $\hat{u}$ and $\hat{v}$ is the Rayleigh quotient $\hat{\sigma} = {\rm Re}(\hat{u}^* A \hat{v})$, so we assume that $\hat{\sigma}$ has this value. Suppose $\hat{\sigma}$ is closer to $\sigma_i$ than any other $\sigma_j$, and let $\delta$ be the gap between $\hat{\sigma}$ and any other singular value: $\delta = \min_{j \neq i} \vert \hat{\sigma} - \sigma_j \vert$.

Define the residual vector $r$ as

\begin{displaymath}
r = \frac{1}{\sqrt{2}} \bmat{c} A\hat{v} - \hat{\sigma} \hat{u} \\
A^*\hat{u} - \hat{\sigma} \hat{v} \emat.
\end{displaymath}

If $r=0$ then $(\hat{\sigma}, \hat{u}, \hat{v})$ is an exact singular triplet, and our error bounds will be small when $\Vert r\Vert _2$ is small.

The difference between $\hat{\sigma}$ and $\sigma_i$ is bounded by

\begin{displaymath}
\vert \hat{\sigma} - \sigma_i \vert \leq \min \left\{ \Vert r\Vert _2 ,
\frac{\Vert r\Vert _2^2}{\delta} \right\}.
\end{displaymath}

Furthermore, the angles $\angle (\hat{v}, v_i)$ and $\angle (\hat{u}, u_i)$ between the approximate and true singular vectors are bounded by

\begin{displaymath}
\max \{ \sin \angle (\hat{v}, v_i) , \sin \angle (\hat{u}, u_i) \}
\leq \frac{\sqrt{2} \Vert r\Vert _2}{\delta}.
\end{displaymath}

In other words, the larger the gap $\delta$ is between the approximate singular value $\hat{\sigma}$ and all the other $\sigma_j$, and the smaller the residual $r$, then the tighter one can bound the error in $\hat{\sigma}$, $\hat{u}$, and $\hat{v}$.

$\Vert r\Vert _2$ is easy to compute given $A$, $\hat{u}$, and $\hat{v}$, but $\delta$ requires more information about the singular values in order to approximate it. Typically one uses the computed singular values near $\hat{\sigma}$ to approximate $\delta$: $\delta \approx \min \{ \hat{\sigma} - \hat{\sigma}_- ,
\hat{\sigma}_+ - \hat{\sigma} \}$, where $\hat{\sigma}_-$ is the next computed singular value smaller than $\hat{\sigma}$, and $\hat{\sigma}_+$ is the next computed singular value larger than $\hat{\sigma}$.



Subsections
next up previous contents index
Next: Overview of Available Algorithms. Up: Singular Value Decomposition Previous: Singular Value Decomposition   Contents   Index
Susan Blackford 2000-11-20