Next: Overview of Available Algorithms. Up: Singular Value Decomposition Previous: Singular Value Decomposition Contents Index

Introduction

In this chapter we consider the singular value decomposition (SVD) of the by matrix . We assume without loss of generality that $m \geq n$ ; if consider . As described in §2.4, this decomposition may be written

$\begin{displaymath} A = U \Sigma V^*, \vspace*{-2pt}%% help \end{displaymath}$

(108)

where $U = [u_1,\ldots,u_m]$ is an

unitary matrix, $V = [v_1,\ldots,v_n]$ is an

unitary matrix, and $\Sigma$ is an

diagonal matrix with entries $\Sigma_{ii} = \sigma_i$ for $i=1,\ldots,n$ . The

are the left singular vectors, the

are the right singular vectors, and the $\sigma_i$ are the singular values. The singular values are nonnegative and sorted so that $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0$ . The number

of nonzero singular values is the rank of

. If

is real,

and

are real and orthogonal.

$A = U \Sigma V^*$ may also be written $AV = U \Sigma$ or $Av_i = u_i \sigma_i$ for $i=1,\ldots,n$ . $A = U \Sigma V^*$ may also be written $A^* U= V \Sigma^*$ or $A^*u_i = v_i \sigma_i$ for $i=1,\ldots,n$ and for $i=n+1,\ldots,m$ .

There are several ``smaller'' versions of the SVD that are often computed. Let $U_t = [u_1,\ldots,u_t]$ be an by matrix of the first left singular vectors, $V_t = [v_1,\ldots,v_t]$ be an by matrix of the first right singular vectors, and $\Sigma_t = {\rm diag}(\sigma_1 ,\ldots, \sigma_t)$ be a by matrix of the first singular values. Then we have the following SVD types.

Thin SVD.: $A = U_n \Sigma_n V_n^*$ is the thin (or economy-sized) SVD of . The thin SVD is much smaller to store and faster to compute than the full SVD when $n \ll m$ .
Compact SVD.: $A = U_{r} \Sigma_{r} V_{r}^*$ is a compact SVD of . The compact SVD is much smaller to store and faster to compute than the thin SVD when $r \ll n$ .
Truncated SVD.: $A_t = U_{t} \Sigma_{t} V_{t}^*$ is the rank- truncated (or partial) SVD of , where . Among all rank- matrices , is the unique minimizer of $\Vert A - B \Vert _F$ and also minimizes (perhaps not uniquely) $\Vert A - B \Vert _2$ . The truncated SVD is much smaller to store and cheaper to compute than the compact SVD when $t \ll r$ and is the most common form of the SVD computed in applications.

The thin SVD may also be written $A = \sum_{i=1}^n \sigma_i u_i v_i^*$ . Each $(\sigma_i, u_i, v_i)$ is called a singular triplet. The compact and truncated SVDs may be written similarly (the sum going from to , or to , respectively).

The SVD of is closely related to the eigendecompositions of three related Hermitian matrices, , , and $H(A) \equiv [{0 \atop A^*} \; {A \atop 0}]$ , which were described in §2.4.7. Most iterative algorithms for the SVD amount to applying an algorithm from Chapter 4 to one of these Hermitian matrices, so we review and expand that material here. The choice of , , or depends on which singular values and vectors one is interested in computing. The cost of some algorithms, like shift-and-invert (see below), may different significantly for , , and .

Consider , which is an by Hermitian matrix. Then the eigendecomposition of $A^*A = V (\Sigma^*\Sigma) V^*$ , where $A = U \Sigma V^*$ is the SVD of . Note that $\Sigma^*\Sigma = {\rm diag}(\sigma_1^2 , \ldots , \sigma_n^2)$ . In other words, the eigenvectors of are the right singular vectors of , and the eigenvalues of are the squares of the singular values of .
Thus, if we find the eigenvalues $\lambda_i$ and eigenvectors of , then we can recover the compact SVD of by taking $\sigma_i = \lambda_i^{1/2}$ for $i=1,\ldots,n$ , the right singular vectors as for $i=1,\ldots,n$ , and the first left singular vectors by $u_i = Av_i / \sigma_i$ . The left singular vectors $u_{r+1}$ through are not determined directly, but may be taken as any orthonormal vectors also orthogonal to through . When $\sigma_i \neq 0$ is very small, i.e., $0 < \sigma_i \ll \sigma_1$ , then will not be determined very accurately.
Consider , which is an by Hermitian matrix. Then the eigendecomposition of $AA^* = U (\Sigma \Sigma^*) U^*$ . Note that $\Sigma \Sigma^* = {\rm diag} ( \sigma_1^2,\ldots,\sigma_n^2 , 0,\ldots,0)$ , with zeros after $\sigma_n^2$ . In other words, the eigenvectors of are the left singular vectors of , and the eigenvalues of are the squares of the singular values of , in addition to 0 with additional multiplicity .
Thus, if we find the eigenvalues $\lambda_i$ and eigenvectors of , then we can recover the compact SVD of by taking $\sigma_i = \lambda_i^{1/2}$ for $i=1,\ldots,n$ , the left singular vectors as for $i=1,\ldots,m$ , and the first right singular vectors as $v_i = A^*u_i / \sigma_i$ . The right singular vectors $v_{r+1}$ through are not determined directly, but may be taken as any orthonormal vectors also orthogonal to through . When $\sigma_i \neq 0$ is very small, i.e., $0 < \sigma_i \ll \sigma_1$ , then will not be determined very accurately.
Consider $H(A) = [{0^{m \times m} \atop A^*} \; {A \atop 0^{n \times n}}]$ , which is an by Hermitian matrix. Write $U = [U_n^{m \times n}, \tilde{U}_n^{m \times (m-n)}]$ and $\Sigma = [{\Sigma_{n}^{n \times n} \atop 0^{(m-n) \times n}}]$ . Then the eigendecomposition of $H(A) = Q \Lambda Q^*$ , where $\Lambda = {\rm diag}( \Sigma_n , -\Sigma_n , O^{(m-n) \times (m-n)})$ and

$\begin{displaymath} Q=\bmat{ccc} \frac{1}{\sqrt{2}} U_n & - \frac{1}{\sqrt{2}} U... ... + \frac{1}{\sqrt{2}} V & O^{n \times (m-n)} \emat \; \; \; . \end{displaymath}$ (109)

In other words, $\pm \sigma_i$ is an eigenvalue with unit eigenvector $\frac{1}{\sqrt{2}} [{\pm u_i \atop v_i}]$ for to , and 0 is an eigenvalue with eigenvector $[{u_i \atop 0^{n \times 1}}]$ for . Thus we can extract the singular values, left singular vectors, and right singular vectors of directly from the eigenvalues and eigenvectors of . More precisely, we can directly extract the compact SVD from the eigenvalues and eigenvectors of . But if 0 is a singular value of , then will be (at least) a double eigenvalue of , so will not necessarily be of the form (6.2). (For example, if so that too, then can be an arbitrary orthogonal matrix not necessarily of the form (6.2).) In this case, suppose the columns of $[{\hat{U}^{m \times z} \atop \hat{V}^{n \times z}}]$ form an orthonormal basis spanning the eigenspace for the zero eigenvalue of ; then the right singular vectors can be taken as any orthonormal vectors spanning ${\rm span} \hat{V}$ , and the left singular vectors can be taken as any orthonormal vectors spanning ${\rm span} \hat{U}$ .

We note that

$\begin{displaymath} H(A)^2 = \bmat{cc} AA^* & 0 \\ 0 & A^*A \emat = \bmat{cc} H_2 & 0 \\ 0 & H_1 \emat \end{displaymath}$

so that two multiplications by

are equivalent to multiplication by both

and

The correspondence between the SVD of and the eigendecomposition of shows that the discussion of perturbation theory for eigenvalues and eigenvectors of Hermitian matrices in §4.1 and §4.8 applies directly to the SVD, as we now describe.

Perturbing to can change the singular values by at most $\Vert E\Vert _2$ :

$\begin{displaymath} \vert \sigma_i (A+E) - \sigma_i(A) \vert \leq \Vert E\Vert _2. \end{displaymath}$

(110)

Now suppose $(\hat{\sigma}, \hat{u}, \hat{v})$ is an approximation of the singular triplet $(\sigma_i, u_i, v_i)$ , where $\hat{u}$ and $\hat{v}$ are unit vectors. The ``best'' $\hat{\sigma}$ corresponding to $\hat{u}$ and $\hat{v}$ is the Rayleigh quotient $\hat{\sigma} = {\rm Re}(\hat{u}^* A \hat{v})$ , so we assume that $\hat{\sigma}$ has this value. Suppose $\hat{\sigma}$ is closer to $\sigma_i$ than any other $\sigma_j$ , and let $\delta$ be the gap between $\hat{\sigma}$ and any other singular value: $\delta = \min_{j \neq i} \vert \hat{\sigma} - \sigma_j \vert$ .

Define the residual vector as

$\begin{displaymath} r = \frac{1}{\sqrt{2}} \bmat{c} A\hat{v} - \hat{\sigma} \hat{u} \\ A^*\hat{u} - \hat{\sigma} \hat{v} \emat. \end{displaymath}$

then $(\hat{\sigma}, \hat{u}, \hat{v})$ is an exact singular triplet, and our error bounds will be small when $\Vert r\Vert _2$ is small.

The difference between $\hat{\sigma}$ and $\sigma_i$ is bounded by

$\begin{displaymath} \vert \hat{\sigma} - \sigma_i \vert \leq \min \left\{ \Vert r\Vert _2 , \frac{\Vert r\Vert _2^2}{\delta} \right\}. \end{displaymath}$

Furthermore, the angles $\angle (\hat{v}, v_i)$ and $\angle (\hat{u}, u_i)$ between the approximate and true singular vectors are bounded by

$\begin{displaymath} \max \{ \sin \angle (\hat{v}, v_i) , \sin \angle (\hat{u}, u_i) \} \leq \frac{\sqrt{2} \Vert r\Vert _2}{\delta}. \end{displaymath}$

In other words, the larger the gap $\delta$ is between the approximate singular value $\hat{\sigma}$ and all the other $\sigma_j$ , and the smaller the residual

, then the tighter one can bound the error in $\hat{\sigma}$ , $\hat{u}$ , and $\hat{v}$ .

$\Vert r\Vert _2$ is easy to compute given , $\hat{u}$ , and $\hat{v}$ , but $\delta$ requires more information about the singular values in order to approximate it. Typically one uses the computed singular values near $\hat{\sigma}$ to approximate $\delta$ : $\delta \approx \min \{ \hat{\sigma} - \hat{\sigma}_- , \hat{\sigma}_+ - \hat{\sigma} \}$ , where $\hat{\sigma}_-$ is the next computed singular value smaller than $\hat{\sigma}$ , and $\hat{\sigma}_+$ is the next computed singular value larger than $\hat{\sigma}$ .

Subsections

- - Overview of Available Algorithms.

Next: Overview of Available Algorithms. Up: Singular Value Decomposition Previous: Singular Value Decomposition Contents Index

Susan Blackford 2000-11-20