Next: Iterative Algorithms J. Up: Singular Value Decomposition Previous: Overview of Available Algorithms. Contents Index

Direct Methods

In this section, we briefly discuss methods for the computation of singular values and vectors of dense matrices. These methods are sometimes called transformation methods. Some or all of them are implemented in LAPACK, ScaLAPACK, and in the svd command in MATLAB.

While it is possible to apply the transformation methods of §4.2 to one of the Hermitian matrices , , or , the methods used in practice are specialized for the SVD and so are more efficient and accurate.

The most common transformation methods compute the thin SVD in three phases, shown below. (They can be easily modified to compute the full SVD, or just selected singular values and/or singular vectors, but we present just the thin SVD for simplicity.)

Find an by unitary matrix and an by matrix with orthonormal columns such that is an by bidiagonal matrix, i.e., is nonzero on the main diagonal and first superdiagonal only.
Compute the SVD of : $B = U_2 \Sigma V_2^*$ .
Multiply and to get the thin SVD $A = U \Sigma V^*$ .

Phase 1, reduction to bidiagonal form, is performed by LAPACK routine xGEBRD and ScaLAPACK routine PxGEBRD using a sequence of $\min (m-1,n)$ unitary Householder reflections on the left and unitary Householder reflections on the right, for a cost of $4n^2(m-\frac{1}{3}n)$ floating point operations. If singular values only are desired, phase 2 costs just , which is much less than the cost of phase 1, and phase 3 is omitted. If all left and right singular vectors are desired, the cost depends strongly on which algorithm is selected, as described below.

The algorithms for phase 2 are analogous to those available for the symmetric tridiagonal eigenproblem as discussed in §4.2, with some differences outlined below.

Currently, the fastest available sequential or shared memory parallel dense SVD routine is LAPACK's xGESDD, but a faster one is on the way. The fastest available distributed memory parallel dense SVD routine is ScaLAPACK's PxGESVD, but again a better one is one the way.

QR algorithm:

This algorithm finds all the singular values, and optionally all the left and/or right singular vectors of a bidiagonal matrix. It costs

for singular values only and

for the singular vectors as well. It can be implemented so that all singular values are computed with nearly all correct significant figures, even the tiniest ones (as long as underflow does not occur) [123]. QR is used to compute singular vectors in LAPACK's computational routine xBDSQR, which is used by driver routine xGESVD to compute the SVD of dense matrices but has been superseded by faster methods described below. xBDSQR is also used by ScaLAPACK driver routine PxGESVD.

DQDS algorithm:

This algorithm finds just the singular values of a bidiagonal matrix in

time, but as accurately and rather more efficiently than QR [168,355,360]. It is the sequential algorithm of choice for singular values, is implemented in LAPACK as xLASQ1, and is used by all LAPACK driver routines.

Divide-and-conquer method:

This is currently the fastest method available in LAPACK to find all the singular values and vectors of a bidiagonal matrix larger than about 25 by 25 [208,12]. It divides the matrix into two halves, computes the SVD of each half, and glues the solutions together by solving a special rational equation. For a large bidiagonal matrix this is repeated recursively until the parts are smaller than 25, when the QR algorithm is called.It can be slightly less accurate than DQDS or QR, because the error in the tiniest singular values may be as large as machine epsilon times $\sigma_1$ , but this is almost always good enough.

Divide-and-conquer is implemented by LAPACK computational routine xBDSDC, which is used by LAPACK driver routine xGESDD to compute the SVD of a dense matrix. xGESDD is currently the LAPACK algorithm of choice for the SVD of dense matrices.

Bisection method and inverse iteration:

Bisection and inverse iteration can be used to find just those singular values and vectors of interest. Current algorithms work in time

per singular triplet, unless the singular value is in a cluster of

nearby singular values, in which case the cost can rise to

if orthogonalization is used to try to guarantee orthogonal singular vectors. This could be as high as

if many singular values are tightly clustered. Still, this can sometimes fail to guarantee orthogonality [129]. Neither LAPACK, ScaLAPACK, nor MATLAB currently contains an SVD routine based on bisection and inverse iteration.

Recent advances in [168,128,358,356] promise an algorithm that costs work per singular triplet while guaranteeing orthogonality, but it is not yet complete. When done, both LAPACK and ScaLAPACK will provide this algorithm, which should be the fastest in all cases.

If is banded, phase 1 can be performed more cheaply by LAPACK computational routine xGBBRD, but no LAPACK driver routines are available.

Finally, we mention Jacobi's method [114,198] for the SVD. This transformation method repeatedly multiplies on the right by elementary orthogonal matrices (Jacobi rotations) until converges to $U \Sigma$ ; the product of the Jacobi rotations is . Jacobi is slower than any of the above transformation methods (it can be made to run within about twice the time of QR [136]) but has the useful property that for certain it can deliver the tiny singular values, and their singular vectors, much more accurately than any of the above methods provided that it is properly implemented [118,115].

Next: Iterative Algorithms J. Up: Singular Value Decomposition Previous: Overview of Available Algorithms. Contents Index

Susan Blackford 2000-11-20