In this section, we briefly discuss methods for the computation of singular values and vectors of dense matrices. These methods are sometimes called transformation methods. Some or all of them are implemented in LAPACK, ScaLAPACK, and in the svd command in MATLAB.
While it is possible to apply the transformation methods of §4.2 to one of the Hermitian matrices , , or , the methods used in practice are specialized for the SVD and so are more efficient and accurate.
The most common transformation methods compute the thin SVD in three phases, shown below. (They can be easily modified to compute the full SVD, or just selected singular values and/or singular vectors, but we present just the thin SVD for simplicity.)
Phase 1, reduction to bidiagonal form, is performed by LAPACK routine xGEBRD and ScaLAPACK routine PxGEBRD using a sequence of unitary Householder reflections on the left and unitary Householder reflections on the right, for a cost of floating point operations. If singular values only are desired, phase 2 costs just , which is much less than the cost of phase 1, and phase 3 is omitted. If all left and right singular vectors are desired, the cost depends strongly on which algorithm is selected, as described below.
The algorithms for phase 2 are analogous to those available for the symmetric tridiagonal eigenproblem as discussed in §4.2, with some differences outlined below.
Currently, the fastest available sequential or shared memory parallel dense SVD routine is LAPACK's xGESDD, but a faster one is on the way. The fastest available distributed memory parallel dense SVD routine is ScaLAPACK's PxGESVD, but again a better one is one the way.
Divide-and-conquer is implemented by LAPACK computational routine xBDSDC, which is used by LAPACK driver routine xGESDD to compute the SVD of a dense matrix. xGESDD is currently the LAPACK algorithm of choice for the SVD of dense matrices.
Recent advances in [168,128,358,356] promise an algorithm that costs work per singular triplet while guaranteeing orthogonality, but it is not yet complete. When done, both LAPACK and ScaLAPACK will provide this algorithm, which should be the fastest in all cases.
If is banded, phase 1 can be performed more cheaply by LAPACK computational routine xGBBRD, but no LAPACK driver routines are available.
Finally, we mention Jacobi's method [114,198] for the SVD. This transformation method repeatedly multiplies on the right by elementary orthogonal matrices (Jacobi rotations) until converges to ; the product of the Jacobi rotations is . Jacobi is slower than any of the above transformation methods (it can be made to run within about twice the time of QR [136]) but has the useful property that for certain it can deliver the tiny singular values, and their singular vectors, much more accurately than any of the above methods provided that it is properly implemented [118,115].