LAPACK 3.12.1
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ dgejsv()

subroutine dgejsv ( character*1 joba,
character*1 jobu,
character*1 jobv,
character*1 jobr,
character*1 jobt,
character*1 jobp,
integer m,
integer n,
double precision, dimension( lda, * ) a,
integer lda,
double precision, dimension( n ) sva,
double precision, dimension( ldu, * ) u,
integer ldu,
double precision, dimension( ldv, * ) v,
integer ldv,
double precision, dimension( lwork ) work,
integer lwork,
integer, dimension( * ) iwork,
integer info )

DGEJSV

Download DGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
!>
!> DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
!> matrix [A], where M >= N. The SVD of [A] is written as
!>
!>              [A] = [U] * [SIGMA] * [V]^t,
!>
!> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
!> diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
!> [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
!> the singular values of [A]. The columns of [U] and [V] are the left and
!> the right singular vectors of [A], respectively. The matrices [U] and [V]
!> are computed and stored in the arrays U and V, respectively. The diagonal
!> of [SIGMA] is computed and stored in the array SVA.
!> DGEJSV can sometimes compute tiny singular values and their singular vectors much
!> more accurately than other SVD routines, see below under Further Details.
!> 
Parameters
[in]JOBA
!>          JOBA is CHARACTER*1
!>        Specifies the level of accuracy:
!>       = 'C': This option works well (high relative accuracy) if A = B * D,
!>             with well-conditioned B and arbitrary diagonal matrix D.
!>             The accuracy cannot be spoiled by COLUMN scaling. The
!>             accuracy of the computed output depends on the condition of
!>             B, and the procedure aims at the best theoretical accuracy.
!>             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
!>             bounded by f(M,N)*epsilon* cond(B), independent of D.
!>             The input matrix is preprocessed with the QRF with column
!>             pivoting. This initial preprocessing and preconditioning by
!>             a rank revealing QR factorization is common for all values of
!>             JOBA. Additional actions are specified as follows:
!>       = 'E': Computation as with 'C' with an additional estimate of the
!>             condition number of B. It provides a realistic error bound.
!>       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
!>             D1, D2, and well-conditioned matrix C, this option gives
!>             higher accuracy than the 'C' option. If the structure of the
!>             input matrix is not known, and relative accuracy is
!>             desirable, then this option is advisable. The input matrix A
!>             is preprocessed with QR factorization with FULL (row and
!>             column) pivoting.
!>       = 'G': Computation as with 'F' with an additional estimate of the
!>             condition number of B, where A=D*B. If A has heavily weighted
!>             rows, then using this condition number gives too pessimistic
!>             error bound.
!>       = 'A': Small singular values are the noise and the matrix is treated
!>             as numerically rank deficient. The error in the computed
!>             singular values is bounded by f(m,n)*epsilon*||A||.
!>             The computed SVD A = U * S * V^t restores A up to
!>             f(m,n)*epsilon*||A||.
!>             This gives the procedure the licence to discard (set to zero)
!>             all singular values below N*epsilon*||A||.
!>       = 'R': Similar as in 'A'. Rank revealing property of the initial
!>             QR factorization is used do reveal (using triangular factor)
!>             a gap sigma_{r+1} < epsilon * sigma_r in which case the
!>             numerical RANK is declared to be r. The SVD is computed with
!>             absolute error bounds, but more accurately than with 'A'.
!> 
[in]JOBU
!>          JOBU is CHARACTER*1
!>        Specifies whether to compute the columns of U:
!>       = 'U': N columns of U are returned in the array U.
!>       = 'F': full set of M left sing. vectors is returned in the array U.
!>       = 'W': U may be used as workspace of length M*N. See the description
!>             of U.
!>       = 'N': U is not computed.
!> 
[in]JOBV
!>          JOBV is CHARACTER*1
!>        Specifies whether to compute the matrix V:
!>       = 'V': N columns of V are returned in the array V; Jacobi rotations
!>             are not explicitly accumulated.
!>       = 'J': N columns of V are returned in the array V, but they are
!>             computed as the product of Jacobi rotations. This option is
!>             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
!>       = 'W': V may be used as workspace of length N*N. See the description
!>             of V.
!>       = 'N': V is not computed.
!> 
[in]JOBR
!>          JOBR is CHARACTER*1
!>        Specifies the RANGE for the singular values. Issues the licence to
!>        set to zero small positive singular values if they are outside
!>        specified range. If A .NE. 0 is scaled so that the largest singular
!>        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
!>        the licence to kill columns of A whose norm in c*A is less than
!>        DSQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN,
!>        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
!>       = 'N': Do not kill small columns of c*A. This option assumes that
!>             BLAS and QR factorizations and triangular solvers are
!>             implemented to work in that range. If the condition of A
!>             is greater than BIG, use DGESVJ.
!>       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
!>             (roughly, as described above). This option is recommended.
!>                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
!>        For computing the singular values in the FULL range [SFMIN,BIG]
!>        use DGESVJ.
!> 
[in]JOBT
!>          JOBT is CHARACTER*1
!>        If the matrix is square then the procedure may determine to use
!>        transposed A if A^t seems to be better with respect to convergence.
!>        If the matrix is not square, JOBT is ignored. This is subject to
!>        changes in the future.
!>        The decision is based on two values of entropy over the adjoint
!>        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
!>       = 'T': transpose if entropy test indicates possibly faster
!>        convergence of Jacobi process if A^t is taken as input. If A is
!>        replaced with A^t, then the row pivoting is included automatically.
!>       = 'N': do not speculate.
!>        This option can be used to compute only the singular values, or the
!>        full SVD (U, SIGMA and V). For only one set of singular vectors
!>        (U or V), the caller should provide both U and V, as one of the
!>        matrices is used as workspace if the matrix A is transposed.
!>        The implementer can easily remove this constraint and make the
!>        code more complicated. See the descriptions of U and V.
!> 
[in]JOBP
!>          JOBP is CHARACTER*1
!>        Issues the licence to introduce structured perturbations to drown
!>        denormalized numbers. This licence should be active if the
!>        denormals are poorly implemented, causing slow computation,
!>        especially in cases of fast convergence (!). For details see [1,2].
!>        For the sake of simplicity, this perturbations are included only
!>        when the full SVD or only the singular values are requested. The
!>        implementer/user can easily add the perturbation for the cases of
!>        computing one set of singular vectors.
!>       = 'P': introduce perturbation
!>       = 'N': do not perturb
!> 
[in]M
!>          M is INTEGER
!>         The number of rows of the input matrix A.  M >= 0.
!> 
[in]N
!>          N is INTEGER
!>         The number of columns of the input matrix A. M >= N >= 0.
!> 
[in,out]A
!>          A is DOUBLE PRECISION array, dimension (LDA,N)
!>          On entry, the M-by-N matrix A.
!> 
[in]LDA
!>          LDA is INTEGER
!>          The leading dimension of the array A.  LDA >= max(1,M).
!> 
[out]SVA
!>          SVA is DOUBLE PRECISION array, dimension (N)
!>          On exit,
!>          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
!>            computation SVA contains Euclidean column norms of the
!>            iterated matrices in the array A.
!>          - For WORK(1) .NE. WORK(2): The singular values of A are
!>            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
!>            sigma_max(A) overflows or if small singular values have been
!>            saved from underflow by scaling the input matrix A.
!>          - If JOBR='R' then some of the singular values may be returned
!>            as exact zeros obtained by  because they are
!>            below the numerical rank threshold or are denormalized numbers.
!> 
[out]U
!>          U is DOUBLE PRECISION array, dimension ( LDU, N ) or ( LDU, M )
!>          If JOBU = 'U', then U contains on exit the M-by-N matrix of
!>                         the left singular vectors.
!>          If JOBU = 'F', then U contains on exit the M-by-M matrix of
!>                         the left singular vectors, including an ONB
!>                         of the orthogonal complement of the Range(A).
!>          If JOBU = 'W'  .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N),
!>                         then U is used as workspace if the procedure
!>                         replaces A with A^t. In that case, [V] is computed
!>                         in U as left singular vectors of A^t and then
!>                         copied back to the V array. This 'W' option is just
!>                         a reminder to the caller that in this case U is
!>                         reserved as workspace of length N*N.
!>          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
!> 
[in]LDU
!>          LDU is INTEGER
!>          The leading dimension of the array U,  LDU >= 1.
!>          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
!> 
[out]V
!>          V is DOUBLE PRECISION array, dimension ( LDV, N )
!>          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
!>                         the right singular vectors;
!>          If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N),
!>                         then V is used as workspace if the procedure
!>                         replaces A with A^t. In that case, [U] is computed
!>                         in V as right singular vectors of A^t and then
!>                         copied back to the U array. This 'W' option is just
!>                         a reminder to the caller that in this case V is
!>                         reserved as workspace of length N*N.
!>          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
!> 
[in]LDV
!>          LDV is INTEGER
!>          The leading dimension of the array V,  LDV >= 1.
!>          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
!> 
[out]WORK
!>          WORK is DOUBLE PRECISION array, dimension (MAX(7,LWORK))
!>          On exit, if N > 0 .AND. M > 0 (else not referenced),
!>          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
!>                    that SCALE*SVA(1:N) are the computed singular values
!>                    of A. (See the description of SVA().)
!>          WORK(2) = See the description of WORK(1).
!>          WORK(3) = SCONDA is an estimate for the condition number of
!>                    column equilibrated A. (If JOBA = 'E' or 'G')
!>                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
!>                    It is computed using DPOCON. It holds
!>                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
!>                    where R is the triangular factor from the QRF of A.
!>                    However, if R is truncated and the numerical rank is
!>                    determined to be strictly smaller than N, SCONDA is
!>                    returned as -1, thus indicating that the smallest
!>                    singular values might be lost.
!>
!>          If full SVD is needed, the following two condition numbers are
!>          useful for the analysis of the algorithm. They are provided for
!>          a developer/implementer who is familiar with the details of
!>          the method.
!>
!>          WORK(4) = an estimate of the scaled condition number of the
!>                    triangular factor in the first QR factorization.
!>          WORK(5) = an estimate of the scaled condition number of the
!>                    triangular factor in the second QR factorization.
!>          The following two parameters are computed if JOBT = 'T'.
!>          They are provided for a developer/implementer who is familiar
!>          with the details of the method.
!>
!>          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
!>                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
!>                    probability simplex.
!>          WORK(7) = the entropy of A*A^t.
!> 
[in]LWORK
!>          LWORK is INTEGER
!>          Length of WORK to confirm proper allocation of work space.
!>          LWORK depends on the job:
!>
!>          If only SIGMA is needed (JOBU = 'N', JOBV = 'N') and
!>            -> .. no scaled condition estimate required (JOBE = 'N'):
!>               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
!>               ->> For optimal performance (blocked code) the optimal value
!>               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
!>               block size for DGEQP3 and DGEQRF.
!>               In general, optimal LWORK is computed as
!>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
!>            -> .. an estimate of the scaled condition number of A is
!>               required (JOBA='E', 'G'). In this case, LWORK is the maximum
!>               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
!>               ->> For optimal performance (blocked code) the optimal value
!>               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
!>               In general, the optimal length LWORK is computed as
!>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
!>                                                     N+N*N+LWORK(DPOCON),7).
!>
!>          If SIGMA and the right singular vectors are needed (JOBV = 'V'),
!>            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
!>            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
!>               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQF,
!>               DORMLQ. In general, the optimal length LWORK is computed as
!>               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
!>                       N+LWORK(DGELQF), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
!>
!>          If SIGMA and the left singular vectors are needed
!>            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
!>            -> For optimal performance:
!>               if JOBU = 'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
!>               if JOBU = 'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
!>               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
!>               In general, the optimal length LWORK is computed as
!>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
!>                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
!>               Here LWORK(DORMQR) equals N*NB (for JOBU = 'U') or
!>               M*NB (for JOBU = 'F').
!>
!>          If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and
!>            -> if JOBV = 'V'
!>               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
!>            -> if JOBV = 'J' the minimal requirement is
!>               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
!>            -> For optimal performance, LWORK should be additionally
!>               larger than N+M*NB, where NB is the optimal block size
!>               for DORMQR.
!> 
[out]IWORK
!>          IWORK is INTEGER array, dimension (MAX(3,M+3*N)).
!>          On exit,
!>          IWORK(1) = the numerical rank determined after the initial
!>                     QR factorization with pivoting. See the descriptions
!>                     of JOBA and JOBR.
!>          IWORK(2) = the number of the computed nonzero singular values
!>          IWORK(3) = if nonzero, a warning message:
!>                     If IWORK(3) = 1 then some of the column norms of A
!>                     were denormalized floats. The requested high accuracy
!>                     is not warranted by the data.
!> 
[out]INFO
!>          INFO is INTEGER
!>           < 0:  if INFO = -i, then the i-th argument had an illegal value.
!>           = 0:  successful exit;
!>           > 0:  DGEJSV  did not converge in the maximal allowed number
!>                 of sweeps. The computed values may be inaccurate.
!> 
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
!>
!>  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
!>  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
!>  additional row pivoting can be used as a preprocessor, which in some
!>  cases results in much higher accuracy. An example is matrix A with the
!>  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
!>  diagonal matrices and C is well-conditioned matrix. In that case, complete
!>  pivoting in the first QR factorizations provides accuracy dependent on the
!>  condition number of C, and independent of D1, D2. Such higher accuracy is
!>  not completely understood theoretically, but it works well in practice.
!>  Further, if A can be written as A = B*D, with well-conditioned B and some
!>  diagonal D, then the high accuracy is guaranteed, both theoretically and
!>  in software, independent of D. For more details see [1], [2].
!>     The computational range for the singular values can be the full range
!>  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
!>  & LAPACK routines called by DGEJSV are implemented to work in that range.
!>  If that is not the case, then the restriction for safe computation with
!>  the singular values in the range of normalized IEEE numbers is that the
!>  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
!>  overflow. This code (DGEJSV) is best used in this restricted range,
!>  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
!>  returned as zeros. See JOBR for details on this.
!>     Further, this implementation is somewhat slower than the one described
!>  in [1,2] due to replacement of some non-LAPACK components, and because
!>  the choice of some tuning parameters in the iterative part (DGESVJ) is
!>  left to the implementer on a particular machine.
!>     The rank revealing QR factorization (in this code: DGEQP3) should be
!>  implemented as in [3]. We have a new version of DGEQP3 under development
!>  that is more robust than the current one in LAPACK, with a cleaner cut in
!>  rank deficient cases. It will be available in the SIGMA library [4].
!>  If M is much larger than N, it is obvious that the initial QRF with
!>  column pivoting can be preprocessed by the QRF without pivoting. That
!>  well known trick is not used in DGEJSV because in some cases heavy row
!>  weighting can be treated with complete pivoting. The overhead in cases
!>  M much larger than N is then only due to pivoting, but the benefits in
!>  terms of accuracy have prevailed. The implementer/user can incorporate
!>  this extra QRF step easily. The implementer can also improve data movement
!>  (matrix transpose, matrix copy, matrix transposed copy) - this
!>  implementation of DGEJSV uses only the simplest, naive data movement.
!> 
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
!>
!> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
!>     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
!>     LAPACK Working note 169.
!> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
!>     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
!>     LAPACK Working note 170.
!> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
!>     factorization software - a case study.
!>     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
!>     LAPACK Working note 176.
!> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
!>     QSVD, (H,K)-SVD computations.
!>     Department of Mathematics, University of Zagreb, 2008.
!> 
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 471 of file dgejsv.f.

474*
475* -- LAPACK computational routine --
476* -- LAPACK is a software package provided by Univ. of Tennessee, --
477* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
478*
479* .. Scalar Arguments ..
480 IMPLICIT NONE
481 INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
482* ..
483* .. Array Arguments ..
484 DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
485 $ WORK( LWORK )
486 INTEGER IWORK( * )
487 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
488* ..
489*
490* ===========================================================================
491*
492* .. Local Parameters ..
493 DOUBLE PRECISION ZERO, ONE
494 parameter( zero = 0.0d0, one = 1.0d0 )
495* ..
496* .. Local Scalars ..
497 DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
498 $ CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
499 $ SCONDA, SFMIN, SMALL, TEMP1, USCAL1, USCAL2, XSC
500 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
501 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
502 $ L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN,
503 $ NOSCAL, ROWPIV, RSVEC, TRANSP
504* ..
505* .. Intrinsic Functions ..
506 INTRINSIC dabs, dlog, max, min, dble, idnint, dsign, dsqrt
507* ..
508* .. External Functions ..
509 DOUBLE PRECISION DLAMCH, DNRM2
510 INTEGER IDAMAX
511 LOGICAL LSAME
512 EXTERNAL idamax, lsame, dlamch, dnrm2
513* ..
514* .. External Subroutines ..
515 EXTERNAL dcopy, dgelqf, dgeqp3, dgeqrf, dlacpy,
516 $ dlascl,
519*
520 EXTERNAL dgesvj
521* ..
522*
523* Test the input arguments
524*
525 lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
526 jracc = lsame( jobv, 'J' )
527 rsvec = lsame( jobv, 'V' ) .OR. jracc
528 rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
529 l2rank = lsame( joba, 'R' )
530 l2aber = lsame( joba, 'A' )
531 errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
532 l2tran = lsame( jobt, 'T' )
533 l2kill = lsame( jobr, 'R' )
534 defr = lsame( jobr, 'N' )
535 l2pert = lsame( jobp, 'P' )
536*
537 IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
538 $ errest .OR. lsame( joba, 'C' ) )) THEN
539 info = - 1
540 ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
541 $ lsame( jobu, 'W' )) ) THEN
542 info = - 2
543 ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
544 $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
545 info = - 3
546 ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
547 info = - 4
548 ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
549 info = - 5
550 ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
551 info = - 6
552 ELSE IF ( m .LT. 0 ) THEN
553 info = - 7
554 ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
555 info = - 8
556 ELSE IF ( lda .LT. m ) THEN
557 info = - 10
558 ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
559 info = - 13
560 ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
561 info = - 15
562 ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
563 & (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
564 & (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
565 & (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
566 & (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
567 & .OR.
568 & (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
569 & .OR.
570 & (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
571 & (lwork.LT.max(2*m+n,6*n+2*n*n)))
572 & .OR. (lsvec .AND. rsvec .AND. jracc .AND.
573 & lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
574 & THEN
575 info = - 17
576 ELSE
577* #:)
578 info = 0
579 END IF
580*
581 IF ( info .NE. 0 ) THEN
582* #:(
583 CALL xerbla( 'DGEJSV', - info )
584 RETURN
585 END IF
586*
587* Quick return for void matrix (Y3K safe)
588* #:)
589 IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
590 iwork(1:3) = 0
591 work(1:7) = 0
592 RETURN
593 ENDIF
594*
595* Determine whether the matrix U should be M x N or M x M
596*
597 IF ( lsvec ) THEN
598 n1 = n
599 IF ( lsame( jobu, 'F' ) ) n1 = m
600 END IF
601*
602* Set numerical parameters
603*
604*! NOTE: Make sure DLAMCH() does not fail on the target architecture.
605*
606 epsln = dlamch('Epsilon')
607 sfmin = dlamch('SafeMinimum')
608 small = sfmin / epsln
609 big = dlamch('O')
610* BIG = ONE / SFMIN
611*
612* Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
613*
614*(!) If necessary, scale SVA() to protect the largest norm from
615* overflow. It is possible that this scaling pushes the smallest
616* column norm left from the underflow threshold (extreme case).
617*
618 scalem = one / dsqrt(dble(m)*dble(n))
619 noscal = .true.
620 goscal = .true.
621 DO 1874 p = 1, n
622 aapp = zero
623 aaqq = one
624 CALL dlassq( m, a(1,p), 1, aapp, aaqq )
625 IF ( aapp .GT. big ) THEN
626 info = - 9
627 CALL xerbla( 'DGEJSV', -info )
628 RETURN
629 END IF
630 aaqq = dsqrt(aaqq)
631 IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
632 sva(p) = aapp * aaqq
633 ELSE
634 noscal = .false.
635 sva(p) = aapp * ( aaqq * scalem )
636 IF ( goscal ) THEN
637 goscal = .false.
638 CALL dscal( p-1, scalem, sva, 1 )
639 END IF
640 END IF
641 1874 CONTINUE
642*
643 IF ( noscal ) scalem = one
644*
645 aapp = zero
646 aaqq = big
647 DO 4781 p = 1, n
648 aapp = max( aapp, sva(p) )
649 IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
650 4781 CONTINUE
651*
652* Quick return for zero M x N matrix
653* #:)
654 IF ( aapp .EQ. zero ) THEN
655 IF ( lsvec ) CALL dlaset( 'G', m, n1, zero, one, u, ldu )
656 IF ( rsvec ) CALL dlaset( 'G', n, n, zero, one, v, ldv )
657 work(1) = one
658 work(2) = one
659 IF ( errest ) work(3) = one
660 IF ( lsvec .AND. rsvec ) THEN
661 work(4) = one
662 work(5) = one
663 END IF
664 IF ( l2tran ) THEN
665 work(6) = zero
666 work(7) = zero
667 END IF
668 iwork(1) = 0
669 iwork(2) = 0
670 iwork(3) = 0
671 RETURN
672 END IF
673*
674* Issue warning if denormalized column norms detected. Override the
675* high relative accuracy request. Issue licence to kill columns
676* (set them to zero) whose norm is less than sigma_max / BIG (roughly).
677* #:(
678 warning = 0
679 IF ( aaqq .LE. sfmin ) THEN
680 l2rank = .true.
681 l2kill = .true.
682 warning = 1
683 END IF
684*
685* Quick return for one-column matrix
686* #:)
687 IF ( n .EQ. 1 ) THEN
688*
689 IF ( lsvec ) THEN
690 CALL dlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
691 CALL dlacpy( 'A', m, 1, a, lda, u, ldu )
692* computing all M left singular vectors of the M x 1 matrix
693 IF ( n1 .NE. n ) THEN
694 CALL dgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,
695 $ ierr )
696 CALL dorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,
697 $ ierr )
698 CALL dcopy( m, a(1,1), 1, u(1,1), 1 )
699 END IF
700 END IF
701 IF ( rsvec ) THEN
702 v(1,1) = one
703 END IF
704 IF ( sva(1) .LT. (big*scalem) ) THEN
705 sva(1) = sva(1) / scalem
706 scalem = one
707 END IF
708 work(1) = one / scalem
709 work(2) = one
710 IF ( sva(1) .NE. zero ) THEN
711 iwork(1) = 1
712 IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
713 iwork(2) = 1
714 ELSE
715 iwork(2) = 0
716 END IF
717 ELSE
718 iwork(1) = 0
719 iwork(2) = 0
720 END IF
721 iwork(3) = 0
722 IF ( errest ) work(3) = one
723 IF ( lsvec .AND. rsvec ) THEN
724 work(4) = one
725 work(5) = one
726 END IF
727 IF ( l2tran ) THEN
728 work(6) = zero
729 work(7) = zero
730 END IF
731 RETURN
732*
733 END IF
734*
735 transp = .false.
736 l2tran = l2tran .AND. ( m .EQ. n )
737*
738 aatmax = -one
739 aatmin = big
740 IF ( rowpiv .OR. l2tran ) THEN
741*
742* Compute the row norms, needed to determine row pivoting sequence
743* (in the case of heavily row weighted A, row pivoting is strongly
744* advised) and to collect information needed to compare the
745* structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
746*
747 IF ( l2tran ) THEN
748 DO 1950 p = 1, m
749 xsc = zero
750 temp1 = one
751 CALL dlassq( n, a(p,1), lda, xsc, temp1 )
752* DLASSQ gets both the ell_2 and the ell_infinity norm
753* in one pass through the vector
754 work(m+n+p) = xsc * scalem
755 work(n+p) = xsc * (scalem*dsqrt(temp1))
756 aatmax = max( aatmax, work(n+p) )
757 IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
758 1950 CONTINUE
759 ELSE
760 DO 1904 p = 1, m
761 work(m+n+p) = scalem*dabs( a(p,idamax(n,a(p,1),lda)) )
762 aatmax = max( aatmax, work(m+n+p) )
763 aatmin = min( aatmin, work(m+n+p) )
764 1904 CONTINUE
765 END IF
766*
767 END IF
768*
769* For square matrix A try to determine whether A^t would be better
770* input for the preconditioned Jacobi SVD, with faster convergence.
771* The decision is based on an O(N) function of the vector of column
772* and row norms of A, based on the Shannon entropy. This should give
773* the right choice in most cases when the difference actually matters.
774* It may fail and pick the slower converging side.
775*
776 entra = zero
777 entrat = zero
778 IF ( l2tran ) THEN
779*
780 xsc = zero
781 temp1 = one
782 CALL dlassq( n, sva, 1, xsc, temp1 )
783 temp1 = one / temp1
784*
785 entra = zero
786 DO 1113 p = 1, n
787 big1 = ( ( sva(p) / xsc )**2 ) * temp1
788 IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
789 1113 CONTINUE
790 entra = - entra / dlog(dble(n))
791*
792* Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
793* It is derived from the diagonal of A^t * A. Do the same with the
794* diagonal of A * A^t, compute the entropy of the corresponding
795* probability distribution. Note that A * A^t and A^t * A have the
796* same trace.
797*
798 entrat = zero
799 DO 1114 p = n+1, n+m
800 big1 = ( ( work(p) / xsc )**2 ) * temp1
801 IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
802 1114 CONTINUE
803 entrat = - entrat / dlog(dble(m))
804*
805* Analyze the entropies and decide A or A^t. Smaller entropy
806* usually means better input for the algorithm.
807*
808 transp = ( entrat .LT. entra )
809*
810* If A^t is better than A, transpose A.
811*
812 IF ( transp ) THEN
813* In an optimal implementation, this trivial transpose
814* should be replaced with faster transpose.
815 DO 1115 p = 1, n - 1
816 DO 1116 q = p + 1, n
817 temp1 = a(q,p)
818 a(q,p) = a(p,q)
819 a(p,q) = temp1
820 1116 CONTINUE
821 1115 CONTINUE
822 DO 1117 p = 1, n
823 work(m+n+p) = sva(p)
824 sva(p) = work(n+p)
825 1117 CONTINUE
826 temp1 = aapp
827 aapp = aatmax
828 aatmax = temp1
829 temp1 = aaqq
830 aaqq = aatmin
831 aatmin = temp1
832 kill = lsvec
833 lsvec = rsvec
834 rsvec = kill
835 IF ( lsvec ) n1 = n
836*
837 rowpiv = .true.
838 END IF
839*
840 END IF
841* END IF L2TRAN
842*
843* Scale the matrix so that its maximal singular value remains less
844* than DSQRT(BIG) -- the matrix is scaled so that its maximal column
845* has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
846* DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
847* BLAS routines that, in some implementations, are not capable of
848* working in the full interval [SFMIN,BIG] and that they may provoke
849* overflows in the intermediate results. If the singular values spread
850* from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
851* one should use DGESVJ instead of DGEJSV.
852*
853 big1 = dsqrt( big )
854 temp1 = dsqrt( big / dble(n) )
855*
856 CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
857 IF ( aaqq .GT. (aapp * sfmin) ) THEN
858 aaqq = ( aaqq / aapp ) * temp1
859 ELSE
860 aaqq = ( aaqq * temp1 ) / aapp
861 END IF
862 temp1 = temp1 * scalem
863 CALL dlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
864*
865* To undo scaling at the end of this procedure, multiply the
866* computed singular values with USCAL2 / USCAL1.
867*
868 uscal1 = temp1
869 uscal2 = aapp
870*
871 IF ( l2kill ) THEN
872* L2KILL enforces computation of nonzero singular values in
873* the restricted range of condition number of the initial A,
874* sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
875 xsc = dsqrt( sfmin )
876 ELSE
877 xsc = small
878*
879* Now, if the condition number of A is too big,
880* sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
881* as a precaution measure, the full SVD is computed using DGESVJ
882* with accumulated Jacobi rotations. This provides numerically
883* more robust computation, at the cost of slightly increased run
884* time. Depending on the concrete implementation of BLAS and LAPACK
885* (i.e. how they behave in presence of extreme ill-conditioning) the
886* implementor may decide to remove this switch.
887 IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
888 jracc = .true.
889 END IF
890*
891 END IF
892 IF ( aaqq .LT. xsc ) THEN
893 DO 700 p = 1, n
894 IF ( sva(p) .LT. xsc ) THEN
895 CALL dlaset( 'A', m, 1, zero, zero, a(1,p), lda )
896 sva(p) = zero
897 END IF
898 700 CONTINUE
899 END IF
900*
901* Preconditioning using QR factorization with pivoting
902*
903 IF ( rowpiv ) THEN
904* Optional row permutation (Bjoerck row pivoting):
905* A result by Cox and Higham shows that the Bjoerck's
906* row pivoting combined with standard column pivoting
907* has similar effect as Powell-Reid complete pivoting.
908* The ell-infinity norms of A are made nonincreasing.
909 DO 1952 p = 1, m - 1
910 q = idamax( m-p+1, work(m+n+p), 1 ) + p - 1
911 iwork(2*n+p) = q
912 IF ( p .NE. q ) THEN
913 temp1 = work(m+n+p)
914 work(m+n+p) = work(m+n+q)
915 work(m+n+q) = temp1
916 END IF
917 1952 CONTINUE
918 CALL dlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
919 END IF
920*
921* End of the preparation phase (scaling, optional sorting and
922* transposing, optional flushing of small columns).
923*
924* Preconditioning
925*
926* If the full SVD is needed, the right singular vectors are computed
927* from a matrix equation, and for that we need theoretical analysis
928* of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
929* In all other cases the first RR QRF can be chosen by other criteria
930* (eg speed by replacing global with restricted window pivoting, such
931* as in SGEQPX from TOMS # 782). Good results will be obtained using
932* SGEQPX with properly (!) chosen numerical parameters.
933* Any improvement of DGEQP3 improves overall performance of DGEJSV.
934*
935* A * P1 = Q1 * [ R1^t 0]^t:
936 DO 1963 p = 1, n
937* .. all columns are free columns
938 iwork(p) = 0
939 1963 CONTINUE
940 CALL dgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
941*
942* The upper triangular matrix R1 from the first QRF is inspected for
943* rank deficiency and possibilities for deflation, or possible
944* ill-conditioning. Depending on the user specified flag L2RANK,
945* the procedure explores possibilities to reduce the numerical
946* rank by inspecting the computed upper triangular factor. If
947* L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
948* A + dA, where ||dA|| <= f(M,N)*EPSLN.
949*
950 nr = 1
951 IF ( l2aber ) THEN
952* Standard absolute error bound suffices. All sigma_i with
953* sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
954* aggressive enforcement of lower numerical rank by introducing a
955* backward error of the order of N*EPSLN*||A||.
956 temp1 = dsqrt(dble(n))*epsln
957 DO 3001 p = 2, n
958 IF ( dabs(a(p,p)) .GE. (temp1*dabs(a(1,1))) ) THEN
959 nr = nr + 1
960 ELSE
961 GO TO 3002
962 END IF
963 3001 CONTINUE
964 3002 CONTINUE
965 ELSE IF ( l2rank ) THEN
966* .. similarly as above, only slightly more gentle (less aggressive).
967* Sudden drop on the diagonal of R1 is used as the criterion for
968* close-to-rank-deficient.
969 temp1 = dsqrt(sfmin)
970 DO 3401 p = 2, n
971 IF ( ( dabs(a(p,p)) .LT. (epsln*dabs(a(p-1,p-1))) ) .OR.
972 $ ( dabs(a(p,p)) .LT. small ) .OR.
973 $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3402
974 nr = nr + 1
975 3401 CONTINUE
976 3402 CONTINUE
977*
978 ELSE
979* The goal is high relative accuracy. However, if the matrix
980* has high scaled condition number the relative accuracy is in
981* general not feasible. Later on, a condition number estimator
982* will be deployed to estimate the scaled condition number.
983* Here we just remove the underflowed part of the triangular
984* factor. This prevents the situation in which the code is
985* working hard to get the accuracy not warranted by the data.
986 temp1 = dsqrt(sfmin)
987 DO 3301 p = 2, n
988 IF ( ( dabs(a(p,p)) .LT. small ) .OR.
989 $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3302
990 nr = nr + 1
991 3301 CONTINUE
992 3302 CONTINUE
993*
994 END IF
995*
996 almort = .false.
997 IF ( nr .EQ. n ) THEN
998 maxprj = one
999 DO 3051 p = 2, n
1000 temp1 = dabs(a(p,p)) / sva(iwork(p))
1001 maxprj = min( maxprj, temp1 )
1002 3051 CONTINUE
1003 IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1004 END IF
1005*
1006*
1007 sconda = - one
1008 condr1 = - one
1009 condr2 = - one
1010*
1011 IF ( errest ) THEN
1012 IF ( n .EQ. nr ) THEN
1013 IF ( rsvec ) THEN
1014* .. V is available as workspace
1015 CALL dlacpy( 'U', n, n, a, lda, v, ldv )
1016 DO 3053 p = 1, n
1017 temp1 = sva(iwork(p))
1018 CALL dscal( p, one/temp1, v(1,p), 1 )
1019 3053 CONTINUE
1020 CALL dpocon( 'U', n, v, ldv, one, temp1,
1021 $ work(n+1), iwork(2*n+m+1), ierr )
1022 ELSE IF ( lsvec ) THEN
1023* .. U is available as workspace
1024 CALL dlacpy( 'U', n, n, a, lda, u, ldu )
1025 DO 3054 p = 1, n
1026 temp1 = sva(iwork(p))
1027 CALL dscal( p, one/temp1, u(1,p), 1 )
1028 3054 CONTINUE
1029 CALL dpocon( 'U', n, u, ldu, one, temp1,
1030 $ work(n+1), iwork(2*n+m+1), ierr )
1031 ELSE
1032 CALL dlacpy( 'U', n, n, a, lda, work(n+1), n )
1033 DO 3052 p = 1, n
1034 temp1 = sva(iwork(p))
1035 CALL dscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1036 3052 CONTINUE
1037* .. the columns of R are scaled to have unit Euclidean lengths.
1038 CALL dpocon( 'U', n, work(n+1), n, one, temp1,
1039 $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1040 END IF
1041 sconda = one / dsqrt(temp1)
1042* SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
1043* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1044 ELSE
1045 sconda = - one
1046 END IF
1047 END IF
1048*
1049 l2pert = l2pert .AND. ( dabs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1050* If there is no violent scaling, artificial perturbation is not needed.
1051*
1052* Phase 3:
1053*
1054 IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1055*
1056* Singular Values only
1057*
1058* .. transpose A(1:NR,1:N)
1059 DO 1946 p = 1, min( n-1, nr )
1060 CALL dcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1061 1946 CONTINUE
1062*
1063* The following two DO-loops introduce small relative perturbation
1064* into the strict upper triangle of the lower triangular matrix.
1065* Small entries below the main diagonal are also changed.
1066* This modification is useful if the computing environment does not
1067* provide/allow FLUSH TO ZERO underflow, for it prevents many
1068* annoying denormalized numbers in case of strongly scaled matrices.
1069* The perturbation is structured so that it does not introduce any
1070* new perturbation of the singular values, and it does not destroy
1071* the job done by the preconditioner.
1072* The licence for this perturbation is in the variable L2PERT, which
1073* should be .FALSE. if FLUSH TO ZERO underflow is active.
1074*
1075 IF ( .NOT. almort ) THEN
1076*
1077 IF ( l2pert ) THEN
1078* XSC = DSQRT(SMALL)
1079 xsc = epsln / dble(n)
1080 DO 4947 q = 1, nr
1081 temp1 = xsc*dabs(a(q,q))
1082 DO 4949 p = 1, n
1083 IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1084 $ .OR. ( p .LT. q ) )
1085 $ a(p,q) = dsign( temp1, a(p,q) )
1086 4949 CONTINUE
1087 4947 CONTINUE
1088 ELSE
1089 CALL dlaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1090 END IF
1091*
1092* .. second preconditioning using the QR factorization
1093*
1094 CALL dgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1095*
1096* .. and transpose upper to lower triangular
1097 DO 1948 p = 1, nr - 1
1098 CALL dcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1099 1948 CONTINUE
1100*
1101 END IF
1102*
1103* Row-cyclic Jacobi SVD algorithm with column pivoting
1104*
1105* .. again some perturbation (a "background noise") is added
1106* to drown denormals
1107 IF ( l2pert ) THEN
1108* XSC = DSQRT(SMALL)
1109 xsc = epsln / dble(n)
1110 DO 1947 q = 1, nr
1111 temp1 = xsc*dabs(a(q,q))
1112 DO 1949 p = 1, nr
1113 IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1114 $ .OR. ( p .LT. q ) )
1115 $ a(p,q) = dsign( temp1, a(p,q) )
1116 1949 CONTINUE
1117 1947 CONTINUE
1118 ELSE
1119 CALL dlaset( 'U', nr-1, nr-1, zero, zero, a(1,2),
1120 $ lda )
1121 END IF
1122*
1123* .. and one-sided Jacobi rotations are started on a lower
1124* triangular matrix (plus perturbation which is ignored in
1125* the part which destroys triangular form (confusing?!))
1126*
1127 CALL dgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1128 $ n, v, ldv, work, lwork, info )
1129*
1130 scalem = work(1)
1131 numrank = idnint(work(2))
1132*
1133*
1134 ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1135*
1136* -> Singular Values and Right Singular Vectors <-
1137*
1138 IF ( almort ) THEN
1139*
1140* .. in this case NR equals N
1141 DO 1998 p = 1, nr
1142 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1143 1998 CONTINUE
1144 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2),
1145 $ ldv )
1146*
1147 CALL dgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1148 $ work, lwork, info )
1149 scalem = work(1)
1150 numrank = idnint(work(2))
1151
1152 ELSE
1153*
1154* .. two more QR factorizations ( one QRF is not enough, two require
1155* accumulated product of Jacobi rotations, three are perfect )
1156*
1157 CALL dlaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1),
1158 $ lda )
1159 CALL dgelqf( nr, n, a, lda, work, work(n+1), lwork-n,
1160 $ ierr)
1161 CALL dlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1162 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2),
1163 $ ldv )
1164 CALL dgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1165 $ lwork-2*n, ierr )
1166 DO 8998 p = 1, nr
1167 CALL dcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1168 8998 CONTINUE
1169 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2),
1170 $ ldv )
1171*
1172 CALL dgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1173 $ ldu, work(n+1), lwork, info )
1174 scalem = work(n+1)
1175 numrank = idnint(work(n+2))
1176 IF ( nr .LT. n ) THEN
1177 CALL dlaset( 'A',n-nr, nr, zero,zero, v(nr+1,1),
1178 $ ldv )
1179 CALL dlaset( 'A',nr, n-nr, zero,zero, v(1,nr+1),
1180 $ ldv )
1181 CALL dlaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1),
1182 $ ldv )
1183 END IF
1184*
1185 CALL dormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1186 $ v, ldv, work(n+1), lwork-n, ierr )
1187*
1188 END IF
1189*
1190 DO 8991 p = 1, n
1191 CALL dcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1192 8991 CONTINUE
1193 CALL dlacpy( 'All', n, n, a, lda, v, ldv )
1194*
1195 IF ( transp ) THEN
1196 CALL dlacpy( 'All', n, n, v, ldv, u, ldu )
1197 END IF
1198*
1199 ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1200*
1201* .. Singular Values and Left Singular Vectors ..
1202*
1203* .. second preconditioning step to avoid need to accumulate
1204* Jacobi rotations in the Jacobi iterations.
1205 DO 1965 p = 1, nr
1206 CALL dcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1207 1965 CONTINUE
1208 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1209*
1210 CALL dgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1211 $ lwork-2*n, ierr )
1212*
1213 DO 1967 p = 1, nr - 1
1214 CALL dcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1215 1967 CONTINUE
1216 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1217*
1218 CALL dgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1219 $ lda, work(n+1), lwork-n, info )
1220 scalem = work(n+1)
1221 numrank = idnint(work(n+2))
1222*
1223 IF ( nr .LT. m ) THEN
1224 CALL dlaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1225 IF ( nr .LT. n1 ) THEN
1226 CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),
1227 $ ldu )
1228 CALL dlaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),
1229 $ ldu )
1230 END IF
1231 END IF
1232*
1233 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1234 $ ldu, work(n+1), lwork-n, ierr )
1235*
1236 IF ( rowpiv )
1237 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1238*
1239 DO 1974 p = 1, n1
1240 xsc = one / dnrm2( m, u(1,p), 1 )
1241 CALL dscal( m, xsc, u(1,p), 1 )
1242 1974 CONTINUE
1243*
1244 IF ( transp ) THEN
1245 CALL dlacpy( 'All', n, n, u, ldu, v, ldv )
1246 END IF
1247*
1248 ELSE
1249*
1250* .. Full SVD ..
1251*
1252 IF ( .NOT. jracc ) THEN
1253*
1254 IF ( .NOT. almort ) THEN
1255*
1256* Second Preconditioning Step (QRF [with pivoting])
1257* Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1258* equivalent to an LQF CALL. Since in many libraries the QRF
1259* seems to be better optimized than the LQF, we do explicit
1260* transpose and use the QRF. This is subject to changes in an
1261* optimized implementation of DGEJSV.
1262*
1263 DO 1968 p = 1, nr
1264 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1265 1968 CONTINUE
1266*
1267* .. the following two loops perturb small entries to avoid
1268* denormals in the second QR factorization, where they are
1269* as good as zeros. This is done to avoid painfully slow
1270* computation with denormals. The relative size of the perturbation
1271* is a parameter that can be changed by the implementer.
1272* This perturbation device will be obsolete on machines with
1273* properly implemented arithmetic.
1274* To switch it off, set L2PERT=.FALSE. To remove it from the
1275* code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1276* The following two loops should be blocked and fused with the
1277* transposed copy above.
1278*
1279 IF ( l2pert ) THEN
1280 xsc = dsqrt(small)
1281 DO 2969 q = 1, nr
1282 temp1 = xsc*dabs( v(q,q) )
1283 DO 2968 p = 1, n
1284 IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1285 $ .OR. ( p .LT. q ) )
1286 $ v(p,q) = dsign( temp1, v(p,q) )
1287 IF ( p .LT. q ) v(p,q) = - v(p,q)
1288 2968 CONTINUE
1289 2969 CONTINUE
1290 ELSE
1291 CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2),
1292 $ ldv )
1293 END IF
1294*
1295* Estimate the row scaled condition number of R1
1296* (If R1 is rectangular, N > NR, then the condition number
1297* of the leading NR x NR submatrix is estimated.)
1298*
1299 CALL dlacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1300 DO 3950 p = 1, nr
1301 temp1 = dnrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1302 CALL dscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1303 3950 CONTINUE
1304 CALL dpocon('Lower',nr,work(2*n+1),nr,one,temp1,
1305 $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1306 condr1 = one / dsqrt(temp1)
1307* .. here need a second opinion on the condition number
1308* .. then assume worst case scenario
1309* R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1310* more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1311*
1312 cond_ok = dsqrt(dble(nr))
1313*[TP] COND_OK is a tuning parameter.
1314
1315 IF ( condr1 .LT. cond_ok ) THEN
1316* .. the second QRF without pivoting. Note: in an optimized
1317* implementation, this QRF should be implemented as the QRF
1318* of a lower triangular matrix.
1319* R1^t = Q2 * R2
1320 CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1321 $ lwork-2*n, ierr )
1322*
1323 IF ( l2pert ) THEN
1324 xsc = dsqrt(small)/epsln
1325 DO 3959 p = 2, nr
1326 DO 3958 q = 1, p - 1
1327 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1328 IF ( dabs(v(q,p)) .LE. temp1 )
1329 $ v(q,p) = dsign( temp1, v(q,p) )
1330 3958 CONTINUE
1331 3959 CONTINUE
1332 END IF
1333*
1334 IF ( nr .NE. n )
1335 $ CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1336* .. save ...
1337*
1338* .. this transposed copy should be better than naive
1339 DO 1969 p = 1, nr - 1
1340 CALL dcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1341 1969 CONTINUE
1342*
1343 condr2 = condr1
1344*
1345 ELSE
1346*
1347* .. ill-conditioned case: second QRF with pivoting
1348* Note that windowed pivoting would be equally good
1349* numerically, and more run-time efficient. So, in
1350* an optimal implementation, the next call to DGEQP3
1351* should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1352* with properly (carefully) chosen parameters.
1353*
1354* R1^t * P2 = Q2 * R2
1355 DO 3003 p = 1, nr
1356 iwork(n+p) = 0
1357 3003 CONTINUE
1358 CALL dgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1359 $ work(2*n+1), lwork-2*n, ierr )
1360** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1361** $ LWORK-2*N, IERR )
1362 IF ( l2pert ) THEN
1363 xsc = dsqrt(small)
1364 DO 3969 p = 2, nr
1365 DO 3968 q = 1, p - 1
1366 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1367 IF ( dabs(v(q,p)) .LE. temp1 )
1368 $ v(q,p) = dsign( temp1, v(q,p) )
1369 3968 CONTINUE
1370 3969 CONTINUE
1371 END IF
1372*
1373 CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1374*
1375 IF ( l2pert ) THEN
1376 xsc = dsqrt(small)
1377 DO 8970 p = 2, nr
1378 DO 8971 q = 1, p - 1
1379 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1380 v(p,q) = - dsign( temp1, v(q,p) )
1381 8971 CONTINUE
1382 8970 CONTINUE
1383 ELSE
1384 CALL dlaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1385 END IF
1386* Now, compute R2 = L3 * Q3, the LQ factorization.
1387 CALL dgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1388 $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1389* .. and estimate the condition number
1390 CALL dlacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1391 DO 4950 p = 1, nr
1392 temp1 = dnrm2( p, work(2*n+n*nr+nr+p), nr )
1393 CALL dscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1394 4950 CONTINUE
1395 CALL dpocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1396 $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1397 condr2 = one / dsqrt(temp1)
1398*
1399 IF ( condr2 .GE. cond_ok ) THEN
1400* .. save the Householder vectors used for Q3
1401* (this overwrites the copy of R2, as it will not be
1402* needed in this branch, but it does not overwrite the
1403* Huseholder vectors of Q2.).
1404 CALL dlacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1405* .. and the rest of the information on Q3 is in
1406* WORK(2*N+N*NR+1:2*N+N*NR+N)
1407 END IF
1408*
1409 END IF
1410*
1411 IF ( l2pert ) THEN
1412 xsc = dsqrt(small)
1413 DO 4968 q = 2, nr
1414 temp1 = xsc * v(q,q)
1415 DO 4969 p = 1, q - 1
1416* V(p,q) = - DSIGN( TEMP1, V(q,p) )
1417 v(p,q) = - dsign( temp1, v(p,q) )
1418 4969 CONTINUE
1419 4968 CONTINUE
1420 ELSE
1421 CALL dlaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1422 END IF
1423*
1424* Second preconditioning finished; continue with Jacobi SVD
1425* The input matrix is lower triangular.
1426*
1427* Recover the right singular vectors as solution of a well
1428* conditioned triangular matrix equation.
1429*
1430 IF ( condr1 .LT. cond_ok ) THEN
1431*
1432 CALL dgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1433 $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1434 scalem = work(2*n+n*nr+nr+1)
1435 numrank = idnint(work(2*n+n*nr+nr+2))
1436 DO 3970 p = 1, nr
1437 CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1438 CALL dscal( nr, sva(p), v(1,p), 1 )
1439 3970 CONTINUE
1440
1441* .. pick the right matrix equation and solve it
1442*
1443 IF ( nr .EQ. n ) THEN
1444* :)) .. best case, R1 is inverted. The solution of this matrix
1445* equation is Q2*V2 = the product of the Jacobi rotations
1446* used in DGESVJ, premultiplied with the orthogonal matrix
1447* from the second QR factorization.
1448 CALL dtrsm( 'L','U','N','N', nr,nr,one, a,lda, v,
1449 $ ldv )
1450 ELSE
1451* .. R1 is well conditioned, but non-square. Transpose(R2)
1452* is inverted to get the product of the Jacobi rotations
1453* used in DGESVJ. The Q-factor from the second QR
1454* factorization is then built in explicitly.
1455 CALL dtrsm('L','U','T','N',nr,nr,one,work(2*n+1),
1456 $ n,v,ldv)
1457 IF ( nr .LT. n ) THEN
1458 CALL dlaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1459 CALL dlaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1460 CALL dlaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),
1461 $ ldv)
1462 END IF
1463 CALL dormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1464 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1465 END IF
1466*
1467 ELSE IF ( condr2 .LT. cond_ok ) THEN
1468*
1469* :) .. the input matrix A is very likely a relative of
1470* the Kahan matrix :)
1471* The matrix R2 is inverted. The solution of the matrix equation
1472* is Q3^T*V3 = the product of the Jacobi rotations (applied to
1473* the lower triangular L3 from the LQ factorization of
1474* R2=L3*Q3), pre-multiplied with the transposed Q3.
1475 CALL dgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr,
1476 $ u,
1477 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1478 scalem = work(2*n+n*nr+nr+1)
1479 numrank = idnint(work(2*n+n*nr+nr+2))
1480 DO 3870 p = 1, nr
1481 CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1482 CALL dscal( nr, sva(p), u(1,p), 1 )
1483 3870 CONTINUE
1484 CALL dtrsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,
1485 $ ldu)
1486* .. apply the permutation from the second QR factorization
1487 DO 873 q = 1, nr
1488 DO 872 p = 1, nr
1489 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1490 872 CONTINUE
1491 DO 874 p = 1, nr
1492 u(p,q) = work(2*n+n*nr+nr+p)
1493 874 CONTINUE
1494 873 CONTINUE
1495 IF ( nr .LT. n ) THEN
1496 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1497 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1498 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),
1499 $ ldv )
1500 END IF
1501 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1502 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1503 ELSE
1504* Last line of defense.
1505* #:( This is a rather pathological case: no scaled condition
1506* improvement after two pivoted QR factorizations. Other
1507* possibility is that the rank revealing QR factorization
1508* or the condition estimator has failed, or the COND_OK
1509* is set very close to ONE (which is unnecessary). Normally,
1510* this branch should never be executed, but in rare cases of
1511* failure of the RRQR or condition estimator, the last line of
1512* defense ensures that DGEJSV completes the task.
1513* Compute the full SVD of L3 using DGESVJ with explicit
1514* accumulation of Jacobi rotations.
1515 CALL dgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr,
1516 $ u,
1517 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1518 scalem = work(2*n+n*nr+nr+1)
1519 numrank = idnint(work(2*n+n*nr+nr+2))
1520 IF ( nr .LT. n ) THEN
1521 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1522 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1523 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),
1524 $ ldv )
1525 END IF
1526 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1527 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1528*
1529 CALL dormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1530 $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1531 $ lwork-2*n-n*nr-nr, ierr )
1532 DO 773 q = 1, nr
1533 DO 772 p = 1, nr
1534 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1535 772 CONTINUE
1536 DO 774 p = 1, nr
1537 u(p,q) = work(2*n+n*nr+nr+p)
1538 774 CONTINUE
1539 773 CONTINUE
1540*
1541 END IF
1542*
1543* Permute the rows of V using the (column) permutation from the
1544* first QRF. Also, scale the columns to make them unit in
1545* Euclidean norm. This applies to all cases.
1546*
1547 temp1 = dsqrt(dble(n)) * epsln
1548 DO 1972 q = 1, n
1549 DO 972 p = 1, n
1550 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1551 972 CONTINUE
1552 DO 973 p = 1, n
1553 v(p,q) = work(2*n+n*nr+nr+p)
1554 973 CONTINUE
1555 xsc = one / dnrm2( n, v(1,q), 1 )
1556 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1557 $ CALL dscal( n, xsc, v(1,q), 1 )
1558 1972 CONTINUE
1559* At this moment, V contains the right singular vectors of A.
1560* Next, assemble the left singular vector matrix U (M x N).
1561 IF ( nr .LT. m ) THEN
1562 CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1),
1563 $ ldu )
1564 IF ( nr .LT. n1 ) THEN
1565 CALL dlaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1566 CALL dlaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),
1567 $ ldu)
1568 END IF
1569 END IF
1570*
1571* The Q matrix from the first QRF is built into the left singular
1572* matrix U. This applies to all cases.
1573*
1574 CALL dormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1575 $ ldu, work(n+1), lwork-n, ierr )
1576
1577* The columns of U are normalized. The cost is O(M*N) flops.
1578 temp1 = dsqrt(dble(m)) * epsln
1579 DO 1973 p = 1, nr
1580 xsc = one / dnrm2( m, u(1,p), 1 )
1581 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1582 $ CALL dscal( m, xsc, u(1,p), 1 )
1583 1973 CONTINUE
1584*
1585* If the initial QRF is computed with row pivoting, the left
1586* singular vectors must be adjusted.
1587*
1588 IF ( rowpiv )
1589 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1590*
1591 ELSE
1592*
1593* .. the initial matrix A has almost orthogonal columns and
1594* the second QRF is not needed
1595*
1596 CALL dlacpy( 'Upper', n, n, a, lda, work(n+1), n )
1597 IF ( l2pert ) THEN
1598 xsc = dsqrt(small)
1599 DO 5970 p = 2, n
1600 temp1 = xsc * work( n + (p-1)*n + p )
1601 DO 5971 q = 1, p - 1
1602 work(n+(q-1)*n+p)=-dsign(temp1,work(n+(p-1)*n+q))
1603 5971 CONTINUE
1604 5970 CONTINUE
1605 ELSE
1606 CALL dlaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1607 END IF
1608*
1609 CALL dgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1610 $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1611*
1612 scalem = work(n+n*n+1)
1613 numrank = idnint(work(n+n*n+2))
1614 DO 6970 p = 1, n
1615 CALL dcopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1616 CALL dscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1617 6970 CONTINUE
1618*
1619 CALL dtrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1620 $ one, a, lda, work(n+1), n )
1621 DO 6972 p = 1, n
1622 CALL dcopy( n, work(n+p), n, v(iwork(p),1), ldv )
1623 6972 CONTINUE
1624 temp1 = dsqrt(dble(n))*epsln
1625 DO 6971 p = 1, n
1626 xsc = one / dnrm2( n, v(1,p), 1 )
1627 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1628 $ CALL dscal( n, xsc, v(1,p), 1 )
1629 6971 CONTINUE
1630*
1631* Assemble the left singular vector matrix U (M x N).
1632*
1633 IF ( n .LT. m ) THEN
1634 CALL dlaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1635 IF ( n .LT. n1 ) THEN
1636 CALL dlaset( 'A',n, n1-n, zero, zero, u(1,n+1),
1637 $ ldu )
1638 CALL dlaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),
1639 $ ldu )
1640 END IF
1641 END IF
1642 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1643 $ ldu, work(n+1), lwork-n, ierr )
1644 temp1 = dsqrt(dble(m))*epsln
1645 DO 6973 p = 1, n1
1646 xsc = one / dnrm2( m, u(1,p), 1 )
1647 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1648 $ CALL dscal( m, xsc, u(1,p), 1 )
1649 6973 CONTINUE
1650*
1651 IF ( rowpiv )
1652 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1653*
1654 END IF
1655*
1656* end of the >> almost orthogonal case << in the full SVD
1657*
1658 ELSE
1659*
1660* This branch deploys a preconditioned Jacobi SVD with explicitly
1661* accumulated rotations. It is included as optional, mainly for
1662* experimental purposes. It does perform well, and can also be used.
1663* In this implementation, this branch will be automatically activated
1664* if the condition number sigma_max(A) / sigma_min(A) is predicted
1665* to be greater than the overflow threshold. This is because the
1666* a posteriori computation of the singular vectors assumes robust
1667* implementation of BLAS and some LAPACK procedures, capable of working
1668* in presence of extreme values. Since that is not always the case, ...
1669*
1670 DO 7968 p = 1, nr
1671 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1672 7968 CONTINUE
1673*
1674 IF ( l2pert ) THEN
1675 xsc = dsqrt(small/epsln)
1676 DO 5969 q = 1, nr
1677 temp1 = xsc*dabs( v(q,q) )
1678 DO 5968 p = 1, n
1679 IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1680 $ .OR. ( p .LT. q ) )
1681 $ v(p,q) = dsign( temp1, v(p,q) )
1682 IF ( p .LT. q ) v(p,q) = - v(p,q)
1683 5968 CONTINUE
1684 5969 CONTINUE
1685 ELSE
1686 CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1687 END IF
1688
1689 CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1690 $ lwork-2*n, ierr )
1691 CALL dlacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1692*
1693 DO 7969 p = 1, nr
1694 CALL dcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1695 7969 CONTINUE
1696
1697 IF ( l2pert ) THEN
1698 xsc = dsqrt(small/epsln)
1699 DO 9970 q = 2, nr
1700 DO 9971 p = 1, q - 1
1701 temp1 = xsc * min(dabs(u(p,p)),dabs(u(q,q)))
1702 u(p,q) = - dsign( temp1, u(q,p) )
1703 9971 CONTINUE
1704 9970 CONTINUE
1705 ELSE
1706 CALL dlaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1707 END IF
1708
1709 CALL dgesvj( 'G', 'U', 'V', nr, nr, u, ldu, sva,
1710 $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1711 scalem = work(2*n+n*nr+1)
1712 numrank = idnint(work(2*n+n*nr+2))
1713
1714 IF ( nr .LT. n ) THEN
1715 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1716 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1717 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1718 END IF
1719
1720 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1721 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1722*
1723* Permute the rows of V using the (column) permutation from the
1724* first QRF. Also, scale the columns to make them unit in
1725* Euclidean norm. This applies to all cases.
1726*
1727 temp1 = dsqrt(dble(n)) * epsln
1728 DO 7972 q = 1, n
1729 DO 8972 p = 1, n
1730 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1731 8972 CONTINUE
1732 DO 8973 p = 1, n
1733 v(p,q) = work(2*n+n*nr+nr+p)
1734 8973 CONTINUE
1735 xsc = one / dnrm2( n, v(1,q), 1 )
1736 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1737 $ CALL dscal( n, xsc, v(1,q), 1 )
1738 7972 CONTINUE
1739*
1740* At this moment, V contains the right singular vectors of A.
1741* Next, assemble the left singular vector matrix U (M x N).
1742*
1743 IF ( nr .LT. m ) THEN
1744 CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1745 IF ( nr .LT. n1 ) THEN
1746 CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),
1747 $ ldu )
1748 CALL dlaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),
1749 $ ldu )
1750 END IF
1751 END IF
1752*
1753 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1754 $ ldu, work(n+1), lwork-n, ierr )
1755*
1756 IF ( rowpiv )
1757 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1758*
1759*
1760 END IF
1761 IF ( transp ) THEN
1762* .. swap U and V because the procedure worked on A^t
1763 DO 6974 p = 1, n
1764 CALL dswap( n, u(1,p), 1, v(1,p), 1 )
1765 6974 CONTINUE
1766 END IF
1767*
1768 END IF
1769* end of the full SVD
1770*
1771* Undo scaling, if necessary (and possible)
1772*
1773 IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1774 CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n,
1775 $ ierr )
1776 uscal1 = one
1777 uscal2 = one
1778 END IF
1779*
1780 IF ( nr .LT. n ) THEN
1781 DO 3004 p = nr+1, n
1782 sva(p) = zero
1783 3004 CONTINUE
1784 END IF
1785*
1786 work(1) = uscal2 * scalem
1787 work(2) = uscal1
1788 IF ( errest ) work(3) = sconda
1789 IF ( lsvec .AND. rsvec ) THEN
1790 work(4) = condr1
1791 work(5) = condr2
1792 END IF
1793 IF ( l2tran ) THEN
1794 work(6) = entra
1795 work(7) = entrat
1796 END IF
1797*
1798 iwork(1) = nr
1799 iwork(2) = numrank
1800 iwork(3) = warning
1801*
1802 RETURN
1803* ..
1804* .. END OF DGEJSV
1805* ..
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine dcopy(n, dx, incx, dy, incy)
DCOPY
Definition dcopy.f:82
subroutine dgelqf(m, n, a, lda, tau, work, lwork, info)
DGELQF
Definition dgelqf.f:142
subroutine dgeqp3(m, n, a, lda, jpvt, tau, work, lwork, info)
DGEQP3
Definition dgeqp3.f:149
subroutine dgeqrf(m, n, a, lda, tau, work, lwork, info)
DGEQRF
Definition dgeqrf.f:144
subroutine dgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
DGESVJ
Definition dgesvj.f:340
integer function idamax(n, dx, incx)
IDAMAX
Definition idamax.f:71
subroutine dlacpy(uplo, m, n, a, lda, b, ldb)
DLACPY copies all or part of one two-dimensional array to another.
Definition dlacpy.f:101
double precision function dlamch(cmach)
DLAMCH
Definition dlamch.f:69
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:142
subroutine dlaset(uplo, m, n, alpha, beta, a, lda)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition dlaset.f:108
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:122
subroutine dlaswp(n, a, lda, k1, k2, ipiv, incx)
DLASWP performs a series of row interchanges on a general rectangular matrix.
Definition dlaswp.f:113
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real(wp) function dnrm2(n, x, incx)
DNRM2
Definition dnrm2.f90:89
subroutine dpocon(uplo, n, a, lda, anorm, rcond, work, iwork, info)
DPOCON
Definition dpocon.f:119
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine dswap(n, dx, incx, dy, incy)
DSWAP
Definition dswap.f:82
subroutine dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
DTRSM
Definition dtrsm.f:181
subroutine dorgqr(m, n, k, a, lda, tau, work, lwork, info)
DORGQR
Definition dorgqr.f:126
subroutine dormlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
DORMLQ
Definition dormlq.f:165
subroutine dormqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
DORMQR
Definition dormqr.f:165
Here is the call graph for this function:
Here is the caller graph for this function: