LAPACK  3.6.1
LAPACK: Linear Algebra PACKage
subroutine zgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
complex*16, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
complex*16, dimension( ldu, * )  U,
integer  LDU,
complex*16, dimension( ldv, * )  V,
integer  LDV,
complex*16, dimension( lwork )  CWORK,
integer  LWORK,
double precision, dimension( * )  RWORK,
integer  LRWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

ZGEJSV

Download ZGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 ZGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^*,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
 [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.

Arguments:

Parameters
[in]JOBA
          JOBA is CHARACTER*1
         Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
              with well-conditioned B and arbitrary diagonal matrix D.
              The accuracy cannot be spoiled by COLUMN scaling. The
              accuracy of the computed output depends on the condition of
              B, and the procedure aims at the best theoretical accuracy.
              The relative error max_{i=1:N}|d sigma_i| / sigma_i is
              bounded by f(M,N)*epsilon* cond(B), independent of D.
              The input matrix is preprocessed with the QRF with column
              pivoting. This initial preprocessing and preconditioning by
              a rank revealing QR factorization is common for all values of
              JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
              condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
              D1, D2, and well-conditioned matrix C, this option gives
              higher accuracy than the 'C' option. If the structure of the
              input matrix is not known, and relative accuracy is
              desirable, then this option is advisable. The input matrix A
              is preprocessed with QR factorization with FULL (row and
              column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
              condition number of B, where A=D*B. If A has heavily weighted
              rows, then using this condition number gives too pessimistic
              error bound.
       = 'A': Small singular values are the noise and the matrix is treated
              as numerically rank defficient. The error in the computed
              singular values is bounded by f(m,n)*epsilon*||A||.
              The computed SVD A = U * S * V^* restores A up to
              f(m,n)*epsilon*||A||.
              This gives the procedure the licence to discard (set to zero)
              all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
              QR factorization is used do reveal (using triangular factor)
              a gap sigma_{r+1} < epsilon * sigma_r in which case the
              numerical RANK is declared to be r. The SVD is computed with
              absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
         Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
              of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
         Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
              are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
              computed as the product of Jacobi rotations. This option is
              allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
              of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
         Specifies the RANGE for the singular values. Issues the licence to
         set to zero small positive singular values if they are outside
         specified range. If A .NE. 0 is scaled so that the largest singular
         value of c*A is around SQRT(BIG), BIG=DLAMCH('O'), then JOBR issues
         the licence to kill columns of A whose norm in c*A is less than
         SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
         where SFMIN=DLAMCH('S'), EPSLN=DLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
              BLAS and QR factorizations and triangular solvers are
              implemented to work in that range. If the condition of A
              is greater than BIG, use ZGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
              (roughly, as described above). This option is recommended.
                                             ===========================
         For computing the singular values in the FULL range [SFMIN,BIG]
         use ZGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
         If the matrix is square then the procedure may determine to use
         transposed A if A^* seems to be better with respect to convergence.
         If the matrix is not square, JOBT is ignored. This is subject to
         changes in the future.
         The decision is based on two values of entropy over the adjoint
         orbit of A^* * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
         convergence of Jacobi process if A^* is taken as input. If A is
         replaced with A^*, then the row pivoting is included automatically.
       = 'N': do not speculate.
         This option can be used to compute only the singular values, or the
         full SVD (U, SIGMA and V). For only one set of singular vectors
         (U or V), the caller should provide both U and V, as one of the
         matrices is used as workspace if the matrix A is transposed.
         The implementer can easily remove this constraint and make the
         code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
         Issues the licence to introduce structured perturbations to drown
         denormalized numbers. This licence should be active if the
         denormals are poorly implemented, causing slow computation,
         especially in cases of fast convergence (!). For details see [1,2].
         For the sake of simplicity, this perturbations are included only
         when the full SVD or only the singular values are requested. The
         implementer/user can easily add the perturbation for the cases of
         computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is COMPLEX*16 array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is COMPLEX*16 array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^*. In that case, [V] is computed
                         in U as left singular vectors of A^* and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is COMPLEX*16 array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^*. In that case, [U] is computed
                         in V as right singular vectors of A^* and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]CWORK
          CWORK is COMPLEX*16 array, dimension at least LWORK.     
[in]LWORK
          LWORK is INTEGER
          Length of CWORK to confirm proper allocation of workspace.
          LWORK depends on the job:

          1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
               LWORK >= 2*N+1. This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= N + (N+1)*NB. Here NB is the optimal
               block size for ZGEQP3 and ZGEQRF.
               In general, optimal LWORK is computed as 
               LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF)).        
            1.2. .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G'). In this case, LWORK the minimal
               requirement is LWORK >= N*N + 3*N.
               ->> For optimal performance (blocked code) the optimal value 
               is LWORK >= max(N+(N+1)*NB, N*N+3*N).
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), 
                                                     N+N*N+LWORK(ZPOCON)).

          2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
             (JOBU.EQ.'N')
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance, LWORK >= max(N+(N+1)*NB, 3*N,2*N+N*NB),
               where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQF,
               ZUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZPOCON), N+LWORK(ZGESVJ),
                       N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).

          3. If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB),
               where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON),
                        2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). 
               
          4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
            4.1. if JOBV.EQ.'V'  
               the minimal requirement is LWORK >= 5*N+2*N*N. 
            4.2. if JOBV.EQ.'J' the minimal requirement is 
               LWORK >= 4*N+N*N.
            In both cases, the allocated CWORK can accommodate blocked runs
            of ZGEQP3, ZGEQRF, ZGELQF, ZUNMQR, ZUNMLQ.
[out]RWORK
          RWORK is DOUBLE PRECISION array, dimension at least LRWORK.
          On exit,
          RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
                    such that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          RWORK(2) = See the description of RWORK(1).
          RWORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
                    It is computed using SPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          RWORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          RWORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.
          RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
                    of diag(A^* * A) / Trace(A^* * A) taken as point in the
                    probability simplex.
          RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
[in]LRWORK
          LRWORK is INTEGER
          Length of RWORK to confirm proper allocation of workspace.
          LRWORK depends on the job:

       1. If only singular values are requested i.e. if 
          LSAME(JOBU,'N') .AND. LSAME(JOBV,'N') 
          then:
          1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          1.2. Otherwise, LRWORK  = max( 7, 2 * N ).
       2. If singular values with the right singular vectors are requested
          i.e. if 
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND. 
          .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
          then:
          2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          2.2. Otherwise, LRWORK  = max( 7, 2 * N ).      
       3. If singular values with the left singular vectors are requested, i.e. if    
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          3.2. Otherwise, LRWORK  = max( 7, 2 * N ).    
       4. If singular values with both the left and the right singular vectors 
          are requested, i.e. if     
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          4.2. Otherwise, LRWORK  = max( 7, 2 * N ).    
[out]IWORK
          IWORK is INTEGER array, of dimension:
                If LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then 
                the dimension of IWORK is max( 3, 2 * N + M ).
                Otherwise, the dimension of IWORK is 
                -> max( 3, 2*N ) for full SVD
                -> max( 3, N ) for singular values only or singular
                   values with one set of singular vectors (left or right)
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successfull exit;
           > 0 :  ZGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
June 2016
Further Details:
  ZGEJSV implements a preconditioned Jacobi SVD algorithm. It uses ZGEQP3,
  ZGEQRF, and ZGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by ZGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (ZGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (ZGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: ZGEQP3) should be
  implemented as in [3]. We have a new version of ZGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank defficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the inital QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in ZGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of ZGEJSV uses only the simplest, naive data movement.  \par Contributors: 
  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)  \par References: 
@verbatim 

 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 519 of file zgejsv.f.

519 *
520 * -- LAPACK computational routine (version 3.6.1) --
521 * -- LAPACK is a software package provided by Univ. of Tennessee, --
522 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
523 * June 2016
524 *
525 * .. Scalar Arguments ..
526  IMPLICIT NONE
527  INTEGER info, lda, ldu, ldv, lwork, lrwork, m, n
528 * ..
529 * .. Array Arguments ..
530  COMPLEX*16 a( lda, * ), u( ldu, * ), v( ldv, * ),
531  $ cwork( lwork )
532  DOUBLE PRECISION sva( n ), rwork( * )
533  INTEGER iwork( * )
534  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
535 * ..
536 *
537 * ===========================================================================
538 *
539 * .. Local Parameters ..
540  DOUBLE PRECISION zero, one
541  parameter( zero = 0.0d0, one = 1.0d0 )
542  COMPLEX*16 czero, cone
543  parameter( czero = ( 0.0d0, 0.0d0 ), cone = ( 1.0d0, 0.0d0 ) )
544 * ..
545 * .. Local Scalars ..
546  COMPLEX*16 ctemp
547  DOUBLE PRECISION aapp, aaqq, aatmax, aatmin, big, big1,
548  $ cond_ok, condr1, condr2, entra, entrat, epsln,
549  $ maxprj, scalem, sconda, sfmin, small, temp1,
550  $ uscal1, uscal2, xsc
551  INTEGER ierr, n1, nr, numrank, p, q, warning
552  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
553  $ l2aber, l2kill, l2pert, l2rank, l2tran,
554  $ noscal, rowpiv, rsvec, transp
555 * ..
556 * .. Intrinsic Functions ..
557  INTRINSIC abs, dcmplx, dconjg, dlog, dmax1, dmin1, dble,
558  $ max0, min0, nint, dsqrt
559 * ..
560 * .. External Functions ..
561  DOUBLE PRECISION dlamch, dznrm2
562  INTEGER idamax, izamax
563  LOGICAL lsame
564  EXTERNAL idamax, izamax, lsame, dlamch, dznrm2
565 * ..
566 * .. External Subroutines ..
567  EXTERNAL dlassq, zcopy, zgelqf, zgeqp3, zgeqrf, zlacpy, zlascl,
570 *
571  EXTERNAL zgesvj
572 * ..
573 *
574 * Test the input arguments
575 *
576 
577  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
578  jracc = lsame( jobv, 'J' )
579  rsvec = lsame( jobv, 'V' ) .OR. jracc
580  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
581  l2rank = lsame( joba, 'R' )
582  l2aber = lsame( joba, 'A' )
583  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
584  l2tran = lsame( jobt, 'T' )
585  l2kill = lsame( jobr, 'R' )
586  defr = lsame( jobr, 'N' )
587  l2pert = lsame( jobp, 'P' )
588 *
589  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
590  $ errest .OR. lsame( joba, 'C' ) )) THEN
591  info = - 1
592  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
593  $ lsame( jobu, 'W' )) ) THEN
594  info = - 2
595  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
596  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
597  info = - 3
598  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
599  info = - 4
600  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
601  info = - 5
602  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
603  info = - 6
604  ELSE IF ( m .LT. 0 ) THEN
605  info = - 7
606  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
607  info = - 8
608  ELSE IF ( lda .LT. m ) THEN
609  info = - 10
610  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
611  info = - 13
612  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
613  info = - 15
614  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
615  $ (lwork .LT. 2*n+1)) .OR.
616  $ (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
617  $ (lwork .LT. n*n+3*n)) .OR.
618  $ (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. 3*n))
619  $ .OR.
620  $ (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. 3*n))
621  $ .OR.
622  $ (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
623  $ (lwork.LT.5*n+2*n*n))
624  $ .OR. (lsvec .AND. rsvec .AND. jracc .AND.
625  $ lwork.LT.4*n+n*n))
626  $ THEN
627  info = - 17
628  ELSE IF ( lrwork.LT. max0(n+2*m,7)) THEN
629  info = -19
630  ELSE
631 * #:)
632  info = 0
633  END IF
634 *
635  IF ( info .NE. 0 ) THEN
636 * #:(
637  CALL xerbla( 'ZGEJSV', - info )
638  RETURN
639  END IF
640 *
641 * Quick return for void matrix (Y3K safe)
642 * #:)
643  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
644  iwork(1:3) = 0
645  rwork(1:7) = 0
646  RETURN
647  ENDIF
648 *
649 * Determine whether the matrix U should be M x N or M x M
650 *
651  IF ( lsvec ) THEN
652  n1 = n
653  IF ( lsame( jobu, 'F' ) ) n1 = m
654  END IF
655 *
656 * Set numerical parameters
657 *
658 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
659 *
660  epsln = dlamch('Epsilon')
661  sfmin = dlamch('SafeMinimum')
662  small = sfmin / epsln
663  big = dlamch('O')
664 * BIG = ONE / SFMIN
665 *
666 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
667 *
668 *(!) If necessary, scale SVA() to protect the largest norm from
669 * overflow. It is possible that this scaling pushes the smallest
670 * column norm left from the underflow threshold (extreme case).
671 *
672  scalem = one / dsqrt(dble(m)*dble(n))
673  noscal = .true.
674  goscal = .true.
675  DO 1874 p = 1, n
676  aapp = zero
677  aaqq = one
678  CALL zlassq( m, a(1,p), 1, aapp, aaqq )
679  IF ( aapp .GT. big ) THEN
680  info = - 9
681  CALL xerbla( 'ZGEJSV', -info )
682  RETURN
683  END IF
684  aaqq = dsqrt(aaqq)
685  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
686  sva(p) = aapp * aaqq
687  ELSE
688  noscal = .false.
689  sva(p) = aapp * ( aaqq * scalem )
690  IF ( goscal ) THEN
691  goscal = .false.
692  CALL dscal( p-1, scalem, sva, 1 )
693  END IF
694  END IF
695  1874 CONTINUE
696 *
697  IF ( noscal ) scalem = one
698 *
699  aapp = zero
700  aaqq = big
701  DO 4781 p = 1, n
702  aapp = dmax1( aapp, sva(p) )
703  IF ( sva(p) .NE. zero ) aaqq = dmin1( aaqq, sva(p) )
704  4781 CONTINUE
705 *
706 * Quick return for zero M x N matrix
707 * #:)
708  IF ( aapp .EQ. zero ) THEN
709  IF ( lsvec ) CALL zlaset( 'G', m, n1, czero, cone, u, ldu )
710  IF ( rsvec ) CALL zlaset( 'G', n, n, czero, cone, v, ldv )
711  rwork(1) = one
712  rwork(2) = one
713  IF ( errest ) rwork(3) = one
714  IF ( lsvec .AND. rsvec ) THEN
715  rwork(4) = one
716  rwork(5) = one
717  END IF
718  IF ( l2tran ) THEN
719  rwork(6) = zero
720  rwork(7) = zero
721  END IF
722  iwork(1) = 0
723  iwork(2) = 0
724  iwork(3) = 0
725  RETURN
726  END IF
727 *
728 * Issue warning if denormalized column norms detected. Override the
729 * high relative accuracy request. Issue licence to kill columns
730 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
731 * #:(
732  warning = 0
733  IF ( aaqq .LE. sfmin ) THEN
734  l2rank = .true.
735  l2kill = .true.
736  warning = 1
737  END IF
738 *
739 * Quick return for one-column matrix
740 * #:)
741  IF ( n .EQ. 1 ) THEN
742 *
743  IF ( lsvec ) THEN
744  CALL zlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
745  CALL zlacpy( 'A', m, 1, a, lda, u, ldu )
746 * computing all M left singular vectors of the M x 1 matrix
747  IF ( n1 .NE. n ) THEN
748  CALL zgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
749  CALL zungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
750  CALL zcopy( m, a(1,1), 1, u(1,1), 1 )
751  END IF
752  END IF
753  IF ( rsvec ) THEN
754  v(1,1) = cone
755  END IF
756  IF ( sva(1) .LT. (big*scalem) ) THEN
757  sva(1) = sva(1) / scalem
758  scalem = one
759  END IF
760  rwork(1) = one / scalem
761  rwork(2) = one
762  IF ( sva(1) .NE. zero ) THEN
763  iwork(1) = 1
764  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
765  iwork(2) = 1
766  ELSE
767  iwork(2) = 0
768  END IF
769  ELSE
770  iwork(1) = 0
771  iwork(2) = 0
772  END IF
773  iwork(3) = 0
774  IF ( errest ) rwork(3) = one
775  IF ( lsvec .AND. rsvec ) THEN
776  rwork(4) = one
777  rwork(5) = one
778  END IF
779  IF ( l2tran ) THEN
780  rwork(6) = zero
781  rwork(7) = zero
782  END IF
783  RETURN
784 *
785  END IF
786 *
787  transp = .false.
788  l2tran = l2tran .AND. ( m .EQ. n )
789 *
790  aatmax = -one
791  aatmin = big
792  IF ( rowpiv .OR. l2tran ) THEN
793 *
794 * Compute the row norms, needed to determine row pivoting sequence
795 * (in the case of heavily row weighted A, row pivoting is strongly
796 * advised) and to collect information needed to compare the
797 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
798 *
799  IF ( l2tran ) THEN
800  DO 1950 p = 1, m
801  xsc = zero
802  temp1 = one
803  CALL zlassq( n, a(p,1), lda, xsc, temp1 )
804 * ZLASSQ gets both the ell_2 and the ell_infinity norm
805 * in one pass through the vector
806  rwork(m+n+p) = xsc * scalem
807  rwork(n+p) = xsc * (scalem*dsqrt(temp1))
808  aatmax = dmax1( aatmax, rwork(n+p) )
809  IF (rwork(n+p) .NE. zero)
810  $ aatmin = dmin1(aatmin,rwork(n+p))
811  1950 CONTINUE
812  ELSE
813  DO 1904 p = 1, m
814  rwork(m+n+p) = scalem*abs( a(p,izamax(n,a(p,1),lda)) )
815  aatmax = dmax1( aatmax, rwork(m+n+p) )
816  aatmin = dmin1( aatmin, rwork(m+n+p) )
817  1904 CONTINUE
818  END IF
819 *
820  END IF
821 *
822 * For square matrix A try to determine whether A^* would be better
823 * input for the preconditioned Jacobi SVD, with faster convergence.
824 * The decision is based on an O(N) function of the vector of column
825 * and row norms of A, based on the Shannon entropy. This should give
826 * the right choice in most cases when the difference actually matters.
827 * It may fail and pick the slower converging side.
828 *
829  entra = zero
830  entrat = zero
831  IF ( l2tran ) THEN
832 *
833  xsc = zero
834  temp1 = one
835  CALL dlassq( n, sva, 1, xsc, temp1 )
836  temp1 = one / temp1
837 *
838  entra = zero
839  DO 1113 p = 1, n
840  big1 = ( ( sva(p) / xsc )**2 ) * temp1
841  IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
842  1113 CONTINUE
843  entra = - entra / dlog(dble(n))
844 *
845 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
846 * It is derived from the diagonal of A^* * A. Do the same with the
847 * diagonal of A * A^*, compute the entropy of the corresponding
848 * probability distribution. Note that A * A^* and A^* * A have the
849 * same trace.
850 *
851  entrat = zero
852  DO 1114 p = n+1, n+m
853  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
854  IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
855  1114 CONTINUE
856  entrat = - entrat / dlog(dble(m))
857 *
858 * Analyze the entropies and decide A or A^*. Smaller entropy
859 * usually means better input for the algorithm.
860 *
861  transp = ( entrat .LT. entra )
862  transp = .true.
863 *
864 * If A^* is better than A, take the adjoint of A.
865 *
866  IF ( transp ) THEN
867 * In an optimal implementation, this trivial transpose
868 * should be replaced with faster transpose.
869  DO 1115 p = 1, n - 1
870  a(p,p) = dconjg(a(p,p))
871  DO 1116 q = p + 1, n
872  ctemp = dconjg(a(q,p))
873  a(q,p) = dconjg(a(p,q))
874  a(p,q) = ctemp
875  1116 CONTINUE
876  1115 CONTINUE
877  a(n,n) = dconjg(a(n,n))
878  DO 1117 p = 1, n
879  rwork(m+n+p) = sva(p)
880  sva(p) = rwork(n+p)
881 * previously computed row 2-norms are now column 2-norms
882 * of the transposed matrix
883  1117 CONTINUE
884  temp1 = aapp
885  aapp = aatmax
886  aatmax = temp1
887  temp1 = aaqq
888  aaqq = aatmin
889  aatmin = temp1
890  kill = lsvec
891  lsvec = rsvec
892  rsvec = kill
893  IF ( lsvec ) n1 = n
894 *
895  rowpiv = .true.
896  END IF
897 *
898  END IF
899 * END IF L2TRAN
900 *
901 * Scale the matrix so that its maximal singular value remains less
902 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
903 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
904 * SQRT(BIG) instead of BIG is the fact that ZGEJSV uses LAPACK and
905 * BLAS routines that, in some implementations, are not capable of
906 * working in the full interval [SFMIN,BIG] and that they may provoke
907 * overflows in the intermediate results. If the singular values spread
908 * from SFMIN to BIG, then ZGESVJ will compute them. So, in that case,
909 * one should use ZGESVJ instead of ZGEJSV.
910 *
911  big1 = dsqrt( big )
912  temp1 = dsqrt( big / dble(n) )
913 *
914  CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
915  IF ( aaqq .GT. (aapp * sfmin) ) THEN
916  aaqq = ( aaqq / aapp ) * temp1
917  ELSE
918  aaqq = ( aaqq * temp1 ) / aapp
919  END IF
920  temp1 = temp1 * scalem
921  CALL zlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
922 *
923 * To undo scaling at the end of this procedure, multiply the
924 * computed singular values with USCAL2 / USCAL1.
925 *
926  uscal1 = temp1
927  uscal2 = aapp
928 *
929  IF ( l2kill ) THEN
930 * L2KILL enforces computation of nonzero singular values in
931 * the restricted range of condition number of the initial A,
932 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
933  xsc = dsqrt( sfmin )
934  ELSE
935  xsc = small
936 *
937 * Now, if the condition number of A is too big,
938 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
939 * as a precaution measure, the full SVD is computed using ZGESVJ
940 * with accumulated Jacobi rotations. This provides numerically
941 * more robust computation, at the cost of slightly increased run
942 * time. Depending on the concrete implementation of BLAS and LAPACK
943 * (i.e. how they behave in presence of extreme ill-conditioning) the
944 * implementor may decide to remove this switch.
945  IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
946  jracc = .true.
947  END IF
948 *
949  END IF
950  IF ( aaqq .LT. xsc ) THEN
951  DO 700 p = 1, n
952  IF ( sva(p) .LT. xsc ) THEN
953  CALL zlaset( 'A', m, 1, czero, czero, a(1,p), lda )
954  sva(p) = zero
955  END IF
956  700 CONTINUE
957  END IF
958 *
959 * Preconditioning using QR factorization with pivoting
960 *
961  IF ( rowpiv ) THEN
962 * Optional row permutation (Bjoerck row pivoting):
963 * A result by Cox and Higham shows that the Bjoerck's
964 * row pivoting combined with standard column pivoting
965 * has similar effect as Powell-Reid complete pivoting.
966 * The ell-infinity norms of A are made nonincreasing.
967  DO 1952 p = 1, m - 1
968  q = idamax( m-p+1, rwork(m+n+p), 1 ) + p - 1
969  iwork(2*n+p) = q
970  IF ( p .NE. q ) THEN
971  temp1 = rwork(m+n+p)
972  rwork(m+n+p) = rwork(m+n+q)
973  rwork(m+n+q) = temp1
974  END IF
975  1952 CONTINUE
976  CALL zlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
977  END IF
978 
979 *
980 * End of the preparation phase (scaling, optional sorting and
981 * transposing, optional flushing of small columns).
982 *
983 * Preconditioning
984 *
985 * If the full SVD is needed, the right singular vectors are computed
986 * from a matrix equation, and for that we need theoretical analysis
987 * of the Businger-Golub pivoting. So we use ZGEQP3 as the first RR QRF.
988 * In all other cases the first RR QRF can be chosen by other criteria
989 * (eg speed by replacing global with restricted window pivoting, such
990 * as in xGEQPX from TOMS # 782). Good results will be obtained using
991 * xGEQPX with properly (!) chosen numerical parameters.
992 * Any improvement of ZGEQP3 improves overal performance of ZGEJSV.
993 *
994 * A * P1 = Q1 * [ R1^* 0]^*:
995  DO 1963 p = 1, n
996 * .. all columns are free columns
997  iwork(p) = 0
998  1963 CONTINUE
999  CALL zgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
1000  $ rwork, ierr )
1001 *
1002 * The upper triangular matrix R1 from the first QRF is inspected for
1003 * rank deficiency and possibilities for deflation, or possible
1004 * ill-conditioning. Depending on the user specified flag L2RANK,
1005 * the procedure explores possibilities to reduce the numerical
1006 * rank by inspecting the computed upper triangular factor. If
1007 * L2RANK or L2ABER are up, then ZGEJSV will compute the SVD of
1008 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1009 *
1010  nr = 1
1011  IF ( l2aber ) THEN
1012 * Standard absolute error bound suffices. All sigma_i with
1013 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1014 * agressive enforcement of lower numerical rank by introducing a
1015 * backward error of the order of N*EPSLN*||A||.
1016  temp1 = dsqrt(dble(n))*epsln
1017  DO 3001 p = 2, n
1018  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1019  nr = nr + 1
1020  ELSE
1021  GO TO 3002
1022  END IF
1023  3001 CONTINUE
1024  3002 CONTINUE
1025  ELSE IF ( l2rank ) THEN
1026 * .. similarly as above, only slightly more gentle (less agressive).
1027 * Sudden drop on the diagonal of R1 is used as the criterion for
1028 * close-to-rank-defficient.
1029  temp1 = dsqrt(sfmin)
1030  DO 3401 p = 2, n
1031  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1032  $ ( abs(a(p,p)) .LT. small ) .OR.
1033  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1034  nr = nr + 1
1035  3401 CONTINUE
1036  3402 CONTINUE
1037 *
1038  ELSE
1039 * The goal is high relative accuracy. However, if the matrix
1040 * has high scaled condition number the relative accuracy is in
1041 * general not feasible. Later on, a condition number estimator
1042 * will be deployed to estimate the scaled condition number.
1043 * Here we just remove the underflowed part of the triangular
1044 * factor. This prevents the situation in which the code is
1045 * working hard to get the accuracy not warranted by the data.
1046  temp1 = dsqrt(sfmin)
1047  DO 3301 p = 2, n
1048  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1049  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1050  nr = nr + 1
1051  3301 CONTINUE
1052  3302 CONTINUE
1053 *
1054  END IF
1055 *
1056  almort = .false.
1057  IF ( nr .EQ. n ) THEN
1058  maxprj = one
1059  DO 3051 p = 2, n
1060  temp1 = abs(a(p,p)) / sva(iwork(p))
1061  maxprj = dmin1( maxprj, temp1 )
1062  3051 CONTINUE
1063  IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1064  END IF
1065 *
1066 *
1067  sconda = - one
1068  condr1 = - one
1069  condr2 = - one
1070 *
1071  IF ( errest ) THEN
1072  IF ( n .EQ. nr ) THEN
1073  IF ( rsvec ) THEN
1074 * .. V is available as workspace
1075  CALL zlacpy( 'U', n, n, a, lda, v, ldv )
1076  DO 3053 p = 1, n
1077  temp1 = sva(iwork(p))
1078  CALL zdscal( p, one/temp1, v(1,p), 1 )
1079  3053 CONTINUE
1080  CALL zpocon( 'U', n, v, ldv, one, temp1,
1081  $ cwork(n+1), rwork, ierr )
1082 *
1083  ELSE IF ( lsvec ) THEN
1084 * .. U is available as workspace
1085  CALL zlacpy( 'U', n, n, a, lda, u, ldu )
1086  DO 3054 p = 1, n
1087  temp1 = sva(iwork(p))
1088  CALL zdscal( p, one/temp1, u(1,p), 1 )
1089  3054 CONTINUE
1090  CALL zpocon( 'U', n, u, ldu, one, temp1,
1091  $ cwork(n+1), rwork, ierr )
1092  ELSE
1093  CALL zlacpy( 'U', n, n, a, lda, cwork(n+1), n )
1094  DO 3052 p = 1, n
1095  temp1 = sva(iwork(p))
1096  CALL zdscal( p, one/temp1, cwork(n+(p-1)*n+1), 1 )
1097  3052 CONTINUE
1098 * .. the columns of R are scaled to have unit Euclidean lengths.
1099  CALL zpocon( 'U', n, cwork(n+1), n, one, temp1,
1100  $ cwork(n+n*n+1), rwork, ierr )
1101 *
1102  END IF
1103  sconda = one / dsqrt(temp1)
1104 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1105 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1106  ELSE
1107  sconda = - one
1108  END IF
1109  END IF
1110 *
1111  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1112 * If there is no violent scaling, artificial perturbation is not needed.
1113 *
1114 * Phase 3:
1115 *
1116  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1117 *
1118 * Singular Values only
1119 *
1120 * .. transpose A(1:NR,1:N)
1121  DO 1946 p = 1, min0( n-1, nr )
1122  CALL zcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1123  CALL zlacgv( n-p+1, a(p,p), 1 )
1124  1946 CONTINUE
1125  IF ( nr .EQ. n ) a(n,n) = dconjg(a(n,n))
1126 *
1127 * The following two DO-loops introduce small relative perturbation
1128 * into the strict upper triangle of the lower triangular matrix.
1129 * Small entries below the main diagonal are also changed.
1130 * This modification is useful if the computing environment does not
1131 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1132 * annoying denormalized numbers in case of strongly scaled matrices.
1133 * The perturbation is structured so that it does not introduce any
1134 * new perturbation of the singular values, and it does not destroy
1135 * the job done by the preconditioner.
1136 * The licence for this perturbation is in the variable L2PERT, which
1137 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1138 *
1139  IF ( .NOT. almort ) THEN
1140 *
1141  IF ( l2pert ) THEN
1142 * XSC = SQRT(SMALL)
1143  xsc = epsln / dble(n)
1144  DO 4947 q = 1, nr
1145  ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1146  DO 4949 p = 1, n
1147  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1148  $ .OR. ( p .LT. q ) )
1149 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1150  $ a(p,q) = ctemp
1151  4949 CONTINUE
1152  4947 CONTINUE
1153  ELSE
1154  CALL zlaset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1155  END IF
1156 *
1157 * .. second preconditioning using the QR factorization
1158 *
1159  CALL zgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1160 *
1161 * .. and transpose upper to lower triangular
1162  DO 1948 p = 1, nr - 1
1163  CALL zcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1164  CALL zlacgv( nr-p+1, a(p,p), 1 )
1165  1948 CONTINUE
1166 *
1167  END IF
1168 *
1169 * Row-cyclic Jacobi SVD algorithm with column pivoting
1170 *
1171 * .. again some perturbation (a "background noise") is added
1172 * to drown denormals
1173  IF ( l2pert ) THEN
1174 * XSC = SQRT(SMALL)
1175  xsc = epsln / dble(n)
1176  DO 1947 q = 1, nr
1177  ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1178  DO 1949 p = 1, nr
1179  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1180  $ .OR. ( p .LT. q ) )
1181 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1182  $ a(p,q) = ctemp
1183  1949 CONTINUE
1184  1947 CONTINUE
1185  ELSE
1186  CALL zlaset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1187  END IF
1188 *
1189 * .. and one-sided Jacobi rotations are started on a lower
1190 * triangular matrix (plus perturbation which is ignored in
1191 * the part which destroys triangular form (confusing?!))
1192 *
1193  CALL zgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1194  $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1195 *
1196  scalem = rwork(1)
1197  numrank = nint(rwork(2))
1198 *
1199 *
1200  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1201 *
1202 * -> Singular Values and Right Singular Vectors <-
1203 *
1204  IF ( almort ) THEN
1205 *
1206 * .. in this case NR equals N
1207  DO 1998 p = 1, nr
1208  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1209  CALL zlacgv( n-p+1, v(p,p), 1 )
1210  1998 CONTINUE
1211  CALL zlaset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1212 *
1213  CALL zgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1214  $ cwork, lwork, rwork, lrwork, info )
1215  scalem = rwork(1)
1216  numrank = nint(rwork(2))
1217 
1218  ELSE
1219 *
1220 * .. two more QR factorizations ( one QRF is not enough, two require
1221 * accumulated product of Jacobi rotations, three are perfect )
1222 *
1223  CALL zlaset( 'Lower', nr-1,nr-1, czero, czero, a(2,1), lda )
1224  CALL zgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1225  CALL zlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1226  CALL zlaset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1227  CALL zgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1228  $ lwork-2*n, ierr )
1229  DO 8998 p = 1, nr
1230  CALL zcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1231  CALL zlacgv( nr-p+1, v(p,p), 1 )
1232  8998 CONTINUE
1233  CALL zlaset('Upper', nr-1, nr-1, czero, czero, v(1,2), ldv)
1234 *
1235  CALL zgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1236  $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1237  scalem = rwork(1)
1238  numrank = nint(rwork(2))
1239  IF ( nr .LT. n ) THEN
1240  CALL zlaset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1241  CALL zlaset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1242  CALL zlaset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1243  END IF
1244 *
1245  CALL zunmlq( 'Left', 'C', n, n, nr, a, lda, cwork,
1246  $ v, ldv, cwork(n+1), lwork-n, ierr )
1247 *
1248  END IF
1249 *
1250  DO 8991 p = 1, n
1251  CALL zcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1252  8991 CONTINUE
1253  CALL zlacpy( 'All', n, n, a, lda, v, ldv )
1254 *
1255  IF ( transp ) THEN
1256  CALL zlacpy( 'All', n, n, v, ldv, u, ldu )
1257  END IF
1258 *
1259  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1260 *
1261 * .. Singular Values and Left Singular Vectors ..
1262 *
1263 * .. second preconditioning step to avoid need to accumulate
1264 * Jacobi rotations in the Jacobi iterations.
1265  DO 1965 p = 1, nr
1266  CALL zcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1267  CALL zlacgv( n-p+1, u(p,p), 1 )
1268  1965 CONTINUE
1269  CALL zlaset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1270 *
1271  CALL zgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1272  $ lwork-2*n, ierr )
1273 *
1274  DO 1967 p = 1, nr - 1
1275  CALL zcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1276  CALL zlacgv( n-p+1, u(p,p), 1 )
1277  1967 CONTINUE
1278  CALL zlaset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1279 *
1280  CALL zgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1281  $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1282  scalem = rwork(1)
1283  numrank = nint(rwork(2))
1284 *
1285  IF ( nr .LT. m ) THEN
1286  CALL zlaset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1287  IF ( nr .LT. n1 ) THEN
1288  CALL zlaset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1289  CALL zlaset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1290  END IF
1291  END IF
1292 *
1293  CALL zunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1294  $ ldu, cwork(n+1), lwork-n, ierr )
1295 *
1296  IF ( rowpiv )
1297  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1298 *
1299  DO 1974 p = 1, n1
1300  xsc = one / dznrm2( m, u(1,p), 1 )
1301  CALL zdscal( m, xsc, u(1,p), 1 )
1302  1974 CONTINUE
1303 *
1304  IF ( transp ) THEN
1305  CALL zlacpy( 'All', n, n, u, ldu, v, ldv )
1306  END IF
1307 *
1308  ELSE
1309 *
1310 * .. Full SVD ..
1311 *
1312  IF ( .NOT. jracc ) THEN
1313 *
1314  IF ( .NOT. almort ) THEN
1315 *
1316 * Second Preconditioning Step (QRF [with pivoting])
1317 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1318 * equivalent to an LQF CALL. Since in many libraries the QRF
1319 * seems to be better optimized than the LQF, we do explicit
1320 * transpose and use the QRF. This is subject to changes in an
1321 * optimized implementation of ZGEJSV.
1322 *
1323  DO 1968 p = 1, nr
1324  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1325  CALL zlacgv( n-p+1, v(p,p), 1 )
1326  1968 CONTINUE
1327 *
1328 * .. the following two loops perturb small entries to avoid
1329 * denormals in the second QR factorization, where they are
1330 * as good as zeros. This is done to avoid painfully slow
1331 * computation with denormals. The relative size of the perturbation
1332 * is a parameter that can be changed by the implementer.
1333 * This perturbation device will be obsolete on machines with
1334 * properly implemented arithmetic.
1335 * To switch it off, set L2PERT=.FALSE. To remove it from the
1336 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1337 * The following two loops should be blocked and fused with the
1338 * transposed copy above.
1339 *
1340  IF ( l2pert ) THEN
1341  xsc = dsqrt(small)
1342  DO 2969 q = 1, nr
1343  ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
1344  DO 2968 p = 1, n
1345  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1346  $ .OR. ( p .LT. q ) )
1347 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1348  $ v(p,q) = ctemp
1349  IF ( p .LT. q ) v(p,q) = - v(p,q)
1350  2968 CONTINUE
1351  2969 CONTINUE
1352  ELSE
1353  CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1354  END IF
1355 *
1356 * Estimate the row scaled condition number of R1
1357 * (If R1 is rectangular, N > NR, then the condition number
1358 * of the leading NR x NR submatrix is estimated.)
1359 *
1360  CALL zlacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1361  DO 3950 p = 1, nr
1362  temp1 = dznrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1363  CALL zdscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1364  3950 CONTINUE
1365  CALL zpocon('Lower',nr,cwork(2*n+1),nr,one,temp1,
1366  $ cwork(2*n+nr*nr+1),rwork,ierr)
1367  condr1 = one / dsqrt(temp1)
1368 * .. here need a second oppinion on the condition number
1369 * .. then assume worst case scenario
1370 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1371 * more conservative <=> CONDR1 .LT. SQRT(DBLE(N))
1372 *
1373  cond_ok = dsqrt(dsqrt(dble(nr)))
1374 *[TP] COND_OK is a tuning parameter.
1375 *
1376  IF ( condr1 .LT. cond_ok ) THEN
1377 * .. the second QRF without pivoting. Note: in an optimized
1378 * implementation, this QRF should be implemented as the QRF
1379 * of a lower triangular matrix.
1380 * R1^* = Q2 * R2
1381  CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1382  $ lwork-2*n, ierr )
1383 *
1384  IF ( l2pert ) THEN
1385  xsc = dsqrt(small)/epsln
1386  DO 3959 p = 2, nr
1387  DO 3958 q = 1, p - 1
1388  ctemp=dcmplx(xsc*dmin1(abs(v(p,p)),abs(v(q,q))),
1389  $ zero)
1390  IF ( abs(v(q,p)) .LE. temp1 )
1391 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1392  $ v(q,p) = ctemp
1393  3958 CONTINUE
1394  3959 CONTINUE
1395  END IF
1396 *
1397  IF ( nr .NE. n )
1398  $ CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1399 * .. save ...
1400 *
1401 * .. this transposed copy should be better than naive
1402  DO 1969 p = 1, nr - 1
1403  CALL zcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1404  CALL zlacgv(nr-p+1, v(p,p), 1 )
1405  1969 CONTINUE
1406  v(nr,nr)=dconjg(v(nr,nr))
1407 *
1408  condr2 = condr1
1409 *
1410  ELSE
1411 *
1412 * .. ill-conditioned case: second QRF with pivoting
1413 * Note that windowed pivoting would be equaly good
1414 * numerically, and more run-time efficient. So, in
1415 * an optimal implementation, the next call to ZGEQP3
1416 * should be replaced with eg. CALL ZGEQPX (ACM TOMS #782)
1417 * with properly (carefully) chosen parameters.
1418 *
1419 * R1^* * P2 = Q2 * R2
1420  DO 3003 p = 1, nr
1421  iwork(n+p) = 0
1422  3003 CONTINUE
1423  CALL zgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1424  $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1425 ** CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1426 ** $ LWORK-2*N, IERR )
1427  IF ( l2pert ) THEN
1428  xsc = dsqrt(small)
1429  DO 3969 p = 2, nr
1430  DO 3968 q = 1, p - 1
1431  ctemp=dcmplx(xsc*dmin1(abs(v(p,p)),abs(v(q,q))),
1432  $ zero)
1433  IF ( abs(v(q,p)) .LE. temp1 )
1434 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1435  $ v(q,p) = ctemp
1436  3968 CONTINUE
1437  3969 CONTINUE
1438  END IF
1439 *
1440  CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1441 *
1442  IF ( l2pert ) THEN
1443  xsc = dsqrt(small)
1444  DO 8970 p = 2, nr
1445  DO 8971 q = 1, p - 1
1446  ctemp=dcmplx(xsc*dmin1(abs(v(p,p)),abs(v(q,q))),
1447  $ zero)
1448 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1449  v(p,q) = - ctemp
1450  8971 CONTINUE
1451  8970 CONTINUE
1452  ELSE
1453  CALL zlaset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1454  END IF
1455 * Now, compute R2 = L3 * Q3, the LQ factorization.
1456  CALL zgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1457  $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1458 * .. and estimate the condition number
1459  CALL zlacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1460  DO 4950 p = 1, nr
1461  temp1 = dznrm2( p, cwork(2*n+n*nr+nr+p), nr )
1462  CALL zdscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1463  4950 CONTINUE
1464  CALL zpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1465  $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1466  condr2 = one / dsqrt(temp1)
1467 *
1468 *
1469  IF ( condr2 .GE. cond_ok ) THEN
1470 * .. save the Householder vectors used for Q3
1471 * (this overwrittes the copy of R2, as it will not be
1472 * needed in this branch, but it does not overwritte the
1473 * Huseholder vectors of Q2.).
1474  CALL zlacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1475 * .. and the rest of the information on Q3 is in
1476 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1477  END IF
1478 *
1479  END IF
1480 *
1481  IF ( l2pert ) THEN
1482  xsc = dsqrt(small)
1483  DO 4968 q = 2, nr
1484  ctemp = xsc * v(q,q)
1485  DO 4969 p = 1, q - 1
1486 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1487  v(p,q) = - ctemp
1488  4969 CONTINUE
1489  4968 CONTINUE
1490  ELSE
1491  CALL zlaset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1492  END IF
1493 *
1494 * Second preconditioning finished; continue with Jacobi SVD
1495 * The input matrix is lower trinagular.
1496 *
1497 * Recover the right singular vectors as solution of a well
1498 * conditioned triangular matrix equation.
1499 *
1500  IF ( condr1 .LT. cond_ok ) THEN
1501 *
1502  CALL zgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1503  $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1504  $ lrwork, info )
1505  scalem = rwork(1)
1506  numrank = nint(rwork(2))
1507  DO 3970 p = 1, nr
1508  CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1509  CALL zdscal( nr, sva(p), v(1,p), 1 )
1510  3970 CONTINUE
1511 
1512 * .. pick the right matrix equation and solve it
1513 *
1514  IF ( nr .EQ. n ) THEN
1515 * :)) .. best case, R1 is inverted. The solution of this matrix
1516 * equation is Q2*V2 = the product of the Jacobi rotations
1517 * used in ZGESVJ, premultiplied with the orthogonal matrix
1518 * from the second QR factorization.
1519  CALL ztrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1520  ELSE
1521 * .. R1 is well conditioned, but non-square. Adjoint of R2
1522 * is inverted to get the product of the Jacobi rotations
1523 * used in ZGESVJ. The Q-factor from the second QR
1524 * factorization is then built in explicitly.
1525  CALL ztrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1526  $ n,v,ldv)
1527  IF ( nr .LT. n ) THEN
1528  CALL zlaset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1529  CALL zlaset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1530  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1531  END IF
1532  CALL zunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1533  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1534  END IF
1535 *
1536  ELSE IF ( condr2 .LT. cond_ok ) THEN
1537 *
1538 * The matrix R2 is inverted. The solution of the matrix equation
1539 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1540 * the lower triangular L3 from the LQ factorization of
1541 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1542  CALL zgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1543  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1544  $ rwork, lrwork, info )
1545  scalem = rwork(1)
1546  numrank = nint(rwork(2))
1547  DO 3870 p = 1, nr
1548  CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1549  CALL zdscal( nr, sva(p), u(1,p), 1 )
1550  3870 CONTINUE
1551  CALL ztrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1552  $ u,ldu)
1553 * .. apply the permutation from the second QR factorization
1554  DO 873 q = 1, nr
1555  DO 872 p = 1, nr
1556  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1557  872 CONTINUE
1558  DO 874 p = 1, nr
1559  u(p,q) = cwork(2*n+n*nr+nr+p)
1560  874 CONTINUE
1561  873 CONTINUE
1562  IF ( nr .LT. n ) THEN
1563  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1564  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1565  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1566  END IF
1567  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1568  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1569  ELSE
1570 * Last line of defense.
1571 * #:( This is a rather pathological case: no scaled condition
1572 * improvement after two pivoted QR factorizations. Other
1573 * possibility is that the rank revealing QR factorization
1574 * or the condition estimator has failed, or the COND_OK
1575 * is set very close to ONE (which is unnecessary). Normally,
1576 * this branch should never be executed, but in rare cases of
1577 * failure of the RRQR or condition estimator, the last line of
1578 * defense ensures that ZGEJSV completes the task.
1579 * Compute the full SVD of L3 using ZGESVJ with explicit
1580 * accumulation of Jacobi rotations.
1581  CALL zgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1582  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1583  $ rwork, lrwork, info )
1584  scalem = rwork(1)
1585  numrank = nint(rwork(2))
1586  IF ( nr .LT. n ) THEN
1587  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1588  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1589  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1590  END IF
1591  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1592  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1593 *
1594  CALL zunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1595  $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1596  $ lwork-2*n-n*nr-nr, ierr )
1597  DO 773 q = 1, nr
1598  DO 772 p = 1, nr
1599  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1600  772 CONTINUE
1601  DO 774 p = 1, nr
1602  u(p,q) = cwork(2*n+n*nr+nr+p)
1603  774 CONTINUE
1604  773 CONTINUE
1605 *
1606  END IF
1607 *
1608 * Permute the rows of V using the (column) permutation from the
1609 * first QRF. Also, scale the columns to make them unit in
1610 * Euclidean norm. This applies to all cases.
1611 *
1612  temp1 = dsqrt(dble(n)) * epsln
1613  DO 1972 q = 1, n
1614  DO 972 p = 1, n
1615  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1616  972 CONTINUE
1617  DO 973 p = 1, n
1618  v(p,q) = cwork(2*n+n*nr+nr+p)
1619  973 CONTINUE
1620  xsc = one / dznrm2( n, v(1,q), 1 )
1621  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1622  $ CALL zdscal( n, xsc, v(1,q), 1 )
1623  1972 CONTINUE
1624 * At this moment, V contains the right singular vectors of A.
1625 * Next, assemble the left singular vector matrix U (M x N).
1626  IF ( nr .LT. m ) THEN
1627  CALL zlaset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1628  IF ( nr .LT. n1 ) THEN
1629  CALL zlaset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1630  CALL zlaset('A',m-nr,n1-nr,czero,cone,
1631  $ u(nr+1,nr+1),ldu)
1632  END IF
1633  END IF
1634 *
1635 * The Q matrix from the first QRF is built into the left singular
1636 * matrix U. This applies to all cases.
1637 *
1638  CALL zunmqr( 'Left', 'No_Tr', m, n1, n, a, lda, cwork, u,
1639  $ ldu, cwork(n+1), lwork-n, ierr )
1640 
1641 * The columns of U are normalized. The cost is O(M*N) flops.
1642  temp1 = dsqrt(dble(m)) * epsln
1643  DO 1973 p = 1, nr
1644  xsc = one / dznrm2( m, u(1,p), 1 )
1645  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1646  $ CALL zdscal( m, xsc, u(1,p), 1 )
1647  1973 CONTINUE
1648 *
1649 * If the initial QRF is computed with row pivoting, the left
1650 * singular vectors must be adjusted.
1651 *
1652  IF ( rowpiv )
1653  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1654 *
1655  ELSE
1656 *
1657 * .. the initial matrix A has almost orthogonal columns and
1658 * the second QRF is not needed
1659 *
1660  CALL zlacpy( 'Upper', n, n, a, lda, cwork(n+1), n )
1661  IF ( l2pert ) THEN
1662  xsc = dsqrt(small)
1663  DO 5970 p = 2, n
1664  ctemp = xsc * cwork( n + (p-1)*n + p )
1665  DO 5971 q = 1, p - 1
1666 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
1667 * $ ABS(CWORK(N+(p-1)*N+q)) )
1668  cwork(n+(q-1)*n+p)=-ctemp
1669  5971 CONTINUE
1670  5970 CONTINUE
1671  ELSE
1672  CALL zlaset( 'Lower',n-1,n-1,czero,czero,cwork(n+2),n )
1673  END IF
1674 *
1675  CALL zgesvj( 'Upper', 'U', 'N', n, n, cwork(n+1), n, sva,
1676  $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
1677  $ info )
1678 *
1679  scalem = rwork(1)
1680  numrank = nint(rwork(2))
1681  DO 6970 p = 1, n
1682  CALL zcopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
1683  CALL zdscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
1684  6970 CONTINUE
1685 *
1686  CALL ztrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1687  $ cone, a, lda, cwork(n+1), n )
1688  DO 6972 p = 1, n
1689  CALL zcopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
1690  6972 CONTINUE
1691  temp1 = dsqrt(dble(n))*epsln
1692  DO 6971 p = 1, n
1693  xsc = one / dznrm2( n, v(1,p), 1 )
1694  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1695  $ CALL zdscal( n, xsc, v(1,p), 1 )
1696  6971 CONTINUE
1697 *
1698 * Assemble the left singular vector matrix U (M x N).
1699 *
1700  IF ( n .LT. m ) THEN
1701  CALL zlaset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
1702  IF ( n .LT. n1 ) THEN
1703  CALL zlaset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
1704  CALL zlaset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
1705  END IF
1706  END IF
1707  CALL zunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1708  $ ldu, cwork(n+1), lwork-n, ierr )
1709  temp1 = dsqrt(dble(m))*epsln
1710  DO 6973 p = 1, n1
1711  xsc = one / dznrm2( m, u(1,p), 1 )
1712  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1713  $ CALL zdscal( m, xsc, u(1,p), 1 )
1714  6973 CONTINUE
1715 *
1716  IF ( rowpiv )
1717  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1718 *
1719  END IF
1720 *
1721 * end of the >> almost orthogonal case << in the full SVD
1722 *
1723  ELSE
1724 *
1725 * This branch deploys a preconditioned Jacobi SVD with explicitly
1726 * accumulated rotations. It is included as optional, mainly for
1727 * experimental purposes. It does perfom well, and can also be used.
1728 * In this implementation, this branch will be automatically activated
1729 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1730 * to be greater than the overflow threshold. This is because the
1731 * a posteriori computation of the singular vectors assumes robust
1732 * implementation of BLAS and some LAPACK procedures, capable of working
1733 * in presence of extreme values. Since that is not always the case, ...
1734 *
1735  DO 7968 p = 1, nr
1736  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1737  CALL zlacgv( n-p+1, v(p,p), 1 )
1738  7968 CONTINUE
1739 *
1740  IF ( l2pert ) THEN
1741  xsc = dsqrt(small/epsln)
1742  DO 5969 q = 1, nr
1743  ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
1744  DO 5968 p = 1, n
1745  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1746  $ .OR. ( p .LT. q ) )
1747 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1748  $ v(p,q) = ctemp
1749  IF ( p .LT. q ) v(p,q) = - v(p,q)
1750  5968 CONTINUE
1751  5969 CONTINUE
1752  ELSE
1753  CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1754  END IF
1755 
1756  CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1757  $ lwork-2*n, ierr )
1758  CALL zlacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
1759 *
1760  DO 7969 p = 1, nr
1761  CALL zcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1762  CALL zlacgv( nr-p+1, u(p,p), 1 )
1763  7969 CONTINUE
1764 
1765  IF ( l2pert ) THEN
1766  xsc = dsqrt(small/epsln)
1767  DO 9970 q = 2, nr
1768  DO 9971 p = 1, q - 1
1769  ctemp = dcmplx(xsc * dmin1(abs(u(p,p)),abs(u(q,q))),
1770  $ zero)
1771 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
1772  u(p,q) = - ctemp
1773  9971 CONTINUE
1774  9970 CONTINUE
1775  ELSE
1776  CALL zlaset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1777  END IF
1778 
1779  CALL zgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
1780  $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
1781  $ rwork, lrwork, info )
1782  scalem = rwork(1)
1783  numrank = nint(rwork(2))
1784 
1785  IF ( nr .LT. n ) THEN
1786  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1787  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1788  CALL zlaset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
1789  END IF
1790 
1791  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1792  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1793 *
1794 * Permute the rows of V using the (column) permutation from the
1795 * first QRF. Also, scale the columns to make them unit in
1796 * Euclidean norm. This applies to all cases.
1797 *
1798  temp1 = dsqrt(dble(n)) * epsln
1799  DO 7972 q = 1, n
1800  DO 8972 p = 1, n
1801  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1802  8972 CONTINUE
1803  DO 8973 p = 1, n
1804  v(p,q) = cwork(2*n+n*nr+nr+p)
1805  8973 CONTINUE
1806  xsc = one / dznrm2( n, v(1,q), 1 )
1807  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1808  $ CALL zdscal( n, xsc, v(1,q), 1 )
1809  7972 CONTINUE
1810 *
1811 * At this moment, V contains the right singular vectors of A.
1812 * Next, assemble the left singular vector matrix U (M x N).
1813 *
1814  IF ( nr .LT. m ) THEN
1815  CALL zlaset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
1816  IF ( nr .LT. n1 ) THEN
1817  CALL zlaset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
1818  CALL zlaset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
1819  END IF
1820  END IF
1821 *
1822  CALL zunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1823  $ ldu, cwork(n+1), lwork-n, ierr )
1824 *
1825  IF ( rowpiv )
1826  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1827 *
1828 *
1829  END IF
1830  IF ( transp ) THEN
1831 * .. swap U and V because the procedure worked on A^*
1832  DO 6974 p = 1, n
1833  CALL zswap( n, u(1,p), 1, v(1,p), 1 )
1834  6974 CONTINUE
1835  END IF
1836 *
1837  END IF
1838 * end of the full SVD
1839 *
1840 * Undo scaling, if necessary (and possible)
1841 *
1842  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1843  CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1844  uscal1 = one
1845  uscal2 = one
1846  END IF
1847 *
1848  IF ( nr .LT. n ) THEN
1849  DO 3004 p = nr+1, n
1850  sva(p) = zero
1851  3004 CONTINUE
1852  END IF
1853 *
1854  rwork(1) = uscal2 * scalem
1855  rwork(2) = uscal1
1856  IF ( errest ) rwork(3) = sconda
1857  IF ( lsvec .AND. rsvec ) THEN
1858  rwork(4) = condr1
1859  rwork(5) = condr2
1860  END IF
1861  IF ( l2tran ) THEN
1862  rwork(6) = entra
1863  rwork(7) = entrat
1864  END IF
1865 *
1866  iwork(1) = nr
1867  iwork(2) = numrank
1868  iwork(3) = warning
1869 *
1870  RETURN
1871 * ..
1872 * .. END OF ZGEJSV
1873 * ..
subroutine zlacpy(UPLO, M, N, A, LDA, B, LDB)
ZLACPY copies all or part of one two-dimensional array to another.
Definition: zlacpy.f:105
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:53
subroutine zcopy(N, ZX, INCX, ZY, INCY)
ZCOPY
Definition: zcopy.f:52
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
subroutine zlaswp(N, A, LDA, K1, K2, IPIV, INCX)
ZLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: zlaswp.f:116
subroutine zgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
ZGEQRF VARIANT: left-looking Level 3 BLAS of the algorithm.
Definition: zgeqrf.f:151
subroutine zswap(N, ZX, INCX, ZY, INCY)
ZSWAP
Definition: zswap.f:52
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:145
subroutine zlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
ZLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: zlaset.f:108
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine zlassq(N, X, INCX, SCALE, SUMSQ)
ZLASSQ updates a sum of squares represented in scaled form.
Definition: zlassq.f:108
subroutine zunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
ZUNMQR
Definition: zunmqr.f:169
subroutine zgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
ZGESVJ
Definition: zgesvj.f:344
double precision function dznrm2(N, X, INCX)
DZNRM2
Definition: dznrm2.f:56
subroutine zungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
ZUNGQR
Definition: zungqr.f:130
subroutine zunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
ZUNMLQ
Definition: zunmlq.f:169
integer function izamax(N, ZX, INCX)
IZAMAX
Definition: izamax.f:53
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:55
subroutine zgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
ZGEQP3
Definition: zgeqp3.f:161
subroutine zdscal(N, DA, ZX, INCX)
ZDSCAL
Definition: zdscal.f:54
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
subroutine zgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
ZGELQF
Definition: zgelqf.f:137
subroutine zlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
ZLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: zlascl.f:145
subroutine zpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
ZPOCON
Definition: zpocon.f:123
subroutine ztrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
ZTRSM
Definition: ztrsm.f:182
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine zlacgv(N, X, INCX)
ZLACGV conjugates a complex vector.
Definition: zlacgv.f:76

Here is the call graph for this function:

Here is the caller graph for this function: