0001:       SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA,
0002:      &                     MV, V, LDV, WORK, LWORK, INFO )
0003: *
0004: *  -- LAPACK routine (version 3.2)                                    --
0005: *
0006: *  -- Contributed by Zlatko Drmac of the University of Zagreb and     --
0007: *  -- Kresimir Veselic of the Fernuniversitaet Hagen                  --
0008: *  -- November 2008                                                   --
0009: *
0010: *  -- LAPACK is a software package provided by Univ. of Tennessee,    --
0011: *  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
0012: *
0013: * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
0014: * SIGMA is a library of algorithms for highly accurate algorithms for
0015: * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
0016: * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
0017: *
0018: *     -#- Scalar Arguments -#-
0019: *
0020:       IMPLICIT    NONE
0021:       INTEGER     INFO, LDA, LDV, LWORK, M, MV, N
0022:       CHARACTER*1 JOBA, JOBU, JOBV
0023: *
0024: *     -#- Array Arguments -#-
0025: *
0026:       DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ), WORK( LWORK )
0027: *     ..
0028: *
0029: *  Purpose
0030: *  ~~~~~~~
0031: *  DGESVJ computes the singular value decomposition (SVD) of a real
0032: *  M-by-N matrix A, where M >= N. The SVD of A is written as
0033: *                                     [++]   [xx]   [x0]   [xx]
0034: *               A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
0035: *                                     [++]   [xx]
0036: *  where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
0037: *  matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
0038: *  of SIGMA are the singular values of A. The columns of U and V are the
0039: *  left and the right singular vectors of A, respectively.
0040: *
0041: *  Further Details
0042: *  ~~~~~~~~~~~~~~~
0043: *  The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
0044: *  rotations. The rotations are implemented as fast scaled rotations of
0045: *  Anda and Park [1]. In the case of underflow of the Jacobi angle, a
0046: *  modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
0047: *  column interchanges of de Rijk [2]. The relative accuracy of the computed
0048: *  singular values and the accuracy of the computed singular vectors (in
0049: *  angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
0050: *  The condition number that determines the accuracy in the full rank case
0051: *  is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
0052: *  spectral condition number. The best performance of this Jacobi SVD
0053: *  procedure is achieved if used in an  accelerated version of Drmac and
0054: *  Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
0055: *  Some tunning parameters (marked with [TP]) are available for the
0056: *  implementer.
0057: *  The computational range for the nonzero singular values is the  machine
0058: *  number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
0059: *  denormalized singular values can be computed with the corresponding
0060: *  gradual loss of accurate digits.
0061: *
0062: *  Contributors
0063: *  ~~~~~~~~~~~~
0064: *  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
0065: *
0066: *  References
0067: *  ~~~~~~~~~~
0068: * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
0069: *     SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
0070: * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
0071: *     singular value decomposition on a vector computer.
0072: *     SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
0073: * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
0074: * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
0075: *     value computation in floating point arithmetic.
0076: *     SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
0077: * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
0078: *     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
0079: *     LAPACK Working note 169.
0080: * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
0081: *     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
0082: *     LAPACK Working note 170.
0083: * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
0084: *     QSVD, (H,K)-SVD computations.
0085: *     Department of Mathematics, University of Zagreb, 2008.
0086: *
0087: *  Bugs, Examples and Comments
0088: *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
0089: *  Please report all bugs and send interesting test examples and comments to
0090: *  drmac@math.hr. Thank you.
0091: *
0092: *  Arguments
0093: *  ~~~~~~~~~
0094: *
0095: *  JOBA    (input) CHARACTER* 1
0096: *          Specifies the structure of A.
0097: *          = 'L': The input matrix A is lower triangular;
0098: *          = 'U': The input matrix A is upper triangular;
0099: *          = 'G': The input matrix A is general M-by-N matrix, M >= N.
0100: *
0101: *  JOBU    (input) CHARACTER*1
0102: *          Specifies whether to compute the left singular vectors
0103: *          (columns of U):
0104: *
0105: *          = 'U': The left singular vectors corresponding to the nonzero
0106: *                 singular values are computed and returned in the leading
0107: *                 columns of A. See more details in the description of A.
0108: *                 The default numerical orthogonality threshold is set to
0109: *                 approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
0110: *          = 'C': Analogous to JOBU='U', except that user can control the
0111: *                 level of numerical orthogonality of the computed left
0112: *                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
0113: *                 CTOL is given on input in the array WORK.
0114: *                 No CTOL smaller than ONE is allowed. CTOL greater
0115: *                 than 1 / EPS is meaningless. The option 'C'
0116: *                 can be used if M*EPS is satisfactory orthogonality
0117: *                 of the computed left singular vectors, so CTOL=M could
0118: *                 save few sweeps of Jacobi rotations.
0119: *                 See the descriptions of A and WORK(1).
0120: *          = 'N': The matrix U is not computed. However, see the
0121: *                 description of A.
0122: *
0123: *  JOBV    (input) CHARACTER*1
0124: *          Specifies whether to compute the right singular vectors, that
0125: *          is, the matrix V:
0126: *          = 'V' : the matrix V is computed and returned in the array V
0127: *          = 'A' : the Jacobi rotations are applied to the MV-by-N
0128: *                  array V. In other words, the right singular vector
0129: *                  matrix V is not computed explicitly, instead it is
0130: *                  applied to an MV-by-N matrix initially stored in the
0131: *                  first MV rows of V.
0132: *          = 'N' : the matrix V is not computed and the array V is not
0133: *                  referenced
0134: *
0135: *  M       (input) INTEGER
0136: *          The number of rows of the input matrix A.  M >= 0.
0137: *
0138: *  N       (input) INTEGER
0139: *          The number of columns of the input matrix A.
0140: *          M >= N >= 0.
0141: *
0142: *  A       (input/output) REAL array, dimension (LDA,N)
0143: *          On entry, the M-by-N matrix A.
0144: *          On exit,
0145: *          If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C':
0146: *          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0147: *                 If INFO .EQ. 0,
0148: *                 ~~~~~~~~~~~~~~~
0149: *                 RANKA orthonormal columns of U are returned in the
0150: *                 leading RANKA columns of the array A. Here RANKA <= N
0151: *                 is the number of computed singular values of A that are
0152: *                 above the underflow threshold DLAMCH('S'). The singular
0153: *                 vectors corresponding to underflowed or zero singular
0154: *                 values are not computed. The value of RANKA is returned
0155: *                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
0156: *                 descriptions of SVA and WORK. The computed columns of U
0157: *                 are mutually numerically orthogonal up to approximately
0158: *                 TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
0159: *                 see the description of JOBU.
0160: *                 If INFO .GT. 0,
0161: *                 ~~~~~~~~~~~~~~~
0162: *                 the procedure DGESVJ did not converge in the given number
0163: *                 of iterations (sweeps). In that case, the computed
0164: *                 columns of U may not be orthogonal up to TOL. The output
0165: *                 U (stored in A), SIGMA (given by the computed singular
0166: *                 values in SVA(1:N)) and V is still a decomposition of the
0167: *                 input matrix A in the sense that the residual
0168: *                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
0169: *
0170: *          If JOBU .EQ. 'N':
0171: *          ~~~~~~~~~~~~~~~~~
0172: *                 If INFO .EQ. 0
0173: *                 ~~~~~~~~~~~~~~
0174: *                 Note that the left singular vectors are 'for free' in the
0175: *                 one-sided Jacobi SVD algorithm. However, if only the
0176: *                 singular values are needed, the level of numerical
0177: *                 orthogonality of U is not an issue and iterations are
0178: *                 stopped when the columns of the iterated matrix are
0179: *                 numerically orthogonal up to approximately M*EPS. Thus,
0180: *                 on exit, A contains the columns of U scaled with the
0181: *                 corresponding singular values.
0182: *                 If INFO .GT. 0,
0183: *                 ~~~~~~~~~~~~~~~
0184: *                 the procedure DGESVJ did not converge in the given number
0185: *                 of iterations (sweeps).
0186: *
0187: *  LDA     (input) INTEGER
0188: *          The leading dimension of the array A.  LDA >= max(1,M).
0189: *
0190: *  SVA     (workspace/output) REAL array, dimension (N)
0191: *          On exit,
0192: *          If INFO .EQ. 0,
0193: *          ~~~~~~~~~~~~~~~
0194: *          depending on the value SCALE = WORK(1), we have:
0195: *                 If SCALE .EQ. ONE:
0196: *                 ~~~~~~~~~~~~~~~~~~
0197: *                 SVA(1:N) contains the computed singular values of A.
0198: *                 During the computation SVA contains the Euclidean column
0199: *                 norms of the iterated matrices in the array A.
0200: *                 If SCALE .NE. ONE:
0201: *                 ~~~~~~~~~~~~~~~~~~
0202: *                 The singular values of A are SCALE*SVA(1:N), and this
0203: *                 factored representation is due to the fact that some of the
0204: *                 singular values of A might underflow or overflow.
0205: *
0206: *          If INFO .GT. 0,
0207: *          ~~~~~~~~~~~~~~~
0208: *          the procedure DGESVJ did not converge in the given number of
0209: *          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
0210: *
0211: *  MV      (input) INTEGER
0212: *          If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
0213: *          is applied to the first MV rows of V. See the description of JOBV.
0214: *
0215: *  V       (input/output) REAL array, dimension (LDV,N)
0216: *          If JOBV = 'V', then V contains on exit the N-by-N matrix of
0217: *                         the right singular vectors;
0218: *          If JOBV = 'A', then V contains the product of the computed right
0219: *                         singular vector matrix and the initial matrix in
0220: *                         the array V.
0221: *          If JOBV = 'N', then V is not referenced.
0222: *
0223: *  LDV     (input) INTEGER
0224: *          The leading dimension of the array V, LDV .GE. 1.
0225: *          If JOBV .EQ. 'V', then LDV .GE. max(1,N).
0226: *          If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
0227: *
0228: *  WORK    (input/workspace/output) REAL array, dimension max(4,M+N).
0229: *          On entry,
0230: *          If JOBU .EQ. 'C',
0231: *          ~~~~~~~~~~~~~~~~~
0232: *          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
0233: *                    The process stops if all columns of A are mutually
0234: *                    orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
0235: *                    It is required that CTOL >= ONE, i.e. it is not
0236: *                    allowed to force the routine to obtain orthogonality
0237: *                    below EPSILON.
0238: *          On exit,
0239: *          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
0240: *                    are the computed singular vcalues of A.
0241: *                    (See description of SVA().)
0242: *          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
0243: *                    singular values.
0244: *          WORK(3) = NINT(WORK(3)) is the number of the computed singular
0245: *                    values that are larger than the underflow threshold.
0246: *          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
0247: *                    rotations needed for numerical convergence.
0248: *          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
0249: *                    This is useful information in cases when DGESVJ did
0250: *                    not converge, as it can be used to estimate whether
0251: *                    the output is stil useful and for post festum analysis.
0252: *          WORK(6) = the largest absolute value over all sines of the
0253: *                    Jacobi rotation angles in the last sweep. It can be
0254: *                    useful for a post festum analysis.
0255: *
0256: *  LWORK   length of WORK, WORK >= MAX(6,M+N)
0257: *
0258: *  INFO    (output) INTEGER
0259: *          = 0 : successful exit.
0260: *          < 0 : if INFO = -i, then the i-th argument had an illegal value
0261: *          > 0 : DGESVJ did not converge in the maximal allowed number (30)
0262: *                of sweeps. The output may still be useful. See the
0263: *                description of WORK.
0264: *
0265: *     Local Parameters
0266: *
0267:       DOUBLE PRECISION   ZERO,  HALF,         ONE,         TWO
0268:       PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0, TWO = 2.0D0 )
0269:       INTEGER     NSWEEP
0270:       PARAMETER ( NSWEEP = 30 )
0271: *
0272: *     Local Scalars
0273: *
0274:       DOUBLE PRECISION AAPP, AAPP0, AAPQ,  AAQQ,    APOAQ,     AQOAP,
0275:      &        BIG,     BIGTHETA, CS,       CTOL,    EPSILON,   LARGE,
0276:      &        MXAAPQ,  MXSINJ,   ROOTBIG,  ROOTEPS, ROOTSFMIN, ROOTTOL,
0277:      &        SCALE,   SFMIN,    SMALL,    SN,      T,         TEMP1,
0278:      &        THETA,   THSIGN,   TOL
0279:       INTEGER BLSKIP,  EMPTSW,   i,        ibr,     IERR,      igl,
0280:      &        IJBLSK,  ir1,      ISWROT,   jbc,     jgl,       KBL,
0281:      &        LKAHEAD, MVL,      N2,       N34,     N4,        NBL,
0282:      &        NOTROT,  p,        PSKIPPED, q,       ROWSKIP,   SWBAND
0283:       LOGICAL APPLV,   GOSCALE,  LOWER,    LSVEC,   NOSCALE,   ROTOK,
0284:      &        RSVEC,   UCTOL,    UPPER
0285: *
0286: *     Local Arrays
0287: *
0288:       DOUBLE PRECISION FASTR(5)
0289: *
0290: *     Intrinsic Functions
0291: *
0292:       INTRINSIC DABS, DMAX1, DMIN1, DBLE, MIN0, DSIGN, DSQRT
0293: *
0294: *     External Functions
0295: *     .. from BLAS
0296:       DOUBLE PRECISION DDOT, DNRM2
0297:       EXTERNAL         DDOT, DNRM2
0298:       INTEGER          IDAMAX
0299:       EXTERNAL         IDAMAX
0300: *     .. from LAPACK
0301:       DOUBLE PRECISION DLAMCH
0302:       EXTERNAL         DLAMCH
0303:       LOGICAL          LSAME
0304:       EXTERNAL         LSAME
0305: *
0306: *     External Subroutines
0307: *     .. from BLAS
0308:       EXTERNAL  DAXPY,  DCOPY, DROTM, DSCAL, DSWAP
0309: *     .. from LAPACK
0310:       EXTERNAL  DLASCL, DLASET, DLASSQ, XERBLA
0311: *
0312:       EXTERNAL  DGSVJ0, DGSVJ1
0313: *
0314: *     Test the input arguments
0315: *
0316:       LSVEC = LSAME( JOBU, 'U' )
0317:       UCTOL = LSAME( JOBU, 'C' )
0318:       RSVEC = LSAME( JOBV, 'V' )
0319:       APPLV = LSAME( JOBV, 'A' )
0320:       UPPER = LSAME( JOBA, 'U' )
0321:       LOWER = LSAME( JOBA, 'L' )
0322: *
0323:       IF ( .NOT.( UPPER .OR. LOWER .OR. LSAME(JOBA,'G') ) ) THEN
0324:          INFO = - 1
0325:       ELSE IF ( .NOT.( LSVEC .OR. UCTOL .OR. LSAME(JOBU,'N') ) ) THEN
0326:          INFO = - 2
0327:       ELSE IF ( .NOT.( RSVEC .OR. APPLV .OR. LSAME(JOBV,'N') ) ) THEN
0328:          INFO = - 3
0329:       ELSE IF ( M .LT. 0 ) THEN
0330:          INFO = - 4
0331:       ELSE IF ( ( N .LT. 0 ) .OR. ( N .GT. M ) ) THEN
0332:          INFO = - 5
0333:       ELSE IF ( LDA .LT. M ) THEN
0334:          INFO = - 7
0335:       ELSE IF ( MV .LT. 0 ) THEN
0336:          INFO = - 9
0337:       ELSE IF ( ( RSVEC .AND. (LDV .LT. N ) ) .OR.
0338:      &          ( APPLV .AND. (LDV .LT. MV) ) ) THEN
0339:          INFO = -11
0340:       ELSE IF ( UCTOL .AND. (WORK(1) .LE. ONE) ) THEN
0341:          INFO = - 12
0342:       ELSE IF ( LWORK .LT. MAX0( M + N , 6 ) ) THEN
0343:          INFO = - 13
0344:       ELSE
0345:          INFO =   0
0346:       END IF
0347: *
0348: *     #:(
0349:       IF ( INFO .NE. 0 ) THEN
0350:          CALL XERBLA( 'DGESVJ', -INFO )
0351:          RETURN
0352:       END IF
0353: *
0354: * #:) Quick return for void matrix
0355: *
0356:       IF ( ( M .EQ. 0 ) .OR. ( N .EQ. 0 ) ) RETURN
0357: *
0358: *     Set numerical parameters
0359: *     The stopping criterion for Jacobi rotations is
0360: *
0361: *     max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
0362: *
0363: *     where EPS is the round-off and CTOL is defined as follows:
0364: *
0365:       IF ( UCTOL ) THEN
0366: *        ... user controlled
0367:          CTOL = WORK(1)
0368:       ELSE
0369: *        ... default
0370:          IF ( LSVEC .OR. RSVEC .OR. APPLV ) THEN
0371:             CTOL = DSQRT(DBLE(M))
0372:          ELSE
0373:             CTOL = DBLE(M)
0374:          END IF
0375:       END IF
0376: *     ... and the machine dependent parameters are
0377: *[!]  (Make sure that DLAMCH() works properly on the target machine.)
0378: *
0379:       EPSILON     = DLAMCH('Epsilon')
0380:       ROOTEPS     = DSQRT(EPSILON)
0381:       SFMIN       = DLAMCH('SafeMinimum')
0382:       ROOTSFMIN   = DSQRT(SFMIN)
0383:       SMALL       = SFMIN   / EPSILON
0384:       BIG         = DLAMCH('Overflow')
0385: *     BIG         = ONE    / SFMIN
0386:       ROOTBIG     = ONE   / ROOTSFMIN
0387:       LARGE       = BIG  / DSQRT(DBLE(M*N))
0388:       BIGTHETA    = ONE / ROOTEPS
0389: *
0390:       TOL         = CTOL * EPSILON
0391:       ROOTTOL     = DSQRT(TOL)
0392: *
0393:       IF ( DBLE(M)*EPSILON .GE. ONE ) THEN
0394:          INFO = - 5
0395:          CALL XERBLA( 'DGESVJ', -INFO )
0396:          RETURN
0397:       END IF
0398: *
0399: *     Initialize the right singular vector matrix.
0400: *
0401:       IF ( RSVEC )  THEN
0402:          MVL = N
0403:          CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV )
0404:       ELSE IF ( APPLV ) THEN
0405:          MVL = MV
0406:       END IF
0407:       RSVEC = RSVEC .OR. APPLV
0408: *
0409: *     Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
0410: *(!)  If necessary, scale A to protect the largest singular value
0411: *     from overflow. It is possible that saving the largest singular
0412: *     value destroys the information about the small ones.
0413: *     This initial scaling is almost minimal in the sense that the
0414: *     goal is to make sure that no column norm overflows, and that
0415: *     DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
0416: *     in A are detected, the procedure returns with INFO=-6.
0417: *
0418:       SCALE    = ONE / DSQRT(DBLE(M)*DBLE(N))
0419:       NOSCALE  = .TRUE.
0420:       GOSCALE  = .TRUE.
0421: *
0422:       IF ( LOWER ) THEN
0423: *        the input matrix is M-by-N lower triangular (trapezoidal)
0424:          DO 1874 p = 1, N
0425:             AAPP = ZERO
0426:             AAQQ = ZERO
0427:             CALL DLASSQ( M-p+1, A(p,p), 1, AAPP, AAQQ )
0428:             IF ( AAPP .GT. BIG ) THEN
0429:                INFO = - 6
0430:                CALL XERBLA( 'DGESVJ', -INFO )
0431:                RETURN
0432:             END IF
0433:             AAQQ = DSQRT(AAQQ)
0434:             IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCALE ) THEN
0435:                SVA(p)   = AAPP * AAQQ
0436:             ELSE
0437:                NOSCALE = .FALSE.
0438:                SVA(p)   = AAPP * ( AAQQ * SCALE )
0439:                IF ( GOSCALE ) THEN
0440:                   GOSCALE = .FALSE.
0441:                   DO 1873 q = 1, p - 1
0442:                      SVA(q) = SVA(q)*SCALE
0443:  1873             CONTINUE
0444:                END IF
0445:             END IF
0446:  1874    CONTINUE
0447:       ELSE IF ( UPPER ) THEN
0448: *        the input matrix is M-by-N upper triangular (trapezoidal)
0449:          DO 2874 p = 1, N
0450:             AAPP = ZERO
0451:             AAQQ = ZERO
0452:             CALL DLASSQ( p, A(1,p), 1, AAPP, AAQQ )
0453:             IF ( AAPP .GT. BIG ) THEN
0454:                INFO = - 6
0455:                CALL XERBLA( 'DGESVJ', -INFO )
0456:                RETURN
0457:             END IF
0458:             AAQQ = DSQRT(AAQQ)
0459:             IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCALE ) THEN
0460:                SVA(p)   = AAPP * AAQQ
0461:             ELSE
0462:                NOSCALE = .FALSE.
0463:                SVA(p)   = AAPP * ( AAQQ * SCALE )
0464:                IF ( GOSCALE ) THEN
0465:                   GOSCALE = .FALSE.
0466:                   DO 2873 q = 1, p - 1
0467:                      SVA(q) = SVA(q)*SCALE
0468:  2873             CONTINUE
0469:                END IF
0470:             END IF
0471:  2874    CONTINUE
0472:       ELSE
0473: *        the input matrix is M-by-N general dense
0474:          DO 3874 p = 1, N
0475:             AAPP = ZERO
0476:             AAQQ = ZERO
0477:             CALL DLASSQ( M, A(1,p), 1, AAPP, AAQQ )
0478:             IF ( AAPP .GT. BIG ) THEN
0479:                INFO = - 6
0480:                CALL XERBLA( 'DGESVJ', -INFO )
0481:                RETURN
0482:             END IF
0483:             AAQQ = DSQRT(AAQQ)
0484:             IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCALE ) THEN
0485:                SVA(p)  = AAPP * AAQQ
0486:             ELSE
0487:                NOSCALE = .FALSE.
0488:                SVA(p)  = AAPP * ( AAQQ * SCALE )
0489:                IF ( GOSCALE ) THEN
0490:                   GOSCALE = .FALSE.
0491:                   DO 3873 q = 1, p - 1
0492:                      SVA(q) = SVA(q)*SCALE
0493:  3873             CONTINUE
0494:                END IF
0495:             END IF
0496:  3874    CONTINUE
0497:       END IF
0498: *
0499:       IF ( NOSCALE ) SCALE = ONE
0500: *
0501: *     Move the smaller part of the spectrum from the underflow threshold
0502: *(!)  Start by determining the position of the nonzero entries of the
0503: *     array SVA() relative to ( SFMIN, BIG ).
0504: *
0505:       AAPP = ZERO
0506:       AAQQ = BIG
0507:       DO 4781 p = 1, N
0508:          IF ( SVA(p) .NE. ZERO ) AAQQ = DMIN1( AAQQ, SVA(p) )
0509:          AAPP = DMAX1( AAPP, SVA(p) )
0510:  4781 CONTINUE
0511: *
0512: * #:) Quick return for zero matrix
0513: *
0514:       IF ( AAPP .EQ. ZERO ) THEN
0515:          IF ( LSVEC ) CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA )
0516:          WORK(1) = ONE
0517:          WORK(2) = ZERO
0518:          WORK(3) = ZERO
0519:          WORK(4) = ZERO
0520:          WORK(5) = ZERO
0521:          WORK(6) = ZERO
0522:          RETURN
0523:       END IF
0524: *
0525: * #:) Quick return for one-column matrix
0526: *
0527:       IF ( N .EQ. 1 ) THEN
0528:          IF ( LSVEC )
0529:      &      CALL DLASCL( 'G',0,0,SVA(1),SCALE,M,1,A(1,1),LDA,IERR )
0530:          WORK(1) = ONE / SCALE
0531:          IF ( SVA(1) .GE. SFMIN ) THEN
0532:             WORK(2) = ONE
0533:          ELSE
0534:             WORK(2) = ZERO
0535:          END IF
0536:          WORK(3) = ZERO
0537:          WORK(4) = ZERO
0538:          WORK(5) = ZERO
0539:          WORK(6) = ZERO
0540:          RETURN
0541:       END IF
0542: *
0543: *     Protect small singular values from underflow, and try to
0544: *     avoid underflows/overflows in computing Jacobi rotations.
0545: *
0546:       SN    = DSQRT( SFMIN / EPSILON  )
0547:       TEMP1 = DSQRT(  BIG /  DBLE(N) )
0548:       IF ( (AAPP.LE.SN).OR.(AAQQ.GE.TEMP1)
0549:      &   .OR.((SN.LE.AAQQ).AND.(AAPP.LE.TEMP1)) ) THEN
0550:          TEMP1 = DMIN1(BIG,TEMP1/AAPP)
0551: *         AAQQ  = AAQQ*TEMP1
0552: *         AAPP  = AAPP*TEMP1
0553:       ELSE IF ( (AAQQ.LE.SN).AND.(AAPP.LE.TEMP1) ) THEN
0554:          TEMP1 = DMIN1( SN / AAQQ, BIG/(AAPP*DSQRT(DBLE(N))) )
0555: *         AAQQ  = AAQQ*TEMP1
0556: *         AAPP  = AAPP*TEMP1
0557:       ELSE IF ( (AAQQ.GE.SN).AND.(AAPP.GE.TEMP1) ) THEN
0558:          TEMP1 = DMAX1( SN / AAQQ, TEMP1 / AAPP )
0559: *         AAQQ  = AAQQ*TEMP1
0560: *         AAPP  = AAPP*TEMP1
0561:       ELSE IF ( (AAQQ.LE.SN).AND.(AAPP.GE.TEMP1) ) THEN
0562:          TEMP1 = DMIN1( SN / AAQQ, BIG / (DSQRT(DBLE(N))*AAPP))
0563: *         AAQQ  = AAQQ*TEMP1
0564: *         AAPP  = AAPP*TEMP1
0565:       ELSE
0566:          TEMP1 = ONE
0567:       END IF
0568: *
0569: *     Scale, if necessary
0570: *
0571:       IF ( TEMP1 .NE. ONE ) THEN
0572:          CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR )
0573:       END IF
0574:       SCALE = TEMP1 * SCALE
0575:       IF ( SCALE .NE. ONE ) THEN
0576:          CALL DLASCL( JOBA, 0, 0, ONE, SCALE, M, N, A, LDA, IERR )
0577:          SCALE = ONE / SCALE
0578:       END IF
0579: *
0580: *     Row-cyclic Jacobi SVD algorithm with column pivoting
0581: *
0582:       EMPTSW   = ( N * ( N - 1 ) ) / 2
0583:       NOTROT   = 0
0584:       FASTR(1) = ZERO
0585: *
0586: *     A is represented in factored form A = A * diag(WORK), where diag(WORK)
0587: *     is initialized to identity. WORK is updated during fast scaled
0588: *     rotations.
0589: *
0590:       DO 1868 q = 1, N
0591:          WORK(q) = ONE
0592:  1868 CONTINUE
0593: *
0594: *
0595:       SWBAND = 3
0596: *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
0597: *     if DGESVJ is used as a computational routine in the preconditioned
0598: *     Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
0599: *     works on pivots inside a band-like region around the diagonal.
0600: *     The boundaries are determined dynamically, based on the number of
0601: *     pivots above a threshold.
0602: *
0603:       KBL = MIN0( 8, N )
0604: *[TP] KBL is a tuning parameter that defines the tile size in the
0605: *     tiling of the p-q loops of pivot pairs. In general, an optimal
0606: *     value of KBL depends on the matrix dimensions and on the
0607: *     parameters of the computer's memory.
0608: *
0609:       NBL = N / KBL
0610:       IF ( ( NBL * KBL ) .NE. N ) NBL = NBL + 1
0611: *
0612:       BLSKIP = KBL**2
0613: *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
0614: *
0615:       ROWSKIP = MIN0( 5, KBL )
0616: *[TP] ROWSKIP is a tuning parameter.
0617: *
0618:       LKAHEAD = 1
0619: *[TP] LKAHEAD is a tuning parameter.
0620: *
0621: *     Quasi block transformations, using the lower (upper) triangular
0622: *     structure of the input matrix. The quasi-block-cycling usually
0623: *     invokes cubic convergence. Big part of this cycle is done inside
0624: *     canonical subspaces of dimensions less than M.
0625: *
0626:       IF ( (LOWER .OR. UPPER) .AND. (N .GT. MAX0(64, 4*KBL)) ) THEN
0627: *[TP] The number of partition levels and the actual partition are
0628: *     tuning parameters.
0629:       N4     = N / 4
0630:       N2     = N / 2
0631:       N34    = 3 * N4
0632:       IF ( APPLV ) THEN
0633:          q = 0
0634:       ELSE
0635:          q = 1
0636:       END IF
0637: *
0638:       IF ( LOWER ) THEN
0639: *
0640: *     This works very well on lower triangular matrices, in particular
0641: *     in the framework of the preconditioned Jacobi SVD (xGEJSV).
0642: *     The idea is simple:
0643: *     [+ 0 0 0]   Note that Jacobi transformations of [0 0]
0644: *     [+ + 0 0]                                       [0 0]
0645: *     [+ + x 0]   actually work on [x 0]              [x 0]
0646: *     [+ + x x]                    [x x].             [x x]
0647: *
0648:       CALL DGSVJ0(JOBV,M-N34,N-N34,A(N34+1,N34+1),LDA,WORK(N34+1),
0649:      &     SVA(N34+1),MVL,V(N34*q+1,N34+1),LDV,EPSILON,SFMIN,TOL,2,
0650:      &     WORK(N+1),LWORK-N,IERR )
0651: *
0652:       CALL DGSVJ0( JOBV,M-N2,N34-N2,A(N2+1,N2+1),LDA,WORK(N2+1),
0653:      &     SVA(N2+1),MVL,V(N2*q+1,N2+1),LDV,EPSILON,SFMIN,TOL,2,
0654:      &     WORK(N+1),LWORK-N,IERR )
0655: *
0656:       CALL DGSVJ1( JOBV,M-N2,N-N2,N4,A(N2+1,N2+1),LDA,WORK(N2+1),
0657:      &     SVA(N2+1),MVL,V(N2*q+1,N2+1),LDV,EPSILON,SFMIN,TOL,1,
0658:      &     WORK(N+1),LWORK-N,IERR )
0659: *
0660:       CALL DGSVJ0( JOBV,M-N4,N2-N4,A(N4+1,N4+1),LDA,WORK(N4+1),
0661:      &     SVA(N4+1),MVL,V(N4*q+1,N4+1),LDV,EPSILON,SFMIN,TOL,1,
0662:      &     WORK(N+1),LWORK-N,IERR )
0663: *
0664:       CALL DGSVJ0( JOBV,M,N4,A,LDA,WORK,SVA,MVL,V,LDV,EPSILON,
0665:      &     SFMIN,TOL,1,WORK(N+1),LWORK-N,IERR )
0666: *
0667:       CALL DGSVJ1( JOBV,M,N2,N4,A,LDA,WORK,SVA,MVL,V,LDV,EPSILON,
0668:      &     SFMIN,TOL,1,WORK(N+1),LWORK-N,IERR )
0669: *
0670: *
0671:       ELSE IF ( UPPER ) THEN
0672: *
0673: *
0674:       CALL DGSVJ0( JOBV,N4,N4,A,LDA,WORK,SVA,MVL,V,LDV,EPSILON,
0675:      &     SFMIN,TOL,2,WORK(N+1),LWORK-N,IERR )
0676: *
0677:       CALL DGSVJ0(JOBV,N2,N4,A(1,N4+1),LDA,WORK(N4+1),SVA(N4+1),MVL,
0678:      &     V(N4*q+1,N4+1),LDV,EPSILON,SFMIN,TOL,1,WORK(N+1),LWORK-N,
0679:      &     IERR )
0680: *
0681:       CALL DGSVJ1( JOBV,N2,N2,N4,A,LDA,WORK,SVA,MVL,V,LDV,EPSILON,
0682:      &     SFMIN,TOL,1,WORK(N+1),LWORK-N,IERR )
0683: *
0684:       CALL DGSVJ0( JOBV,N2+N4,N4,A(1,N2+1),LDA,WORK(N2+1),SVA(N2+1),MVL,
0685:      &     V(N2*q+1,N2+1),LDV,EPSILON,SFMIN,TOL,1,
0686:      &     WORK(N+1),LWORK-N,IERR )
0687: 
0688:       END IF
0689: *
0690:       END IF
0691: *
0692: *     -#- Row-cyclic pivot strategy with de Rijk's pivoting -#-
0693: *
0694:       DO 1993 i = 1, NSWEEP
0695: *
0696: *     .. go go go ...
0697: *
0698:       MXAAPQ = ZERO
0699:       MXSINJ = ZERO
0700:       ISWROT = 0
0701: *
0702:       NOTROT   = 0
0703:       PSKIPPED = 0
0704: *
0705: *     Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
0706: *     1 <= p < q <= N. This is the first step toward a blocked implementation
0707: *     of the rotations. New implementation, based on block transformations,
0708: *     is under development.
0709: *
0710:       DO 2000 ibr = 1, NBL
0711: *
0712:       igl = ( ibr - 1 ) * KBL + 1
0713: *
0714:       DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL - ibr )
0715: *
0716:       igl = igl + ir1 * KBL
0717: *
0718:       DO 2001 p = igl, MIN0( igl + KBL - 1, N - 1)
0719: *
0720: *     .. de Rijk's pivoting
0721: *
0722:       q   = IDAMAX( N-p+1, SVA(p), 1 ) + p - 1
0723:       IF ( p .NE. q ) THEN
0724:          CALL DSWAP( M, A(1,p), 1, A(1,q), 1 )
0725:          IF ( RSVEC ) CALL DSWAP( MVL, V(1,p), 1, V(1,q), 1 )
0726:          TEMP1   = SVA(p)
0727:          SVA(p)  = SVA(q)
0728:          SVA(q)  = TEMP1
0729:          TEMP1   = WORK(p)
0730:          WORK(p) = WORK(q)
0731:          WORK(q) = TEMP1
0732:       END IF
0733: *
0734:       IF ( ir1 .EQ. 0 ) THEN
0735: *
0736: *        Column norms are periodically updated by explicit
0737: *        norm computation.
0738: *        Caveat:
0739: *        Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
0740: *        as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
0741: *        overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
0742: *        underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
0743: *        Hence, DNRM2 cannot be trusted, not even in the case when
0744: *        the true norm is far from the under(over)flow boundaries.
0745: *        If properly implemented DNRM2 is available, the IF-THEN-ELSE
0746: *        below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
0747: *
0748:          IF ((SVA(p) .LT. ROOTBIG) .AND. (SVA(p) .GT. ROOTSFMIN)) THEN
0749:             SVA(p) = DNRM2( M, A(1,p), 1 ) * WORK(p)
0750:          ELSE
0751:             TEMP1 = ZERO
0752:             AAPP  = ZERO
0753:             CALL DLASSQ( M, A(1,p), 1, TEMP1, AAPP )
0754:             SVA(p) = TEMP1 * DSQRT(AAPP) * WORK(p)
0755:          END IF
0756:          AAPP = SVA(p)
0757:       ELSE
0758:          AAPP = SVA(p)
0759:       END IF
0760: *
0761:       IF ( AAPP .GT. ZERO ) THEN
0762: *
0763:       PSKIPPED = 0
0764: *
0765:       DO 2002 q = p + 1, MIN0( igl + KBL - 1, N )
0766: *
0767:       AAQQ = SVA(q)
0768: *
0769:       IF ( AAQQ .GT. ZERO ) THEN
0770: *
0771:          AAPP0 = AAPP
0772:          IF ( AAQQ .GE. ONE ) THEN
0773:             ROTOK  = ( SMALL*AAPP ) .LE. AAQQ
0774:             IF ( AAPP .LT. ( BIG / AAQQ ) ) THEN
0775:                AAPQ = ( DDOT(M, A(1,p), 1, A(1,q), 1 ) *
0776:      &                  WORK(p) * WORK(q) / AAQQ ) / AAPP
0777:             ELSE
0778:                CALL DCOPY( M, A(1,p), 1, WORK(N+1), 1 )
0779:                CALL DLASCL( 'G', 0, 0, AAPP, WORK(p), M,
0780:      &              1, WORK(N+1), LDA, IERR )
0781:                AAPQ = DDOT( M, WORK(N+1),1, A(1,q),1 )*WORK(q) / AAQQ
0782:             END IF
0783:          ELSE
0784:             ROTOK  = AAPP .LE. ( AAQQ / SMALL )
0785:             IF ( AAPP .GT.  ( SMALL / AAQQ ) ) THEN
0786:                AAPQ = ( DDOT( M, A(1,p), 1, A(1,q), 1 ) *
0787:      &               WORK(p) * WORK(q) / AAQQ ) / AAPP
0788:             ELSE
0789:                CALL DCOPY( M, A(1,q), 1, WORK(N+1), 1 )
0790:                CALL DLASCL( 'G', 0, 0, AAQQ, WORK(q), M,
0791:      &              1, WORK(N+1), LDA, IERR )
0792:                AAPQ = DDOT( M, WORK(N+1),1, A(1,p),1 )*WORK(p) / AAPP
0793:             END IF
0794:          END IF
0795: *
0796:          MXAAPQ = DMAX1( MXAAPQ, DABS(AAPQ) )
0797: *
0798: *        TO rotate or NOT to rotate, THAT is the question ...
0799: *
0800:          IF ( DABS( AAPQ ) .GT. TOL ) THEN
0801: *
0802: *           .. rotate
0803: *[RTD]      ROTATED = ROTATED + ONE
0804: *
0805:             IF ( ir1 .EQ. 0 ) THEN
0806:                NOTROT   = 0
0807:                PSKIPPED = 0
0808:                ISWROT   = ISWROT  + 1
0809:             END IF
0810: *
0811:             IF ( ROTOK ) THEN
0812: *
0813:                AQOAP = AAQQ / AAPP
0814:                APOAQ = AAPP / AAQQ
0815:                THETA = - HALF * DABS( AQOAP - APOAQ ) / AAPQ
0816: *
0817:                IF ( DABS( THETA ) .GT. BIGTHETA ) THEN
0818: *
0819:                   T        = HALF / THETA
0820:                   FASTR(3) =   T * WORK(p) / WORK(q)
0821:                   FASTR(4) = - T * WORK(q) / WORK(p)
0822:                   CALL DROTM( M,   A(1,p), 1, A(1,q), 1, FASTR )
0823:                   IF ( RSVEC )
0824:      &            CALL DROTM( MVL, V(1,p), 1, V(1,q), 1, FASTR )
0825:                   SVA(q) = AAQQ*DSQRT( DMAX1(ZERO,ONE + T*APOAQ*AAPQ) )
0826:                   AAPP   = AAPP*DSQRT( ONE - T*AQOAP*AAPQ )
0827:                   MXSINJ = DMAX1( MXSINJ, DABS(T) )
0828: *
0829:                ELSE
0830: *
0831: *                 .. choose correct signum for THETA and rotate
0832: *
0833:                   THSIGN =  - DSIGN(ONE,AAPQ)
0834:                   T  = ONE / ( THETA + THSIGN*DSQRT(ONE+THETA*THETA) )
0835:                   CS = DSQRT( ONE / ( ONE + T*T ) )
0836:                   SN = T * CS
0837: *
0838:                   MXSINJ = DMAX1( MXSINJ, DABS(SN) )
0839:                   SVA(q) = AAQQ*DSQRT( DMAX1(ZERO, ONE+T*APOAQ*AAPQ) )
0840:                   AAPP   = AAPP*DSQRT( DMAX1(ZERO, ONE-T*AQOAP*AAPQ) )
0841: *
0842:                   APOAQ = WORK(p) / WORK(q)
0843:                   AQOAP = WORK(q) / WORK(p)
0844:                   IF ( WORK(p) .GE. ONE ) THEN
0845:                      IF ( WORK(q) .GE.  ONE ) THEN
0846:                         FASTR(3) =   T * APOAQ
0847:                         FASTR(4) = - T * AQOAP
0848:                         WORK(p)  = WORK(p) * CS
0849:                         WORK(q)  = WORK(q) * CS
0850:                         CALL DROTM( M,   A(1,p),1, A(1,q),1, FASTR )
0851:                         IF ( RSVEC )
0852:      &                  CALL DROTM( MVL, V(1,p),1, V(1,q),1, FASTR )
0853:                      ELSE
0854:                         CALL DAXPY( M,    -T*AQOAP, A(1,q),1, A(1,p),1 )
0855:                         CALL DAXPY( M, CS*SN*APOAQ, A(1,p),1, A(1,q),1 )
0856:                         WORK(p) = WORK(p) * CS
0857:                         WORK(q) = WORK(q) / CS
0858:                         IF ( RSVEC ) THEN
0859:                         CALL DAXPY(MVL,   -T*AQOAP, V(1,q),1,V(1,p),1)
0860:                         CALL DAXPY(MVL,CS*SN*APOAQ, V(1,p),1,V(1,q),1)
0861:                         END IF
0862:                      END IF
0863:                   ELSE
0864:                      IF ( WORK(q) .GE. ONE ) THEN
0865:                         CALL DAXPY( M,     T*APOAQ, A(1,p),1, A(1,q),1 )
0866:                         CALL DAXPY( M,-CS*SN*AQOAP, A(1,q),1, A(1,p),1 )
0867:                         WORK(p) = WORK(p) / CS
0868:                         WORK(q) = WORK(q) * CS
0869:                         IF ( RSVEC ) THEN
0870:                         CALL DAXPY(MVL, T*APOAQ,    V(1,p),1,V(1,q),1)
0871:                         CALL DAXPY(MVL,-CS*SN*AQOAP,V(1,q),1,V(1,p),1)
0872:                         END IF
0873:                      ELSE
0874:                         IF ( WORK(p) .GE. WORK(q) ) THEN
0875:                            CALL DAXPY( M,-T*AQOAP,   A(1,q),1,A(1,p),1 )
0876:                            CALL DAXPY( M,CS*SN*APOAQ,A(1,p),1,A(1,q),1 )
0877:                            WORK(p) = WORK(p) * CS
0878:                            WORK(q) = WORK(q) / CS
0879:                            IF ( RSVEC ) THEN
0880:                            CALL DAXPY(MVL, -T*AQOAP,  V(1,q),1,V(1,p),1)
0881:                            CALL DAXPY(MVL,CS*SN*APOAQ,V(1,p),1,V(1,q),1)
0882:                            END IF
0883:                         ELSE
0884:                            CALL DAXPY( M, T*APOAQ,    A(1,p),1,A(1,q),1)
0885:                            CALL DAXPY( M,-CS*SN*AQOAP,A(1,q),1,A(1,p),1)
0886:                            WORK(p) = WORK(p) / CS
0887:                            WORK(q) = WORK(q) * CS
0888:                           IF ( RSVEC ) THEN
0889:                           CALL DAXPY(MVL, T*APOAQ,    V(1,p),1,V(1,q),1)
0890:                           CALL DAXPY(MVL,-CS*SN*AQOAP,V(1,q),1,V(1,p),1)
0891:                           END IF
0892:                         END IF
0893:                      END IF
0894:                   ENDIF
0895:                END IF
0896: *
0897:             ELSE
0898: *              .. have to use modified Gram-Schmidt like transformation
0899:                CALL DCOPY( M, A(1,p), 1, WORK(N+1), 1 )
0900:                CALL DLASCL( 'G',0,0,AAPP,ONE,M,1,WORK(N+1),LDA,IERR )
0901:                CALL DLASCL( 'G',0,0,AAQQ,ONE,M,1,   A(1,q),LDA,IERR )
0902:                TEMP1 = -AAPQ * WORK(p) / WORK(q)
0903:                CALL DAXPY ( M, TEMP1, WORK(N+1), 1, A(1,q), 1 )
0904:                CALL DLASCL( 'G',0,0,ONE,AAQQ,M,1,   A(1,q),LDA,IERR )
0905:                SVA(q) = AAQQ*DSQRT( DMAX1( ZERO, ONE - AAPQ*AAPQ ) )
0906:                MXSINJ = DMAX1( MXSINJ, SFMIN )
0907:             END IF
0908: *           END IF ROTOK THEN ... ELSE
0909: *
0910: *           In the case of cancellation in updating SVA(q), SVA(p)
0911: *           recompute SVA(q), SVA(p).
0912: *
0913:             IF ( (SVA(q) / AAQQ )**2 .LE. ROOTEPS  ) THEN
0914:                IF ((AAQQ .LT. ROOTBIG).AND.(AAQQ .GT. ROOTSFMIN)) THEN
0915:                   SVA(q) = DNRM2( M, A(1,q), 1 ) * WORK(q)
0916:                ELSE
0917:                   T    = ZERO
0918:                   AAQQ = ZERO
0919:                   CALL DLASSQ( M, A(1,q), 1, T, AAQQ )
0920:                   SVA(q) = T * DSQRT(AAQQ) * WORK(q)
0921:                END IF
0922:             END IF
0923:             IF ( ( AAPP / AAPP0) .LE. ROOTEPS  ) THEN
0924:                IF ((AAPP .LT. ROOTBIG).AND.(AAPP .GT. ROOTSFMIN)) THEN
0925:                   AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)
0926:                ELSE
0927:                   T    = ZERO
0928:                   AAPP = ZERO
0929:                   CALL DLASSQ( M, A(1,p), 1, T, AAPP )
0930:                   AAPP = T * DSQRT(AAPP) * WORK(p)
0931:                END IF
0932:                SVA(p) = AAPP
0933:             END IF
0934: *
0935:          ELSE
0936: *        A(:,p) and A(:,q) already numerically orthogonal
0937:             IF ( ir1 .EQ. 0 ) NOTROT   = NOTROT + 1
0938: *[RTD]      SKIPPED  = SKIPPED  + 1
0939:             PSKIPPED = PSKIPPED + 1
0940:          END IF
0941:       ELSE
0942: *        A(:,q) is zero column
0943:          IF ( ir1. EQ. 0 ) NOTROT = NOTROT + 1
0944:          PSKIPPED = PSKIPPED + 1
0945:       END IF
0946: *
0947:       IF ( ( i .LE. SWBAND ) .AND. ( PSKIPPED .GT. ROWSKIP ) ) THEN
0948:          IF ( ir1 .EQ. 0 ) AAPP = - AAPP
0949:          NOTROT = 0
0950:          GO TO 2103
0951:       END IF
0952: *
0953:  2002 CONTINUE
0954: *     END q-LOOP
0955: *
0956:  2103 CONTINUE
0957: *     bailed out of q-loop
0958: *
0959:       SVA(p) = AAPP
0960: *
0961:       ELSE
0962:          SVA(p) = AAPP
0963:          IF ( ( ir1 .EQ. 0 ) .AND. (AAPP .EQ. ZERO) )
0964:      &        NOTROT=NOTROT+MIN0(igl+KBL-1,N)-p
0965:       END IF
0966: *
0967:  2001 CONTINUE
0968: *     end of the p-loop
0969: *     end of doing the block ( ibr, ibr )
0970:  1002 CONTINUE
0971: *     end of ir1-loop
0972: *
0973: * ... go to the off diagonal blocks
0974: *
0975:       igl = ( ibr - 1 ) * KBL + 1
0976: *
0977:       DO 2010 jbc = ibr + 1, NBL
0978: *
0979:          jgl = ( jbc - 1 ) * KBL + 1
0980: *
0981: *        doing the block at ( ibr, jbc )
0982: *
0983:          IJBLSK = 0
0984:          DO 2100 p = igl, MIN0( igl + KBL - 1, N )
0985: *
0986:          AAPP = SVA(p)
0987:          IF ( AAPP .GT. ZERO ) THEN
0988: *
0989:          PSKIPPED = 0
0990: *
0991:          DO 2200 q = jgl, MIN0( jgl + KBL - 1, N )
0992: *
0993:          AAQQ = SVA(q)
0994:          IF ( AAQQ .GT. ZERO ) THEN
0995:             AAPP0 = AAPP
0996: *
0997: *     -#- M x 2 Jacobi SVD -#-
0998: *
0999: *        Safe Gram matrix computation
1000: *
1001:          IF ( AAQQ .GE. ONE ) THEN
1002:             IF ( AAPP .GE. AAQQ ) THEN
1003:                ROTOK = ( SMALL*AAPP ) .LE. AAQQ
1004:             ELSE
1005:                ROTOK = ( SMALL*AAQQ ) .LE. AAPP
1006:             END IF
1007:             IF ( AAPP .LT. ( BIG / AAQQ ) ) THEN
1008:                AAPQ = ( DDOT(M, A(1,p), 1, A(1,q), 1 ) *
1009:      &                  WORK(p) * WORK(q) / AAQQ ) / AAPP
1010:             ELSE
1011:                CALL DCOPY( M, A(1,p), 1, WORK(N+1), 1 )
1012:                CALL DLASCL( 'G', 0, 0, AAPP, WORK(p), M,
1013:      &              1, WORK(N+1), LDA, IERR )
1014:                AAPQ = DDOT( M, WORK(N+1), 1, A(1,q), 1 ) *
1015:      &                WORK(q) / AAQQ
1016:             END IF
1017:          ELSE
1018:             IF ( AAPP .GE. AAQQ ) THEN
1019:                ROTOK = AAPP .LE. ( AAQQ / SMALL )
1020:             ELSE
1021:                ROTOK = AAQQ .LE. ( AAPP / SMALL )
1022:             END IF
1023:             IF ( AAPP .GT.  ( SMALL / AAQQ ) ) THEN
1024:                AAPQ = ( DDOT( M, A(1,p), 1, A(1,q), 1 ) *
1025:      &               WORK(p) * WORK(q) / AAQQ ) / AAPP
1026:             ELSE
1027:                CALL DCOPY( M, A(1,q), 1, WORK(N+1), 1 )
1028:                CALL DLASCL( 'G', 0, 0, AAQQ, WORK(q), M, 1,
1029:      &              WORK(N+1), LDA, IERR )
1030:                AAPQ = DDOT(M,WORK(N+1),1,A(1,p),1) * WORK(p) / AAPP
1031:             END IF
1032:          END IF
1033: *
1034:          MXAAPQ = DMAX1( MXAAPQ, DABS(AAPQ) )
1035: *
1036: *        TO rotate or NOT to rotate, THAT is the question ...
1037: *
1038:          IF ( DABS( AAPQ ) .GT. TOL ) THEN
1039:             NOTROT   = 0
1040: *[RTD]      ROTATED  = ROTATED + 1
1041:             PSKIPPED = 0
1042:             ISWROT   = ISWROT  + 1
1043: *
1044:             IF ( ROTOK ) THEN
1045: *
1046:                AQOAP = AAQQ / AAPP
1047:                APOAQ = AAPP / AAQQ
1048:                THETA = - HALF * DABS( AQOAP - APOAQ ) / AAPQ
1049:                IF ( AAQQ .GT. AAPP0 ) THETA = - THETA
1050: *
1051:                IF ( DABS( THETA ) .GT. BIGTHETA ) THEN
1052:                   T = HALF / THETA
1053:                   FASTR(3) =  T * WORK(p) / WORK(q)
1054:                   FASTR(4) = -T * WORK(q) / WORK(p)
1055:                   CALL DROTM( M,   A(1,p), 1, A(1,q), 1, FASTR )
1056:                   IF ( RSVEC )
1057:      &            CALL DROTM( MVL, V(1,p), 1, V(1,q), 1, FASTR )
1058:                   SVA(q) = AAQQ*DSQRT( DMAX1(ZERO,ONE + T*APOAQ*AAPQ) )
1059:                   AAPP   = AAPP*DSQRT( DMAX1(ZERO,ONE - T*AQOAP*AAPQ) )
1060:                   MXSINJ = DMAX1( MXSINJ, DABS(T) )
1061:                ELSE
1062: *
1063: *                 .. choose correct signum for THETA and rotate
1064: *
1065:                   THSIGN = - DSIGN(ONE,AAPQ)
1066:                   IF ( AAQQ .GT. AAPP0 ) THSIGN = - THSIGN
1067:                   T  = ONE / ( THETA + THSIGN*DSQRT(ONE+THETA*THETA) )
1068:                   CS = DSQRT( ONE / ( ONE + T*T ) )
1069:                   SN = T * CS
1070:                   MXSINJ = DMAX1( MXSINJ, DABS(SN) )
1071:                   SVA(q) = AAQQ*DSQRT( DMAX1(ZERO, ONE+T*APOAQ*AAPQ) )
1072:                   AAPP   = AAPP*DSQRT( ONE - T*AQOAP*AAPQ)
1073: *
1074:                   APOAQ = WORK(p) / WORK(q)
1075:                   AQOAP = WORK(q) / WORK(p)
1076:                   IF ( WORK(p) .GE. ONE ) THEN
1077: *
1078:                      IF ( WORK(q) .GE.  ONE ) THEN
1079:                         FASTR(3) =   T * APOAQ
1080:                         FASTR(4) = - T * AQOAP
1081:                         WORK(p)  = WORK(p) * CS
1082:                         WORK(q)  = WORK(q) * CS
1083:                         CALL DROTM( M,   A(1,p),1, A(1,q),1, FASTR )
1084:                         IF ( RSVEC )
1085:      &                  CALL DROTM( MVL, V(1,p),1, V(1,q),1, FASTR )
1086:                      ELSE
1087:                         CALL DAXPY( M,    -T*AQOAP, A(1,q),1, A(1,p),1 )
1088:                         CALL DAXPY( M, CS*SN*APOAQ, A(1,p),1, A(1,q),1 )
1089:                         IF ( RSVEC ) THEN
1090:                         CALL DAXPY( MVL, -T*AQOAP,  V(1,q),1, V(1,p),1 )
1091:                         CALL DAXPY( MVL,CS*SN*APOAQ,V(1,p),1, V(1,q),1 )
1092:                         END IF
1093:                         WORK(p) = WORK(p) * CS
1094:                         WORK(q) = WORK(q) / CS
1095:                      END IF
1096:                   ELSE
1097:                      IF ( WORK(q) .GE. ONE ) THEN
1098:                         CALL DAXPY( M,     T*APOAQ, A(1,p),1, A(1,q),1 )
1099:                         CALL DAXPY( M,-CS*SN*AQOAP, A(1,q),1, A(1,p),1 )
1100:                         IF ( RSVEC ) THEN
1101:                         CALL DAXPY(MVL,T*APOAQ,     V(1,p),1, V(1,q),1 )
1102:                         CALL DAXPY(MVL,-CS*SN*AQOAP,V(1,q),1, V(1,p),1 )
1103:                         END IF
1104:                         WORK(p) = WORK(p) / CS
1105:                         WORK(q) = WORK(q) * CS
1106:                      ELSE
1107:                         IF ( WORK(p) .GE. WORK(q) ) THEN
1108:                            CALL DAXPY( M,-T*AQOAP,   A(1,q),1,A(1,p),1 )
1109:                            CALL DAXPY( M,CS*SN*APOAQ,A(1,p),1,A(1,q),1 )
1110:                            WORK(p) = WORK(p) * CS
1111:                            WORK(q) = WORK(q) / CS
1112:                            IF ( RSVEC ) THEN
1113:                            CALL DAXPY( MVL, -T*AQOAP, V(1,q),1,V(1,p),1)
1114:                            CALL DAXPY(MVL,CS*SN*APOAQ,V(1,p),1,V(1,q),1)
1115:                            END IF
1116:                         ELSE
1117:                            CALL DAXPY(M, T*APOAQ,    A(1,p),1,A(1,q),1)
1118:                            CALL DAXPY(M,-CS*SN*AQOAP,A(1,q),1,A(1,p),1)
1119:                            WORK(p) = WORK(p) / CS
1120:                            WORK(q) = WORK(q) * CS
1121:                           IF ( RSVEC ) THEN
1122:                           CALL DAXPY(MVL, T*APOAQ,    V(1,p),1,V(1,q),1)
1123:                           CALL DAXPY(MVL,-CS*SN*AQOAP,V(1,q),1,V(1,p),1)
1124:                           END IF
1125:                         END IF
1126:                      END IF
1127:                   ENDIF
1128:                END IF
1129: *
1130:             ELSE
1131:                IF ( AAPP .GT. AAQQ ) THEN
1132:                   CALL DCOPY( M, A(1,p), 1, WORK(N+1), 1 )
1133:                   CALL DLASCL('G',0,0,AAPP,ONE,M,1,WORK(N+1),LDA,IERR)
1134:                   CALL DLASCL('G',0,0,AAQQ,ONE,M,1,   A(1,q),LDA,IERR)
1135:                   TEMP1 = -AAPQ * WORK(p) / WORK(q)
1136:                   CALL DAXPY(M,TEMP1,WORK(N+1),1,A(1,q),1)
1137:                   CALL DLASCL('G',0,0,ONE,AAQQ,M,1,A(1,q),LDA,IERR)
1138:                   SVA(q) = AAQQ*DSQRT(DMAX1(ZERO, ONE - AAPQ*AAPQ))
1139:                   MXSINJ = DMAX1( MXSINJ, SFMIN )
1140:                ELSE
1141:                   CALL DCOPY( M, A(1,q), 1, WORK(N+1), 1 )
1142:                   CALL DLASCL('G',0,0,AAQQ,ONE,M,1,WORK(N+1),LDA,IERR)
1143:                   CALL DLASCL('G',0,0,AAPP,ONE,M,1,   A(1,p),LDA,IERR)
1144:                   TEMP1 = -AAPQ * WORK(q) / WORK(p)
1145:                   CALL DAXPY(M,TEMP1,WORK(N+1),1,A(1,p),1)
1146:                   CALL DLASCL('G',0,0,ONE,AAPP,M,1,A(1,p),LDA,IERR)
1147:                   SVA(p) = AAPP*DSQRT(DMAX1(ZERO, ONE - AAPQ*AAPQ))
1148:                   MXSINJ = DMAX1( MXSINJ, SFMIN )
1149:                END IF
1150:             END IF
1151: *           END IF ROTOK THEN ... ELSE
1152: *
1153: *           In the case of cancellation in updating SVA(q)
1154: *           .. recompute SVA(q)
1155:             IF ( (SVA(q) / AAQQ )**2 .LE. ROOTEPS  ) THEN
1156:                IF ((AAQQ .LT. ROOTBIG).AND.(AAQQ .GT. ROOTSFMIN)) THEN
1157:                   SVA(q) = DNRM2( M, A(1,q), 1 ) * WORK(q)
1158:                ELSE
1159:                   T    = ZERO
1160:                   AAQQ = ZERO
1161:                   CALL DLASSQ( M, A(1,q), 1, T, AAQQ )
1162:                   SVA(q) = T * DSQRT(AAQQ) * WORK(q)
1163:                END IF
1164:             END IF
1165:             IF ( (AAPP / AAPP0 )**2 .LE. ROOTEPS  ) THEN
1166:                IF ((AAPP .LT. ROOTBIG).AND.(AAPP .GT. ROOTSFMIN)) THEN
1167:                   AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)
1168:                ELSE
1169:                   T    = ZERO
1170:                   AAPP = ZERO
1171:                   CALL DLASSQ( M, A(1,p), 1, T, AAPP )
1172:                   AAPP = T * DSQRT(AAPP) * WORK(p)
1173:                END IF
1174:                SVA(p) = AAPP
1175:             END IF
1176: *              end of OK rotation
1177:          ELSE
1178:             NOTROT   = NOTROT   + 1
1179: *[RTD]      SKIPPED  = SKIPPED  + 1
1180:             PSKIPPED = PSKIPPED + 1
1181:             IJBLSK   = IJBLSK   + 1
1182:          END IF
1183:       ELSE
1184:          NOTROT   = NOTROT   + 1
1185:          PSKIPPED = PSKIPPED + 1
1186:          IJBLSK   = IJBLSK   + 1
1187:       END IF
1188: *
1189:       IF ( ( i .LE. SWBAND ) .AND. ( IJBLSK .GE. BLSKIP ) ) THEN
1190:          SVA(p) = AAPP
1191:          NOTROT = 0
1192:          GO TO 2011
1193:       END IF
1194:       IF ( ( i .LE. SWBAND ) .AND. ( PSKIPPED .GT. ROWSKIP ) ) THEN
1195:          AAPP = -AAPP
1196:          NOTROT = 0
1197:          GO TO 2203
1198:       END IF
1199: *
1200:  2200    CONTINUE
1201: *        end of the q-loop
1202:  2203    CONTINUE
1203: *
1204:          SVA(p) = AAPP
1205: *
1206:       ELSE
1207: *
1208:          IF ( AAPP .EQ. ZERO ) NOTROT=NOTROT+MIN0(jgl+KBL-1,N)-jgl+1
1209:          IF ( AAPP .LT. ZERO ) NOTROT = 0
1210: *
1211:       END IF
1212: *
1213:  2100 CONTINUE
1214: *     end of the p-loop
1215:  2010 CONTINUE
1216: *     end of the jbc-loop
1217:  2011 CONTINUE
1218: *2011 bailed out of the jbc-loop
1219:       DO 2012 p = igl, MIN0( igl + KBL - 1, N )
1220:          SVA(p) = DABS(SVA(p))
1221:  2012 CONTINUE
1222: ***
1223:  2000 CONTINUE
1224: *2000 :: end of the ibr-loop
1225: *
1226: *     .. update SVA(N)
1227:       IF ((SVA(N) .LT. ROOTBIG).AND.(SVA(N) .GT. ROOTSFMIN)) THEN
1228:          SVA(N) = DNRM2( M, A(1,N), 1 ) * WORK(N)
1229:       ELSE
1230:          T    = ZERO
1231:          AAPP = ZERO
1232:          CALL DLASSQ( M, A(1,N), 1, T, AAPP )
1233:          SVA(N) = T * DSQRT(AAPP) * WORK(N)
1234:       END IF
1235: *
1236: *     Additional steering devices
1237: *
1238:       IF ( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR.
1239:      &     ( ISWROT .LE. N ) ) )
1240:      &   SWBAND = i
1241: *
1242:       IF ( (i .GT. SWBAND+1) .AND. (MXAAPQ .LT. DSQRT(DBLE(N))*TOL)
1243:      &   .AND. (DBLE(N)*MXAAPQ*MXSINJ .LT. TOL) ) THEN
1244:         GO TO 1994
1245:       END IF
1246: *
1247:       IF ( NOTROT .GE. EMPTSW ) GO TO 1994
1248: *
1249:  1993 CONTINUE
1250: *     end i=1:NSWEEP loop
1251: *
1252: * #:( Reaching this point means that the procedure has not converged.
1253:       INFO = NSWEEP - 1
1254:       GO TO 1995
1255: *
1256:  1994 CONTINUE
1257: * #:) Reaching this point means numerical convergence after the i-th
1258: *     sweep.
1259: *
1260:       INFO = 0
1261: * #:) INFO = 0 confirms successful iterations.
1262:  1995 CONTINUE
1263: *
1264: *     Sort the singular values and find how many are above
1265: *     the underflow threshold.
1266: *
1267:       N2 = 0
1268:       N4 = 0
1269:       DO 5991 p = 1, N - 1
1270:          q = IDAMAX( N-p+1, SVA(p), 1 ) + p - 1
1271:          IF ( p .NE. q ) THEN
1272:             TEMP1  = SVA(p)
1273:             SVA(p) = SVA(q)
1274:             SVA(q) = TEMP1
1275:             TEMP1   = WORK(p)
1276:             WORK(p) = WORK(q)
1277:             WORK(q) = TEMP1
1278:             CALL DSWAP( M, A(1,p), 1, A(1,q), 1 )
1279:             IF ( RSVEC ) CALL DSWAP( MVL, V(1,p), 1, V(1,q), 1 )
1280:          END IF
1281:          IF ( SVA(p) .NE. ZERO ) THEN
1282:             N4 = N4 + 1
1283:             IF ( SVA(p)*SCALE .GT. SFMIN ) N2 = N2 + 1
1284:          END IF
1285:  5991 CONTINUE
1286:       IF ( SVA(N) .NE. ZERO ) THEN
1287:          N4 = N4 + 1
1288:          IF ( SVA(N)*SCALE .GT. SFMIN ) N2 = N2 + 1
1289:       END IF
1290: *
1291: *     Normalize the left singular vectors.
1292: *
1293:       IF ( LSVEC .OR. UCTOL ) THEN
1294:          DO 1998 p = 1, N2
1295:             CALL DSCAL( M, WORK(p) / SVA(p), A(1,p), 1 )
1296:  1998    CONTINUE
1297:       END IF
1298: *
1299: *     Scale the product of Jacobi rotations (assemble the fast rotations).
1300: *
1301:       IF ( RSVEC ) THEN
1302:          IF ( APPLV ) THEN
1303:             DO 2398 p = 1, N
1304:                CALL DSCAL( MVL, WORK(p), V(1,p), 1 )
1305:  2398       CONTINUE
1306:          ELSE
1307:             DO 2399 p = 1, N
1308:                TEMP1 = ONE / DNRM2(MVL, V(1,p), 1 )
1309:                CALL DSCAL( MVL, TEMP1, V(1,p), 1 )
1310:  2399       CONTINUE
1311:          END IF
1312:       END IF
1313: *
1314: *     Undo scaling, if necessary (and possible).
1315:       IF ( ((SCALE.GT.ONE).AND.(SVA(1).LT.(BIG/SCALE)))
1316:      & .OR.((SCALE.LT.ONE).AND.(SVA(N2).GT.(SFMIN/SCALE))) ) THEN
1317:          DO 2400 p = 1, N
1318:             SVA(p) = SCALE*SVA(p)
1319:  2400    CONTINUE
1320:          SCALE = ONE
1321:       END IF
1322: *
1323:       WORK(1) = SCALE
1324: *     The singular values of A are SCALE*SVA(1:N). If SCALE.NE.ONE
1325: *     then some of the singular values may overflow or underflow and
1326: *     the spectrum is given in this factored representation.
1327: *
1328:       WORK(2) = DBLE(N4)
1329: *     N4 is the number of computed nonzero singular values of A.
1330: *
1331:       WORK(3) = DBLE(N2)
1332: *     N2 is the number of singular values of A greater than SFMIN.
1333: *     If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1334: *     that may carry some information.
1335: *
1336:       WORK(4) = DBLE(i)
1337: *     i is the index of the last sweep before declaring convergence.
1338: *
1339:       WORK(5) = MXAAPQ
1340: *     MXAAPQ is the largest absolute value of scaled pivots in the
1341: *     last sweep
1342: *
1343:       WORK(6) = MXSINJ
1344: *     MXSINJ is the largest absolute value of the sines of Jacobi angles
1345: *     in the last sweep
1346: *
1347:       RETURN
1348: *     ..
1349: *     .. END OF DGESVJ
1350: *     ..
1351:       END
1352: *
1353: