LAPACK  3.10.1
LAPACK: Linear Algebra PACKage

◆ dgesvj()

subroutine dgesvj ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
integer  MV,
double precision, dimension( ldv, * )  V,
integer  LDV,
double precision, dimension( lwork )  WORK,
integer  LWORK,
integer  INFO 
)

DGESVJ

Download DGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGESVJ computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, where M >= N. The SVD of A is written as
                                    [++]   [xx]   [x0]   [xx]
              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
                                    [++]   [xx]
 where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
 matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
 of SIGMA are the singular values of A. The columns of U and V are the
 left and the right singular vectors of A, respectively.
 DGESVJ can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
          Specifies the structure of A.
          = 'L': The input matrix A is lower triangular;
          = 'U': The input matrix A is upper triangular;
          = 'G': The input matrix A is general M-by-N matrix, M >= N.
[in]JOBU
          JOBU is CHARACTER*1
          Specifies whether to compute the left singular vectors
          (columns of U):
          = 'U': The left singular vectors corresponding to the nonzero
                 singular values are computed and returned in the leading
                 columns of A. See more details in the description of A.
                 The default numerical orthogonality threshold is set to
                 approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
          = 'C': Analogous to JOBU='U', except that user can control the
                 level of numerical orthogonality of the computed left
                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
                 CTOL is given on input in the array WORK.
                 No CTOL smaller than ONE is allowed. CTOL greater
                 than 1 / EPS is meaningless. The option 'C'
                 can be used if M*EPS is satisfactory orthogonality
                 of the computed left singular vectors, so CTOL=M could
                 save few sweeps of Jacobi rotations.
                 See the descriptions of A and WORK(1).
          = 'N': The matrix U is not computed. However, see the
                 description of A.
[in]JOBV
          JOBV is CHARACTER*1
          Specifies whether to compute the right singular vectors, that
          is, the matrix V:
          = 'V':  the matrix V is computed and returned in the array V
          = 'A':  the Jacobi rotations are applied to the MV-by-N
                  array V. In other words, the right singular vector
                  matrix V is not computed explicitly, instead it is
                  applied to an MV-by-N matrix initially stored in the
                  first MV rows of V.
          = 'N':  the matrix V is not computed and the array V is not
                  referenced
[in]M
          M is INTEGER
          The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.
          M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit :
          If JOBU = 'U' .OR. JOBU = 'C' :
                 If INFO = 0 :
                 RANKA orthonormal columns of U are returned in the
                 leading RANKA columns of the array A. Here RANKA <= N
                 is the number of computed singular values of A that are
                 above the underflow threshold DLAMCH('S'). The singular
                 vectors corresponding to underflowed or zero singular
                 values are not computed. The value of RANKA is returned
                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
                 descriptions of SVA and WORK. The computed columns of U
                 are mutually numerically orthogonal up to approximately
                 TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
                 see the description of JOBU.
                 If INFO > 0 :
                 the procedure DGESVJ did not converge in the given number
                 of iterations (sweeps). In that case, the computed
                 columns of U may not be orthogonal up to TOL. The output
                 U (stored in A), SIGMA (given by the computed singular
                 values in SVA(1:N)) and V is still a decomposition of the
                 input matrix A in the sense that the residual
                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.

          If JOBU = 'N' :
                 If INFO = 0 :
                 Note that the left singular vectors are 'for free' in the
                 one-sided Jacobi SVD algorithm. However, if only the
                 singular values are needed, the level of numerical
                 orthogonality of U is not an issue and iterations are
                 stopped when the columns of the iterated matrix are
                 numerically orthogonal up to approximately M*EPS. Thus,
                 on exit, A contains the columns of U scaled with the
                 corresponding singular values.
                 If INFO > 0 :
                 the procedure DGESVJ did not converge in the given number
                 of iterations (sweeps).
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit :
          If INFO = 0 :
          depending on the value SCALE = WORK(1), we have:
                 If SCALE = ONE :
                 SVA(1:N) contains the computed singular values of A.
                 During the computation SVA contains the Euclidean column
                 norms of the iterated matrices in the array A.
                 If SCALE .NE. ONE :
                 The singular values of A are SCALE*SVA(1:N), and this
                 factored representation is due to the fact that some of the
                 singular values of A might underflow or overflow.
          If INFO > 0 :
          the procedure DGESVJ did not converge in the given number of
          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
[in]MV
          MV is INTEGER
          If JOBV = 'A', then the product of Jacobi rotations in DGESVJ
          is applied to the first MV rows of V. See the description of JOBV.
[in,out]V
          V is DOUBLE PRECISION array, dimension (LDV,N)
          If JOBV = 'V', then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'A', then V contains the product of the computed right
                         singular vector matrix and the initial matrix in
                         the array V.
          If JOBV = 'N', then V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V, LDV >= 1.
          If JOBV = 'V', then LDV >= max(1,N).
          If JOBV = 'A', then LDV >= max(1,MV) .
[in,out]WORK
          WORK is DOUBLE PRECISION array, dimension (LWORK)
          On entry :
          If JOBU = 'C' :
          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
                    The process stops if all columns of A are mutually
                    orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
                    It is required that CTOL >= ONE, i.e. it is not
                    allowed to force the routine to obtain orthogonality
                    below EPS.
          On exit :
          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
                    are the computed singular values of A.
                    (See description of SVA().)
          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
                    singular values.
          WORK(3) = NINT(WORK(3)) is the number of the computed singular
                    values that are larger than the underflow threshold.
          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
                    rotations needed for numerical convergence.
          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
                    This is useful information in cases when DGESVJ did
                    not converge, as it can be used to estimate whether
                    the output is still useful and for post festum analysis.
          WORK(6) = the largest absolute value over all sines of the
                    Jacobi rotation angles in the last sweep. It can be
                    useful for a post festum analysis.
[in]LWORK
          LWORK is INTEGER
          length of WORK, WORK >= MAX(6,M+N)
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, then the i-th argument had an illegal value
          > 0:  DGESVJ did not converge in the maximal allowed number (30)
                of sweeps. The output may still be useful. See the
                description of WORK.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
  The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
  rotations. The rotations are implemented as fast scaled rotations of
  Anda and Park [1]. In the case of underflow of the Jacobi angle, a
  modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
  column interchanges of de Rijk [2]. The relative accuracy of the computed
  singular values and the accuracy of the computed singular vectors (in
  angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
  The condition number that determines the accuracy in the full rank case
  is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
  spectral condition number. The best performance of this Jacobi SVD
  procedure is achieved if used in an  accelerated version of Drmac and
  Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
  Some tuning parameters (marked with [TP]) are available for the
  implementer.
  The computational range for the nonzero singular values is the  machine
  number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
  denormalized singular values can be computed with the corresponding
  gradual loss of accurate digits.
Contributors:
  ============

  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
     SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
 [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
     singular value decomposition on a vector computer.
     SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
 [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
 [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
     value computation in floating point arithmetic.
     SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
 [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
  ===========================
  Please report all bugs and send interesting test examples and comments to
  drmac@math.hr. Thank you.

Definition at line 335 of file dgesvj.f.

337 *
338 * -- LAPACK computational routine --
339 * -- LAPACK is a software package provided by Univ. of Tennessee, --
340 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
341 *
342 * .. Scalar Arguments ..
343  INTEGER INFO, LDA, LDV, LWORK, M, MV, N
344  CHARACTER*1 JOBA, JOBU, JOBV
345 * ..
346 * .. Array Arguments ..
347  DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
348  $ WORK( LWORK )
349 * ..
350 *
351 * =====================================================================
352 *
353 * .. Local Parameters ..
354  DOUBLE PRECISION ZERO, HALF, ONE
355  parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
356  INTEGER NSWEEP
357  parameter( nsweep = 30 )
358 * ..
359 * .. Local Scalars ..
360  DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
361  $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
362  $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
363  $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
364  $ THSIGN, TOL
365  INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
366  $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
367  $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
368  $ SWBAND
369  LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
370  $ RSVEC, UCTOL, UPPER
371 * ..
372 * .. Local Arrays ..
373  DOUBLE PRECISION FASTR( 5 )
374 * ..
375 * .. Intrinsic Functions ..
376  INTRINSIC dabs, max, min, dble, dsign, dsqrt
377 * ..
378 * .. External Functions ..
379 * ..
380 * from BLAS
381  DOUBLE PRECISION DDOT, DNRM2
382  EXTERNAL ddot, dnrm2
383  INTEGER IDAMAX
384  EXTERNAL idamax
385 * from LAPACK
386  DOUBLE PRECISION DLAMCH
387  EXTERNAL dlamch
388  LOGICAL LSAME
389  EXTERNAL lsame
390 * ..
391 * .. External Subroutines ..
392 * ..
393 * from BLAS
394  EXTERNAL daxpy, dcopy, drotm, dscal, dswap
395 * from LAPACK
396  EXTERNAL dlascl, dlaset, dlassq, xerbla
397 *
398  EXTERNAL dgsvj0, dgsvj1
399 * ..
400 * .. Executable Statements ..
401 *
402 * Test the input arguments
403 *
404  lsvec = lsame( jobu, 'U' )
405  uctol = lsame( jobu, 'C' )
406  rsvec = lsame( jobv, 'V' )
407  applv = lsame( jobv, 'A' )
408  upper = lsame( joba, 'U' )
409  lower = lsame( joba, 'L' )
410 *
411  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
412  info = -1
413  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
414  info = -2
415  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
416  info = -3
417  ELSE IF( m.LT.0 ) THEN
418  info = -4
419  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
420  info = -5
421  ELSE IF( lda.LT.m ) THEN
422  info = -7
423  ELSE IF( mv.LT.0 ) THEN
424  info = -9
425  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
426  $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
427  info = -11
428  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
429  info = -12
430  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
431  info = -13
432  ELSE
433  info = 0
434  END IF
435 *
436 * #:(
437  IF( info.NE.0 ) THEN
438  CALL xerbla( 'DGESVJ', -info )
439  RETURN
440  END IF
441 *
442 * #:) Quick return for void matrix
443 *
444  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
445 *
446 * Set numerical parameters
447 * The stopping criterion for Jacobi rotations is
448 *
449 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
450 *
451 * where EPS is the round-off and CTOL is defined as follows:
452 *
453  IF( uctol ) THEN
454 * ... user controlled
455  ctol = work( 1 )
456  ELSE
457 * ... default
458  IF( lsvec .OR. rsvec .OR. applv ) THEN
459  ctol = dsqrt( dble( m ) )
460  ELSE
461  ctol = dble( m )
462  END IF
463  END IF
464 * ... and the machine dependent parameters are
465 *[!] (Make sure that DLAMCH() works properly on the target machine.)
466 *
467  epsln = dlamch( 'Epsilon' )
468  rooteps = dsqrt( epsln )
469  sfmin = dlamch( 'SafeMinimum' )
470  rootsfmin = dsqrt( sfmin )
471  small = sfmin / epsln
472  big = dlamch( 'Overflow' )
473 * BIG = ONE / SFMIN
474  rootbig = one / rootsfmin
475  large = big / dsqrt( dble( m*n ) )
476  bigtheta = one / rooteps
477 *
478  tol = ctol*epsln
479  roottol = dsqrt( tol )
480 *
481  IF( dble( m )*epsln.GE.one ) THEN
482  info = -4
483  CALL xerbla( 'DGESVJ', -info )
484  RETURN
485  END IF
486 *
487 * Initialize the right singular vector matrix.
488 *
489  IF( rsvec ) THEN
490  mvl = n
491  CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
492  ELSE IF( applv ) THEN
493  mvl = mv
494  END IF
495  rsvec = rsvec .OR. applv
496 *
497 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
498 *(!) If necessary, scale A to protect the largest singular value
499 * from overflow. It is possible that saving the largest singular
500 * value destroys the information about the small ones.
501 * This initial scaling is almost minimal in the sense that the
502 * goal is to make sure that no column norm overflows, and that
503 * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
504 * in A are detected, the procedure returns with INFO=-6.
505 *
506  skl= one / dsqrt( dble( m )*dble( n ) )
507  noscale = .true.
508  goscale = .true.
509 *
510  IF( lower ) THEN
511 * the input matrix is M-by-N lower triangular (trapezoidal)
512  DO 1874 p = 1, n
513  aapp = zero
514  aaqq = one
515  CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
516  IF( aapp.GT.big ) THEN
517  info = -6
518  CALL xerbla( 'DGESVJ', -info )
519  RETURN
520  END IF
521  aaqq = dsqrt( aaqq )
522  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
523  sva( p ) = aapp*aaqq
524  ELSE
525  noscale = .false.
526  sva( p ) = aapp*( aaqq*skl)
527  IF( goscale ) THEN
528  goscale = .false.
529  DO 1873 q = 1, p - 1
530  sva( q ) = sva( q )*skl
531  1873 CONTINUE
532  END IF
533  END IF
534  1874 CONTINUE
535  ELSE IF( upper ) THEN
536 * the input matrix is M-by-N upper triangular (trapezoidal)
537  DO 2874 p = 1, n
538  aapp = zero
539  aaqq = one
540  CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
541  IF( aapp.GT.big ) THEN
542  info = -6
543  CALL xerbla( 'DGESVJ', -info )
544  RETURN
545  END IF
546  aaqq = dsqrt( aaqq )
547  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
548  sva( p ) = aapp*aaqq
549  ELSE
550  noscale = .false.
551  sva( p ) = aapp*( aaqq*skl)
552  IF( goscale ) THEN
553  goscale = .false.
554  DO 2873 q = 1, p - 1
555  sva( q ) = sva( q )*skl
556  2873 CONTINUE
557  END IF
558  END IF
559  2874 CONTINUE
560  ELSE
561 * the input matrix is M-by-N general dense
562  DO 3874 p = 1, n
563  aapp = zero
564  aaqq = one
565  CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
566  IF( aapp.GT.big ) THEN
567  info = -6
568  CALL xerbla( 'DGESVJ', -info )
569  RETURN
570  END IF
571  aaqq = dsqrt( aaqq )
572  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
573  sva( p ) = aapp*aaqq
574  ELSE
575  noscale = .false.
576  sva( p ) = aapp*( aaqq*skl)
577  IF( goscale ) THEN
578  goscale = .false.
579  DO 3873 q = 1, p - 1
580  sva( q ) = sva( q )*skl
581  3873 CONTINUE
582  END IF
583  END IF
584  3874 CONTINUE
585  END IF
586 *
587  IF( noscale )skl= one
588 *
589 * Move the smaller part of the spectrum from the underflow threshold
590 *(!) Start by determining the position of the nonzero entries of the
591 * array SVA() relative to ( SFMIN, BIG ).
592 *
593  aapp = zero
594  aaqq = big
595  DO 4781 p = 1, n
596  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
597  aapp = max( aapp, sva( p ) )
598  4781 CONTINUE
599 *
600 * #:) Quick return for zero matrix
601 *
602  IF( aapp.EQ.zero ) THEN
603  IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
604  work( 1 ) = one
605  work( 2 ) = zero
606  work( 3 ) = zero
607  work( 4 ) = zero
608  work( 5 ) = zero
609  work( 6 ) = zero
610  RETURN
611  END IF
612 *
613 * #:) Quick return for one-column matrix
614 *
615  IF( n.EQ.1 ) THEN
616  IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
617  $ a( 1, 1 ), lda, ierr )
618  work( 1 ) = one / skl
619  IF( sva( 1 ).GE.sfmin ) THEN
620  work( 2 ) = one
621  ELSE
622  work( 2 ) = zero
623  END IF
624  work( 3 ) = zero
625  work( 4 ) = zero
626  work( 5 ) = zero
627  work( 6 ) = zero
628  RETURN
629  END IF
630 *
631 * Protect small singular values from underflow, and try to
632 * avoid underflows/overflows in computing Jacobi rotations.
633 *
634  sn = dsqrt( sfmin / epsln )
635  temp1 = dsqrt( big / dble( n ) )
636  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
637  $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
638  temp1 = min( big, temp1 / aapp )
639 * AAQQ = AAQQ*TEMP1
640 * AAPP = AAPP*TEMP1
641  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
642  temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
643 * AAQQ = AAQQ*TEMP1
644 * AAPP = AAPP*TEMP1
645  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
646  temp1 = max( sn / aaqq, temp1 / aapp )
647 * AAQQ = AAQQ*TEMP1
648 * AAPP = AAPP*TEMP1
649  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
650  temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
651 * AAQQ = AAQQ*TEMP1
652 * AAPP = AAPP*TEMP1
653  ELSE
654  temp1 = one
655  END IF
656 *
657 * Scale, if necessary
658 *
659  IF( temp1.NE.one ) THEN
660  CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
661  END IF
662  skl= temp1*skl
663  IF( skl.NE.one ) THEN
664  CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
665  skl= one / skl
666  END IF
667 *
668 * Row-cyclic Jacobi SVD algorithm with column pivoting
669 *
670  emptsw = ( n*( n-1 ) ) / 2
671  notrot = 0
672  fastr( 1 ) = zero
673 *
674 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
675 * is initialized to identity. WORK is updated during fast scaled
676 * rotations.
677 *
678  DO 1868 q = 1, n
679  work( q ) = one
680  1868 CONTINUE
681 *
682 *
683  swband = 3
684 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
685 * if DGESVJ is used as a computational routine in the preconditioned
686 * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
687 * works on pivots inside a band-like region around the diagonal.
688 * The boundaries are determined dynamically, based on the number of
689 * pivots above a threshold.
690 *
691  kbl = min( 8, n )
692 *[TP] KBL is a tuning parameter that defines the tile size in the
693 * tiling of the p-q loops of pivot pairs. In general, an optimal
694 * value of KBL depends on the matrix dimensions and on the
695 * parameters of the computer's memory.
696 *
697  nbl = n / kbl
698  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
699 *
700  blskip = kbl**2
701 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
702 *
703  rowskip = min( 5, kbl )
704 *[TP] ROWSKIP is a tuning parameter.
705 *
706  lkahead = 1
707 *[TP] LKAHEAD is a tuning parameter.
708 *
709 * Quasi block transformations, using the lower (upper) triangular
710 * structure of the input matrix. The quasi-block-cycling usually
711 * invokes cubic convergence. Big part of this cycle is done inside
712 * canonical subspaces of dimensions less than M.
713 *
714  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
715 *[TP] The number of partition levels and the actual partition are
716 * tuning parameters.
717  n4 = n / 4
718  n2 = n / 2
719  n34 = 3*n4
720  IF( applv ) THEN
721  q = 0
722  ELSE
723  q = 1
724  END IF
725 *
726  IF( lower ) THEN
727 *
728 * This works very well on lower triangular matrices, in particular
729 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
730 * The idea is simple:
731 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
732 * [+ + 0 0] [0 0]
733 * [+ + x 0] actually work on [x 0] [x 0]
734 * [+ + x x] [x x]. [x x]
735 *
736  CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
737  $ work( n34+1 ), sva( n34+1 ), mvl,
738  $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
739  $ 2, work( n+1 ), lwork-n, ierr )
740 *
741  CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
742  $ work( n2+1 ), sva( n2+1 ), mvl,
743  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
744  $ work( n+1 ), lwork-n, ierr )
745 *
746  CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
747  $ work( n2+1 ), sva( n2+1 ), mvl,
748  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
749  $ work( n+1 ), lwork-n, ierr )
750 *
751  CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
752  $ work( n4+1 ), sva( n4+1 ), mvl,
753  $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
754  $ work( n+1 ), lwork-n, ierr )
755 *
756  CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
757  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
758  $ ierr )
759 *
760  CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
761  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
762  $ lwork-n, ierr )
763 *
764 *
765  ELSE IF( upper ) THEN
766 *
767 *
768  CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
769  $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
770  $ ierr )
771 *
772  CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
773  $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
774  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
775  $ ierr )
776 *
777  CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
778  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
779  $ lwork-n, ierr )
780 *
781  CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
782  $ work( n2+1 ), sva( n2+1 ), mvl,
783  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
784  $ work( n+1 ), lwork-n, ierr )
785 
786  END IF
787 *
788  END IF
789 *
790 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
791 *
792  DO 1993 i = 1, nsweep
793 *
794 * .. go go go ...
795 *
796  mxaapq = zero
797  mxsinj = zero
798  iswrot = 0
799 *
800  notrot = 0
801  pskipped = 0
802 *
803 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
804 * 1 <= p < q <= N. This is the first step toward a blocked implementation
805 * of the rotations. New implementation, based on block transformations,
806 * is under development.
807 *
808  DO 2000 ibr = 1, nbl
809 *
810  igl = ( ibr-1 )*kbl + 1
811 *
812  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
813 *
814  igl = igl + ir1*kbl
815 *
816  DO 2001 p = igl, min( igl+kbl-1, n-1 )
817 *
818 * .. de Rijk's pivoting
819 *
820  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
821  IF( p.NE.q ) THEN
822  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
823  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
824  $ v( 1, q ), 1 )
825  temp1 = sva( p )
826  sva( p ) = sva( q )
827  sva( q ) = temp1
828  temp1 = work( p )
829  work( p ) = work( q )
830  work( q ) = temp1
831  END IF
832 *
833  IF( ir1.EQ.0 ) THEN
834 *
835 * Column norms are periodically updated by explicit
836 * norm computation.
837 * Caveat:
838 * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
839 * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
840 * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
841 * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
842 * Hence, DNRM2 cannot be trusted, not even in the case when
843 * the true norm is far from the under(over)flow boundaries.
844 * If properly implemented DNRM2 is available, the IF-THEN-ELSE
845 * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
846 *
847  IF( ( sva( p ).LT.rootbig ) .AND.
848  $ ( sva( p ).GT.rootsfmin ) ) THEN
849  sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
850  ELSE
851  temp1 = zero
852  aapp = one
853  CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
854  sva( p ) = temp1*dsqrt( aapp )*work( p )
855  END IF
856  aapp = sva( p )
857  ELSE
858  aapp = sva( p )
859  END IF
860 *
861  IF( aapp.GT.zero ) THEN
862 *
863  pskipped = 0
864 *
865  DO 2002 q = p + 1, min( igl+kbl-1, n )
866 *
867  aaqq = sva( q )
868 *
869  IF( aaqq.GT.zero ) THEN
870 *
871  aapp0 = aapp
872  IF( aaqq.GE.one ) THEN
873  rotok = ( small*aapp ).LE.aaqq
874  IF( aapp.LT.( big / aaqq ) ) THEN
875  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
876  $ q ), 1 )*work( p )*work( q ) /
877  $ aaqq ) / aapp
878  ELSE
879  CALL dcopy( m, a( 1, p ), 1,
880  $ work( n+1 ), 1 )
881  CALL dlascl( 'G', 0, 0, aapp,
882  $ work( p ), m, 1,
883  $ work( n+1 ), lda, ierr )
884  aapq = ddot( m, work( n+1 ), 1,
885  $ a( 1, q ), 1 )*work( q ) / aaqq
886  END IF
887  ELSE
888  rotok = aapp.LE.( aaqq / small )
889  IF( aapp.GT.( small / aaqq ) ) THEN
890  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
891  $ q ), 1 )*work( p )*work( q ) /
892  $ aaqq ) / aapp
893  ELSE
894  CALL dcopy( m, a( 1, q ), 1,
895  $ work( n+1 ), 1 )
896  CALL dlascl( 'G', 0, 0, aaqq,
897  $ work( q ), m, 1,
898  $ work( n+1 ), lda, ierr )
899  aapq = ddot( m, work( n+1 ), 1,
900  $ a( 1, p ), 1 )*work( p ) / aapp
901  END IF
902  END IF
903 *
904  mxaapq = max( mxaapq, dabs( aapq ) )
905 *
906 * TO rotate or NOT to rotate, THAT is the question ...
907 *
908  IF( dabs( aapq ).GT.tol ) THEN
909 *
910 * .. rotate
911 *[RTD] ROTATED = ROTATED + ONE
912 *
913  IF( ir1.EQ.0 ) THEN
914  notrot = 0
915  pskipped = 0
916  iswrot = iswrot + 1
917  END IF
918 *
919  IF( rotok ) THEN
920 *
921  aqoap = aaqq / aapp
922  apoaq = aapp / aaqq
923  theta = -half*dabs(aqoap-apoaq)/aapq
924 *
925  IF( dabs( theta ).GT.bigtheta ) THEN
926 *
927  t = half / theta
928  fastr( 3 ) = t*work( p ) / work( q )
929  fastr( 4 ) = -t*work( q ) /
930  $ work( p )
931  CALL drotm( m, a( 1, p ), 1,
932  $ a( 1, q ), 1, fastr )
933  IF( rsvec )CALL drotm( mvl,
934  $ v( 1, p ), 1,
935  $ v( 1, q ), 1,
936  $ fastr )
937  sva( q ) = aaqq*dsqrt( max( zero,
938  $ one+t*apoaq*aapq ) )
939  aapp = aapp*dsqrt( max( zero,
940  $ one-t*aqoap*aapq ) )
941  mxsinj = max( mxsinj, dabs( t ) )
942 *
943  ELSE
944 *
945 * .. choose correct signum for THETA and rotate
946 *
947  thsign = -dsign( one, aapq )
948  t = one / ( theta+thsign*
949  $ dsqrt( one+theta*theta ) )
950  cs = dsqrt( one / ( one+t*t ) )
951  sn = t*cs
952 *
953  mxsinj = max( mxsinj, dabs( sn ) )
954  sva( q ) = aaqq*dsqrt( max( zero,
955  $ one+t*apoaq*aapq ) )
956  aapp = aapp*dsqrt( max( zero,
957  $ one-t*aqoap*aapq ) )
958 *
959  apoaq = work( p ) / work( q )
960  aqoap = work( q ) / work( p )
961  IF( work( p ).GE.one ) THEN
962  IF( work( q ).GE.one ) THEN
963  fastr( 3 ) = t*apoaq
964  fastr( 4 ) = -t*aqoap
965  work( p ) = work( p )*cs
966  work( q ) = work( q )*cs
967  CALL drotm( m, a( 1, p ), 1,
968  $ a( 1, q ), 1,
969  $ fastr )
970  IF( rsvec )CALL drotm( mvl,
971  $ v( 1, p ), 1, v( 1, q ),
972  $ 1, fastr )
973  ELSE
974  CALL daxpy( m, -t*aqoap,
975  $ a( 1, q ), 1,
976  $ a( 1, p ), 1 )
977  CALL daxpy( m, cs*sn*apoaq,
978  $ a( 1, p ), 1,
979  $ a( 1, q ), 1 )
980  work( p ) = work( p )*cs
981  work( q ) = work( q ) / cs
982  IF( rsvec ) THEN
983  CALL daxpy( mvl, -t*aqoap,
984  $ v( 1, q ), 1,
985  $ v( 1, p ), 1 )
986  CALL daxpy( mvl,
987  $ cs*sn*apoaq,
988  $ v( 1, p ), 1,
989  $ v( 1, q ), 1 )
990  END IF
991  END IF
992  ELSE
993  IF( work( q ).GE.one ) THEN
994  CALL daxpy( m, t*apoaq,
995  $ a( 1, p ), 1,
996  $ a( 1, q ), 1 )
997  CALL daxpy( m, -cs*sn*aqoap,
998  $ a( 1, q ), 1,
999  $ a( 1, p ), 1 )
1000  work( p ) = work( p ) / cs
1001  work( q ) = work( q )*cs
1002  IF( rsvec ) THEN
1003  CALL daxpy( mvl, t*apoaq,
1004  $ v( 1, p ), 1,
1005  $ v( 1, q ), 1 )
1006  CALL daxpy( mvl,
1007  $ -cs*sn*aqoap,
1008  $ v( 1, q ), 1,
1009  $ v( 1, p ), 1 )
1010  END IF
1011  ELSE
1012  IF( work( p ).GE.work( q ) )
1013  $ THEN
1014  CALL daxpy( m, -t*aqoap,
1015  $ a( 1, q ), 1,
1016  $ a( 1, p ), 1 )
1017  CALL daxpy( m, cs*sn*apoaq,
1018  $ a( 1, p ), 1,
1019  $ a( 1, q ), 1 )
1020  work( p ) = work( p )*cs
1021  work( q ) = work( q ) / cs
1022  IF( rsvec ) THEN
1023  CALL daxpy( mvl,
1024  $ -t*aqoap,
1025  $ v( 1, q ), 1,
1026  $ v( 1, p ), 1 )
1027  CALL daxpy( mvl,
1028  $ cs*sn*apoaq,
1029  $ v( 1, p ), 1,
1030  $ v( 1, q ), 1 )
1031  END IF
1032  ELSE
1033  CALL daxpy( m, t*apoaq,
1034  $ a( 1, p ), 1,
1035  $ a( 1, q ), 1 )
1036  CALL daxpy( m,
1037  $ -cs*sn*aqoap,
1038  $ a( 1, q ), 1,
1039  $ a( 1, p ), 1 )
1040  work( p ) = work( p ) / cs
1041  work( q ) = work( q )*cs
1042  IF( rsvec ) THEN
1043  CALL daxpy( mvl,
1044  $ t*apoaq, v( 1, p ),
1045  $ 1, v( 1, q ), 1 )
1046  CALL daxpy( mvl,
1047  $ -cs*sn*aqoap,
1048  $ v( 1, q ), 1,
1049  $ v( 1, p ), 1 )
1050  END IF
1051  END IF
1052  END IF
1053  END IF
1054  END IF
1055 *
1056  ELSE
1057 * .. have to use modified Gram-Schmidt like transformation
1058  CALL dcopy( m, a( 1, p ), 1,
1059  $ work( n+1 ), 1 )
1060  CALL dlascl( 'G', 0, 0, aapp, one, m,
1061  $ 1, work( n+1 ), lda,
1062  $ ierr )
1063  CALL dlascl( 'G', 0, 0, aaqq, one, m,
1064  $ 1, a( 1, q ), lda, ierr )
1065  temp1 = -aapq*work( p ) / work( q )
1066  CALL daxpy( m, temp1, work( n+1 ), 1,
1067  $ a( 1, q ), 1 )
1068  CALL dlascl( 'G', 0, 0, one, aaqq, m,
1069  $ 1, a( 1, q ), lda, ierr )
1070  sva( q ) = aaqq*dsqrt( max( zero,
1071  $ one-aapq*aapq ) )
1072  mxsinj = max( mxsinj, sfmin )
1073  END IF
1074 * END IF ROTOK THEN ... ELSE
1075 *
1076 * In the case of cancellation in updating SVA(q), SVA(p)
1077 * recompute SVA(q), SVA(p).
1078 *
1079  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1080  $ THEN
1081  IF( ( aaqq.LT.rootbig ) .AND.
1082  $ ( aaqq.GT.rootsfmin ) ) THEN
1083  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1084  $ work( q )
1085  ELSE
1086  t = zero
1087  aaqq = one
1088  CALL dlassq( m, a( 1, q ), 1, t,
1089  $ aaqq )
1090  sva( q ) = t*dsqrt( aaqq )*work( q )
1091  END IF
1092  END IF
1093  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1094  IF( ( aapp.LT.rootbig ) .AND.
1095  $ ( aapp.GT.rootsfmin ) ) THEN
1096  aapp = dnrm2( m, a( 1, p ), 1 )*
1097  $ work( p )
1098  ELSE
1099  t = zero
1100  aapp = one
1101  CALL dlassq( m, a( 1, p ), 1, t,
1102  $ aapp )
1103  aapp = t*dsqrt( aapp )*work( p )
1104  END IF
1105  sva( p ) = aapp
1106  END IF
1107 *
1108  ELSE
1109 * A(:,p) and A(:,q) already numerically orthogonal
1110  IF( ir1.EQ.0 )notrot = notrot + 1
1111 *[RTD] SKIPPED = SKIPPED + 1
1112  pskipped = pskipped + 1
1113  END IF
1114  ELSE
1115 * A(:,q) is zero column
1116  IF( ir1.EQ.0 )notrot = notrot + 1
1117  pskipped = pskipped + 1
1118  END IF
1119 *
1120  IF( ( i.LE.swband ) .AND.
1121  $ ( pskipped.GT.rowskip ) ) THEN
1122  IF( ir1.EQ.0 )aapp = -aapp
1123  notrot = 0
1124  GO TO 2103
1125  END IF
1126 *
1127  2002 CONTINUE
1128 * END q-LOOP
1129 *
1130  2103 CONTINUE
1131 * bailed out of q-loop
1132 *
1133  sva( p ) = aapp
1134 *
1135  ELSE
1136  sva( p ) = aapp
1137  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1138  $ notrot = notrot + min( igl+kbl-1, n ) - p
1139  END IF
1140 *
1141  2001 CONTINUE
1142 * end of the p-loop
1143 * end of doing the block ( ibr, ibr )
1144  1002 CONTINUE
1145 * end of ir1-loop
1146 *
1147 * ... go to the off diagonal blocks
1148 *
1149  igl = ( ibr-1 )*kbl + 1
1150 *
1151  DO 2010 jbc = ibr + 1, nbl
1152 *
1153  jgl = ( jbc-1 )*kbl + 1
1154 *
1155 * doing the block at ( ibr, jbc )
1156 *
1157  ijblsk = 0
1158  DO 2100 p = igl, min( igl+kbl-1, n )
1159 *
1160  aapp = sva( p )
1161  IF( aapp.GT.zero ) THEN
1162 *
1163  pskipped = 0
1164 *
1165  DO 2200 q = jgl, min( jgl+kbl-1, n )
1166 *
1167  aaqq = sva( q )
1168  IF( aaqq.GT.zero ) THEN
1169  aapp0 = aapp
1170 *
1171 * .. M x 2 Jacobi SVD ..
1172 *
1173 * Safe Gram matrix computation
1174 *
1175  IF( aaqq.GE.one ) THEN
1176  IF( aapp.GE.aaqq ) THEN
1177  rotok = ( small*aapp ).LE.aaqq
1178  ELSE
1179  rotok = ( small*aaqq ).LE.aapp
1180  END IF
1181  IF( aapp.LT.( big / aaqq ) ) THEN
1182  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1183  $ q ), 1 )*work( p )*work( q ) /
1184  $ aaqq ) / aapp
1185  ELSE
1186  CALL dcopy( m, a( 1, p ), 1,
1187  $ work( n+1 ), 1 )
1188  CALL dlascl( 'G', 0, 0, aapp,
1189  $ work( p ), m, 1,
1190  $ work( n+1 ), lda, ierr )
1191  aapq = ddot( m, work( n+1 ), 1,
1192  $ a( 1, q ), 1 )*work( q ) / aaqq
1193  END IF
1194  ELSE
1195  IF( aapp.GE.aaqq ) THEN
1196  rotok = aapp.LE.( aaqq / small )
1197  ELSE
1198  rotok = aaqq.LE.( aapp / small )
1199  END IF
1200  IF( aapp.GT.( small / aaqq ) ) THEN
1201  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1202  $ q ), 1 )*work( p )*work( q ) /
1203  $ aaqq ) / aapp
1204  ELSE
1205  CALL dcopy( m, a( 1, q ), 1,
1206  $ work( n+1 ), 1 )
1207  CALL dlascl( 'G', 0, 0, aaqq,
1208  $ work( q ), m, 1,
1209  $ work( n+1 ), lda, ierr )
1210  aapq = ddot( m, work( n+1 ), 1,
1211  $ a( 1, p ), 1 )*work( p ) / aapp
1212  END IF
1213  END IF
1214 *
1215  mxaapq = max( mxaapq, dabs( aapq ) )
1216 *
1217 * TO rotate or NOT to rotate, THAT is the question ...
1218 *
1219  IF( dabs( aapq ).GT.tol ) THEN
1220  notrot = 0
1221 *[RTD] ROTATED = ROTATED + 1
1222  pskipped = 0
1223  iswrot = iswrot + 1
1224 *
1225  IF( rotok ) THEN
1226 *
1227  aqoap = aaqq / aapp
1228  apoaq = aapp / aaqq
1229  theta = -half*dabs(aqoap-apoaq)/aapq
1230  IF( aaqq.GT.aapp0 )theta = -theta
1231 *
1232  IF( dabs( theta ).GT.bigtheta ) THEN
1233  t = half / theta
1234  fastr( 3 ) = t*work( p ) / work( q )
1235  fastr( 4 ) = -t*work( q ) /
1236  $ work( p )
1237  CALL drotm( m, a( 1, p ), 1,
1238  $ a( 1, q ), 1, fastr )
1239  IF( rsvec )CALL drotm( mvl,
1240  $ v( 1, p ), 1,
1241  $ v( 1, q ), 1,
1242  $ fastr )
1243  sva( q ) = aaqq*dsqrt( max( zero,
1244  $ one+t*apoaq*aapq ) )
1245  aapp = aapp*dsqrt( max( zero,
1246  $ one-t*aqoap*aapq ) )
1247  mxsinj = max( mxsinj, dabs( t ) )
1248  ELSE
1249 *
1250 * .. choose correct signum for THETA and rotate
1251 *
1252  thsign = -dsign( one, aapq )
1253  IF( aaqq.GT.aapp0 )thsign = -thsign
1254  t = one / ( theta+thsign*
1255  $ dsqrt( one+theta*theta ) )
1256  cs = dsqrt( one / ( one+t*t ) )
1257  sn = t*cs
1258  mxsinj = max( mxsinj, dabs( sn ) )
1259  sva( q ) = aaqq*dsqrt( max( zero,
1260  $ one+t*apoaq*aapq ) )
1261  aapp = aapp*dsqrt( max( zero,
1262  $ one-t*aqoap*aapq ) )
1263 *
1264  apoaq = work( p ) / work( q )
1265  aqoap = work( q ) / work( p )
1266  IF( work( p ).GE.one ) THEN
1267 *
1268  IF( work( q ).GE.one ) THEN
1269  fastr( 3 ) = t*apoaq
1270  fastr( 4 ) = -t*aqoap
1271  work( p ) = work( p )*cs
1272  work( q ) = work( q )*cs
1273  CALL drotm( m, a( 1, p ), 1,
1274  $ a( 1, q ), 1,
1275  $ fastr )
1276  IF( rsvec )CALL drotm( mvl,
1277  $ v( 1, p ), 1, v( 1, q ),
1278  $ 1, fastr )
1279  ELSE
1280  CALL daxpy( m, -t*aqoap,
1281  $ a( 1, q ), 1,
1282  $ a( 1, p ), 1 )
1283  CALL daxpy( m, cs*sn*apoaq,
1284  $ a( 1, p ), 1,
1285  $ a( 1, q ), 1 )
1286  IF( rsvec ) THEN
1287  CALL daxpy( mvl, -t*aqoap,
1288  $ v( 1, q ), 1,
1289  $ v( 1, p ), 1 )
1290  CALL daxpy( mvl,
1291  $ cs*sn*apoaq,
1292  $ v( 1, p ), 1,
1293  $ v( 1, q ), 1 )
1294  END IF
1295  work( p ) = work( p )*cs
1296  work( q ) = work( q ) / cs
1297  END IF
1298  ELSE
1299  IF( work( q ).GE.one ) THEN
1300  CALL daxpy( m, t*apoaq,
1301  $ a( 1, p ), 1,
1302  $ a( 1, q ), 1 )
1303  CALL daxpy( m, -cs*sn*aqoap,
1304  $ a( 1, q ), 1,
1305  $ a( 1, p ), 1 )
1306  IF( rsvec ) THEN
1307  CALL daxpy( mvl, t*apoaq,
1308  $ v( 1, p ), 1,
1309  $ v( 1, q ), 1 )
1310  CALL daxpy( mvl,
1311  $ -cs*sn*aqoap,
1312  $ v( 1, q ), 1,
1313  $ v( 1, p ), 1 )
1314  END IF
1315  work( p ) = work( p ) / cs
1316  work( q ) = work( q )*cs
1317  ELSE
1318  IF( work( p ).GE.work( q ) )
1319  $ THEN
1320  CALL daxpy( m, -t*aqoap,
1321  $ a( 1, q ), 1,
1322  $ a( 1, p ), 1 )
1323  CALL daxpy( m, cs*sn*apoaq,
1324  $ a( 1, p ), 1,
1325  $ a( 1, q ), 1 )
1326  work( p ) = work( p )*cs
1327  work( q ) = work( q ) / cs
1328  IF( rsvec ) THEN
1329  CALL daxpy( mvl,
1330  $ -t*aqoap,
1331  $ v( 1, q ), 1,
1332  $ v( 1, p ), 1 )
1333  CALL daxpy( mvl,
1334  $ cs*sn*apoaq,
1335  $ v( 1, p ), 1,
1336  $ v( 1, q ), 1 )
1337  END IF
1338  ELSE
1339  CALL daxpy( m, t*apoaq,
1340  $ a( 1, p ), 1,
1341  $ a( 1, q ), 1 )
1342  CALL daxpy( m,
1343  $ -cs*sn*aqoap,
1344  $ a( 1, q ), 1,
1345  $ a( 1, p ), 1 )
1346  work( p ) = work( p ) / cs
1347  work( q ) = work( q )*cs
1348  IF( rsvec ) THEN
1349  CALL daxpy( mvl,
1350  $ t*apoaq, v( 1, p ),
1351  $ 1, v( 1, q ), 1 )
1352  CALL daxpy( mvl,
1353  $ -cs*sn*aqoap,
1354  $ v( 1, q ), 1,
1355  $ v( 1, p ), 1 )
1356  END IF
1357  END IF
1358  END IF
1359  END IF
1360  END IF
1361 *
1362  ELSE
1363  IF( aapp.GT.aaqq ) THEN
1364  CALL dcopy( m, a( 1, p ), 1,
1365  $ work( n+1 ), 1 )
1366  CALL dlascl( 'G', 0, 0, aapp, one,
1367  $ m, 1, work( n+1 ), lda,
1368  $ ierr )
1369  CALL dlascl( 'G', 0, 0, aaqq, one,
1370  $ m, 1, a( 1, q ), lda,
1371  $ ierr )
1372  temp1 = -aapq*work( p ) / work( q )
1373  CALL daxpy( m, temp1, work( n+1 ),
1374  $ 1, a( 1, q ), 1 )
1375  CALL dlascl( 'G', 0, 0, one, aaqq,
1376  $ m, 1, a( 1, q ), lda,
1377  $ ierr )
1378  sva( q ) = aaqq*dsqrt( max( zero,
1379  $ one-aapq*aapq ) )
1380  mxsinj = max( mxsinj, sfmin )
1381  ELSE
1382  CALL dcopy( m, a( 1, q ), 1,
1383  $ work( n+1 ), 1 )
1384  CALL dlascl( 'G', 0, 0, aaqq, one,
1385  $ m, 1, work( n+1 ), lda,
1386  $ ierr )
1387  CALL dlascl( 'G', 0, 0, aapp, one,
1388  $ m, 1, a( 1, p ), lda,
1389  $ ierr )
1390  temp1 = -aapq*work( q ) / work( p )
1391  CALL daxpy( m, temp1, work( n+1 ),
1392  $ 1, a( 1, p ), 1 )
1393  CALL dlascl( 'G', 0, 0, one, aapp,
1394  $ m, 1, a( 1, p ), lda,
1395  $ ierr )
1396  sva( p ) = aapp*dsqrt( max( zero,
1397  $ one-aapq*aapq ) )
1398  mxsinj = max( mxsinj, sfmin )
1399  END IF
1400  END IF
1401 * END IF ROTOK THEN ... ELSE
1402 *
1403 * In the case of cancellation in updating SVA(q)
1404 * .. recompute SVA(q)
1405  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1406  $ THEN
1407  IF( ( aaqq.LT.rootbig ) .AND.
1408  $ ( aaqq.GT.rootsfmin ) ) THEN
1409  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1410  $ work( q )
1411  ELSE
1412  t = zero
1413  aaqq = one
1414  CALL dlassq( m, a( 1, q ), 1, t,
1415  $ aaqq )
1416  sva( q ) = t*dsqrt( aaqq )*work( q )
1417  END IF
1418  END IF
1419  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1420  IF( ( aapp.LT.rootbig ) .AND.
1421  $ ( aapp.GT.rootsfmin ) ) THEN
1422  aapp = dnrm2( m, a( 1, p ), 1 )*
1423  $ work( p )
1424  ELSE
1425  t = zero
1426  aapp = one
1427  CALL dlassq( m, a( 1, p ), 1, t,
1428  $ aapp )
1429  aapp = t*dsqrt( aapp )*work( p )
1430  END IF
1431  sva( p ) = aapp
1432  END IF
1433 * end of OK rotation
1434  ELSE
1435  notrot = notrot + 1
1436 *[RTD] SKIPPED = SKIPPED + 1
1437  pskipped = pskipped + 1
1438  ijblsk = ijblsk + 1
1439  END IF
1440  ELSE
1441  notrot = notrot + 1
1442  pskipped = pskipped + 1
1443  ijblsk = ijblsk + 1
1444  END IF
1445 *
1446  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1447  $ THEN
1448  sva( p ) = aapp
1449  notrot = 0
1450  GO TO 2011
1451  END IF
1452  IF( ( i.LE.swband ) .AND.
1453  $ ( pskipped.GT.rowskip ) ) THEN
1454  aapp = -aapp
1455  notrot = 0
1456  GO TO 2203
1457  END IF
1458 *
1459  2200 CONTINUE
1460 * end of the q-loop
1461  2203 CONTINUE
1462 *
1463  sva( p ) = aapp
1464 *
1465  ELSE
1466 *
1467  IF( aapp.EQ.zero )notrot = notrot +
1468  $ min( jgl+kbl-1, n ) - jgl + 1
1469  IF( aapp.LT.zero )notrot = 0
1470 *
1471  END IF
1472 *
1473  2100 CONTINUE
1474 * end of the p-loop
1475  2010 CONTINUE
1476 * end of the jbc-loop
1477  2011 CONTINUE
1478 *2011 bailed out of the jbc-loop
1479  DO 2012 p = igl, min( igl+kbl-1, n )
1480  sva( p ) = dabs( sva( p ) )
1481  2012 CONTINUE
1482 ***
1483  2000 CONTINUE
1484 *2000 :: end of the ibr-loop
1485 *
1486 * .. update SVA(N)
1487  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1488  $ THEN
1489  sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1490  ELSE
1491  t = zero
1492  aapp = one
1493  CALL dlassq( m, a( 1, n ), 1, t, aapp )
1494  sva( n ) = t*dsqrt( aapp )*work( n )
1495  END IF
1496 *
1497 * Additional steering devices
1498 *
1499  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1500  $ ( iswrot.LE.n ) ) )swband = i
1501 *
1502  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1503  $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1504  GO TO 1994
1505  END IF
1506 *
1507  IF( notrot.GE.emptsw )GO TO 1994
1508 *
1509  1993 CONTINUE
1510 * end i=1:NSWEEP loop
1511 *
1512 * #:( Reaching this point means that the procedure has not converged.
1513  info = nsweep - 1
1514  GO TO 1995
1515 *
1516  1994 CONTINUE
1517 * #:) Reaching this point means numerical convergence after the i-th
1518 * sweep.
1519 *
1520  info = 0
1521 * #:) INFO = 0 confirms successful iterations.
1522  1995 CONTINUE
1523 *
1524 * Sort the singular values and find how many are above
1525 * the underflow threshold.
1526 *
1527  n2 = 0
1528  n4 = 0
1529  DO 5991 p = 1, n - 1
1530  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1531  IF( p.NE.q ) THEN
1532  temp1 = sva( p )
1533  sva( p ) = sva( q )
1534  sva( q ) = temp1
1535  temp1 = work( p )
1536  work( p ) = work( q )
1537  work( q ) = temp1
1538  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1539  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1540  END IF
1541  IF( sva( p ).NE.zero ) THEN
1542  n4 = n4 + 1
1543  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1544  END IF
1545  5991 CONTINUE
1546  IF( sva( n ).NE.zero ) THEN
1547  n4 = n4 + 1
1548  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1549  END IF
1550 *
1551 * Normalize the left singular vectors.
1552 *
1553  IF( lsvec .OR. uctol ) THEN
1554  DO 1998 p = 1, n2
1555  CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1556  1998 CONTINUE
1557  END IF
1558 *
1559 * Scale the product of Jacobi rotations (assemble the fast rotations).
1560 *
1561  IF( rsvec ) THEN
1562  IF( applv ) THEN
1563  DO 2398 p = 1, n
1564  CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1565  2398 CONTINUE
1566  ELSE
1567  DO 2399 p = 1, n
1568  temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1569  CALL dscal( mvl, temp1, v( 1, p ), 1 )
1570  2399 CONTINUE
1571  END IF
1572  END IF
1573 *
1574 * Undo scaling, if necessary (and possible).
1575  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1576  $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1577  $ ( sfmin / skl) ) ) ) THEN
1578  DO 2400 p = 1, n
1579  sva( p ) = skl*sva( p )
1580  2400 CONTINUE
1581  skl= one
1582  END IF
1583 *
1584  work( 1 ) = skl
1585 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1586 * then some of the singular values may overflow or underflow and
1587 * the spectrum is given in this factored representation.
1588 *
1589  work( 2 ) = dble( n4 )
1590 * N4 is the number of computed nonzero singular values of A.
1591 *
1592  work( 3 ) = dble( n2 )
1593 * N2 is the number of singular values of A greater than SFMIN.
1594 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1595 * that may carry some information.
1596 *
1597  work( 4 ) = dble( i )
1598 * i is the index of the last sweep before declaring convergence.
1599 *
1600  work( 5 ) = mxaapq
1601 * MXAAPQ is the largest absolute value of scaled pivots in the
1602 * last sweep
1603 *
1604  work( 6 ) = mxsinj
1605 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1606 * in the last sweep
1607 *
1608  RETURN
1609 * ..
1610 * .. END OF DGESVJ
1611 * ..
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:69
subroutine dlassq(n, x, incx, scl, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f90:137
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:143
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition: dlaset.f:110
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:71
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:60
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:53
subroutine drotm(N, DX, INCX, DY, INCY, DPARAM)
DROTM
Definition: drotm.f:96
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:82
double precision function ddot(N, DX, INCX, DY, INCY)
DDOT
Definition: ddot.f:82
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:79
subroutine daxpy(N, DA, DX, INCX, DY, INCY)
DAXPY
Definition: daxpy.f:89
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:82
subroutine dgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ0 pre-processor for the routine dgesvj.
Definition: dgsvj0.f:218
subroutine dgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: dgsvj1.f:236
real(wp) function dnrm2(n, x, incx)
DNRM2
Definition: dnrm2.f90:89
Here is the call graph for this function:
Here is the caller graph for this function: