LAPACK  3.6.1
LAPACK: Linear Algebra PACKage
subroutine sgesvj ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
integer  M,
integer  N,
real, dimension( lda, * )  A,
integer  LDA,
real, dimension( n )  SVA,
integer  MV,
real, dimension( ldv, * )  V,
integer  LDV,
real, dimension( lwork )  WORK,
integer  LWORK,
integer  INFO 
)

SGESVJ

Download SGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 SGESVJ computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, where M >= N. The SVD of A is written as
                                    [++]   [xx]   [x0]   [xx]
              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
                                    [++]   [xx]
 where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
 matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
 of SIGMA are the singular values of A. The columns of U and V are the
 left and the right singular vectors of A, respectively.
 SGESVJ can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER* 1
          Specifies the structure of A.
          = 'L': The input matrix A is lower triangular;
          = 'U': The input matrix A is upper triangular;
          = 'G': The input matrix A is general M-by-N matrix, M >= N.
[in]JOBU
          JOBU is CHARACTER*1
          Specifies whether to compute the left singular vectors
          (columns of U):
          = 'U': The left singular vectors corresponding to the nonzero
                 singular values are computed and returned in the leading
                 columns of A. See more details in the description of A.
                 The default numerical orthogonality threshold is set to
                 approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E').
          = 'C': Analogous to JOBU='U', except that user can control the
                 level of numerical orthogonality of the computed left
                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
                 CTOL is given on input in the array WORK.
                 No CTOL smaller than ONE is allowed. CTOL greater
                 than 1 / EPS is meaningless. The option 'C'
                 can be used if M*EPS is satisfactory orthogonality
                 of the computed left singular vectors, so CTOL=M could
                 save few sweeps of Jacobi rotations.
                 See the descriptions of A and WORK(1).
          = 'N': The matrix U is not computed. However, see the
                 description of A.
[in]JOBV
          JOBV is CHARACTER*1
          Specifies whether to compute the right singular vectors, that
          is, the matrix V:
          = 'V' : the matrix V is computed and returned in the array V
          = 'A' : the Jacobi rotations are applied to the MV-by-N
                  array V. In other words, the right singular vector
                  matrix V is not computed explicitly; instead it is
                  applied to an MV-by-N matrix initially stored in the
                  first MV rows of V.
          = 'N' : the matrix V is not computed and the array V is not
                  referenced
[in]M
          M is INTEGER
          The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.
          M >= N >= 0.
[in,out]A
          A is REAL array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C':
                 If INFO .EQ. 0 :
                 RANKA orthonormal columns of U are returned in the
                 leading RANKA columns of the array A. Here RANKA <= N
                 is the number of computed singular values of A that are
                 above the underflow threshold SLAMCH('S'). The singular
                 vectors corresponding to underflowed or zero singular
                 values are not computed. The value of RANKA is returned
                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
                 descriptions of SVA and WORK. The computed columns of U
                 are mutually numerically orthogonal up to approximately
                 TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
                 see the description of JOBU.
                 If INFO .GT. 0,
                 the procedure SGESVJ did not converge in the given number
                 of iterations (sweeps). In that case, the computed
                 columns of U may not be orthogonal up to TOL. The output
                 U (stored in A), SIGMA (given by the computed singular
                 values in SVA(1:N)) and V is still a decomposition of the
                 input matrix A in the sense that the residual
                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
          If JOBU .EQ. 'N':
                 If INFO .EQ. 0 :
                 Note that the left singular vectors are 'for free' in the
                 one-sided Jacobi SVD algorithm. However, if only the
                 singular values are needed, the level of numerical
                 orthogonality of U is not an issue and iterations are
                 stopped when the columns of the iterated matrix are
                 numerically orthogonal up to approximately M*EPS. Thus,
                 on exit, A contains the columns of U scaled with the
                 corresponding singular values.
                 If INFO .GT. 0 :
                 the procedure SGESVJ did not converge in the given number
                 of iterations (sweeps).
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is REAL array, dimension (N)
          On exit,
          If INFO .EQ. 0 :
          depending on the value SCALE = WORK(1), we have:
                 If SCALE .EQ. ONE:
                 SVA(1:N) contains the computed singular values of A.
                 During the computation SVA contains the Euclidean column
                 norms of the iterated matrices in the array A.
                 If SCALE .NE. ONE:
                 The singular values of A are SCALE*SVA(1:N), and this
                 factored representation is due to the fact that some of the
                 singular values of A might underflow or overflow.

          If INFO .GT. 0 :
          the procedure SGESVJ did not converge in the given number of
          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
[in]MV
          MV is INTEGER
          If JOBV .EQ. 'A', then the product of Jacobi rotations in SGESVJ
          is applied to the first MV rows of V. See the description of JOBV.
[in,out]V
          V is REAL array, dimension (LDV,N)
          If JOBV = 'V', then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'A', then V contains the product of the computed right
                         singular vector matrix and the initial matrix in
                         the array V.
          If JOBV = 'N', then V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V, LDV .GE. 1.
          If JOBV .EQ. 'V', then LDV .GE. max(1,N).
          If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
[in,out]WORK
          WORK is REAL array, dimension max(4,M+N).
          On entry,
          If JOBU .EQ. 'C' :
          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
                    The process stops if all columns of A are mutually
                    orthogonal up to CTOL*EPS, EPS=SLAMCH('E').
                    It is required that CTOL >= ONE, i.e. it is not
                    allowed to force the routine to obtain orthogonality
                    below EPSILON.
          On exit,
          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
                    are the computed singular vcalues of A.
                    (See description of SVA().)
          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
                    singular values.
          WORK(3) = NINT(WORK(3)) is the number of the computed singular
                    values that are larger than the underflow threshold.
          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
                    rotations needed for numerical convergence.
          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
                    This is useful information in cases when SGESVJ did
                    not converge, as it can be used to estimate whether
                    the output is stil useful and for post festum analysis.
          WORK(6) = the largest absolute value over all sines of the
                    Jacobi rotation angles in the last sweep. It can be
                    useful for a post festum analysis.
[in]LWORK
          LWORK is INTEGER
         length of WORK, WORK >= MAX(6,M+N)
[out]INFO
          INFO is INTEGER
          = 0 : successful exit.
          < 0 : if INFO = -i, then the i-th argument had an illegal value
          > 0 : SGESVJ did not converge in the maximal allowed number (30)
                of sweeps. The output may still be useful. See the
                description of WORK.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Further Details:
The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane rotations. The rotations are implemented as fast scaled rotations of Anda and Park [1]. In the case of underflow of the Jacobi angle, a modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses column interchanges of de Rijk [2]. The relative accuracy of the computed singular values and the accuracy of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. The condition number that determines the accuracy in the full rank case is essentially min_{D=diag} kappa(A*D), where kappa(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if used in an accelerated version of Drmac and Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. Some tunning parameters (marked with [TP]) are available for the implementer.
The computational range for the nonzero singular values is the machine number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the corresponding gradual loss of accurate digits.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
[1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.

[2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer.
SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.

[3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
[4] Z. Drmac: Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic.
SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.

[5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
LAPACK Working note 169.

[6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
LAPACK Working note 170.

[7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Bugs, Examples and Comments:
Please report all bugs and send interesting test examples and comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 325 of file sgesvj.f.

325 *
326 * -- LAPACK computational routine (version 3.6.0) --
327 * -- LAPACK is a software package provided by Univ. of Tennessee, --
328 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
329 * November 2015
330 *
331 * .. Scalar Arguments ..
332  INTEGER info, lda, ldv, lwork, m, mv, n
333  CHARACTER*1 joba, jobu, jobv
334 * ..
335 * .. Array Arguments ..
336  REAL a( lda, * ), sva( n ), v( ldv, * ),
337  $ work( lwork )
338 * ..
339 *
340 * =====================================================================
341 *
342 * .. Local Parameters ..
343  REAL zero, half, one
344  parameter ( zero = 0.0e0, half = 0.5e0, one = 1.0e0)
345  INTEGER nsweep
346  parameter ( nsweep = 30 )
347 * ..
348 * .. Local Scalars ..
349  REAL aapp, aapp0, aapq, aaqq, apoaq, aqoap, big,
350  $ bigtheta, cs, ctol, epsln, large, mxaapq,
351  $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
352  $ skl, sfmin, small, sn, t, temp1, theta,
353  $ thsign, tol
354  INTEGER blskip, emptsw, i, ibr, ierr, igl, ijblsk, ir1,
355  $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
356  $ n4, nbl, notrot, p, pskipped, q, rowskip,
357  $ swband
358  LOGICAL applv, goscale, lower, lsvec, noscale, rotok,
359  $ rsvec, uctol, upper
360 * ..
361 * .. Local Arrays ..
362  REAL fastr( 5 )
363 * ..
364 * .. Intrinsic Functions ..
365  INTRINSIC abs, max, min, float, sign, sqrt
366 * ..
367 * .. External Functions ..
368 * ..
369 * from BLAS
370  REAL sdot, snrm2
371  EXTERNAL sdot, snrm2
372  INTEGER isamax
373  EXTERNAL isamax
374 * from LAPACK
375  REAL slamch
376  EXTERNAL slamch
377  LOGICAL lsame
378  EXTERNAL lsame
379 * ..
380 * .. External Subroutines ..
381 * ..
382 * from BLAS
383  EXTERNAL saxpy, scopy, srotm, sscal, sswap
384 * from LAPACK
385  EXTERNAL slascl, slaset, slassq, xerbla
386 *
387  EXTERNAL sgsvj0, sgsvj1
388 * ..
389 * .. Executable Statements ..
390 *
391 * Test the input arguments
392 *
393  lsvec = lsame( jobu, 'U' )
394  uctol = lsame( jobu, 'C' )
395  rsvec = lsame( jobv, 'V' )
396  applv = lsame( jobv, 'A' )
397  upper = lsame( joba, 'U' )
398  lower = lsame( joba, 'L' )
399 *
400  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
401  info = -1
402  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
403  info = -2
404  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
405  info = -3
406  ELSE IF( m.LT.0 ) THEN
407  info = -4
408  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
409  info = -5
410  ELSE IF( lda.LT.m ) THEN
411  info = -7
412  ELSE IF( mv.LT.0 ) THEN
413  info = -9
414  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
415  $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
416  info = -11
417  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
418  info = -12
419  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
420  info = -13
421  ELSE
422  info = 0
423  END IF
424 *
425 * #:(
426  IF( info.NE.0 ) THEN
427  CALL xerbla( 'SGESVJ', -info )
428  RETURN
429  END IF
430 *
431 * #:) Quick return for void matrix
432 *
433  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
434 *
435 * Set numerical parameters
436 * The stopping criterion for Jacobi rotations is
437 *
438 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
439 *
440 * where EPS is the round-off and CTOL is defined as follows:
441 *
442  IF( uctol ) THEN
443 * ... user controlled
444  ctol = work( 1 )
445  ELSE
446 * ... default
447  IF( lsvec .OR. rsvec .OR. applv ) THEN
448  ctol = sqrt( float( m ) )
449  ELSE
450  ctol = float( m )
451  END IF
452  END IF
453 * ... and the machine dependent parameters are
454 *[!] (Make sure that SLAMCH() works properly on the target machine.)
455 *
456  epsln = slamch( 'Epsilon' )
457  rooteps = sqrt( epsln )
458  sfmin = slamch( 'SafeMinimum' )
459  rootsfmin = sqrt( sfmin )
460  small = sfmin / epsln
461  big = slamch( 'Overflow' )
462 * BIG = ONE / SFMIN
463  rootbig = one / rootsfmin
464  large = big / sqrt( float( m*n ) )
465  bigtheta = one / rooteps
466 *
467  tol = ctol*epsln
468  roottol = sqrt( tol )
469 *
470  IF( float( m )*epsln.GE.one ) THEN
471  info = -4
472  CALL xerbla( 'SGESVJ', -info )
473  RETURN
474  END IF
475 *
476 * Initialize the right singular vector matrix.
477 *
478  IF( rsvec ) THEN
479  mvl = n
480  CALL slaset( 'A', mvl, n, zero, one, v, ldv )
481  ELSE IF( applv ) THEN
482  mvl = mv
483  END IF
484  rsvec = rsvec .OR. applv
485 *
486 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
487 *(!) If necessary, scale A to protect the largest singular value
488 * from overflow. It is possible that saving the largest singular
489 * value destroys the information about the small ones.
490 * This initial scaling is almost minimal in the sense that the
491 * goal is to make sure that no column norm overflows, and that
492 * SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
493 * in A are detected, the procedure returns with INFO=-6.
494 *
495  skl = one / sqrt( float( m )*float( n ) )
496  noscale = .true.
497  goscale = .true.
498 *
499  IF( lower ) THEN
500 * the input matrix is M-by-N lower triangular (trapezoidal)
501  DO 1874 p = 1, n
502  aapp = zero
503  aaqq = one
504  CALL slassq( m-p+1, a( p, p ), 1, aapp, aaqq )
505  IF( aapp.GT.big ) THEN
506  info = -6
507  CALL xerbla( 'SGESVJ', -info )
508  RETURN
509  END IF
510  aaqq = sqrt( aaqq )
511  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
512  sva( p ) = aapp*aaqq
513  ELSE
514  noscale = .false.
515  sva( p ) = aapp*( aaqq*skl )
516  IF( goscale ) THEN
517  goscale = .false.
518  DO 1873 q = 1, p - 1
519  sva( q ) = sva( q )*skl
520  1873 CONTINUE
521  END IF
522  END IF
523  1874 CONTINUE
524  ELSE IF( upper ) THEN
525 * the input matrix is M-by-N upper triangular (trapezoidal)
526  DO 2874 p = 1, n
527  aapp = zero
528  aaqq = one
529  CALL slassq( p, a( 1, p ), 1, aapp, aaqq )
530  IF( aapp.GT.big ) THEN
531  info = -6
532  CALL xerbla( 'SGESVJ', -info )
533  RETURN
534  END IF
535  aaqq = sqrt( aaqq )
536  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
537  sva( p ) = aapp*aaqq
538  ELSE
539  noscale = .false.
540  sva( p ) = aapp*( aaqq*skl )
541  IF( goscale ) THEN
542  goscale = .false.
543  DO 2873 q = 1, p - 1
544  sva( q ) = sva( q )*skl
545  2873 CONTINUE
546  END IF
547  END IF
548  2874 CONTINUE
549  ELSE
550 * the input matrix is M-by-N general dense
551  DO 3874 p = 1, n
552  aapp = zero
553  aaqq = one
554  CALL slassq( m, a( 1, p ), 1, aapp, aaqq )
555  IF( aapp.GT.big ) THEN
556  info = -6
557  CALL xerbla( 'SGESVJ', -info )
558  RETURN
559  END IF
560  aaqq = sqrt( aaqq )
561  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
562  sva( p ) = aapp*aaqq
563  ELSE
564  noscale = .false.
565  sva( p ) = aapp*( aaqq*skl )
566  IF( goscale ) THEN
567  goscale = .false.
568  DO 3873 q = 1, p - 1
569  sva( q ) = sva( q )*skl
570  3873 CONTINUE
571  END IF
572  END IF
573  3874 CONTINUE
574  END IF
575 *
576  IF( noscale )skl = one
577 *
578 * Move the smaller part of the spectrum from the underflow threshold
579 *(!) Start by determining the position of the nonzero entries of the
580 * array SVA() relative to ( SFMIN, BIG ).
581 *
582  aapp = zero
583  aaqq = big
584  DO 4781 p = 1, n
585  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
586  aapp = max( aapp, sva( p ) )
587  4781 CONTINUE
588 *
589 * #:) Quick return for zero matrix
590 *
591  IF( aapp.EQ.zero ) THEN
592  IF( lsvec )CALL slaset( 'G', m, n, zero, one, a, lda )
593  work( 1 ) = one
594  work( 2 ) = zero
595  work( 3 ) = zero
596  work( 4 ) = zero
597  work( 5 ) = zero
598  work( 6 ) = zero
599  RETURN
600  END IF
601 *
602 * #:) Quick return for one-column matrix
603 *
604  IF( n.EQ.1 ) THEN
605  IF( lsvec )CALL slascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
606  $ a( 1, 1 ), lda, ierr )
607  work( 1 ) = one / skl
608  IF( sva( 1 ).GE.sfmin ) THEN
609  work( 2 ) = one
610  ELSE
611  work( 2 ) = zero
612  END IF
613  work( 3 ) = zero
614  work( 4 ) = zero
615  work( 5 ) = zero
616  work( 6 ) = zero
617  RETURN
618  END IF
619 *
620 * Protect small singular values from underflow, and try to
621 * avoid underflows/overflows in computing Jacobi rotations.
622 *
623  sn = sqrt( sfmin / epsln )
624  temp1 = sqrt( big / float( n ) )
625  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
626  $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
627  temp1 = min( big, temp1 / aapp )
628 * AAQQ = AAQQ*TEMP1
629 * AAPP = AAPP*TEMP1
630  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
631  temp1 = min( sn / aaqq, big / ( aapp*sqrt( float( n ) ) ) )
632 * AAQQ = AAQQ*TEMP1
633 * AAPP = AAPP*TEMP1
634  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
635  temp1 = max( sn / aaqq, temp1 / aapp )
636 * AAQQ = AAQQ*TEMP1
637 * AAPP = AAPP*TEMP1
638  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
639  temp1 = min( sn / aaqq, big / ( sqrt( float( n ) )*aapp ) )
640 * AAQQ = AAQQ*TEMP1
641 * AAPP = AAPP*TEMP1
642  ELSE
643  temp1 = one
644  END IF
645 *
646 * Scale, if necessary
647 *
648  IF( temp1.NE.one ) THEN
649  CALL slascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
650  END IF
651  skl = temp1*skl
652  IF( skl.NE.one ) THEN
653  CALL slascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
654  skl = one / skl
655  END IF
656 *
657 * Row-cyclic Jacobi SVD algorithm with column pivoting
658 *
659  emptsw = ( n*( n-1 ) ) / 2
660  notrot = 0
661  fastr( 1 ) = zero
662 *
663 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
664 * is initialized to identity. WORK is updated during fast scaled
665 * rotations.
666 *
667  DO 1868 q = 1, n
668  work( q ) = one
669  1868 CONTINUE
670 *
671 *
672  swband = 3
673 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
674 * if SGESVJ is used as a computational routine in the preconditioned
675 * Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure
676 * works on pivots inside a band-like region around the diagonal.
677 * The boundaries are determined dynamically, based on the number of
678 * pivots above a threshold.
679 *
680  kbl = min( 8, n )
681 *[TP] KBL is a tuning parameter that defines the tile size in the
682 * tiling of the p-q loops of pivot pairs. In general, an optimal
683 * value of KBL depends on the matrix dimensions and on the
684 * parameters of the computer's memory.
685 *
686  nbl = n / kbl
687  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
688 *
689  blskip = kbl**2
690 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
691 *
692  rowskip = min( 5, kbl )
693 *[TP] ROWSKIP is a tuning parameter.
694 *
695  lkahead = 1
696 *[TP] LKAHEAD is a tuning parameter.
697 *
698 * Quasi block transformations, using the lower (upper) triangular
699 * structure of the input matrix. The quasi-block-cycling usually
700 * invokes cubic convergence. Big part of this cycle is done inside
701 * canonical subspaces of dimensions less than M.
702 *
703  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
704 *[TP] The number of partition levels and the actual partition are
705 * tuning parameters.
706  n4 = n / 4
707  n2 = n / 2
708  n34 = 3*n4
709  IF( applv ) THEN
710  q = 0
711  ELSE
712  q = 1
713  END IF
714 *
715  IF( lower ) THEN
716 *
717 * This works very well on lower triangular matrices, in particular
718 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
719 * The idea is simple:
720 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
721 * [+ + 0 0] [0 0]
722 * [+ + x 0] actually work on [x 0] [x 0]
723 * [+ + x x] [x x]. [x x]
724 *
725  CALL sgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
726  $ work( n34+1 ), sva( n34+1 ), mvl,
727  $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
728  $ 2, work( n+1 ), lwork-n, ierr )
729 *
730  CALL sgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
731  $ work( n2+1 ), sva( n2+1 ), mvl,
732  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
733  $ work( n+1 ), lwork-n, ierr )
734 *
735  CALL sgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
736  $ work( n2+1 ), sva( n2+1 ), mvl,
737  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
738  $ work( n+1 ), lwork-n, ierr )
739 *
740  CALL sgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
741  $ work( n4+1 ), sva( n4+1 ), mvl,
742  $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
743  $ work( n+1 ), lwork-n, ierr )
744 *
745  CALL sgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
746  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
747  $ ierr )
748 *
749  CALL sgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
750  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
751  $ lwork-n, ierr )
752 *
753 *
754  ELSE IF( upper ) THEN
755 *
756 *
757  CALL sgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
758  $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
759  $ ierr )
760 *
761  CALL sgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
762  $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
763  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
764  $ ierr )
765 *
766  CALL sgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
767  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
768  $ lwork-n, ierr )
769 *
770  CALL sgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
771  $ work( n2+1 ), sva( n2+1 ), mvl,
772  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
773  $ work( n+1 ), lwork-n, ierr )
774 
775  END IF
776 *
777  END IF
778 *
779 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
780 *
781  DO 1993 i = 1, nsweep
782 *
783 * .. go go go ...
784 *
785  mxaapq = zero
786  mxsinj = zero
787  iswrot = 0
788 *
789  notrot = 0
790  pskipped = 0
791 *
792 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
793 * 1 <= p < q <= N. This is the first step toward a blocked implementation
794 * of the rotations. New implementation, based on block transformations,
795 * is under development.
796 *
797  DO 2000 ibr = 1, nbl
798 *
799  igl = ( ibr-1 )*kbl + 1
800 *
801  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
802 *
803  igl = igl + ir1*kbl
804 *
805  DO 2001 p = igl, min( igl+kbl-1, n-1 )
806 *
807 * .. de Rijk's pivoting
808 *
809  q = isamax( n-p+1, sva( p ), 1 ) + p - 1
810  IF( p.NE.q ) THEN
811  CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
812  IF( rsvec )CALL sswap( mvl, v( 1, p ), 1,
813  $ v( 1, q ), 1 )
814  temp1 = sva( p )
815  sva( p ) = sva( q )
816  sva( q ) = temp1
817  temp1 = work( p )
818  work( p ) = work( q )
819  work( q ) = temp1
820  END IF
821 *
822  IF( ir1.EQ.0 ) THEN
823 *
824 * Column norms are periodically updated by explicit
825 * norm computation.
826 * Caveat:
827 * Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1)
828 * as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
829 * overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to
830 * underflow for ||A(:,p)||_2 < SQRT(underflow_threshold).
831 * Hence, SNRM2 cannot be trusted, not even in the case when
832 * the true norm is far from the under(over)flow boundaries.
833 * If properly implemented SNRM2 is available, the IF-THEN-ELSE
834 * below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)".
835 *
836  IF( ( sva( p ).LT.rootbig ) .AND.
837  $ ( sva( p ).GT.rootsfmin ) ) THEN
838  sva( p ) = snrm2( m, a( 1, p ), 1 )*work( p )
839  ELSE
840  temp1 = zero
841  aapp = one
842  CALL slassq( m, a( 1, p ), 1, temp1, aapp )
843  sva( p ) = temp1*sqrt( aapp )*work( p )
844  END IF
845  aapp = sva( p )
846  ELSE
847  aapp = sva( p )
848  END IF
849 *
850  IF( aapp.GT.zero ) THEN
851 *
852  pskipped = 0
853 *
854  DO 2002 q = p + 1, min( igl+kbl-1, n )
855 *
856  aaqq = sva( q )
857 *
858  IF( aaqq.GT.zero ) THEN
859 *
860  aapp0 = aapp
861  IF( aaqq.GE.one ) THEN
862  rotok = ( small*aapp ).LE.aaqq
863  IF( aapp.LT.( big / aaqq ) ) THEN
864  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
865  $ q ), 1 )*work( p )*work( q ) /
866  $ aaqq ) / aapp
867  ELSE
868  CALL scopy( m, a( 1, p ), 1,
869  $ work( n+1 ), 1 )
870  CALL slascl( 'G', 0, 0, aapp,
871  $ work( p ), m, 1,
872  $ work( n+1 ), lda, ierr )
873  aapq = sdot( m, work( n+1 ), 1,
874  $ a( 1, q ), 1 )*work( q ) / aaqq
875  END IF
876  ELSE
877  rotok = aapp.LE.( aaqq / small )
878  IF( aapp.GT.( small / aaqq ) ) THEN
879  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
880  $ q ), 1 )*work( p )*work( q ) /
881  $ aaqq ) / aapp
882  ELSE
883  CALL scopy( m, a( 1, q ), 1,
884  $ work( n+1 ), 1 )
885  CALL slascl( 'G', 0, 0, aaqq,
886  $ work( q ), m, 1,
887  $ work( n+1 ), lda, ierr )
888  aapq = sdot( m, work( n+1 ), 1,
889  $ a( 1, p ), 1 )*work( p ) / aapp
890  END IF
891  END IF
892 *
893  mxaapq = max( mxaapq, abs( aapq ) )
894 *
895 * TO rotate or NOT to rotate, THAT is the question ...
896 *
897  IF( abs( aapq ).GT.tol ) THEN
898 *
899 * .. rotate
900 *[RTD] ROTATED = ROTATED + ONE
901 *
902  IF( ir1.EQ.0 ) THEN
903  notrot = 0
904  pskipped = 0
905  iswrot = iswrot + 1
906  END IF
907 *
908  IF( rotok ) THEN
909 *
910  aqoap = aaqq / aapp
911  apoaq = aapp / aaqq
912  theta = -half*abs( aqoap-apoaq ) / aapq
913 *
914  IF( abs( theta ).GT.bigtheta ) THEN
915 *
916  t = half / theta
917  fastr( 3 ) = t*work( p ) / work( q )
918  fastr( 4 ) = -t*work( q ) /
919  $ work( p )
920  CALL srotm( m, a( 1, p ), 1,
921  $ a( 1, q ), 1, fastr )
922  IF( rsvec )CALL srotm( mvl,
923  $ v( 1, p ), 1,
924  $ v( 1, q ), 1,
925  $ fastr )
926  sva( q ) = aaqq*sqrt( max( zero,
927  $ one+t*apoaq*aapq ) )
928  aapp = aapp*sqrt( max( zero,
929  $ one-t*aqoap*aapq ) )
930  mxsinj = max( mxsinj, abs( t ) )
931 *
932  ELSE
933 *
934 * .. choose correct signum for THETA and rotate
935 *
936  thsign = -sign( one, aapq )
937  t = one / ( theta+thsign*
938  $ sqrt( one+theta*theta ) )
939  cs = sqrt( one / ( one+t*t ) )
940  sn = t*cs
941 *
942  mxsinj = max( mxsinj, abs( sn ) )
943  sva( q ) = aaqq*sqrt( max( zero,
944  $ one+t*apoaq*aapq ) )
945  aapp = aapp*sqrt( max( zero,
946  $ one-t*aqoap*aapq ) )
947 *
948  apoaq = work( p ) / work( q )
949  aqoap = work( q ) / work( p )
950  IF( work( p ).GE.one ) THEN
951  IF( work( q ).GE.one ) THEN
952  fastr( 3 ) = t*apoaq
953  fastr( 4 ) = -t*aqoap
954  work( p ) = work( p )*cs
955  work( q ) = work( q )*cs
956  CALL srotm( m, a( 1, p ), 1,
957  $ a( 1, q ), 1,
958  $ fastr )
959  IF( rsvec )CALL srotm( mvl,
960  $ v( 1, p ), 1, v( 1, q ),
961  $ 1, fastr )
962  ELSE
963  CALL saxpy( m, -t*aqoap,
964  $ a( 1, q ), 1,
965  $ a( 1, p ), 1 )
966  CALL saxpy( m, cs*sn*apoaq,
967  $ a( 1, p ), 1,
968  $ a( 1, q ), 1 )
969  work( p ) = work( p )*cs
970  work( q ) = work( q ) / cs
971  IF( rsvec ) THEN
972  CALL saxpy( mvl, -t*aqoap,
973  $ v( 1, q ), 1,
974  $ v( 1, p ), 1 )
975  CALL saxpy( mvl,
976  $ cs*sn*apoaq,
977  $ v( 1, p ), 1,
978  $ v( 1, q ), 1 )
979  END IF
980  END IF
981  ELSE
982  IF( work( q ).GE.one ) THEN
983  CALL saxpy( m, t*apoaq,
984  $ a( 1, p ), 1,
985  $ a( 1, q ), 1 )
986  CALL saxpy( m, -cs*sn*aqoap,
987  $ a( 1, q ), 1,
988  $ a( 1, p ), 1 )
989  work( p ) = work( p ) / cs
990  work( q ) = work( q )*cs
991  IF( rsvec ) THEN
992  CALL saxpy( mvl, t*apoaq,
993  $ v( 1, p ), 1,
994  $ v( 1, q ), 1 )
995  CALL saxpy( mvl,
996  $ -cs*sn*aqoap,
997  $ v( 1, q ), 1,
998  $ v( 1, p ), 1 )
999  END IF
1000  ELSE
1001  IF( work( p ).GE.work( q ) )
1002  $ THEN
1003  CALL saxpy( m, -t*aqoap,
1004  $ a( 1, q ), 1,
1005  $ a( 1, p ), 1 )
1006  CALL saxpy( m, cs*sn*apoaq,
1007  $ a( 1, p ), 1,
1008  $ a( 1, q ), 1 )
1009  work( p ) = work( p )*cs
1010  work( q ) = work( q ) / cs
1011  IF( rsvec ) THEN
1012  CALL saxpy( mvl,
1013  $ -t*aqoap,
1014  $ v( 1, q ), 1,
1015  $ v( 1, p ), 1 )
1016  CALL saxpy( mvl,
1017  $ cs*sn*apoaq,
1018  $ v( 1, p ), 1,
1019  $ v( 1, q ), 1 )
1020  END IF
1021  ELSE
1022  CALL saxpy( m, t*apoaq,
1023  $ a( 1, p ), 1,
1024  $ a( 1, q ), 1 )
1025  CALL saxpy( m,
1026  $ -cs*sn*aqoap,
1027  $ a( 1, q ), 1,
1028  $ a( 1, p ), 1 )
1029  work( p ) = work( p ) / cs
1030  work( q ) = work( q )*cs
1031  IF( rsvec ) THEN
1032  CALL saxpy( mvl,
1033  $ t*apoaq, v( 1, p ),
1034  $ 1, v( 1, q ), 1 )
1035  CALL saxpy( mvl,
1036  $ -cs*sn*aqoap,
1037  $ v( 1, q ), 1,
1038  $ v( 1, p ), 1 )
1039  END IF
1040  END IF
1041  END IF
1042  END IF
1043  END IF
1044 *
1045  ELSE
1046 * .. have to use modified Gram-Schmidt like transformation
1047  CALL scopy( m, a( 1, p ), 1,
1048  $ work( n+1 ), 1 )
1049  CALL slascl( 'G', 0, 0, aapp, one, m,
1050  $ 1, work( n+1 ), lda,
1051  $ ierr )
1052  CALL slascl( 'G', 0, 0, aaqq, one, m,
1053  $ 1, a( 1, q ), lda, ierr )
1054  temp1 = -aapq*work( p ) / work( q )
1055  CALL saxpy( m, temp1, work( n+1 ), 1,
1056  $ a( 1, q ), 1 )
1057  CALL slascl( 'G', 0, 0, one, aaqq, m,
1058  $ 1, a( 1, q ), lda, ierr )
1059  sva( q ) = aaqq*sqrt( max( zero,
1060  $ one-aapq*aapq ) )
1061  mxsinj = max( mxsinj, sfmin )
1062  END IF
1063 * END IF ROTOK THEN ... ELSE
1064 *
1065 * In the case of cancellation in updating SVA(q), SVA(p)
1066 * recompute SVA(q), SVA(p).
1067 *
1068  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1069  $ THEN
1070  IF( ( aaqq.LT.rootbig ) .AND.
1071  $ ( aaqq.GT.rootsfmin ) ) THEN
1072  sva( q ) = snrm2( m, a( 1, q ), 1 )*
1073  $ work( q )
1074  ELSE
1075  t = zero
1076  aaqq = one
1077  CALL slassq( m, a( 1, q ), 1, t,
1078  $ aaqq )
1079  sva( q ) = t*sqrt( aaqq )*work( q )
1080  END IF
1081  END IF
1082  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1083  IF( ( aapp.LT.rootbig ) .AND.
1084  $ ( aapp.GT.rootsfmin ) ) THEN
1085  aapp = snrm2( m, a( 1, p ), 1 )*
1086  $ work( p )
1087  ELSE
1088  t = zero
1089  aapp = one
1090  CALL slassq( m, a( 1, p ), 1, t,
1091  $ aapp )
1092  aapp = t*sqrt( aapp )*work( p )
1093  END IF
1094  sva( p ) = aapp
1095  END IF
1096 *
1097  ELSE
1098 * A(:,p) and A(:,q) already numerically orthogonal
1099  IF( ir1.EQ.0 )notrot = notrot + 1
1100 *[RTD] SKIPPED = SKIPPED + 1
1101  pskipped = pskipped + 1
1102  END IF
1103  ELSE
1104 * A(:,q) is zero column
1105  IF( ir1.EQ.0 )notrot = notrot + 1
1106  pskipped = pskipped + 1
1107  END IF
1108 *
1109  IF( ( i.LE.swband ) .AND.
1110  $ ( pskipped.GT.rowskip ) ) THEN
1111  IF( ir1.EQ.0 )aapp = -aapp
1112  notrot = 0
1113  GO TO 2103
1114  END IF
1115 *
1116  2002 CONTINUE
1117 * END q-LOOP
1118 *
1119  2103 CONTINUE
1120 * bailed out of q-loop
1121 *
1122  sva( p ) = aapp
1123 *
1124  ELSE
1125  sva( p ) = aapp
1126  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1127  $ notrot = notrot + min( igl+kbl-1, n ) - p
1128  END IF
1129 *
1130  2001 CONTINUE
1131 * end of the p-loop
1132 * end of doing the block ( ibr, ibr )
1133  1002 CONTINUE
1134 * end of ir1-loop
1135 *
1136 * ... go to the off diagonal blocks
1137 *
1138  igl = ( ibr-1 )*kbl + 1
1139 *
1140  DO 2010 jbc = ibr + 1, nbl
1141 *
1142  jgl = ( jbc-1 )*kbl + 1
1143 *
1144 * doing the block at ( ibr, jbc )
1145 *
1146  ijblsk = 0
1147  DO 2100 p = igl, min( igl+kbl-1, n )
1148 *
1149  aapp = sva( p )
1150  IF( aapp.GT.zero ) THEN
1151 *
1152  pskipped = 0
1153 *
1154  DO 2200 q = jgl, min( jgl+kbl-1, n )
1155 *
1156  aaqq = sva( q )
1157  IF( aaqq.GT.zero ) THEN
1158  aapp0 = aapp
1159 *
1160 * .. M x 2 Jacobi SVD ..
1161 *
1162 * Safe Gram matrix computation
1163 *
1164  IF( aaqq.GE.one ) THEN
1165  IF( aapp.GE.aaqq ) THEN
1166  rotok = ( small*aapp ).LE.aaqq
1167  ELSE
1168  rotok = ( small*aaqq ).LE.aapp
1169  END IF
1170  IF( aapp.LT.( big / aaqq ) ) THEN
1171  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1172  $ q ), 1 )*work( p )*work( q ) /
1173  $ aaqq ) / aapp
1174  ELSE
1175  CALL scopy( m, a( 1, p ), 1,
1176  $ work( n+1 ), 1 )
1177  CALL slascl( 'G', 0, 0, aapp,
1178  $ work( p ), m, 1,
1179  $ work( n+1 ), lda, ierr )
1180  aapq = sdot( m, work( n+1 ), 1,
1181  $ a( 1, q ), 1 )*work( q ) / aaqq
1182  END IF
1183  ELSE
1184  IF( aapp.GE.aaqq ) THEN
1185  rotok = aapp.LE.( aaqq / small )
1186  ELSE
1187  rotok = aaqq.LE.( aapp / small )
1188  END IF
1189  IF( aapp.GT.( small / aaqq ) ) THEN
1190  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1191  $ q ), 1 )*work( p )*work( q ) /
1192  $ aaqq ) / aapp
1193  ELSE
1194  CALL scopy( m, a( 1, q ), 1,
1195  $ work( n+1 ), 1 )
1196  CALL slascl( 'G', 0, 0, aaqq,
1197  $ work( q ), m, 1,
1198  $ work( n+1 ), lda, ierr )
1199  aapq = sdot( m, work( n+1 ), 1,
1200  $ a( 1, p ), 1 )*work( p ) / aapp
1201  END IF
1202  END IF
1203 *
1204  mxaapq = max( mxaapq, abs( aapq ) )
1205 *
1206 * TO rotate or NOT to rotate, THAT is the question ...
1207 *
1208  IF( abs( aapq ).GT.tol ) THEN
1209  notrot = 0
1210 *[RTD] ROTATED = ROTATED + 1
1211  pskipped = 0
1212  iswrot = iswrot + 1
1213 *
1214  IF( rotok ) THEN
1215 *
1216  aqoap = aaqq / aapp
1217  apoaq = aapp / aaqq
1218  theta = -half*abs( aqoap-apoaq ) / aapq
1219  IF( aaqq.GT.aapp0 )theta = -theta
1220 *
1221  IF( abs( theta ).GT.bigtheta ) THEN
1222  t = half / theta
1223  fastr( 3 ) = t*work( p ) / work( q )
1224  fastr( 4 ) = -t*work( q ) /
1225  $ work( p )
1226  CALL srotm( m, a( 1, p ), 1,
1227  $ a( 1, q ), 1, fastr )
1228  IF( rsvec )CALL srotm( mvl,
1229  $ v( 1, p ), 1,
1230  $ v( 1, q ), 1,
1231  $ fastr )
1232  sva( q ) = aaqq*sqrt( max( zero,
1233  $ one+t*apoaq*aapq ) )
1234  aapp = aapp*sqrt( max( zero,
1235  $ one-t*aqoap*aapq ) )
1236  mxsinj = max( mxsinj, abs( t ) )
1237  ELSE
1238 *
1239 * .. choose correct signum for THETA and rotate
1240 *
1241  thsign = -sign( one, aapq )
1242  IF( aaqq.GT.aapp0 )thsign = -thsign
1243  t = one / ( theta+thsign*
1244  $ sqrt( one+theta*theta ) )
1245  cs = sqrt( one / ( one+t*t ) )
1246  sn = t*cs
1247  mxsinj = max( mxsinj, abs( sn ) )
1248  sva( q ) = aaqq*sqrt( max( zero,
1249  $ one+t*apoaq*aapq ) )
1250  aapp = aapp*sqrt( max( zero,
1251  $ one-t*aqoap*aapq ) )
1252 *
1253  apoaq = work( p ) / work( q )
1254  aqoap = work( q ) / work( p )
1255  IF( work( p ).GE.one ) THEN
1256 *
1257  IF( work( q ).GE.one ) THEN
1258  fastr( 3 ) = t*apoaq
1259  fastr( 4 ) = -t*aqoap
1260  work( p ) = work( p )*cs
1261  work( q ) = work( q )*cs
1262  CALL srotm( m, a( 1, p ), 1,
1263  $ a( 1, q ), 1,
1264  $ fastr )
1265  IF( rsvec )CALL srotm( mvl,
1266  $ v( 1, p ), 1, v( 1, q ),
1267  $ 1, fastr )
1268  ELSE
1269  CALL saxpy( m, -t*aqoap,
1270  $ a( 1, q ), 1,
1271  $ a( 1, p ), 1 )
1272  CALL saxpy( m, cs*sn*apoaq,
1273  $ a( 1, p ), 1,
1274  $ a( 1, q ), 1 )
1275  IF( rsvec ) THEN
1276  CALL saxpy( mvl, -t*aqoap,
1277  $ v( 1, q ), 1,
1278  $ v( 1, p ), 1 )
1279  CALL saxpy( mvl,
1280  $ cs*sn*apoaq,
1281  $ v( 1, p ), 1,
1282  $ v( 1, q ), 1 )
1283  END IF
1284  work( p ) = work( p )*cs
1285  work( q ) = work( q ) / cs
1286  END IF
1287  ELSE
1288  IF( work( q ).GE.one ) THEN
1289  CALL saxpy( m, t*apoaq,
1290  $ a( 1, p ), 1,
1291  $ a( 1, q ), 1 )
1292  CALL saxpy( m, -cs*sn*aqoap,
1293  $ a( 1, q ), 1,
1294  $ a( 1, p ), 1 )
1295  IF( rsvec ) THEN
1296  CALL saxpy( mvl, t*apoaq,
1297  $ v( 1, p ), 1,
1298  $ v( 1, q ), 1 )
1299  CALL saxpy( mvl,
1300  $ -cs*sn*aqoap,
1301  $ v( 1, q ), 1,
1302  $ v( 1, p ), 1 )
1303  END IF
1304  work( p ) = work( p ) / cs
1305  work( q ) = work( q )*cs
1306  ELSE
1307  IF( work( p ).GE.work( q ) )
1308  $ THEN
1309  CALL saxpy( m, -t*aqoap,
1310  $ a( 1, q ), 1,
1311  $ a( 1, p ), 1 )
1312  CALL saxpy( m, cs*sn*apoaq,
1313  $ a( 1, p ), 1,
1314  $ a( 1, q ), 1 )
1315  work( p ) = work( p )*cs
1316  work( q ) = work( q ) / cs
1317  IF( rsvec ) THEN
1318  CALL saxpy( mvl,
1319  $ -t*aqoap,
1320  $ v( 1, q ), 1,
1321  $ v( 1, p ), 1 )
1322  CALL saxpy( mvl,
1323  $ cs*sn*apoaq,
1324  $ v( 1, p ), 1,
1325  $ v( 1, q ), 1 )
1326  END IF
1327  ELSE
1328  CALL saxpy( m, t*apoaq,
1329  $ a( 1, p ), 1,
1330  $ a( 1, q ), 1 )
1331  CALL saxpy( m,
1332  $ -cs*sn*aqoap,
1333  $ a( 1, q ), 1,
1334  $ a( 1, p ), 1 )
1335  work( p ) = work( p ) / cs
1336  work( q ) = work( q )*cs
1337  IF( rsvec ) THEN
1338  CALL saxpy( mvl,
1339  $ t*apoaq, v( 1, p ),
1340  $ 1, v( 1, q ), 1 )
1341  CALL saxpy( mvl,
1342  $ -cs*sn*aqoap,
1343  $ v( 1, q ), 1,
1344  $ v( 1, p ), 1 )
1345  END IF
1346  END IF
1347  END IF
1348  END IF
1349  END IF
1350 *
1351  ELSE
1352  IF( aapp.GT.aaqq ) THEN
1353  CALL scopy( m, a( 1, p ), 1,
1354  $ work( n+1 ), 1 )
1355  CALL slascl( 'G', 0, 0, aapp, one,
1356  $ m, 1, work( n+1 ), lda,
1357  $ ierr )
1358  CALL slascl( 'G', 0, 0, aaqq, one,
1359  $ m, 1, a( 1, q ), lda,
1360  $ ierr )
1361  temp1 = -aapq*work( p ) / work( q )
1362  CALL saxpy( m, temp1, work( n+1 ),
1363  $ 1, a( 1, q ), 1 )
1364  CALL slascl( 'G', 0, 0, one, aaqq,
1365  $ m, 1, a( 1, q ), lda,
1366  $ ierr )
1367  sva( q ) = aaqq*sqrt( max( zero,
1368  $ one-aapq*aapq ) )
1369  mxsinj = max( mxsinj, sfmin )
1370  ELSE
1371  CALL scopy( m, a( 1, q ), 1,
1372  $ work( n+1 ), 1 )
1373  CALL slascl( 'G', 0, 0, aaqq, one,
1374  $ m, 1, work( n+1 ), lda,
1375  $ ierr )
1376  CALL slascl( 'G', 0, 0, aapp, one,
1377  $ m, 1, a( 1, p ), lda,
1378  $ ierr )
1379  temp1 = -aapq*work( q ) / work( p )
1380  CALL saxpy( m, temp1, work( n+1 ),
1381  $ 1, a( 1, p ), 1 )
1382  CALL slascl( 'G', 0, 0, one, aapp,
1383  $ m, 1, a( 1, p ), lda,
1384  $ ierr )
1385  sva( p ) = aapp*sqrt( max( zero,
1386  $ one-aapq*aapq ) )
1387  mxsinj = max( mxsinj, sfmin )
1388  END IF
1389  END IF
1390 * END IF ROTOK THEN ... ELSE
1391 *
1392 * In the case of cancellation in updating SVA(q)
1393 * .. recompute SVA(q)
1394  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1395  $ THEN
1396  IF( ( aaqq.LT.rootbig ) .AND.
1397  $ ( aaqq.GT.rootsfmin ) ) THEN
1398  sva( q ) = snrm2( m, a( 1, q ), 1 )*
1399  $ work( q )
1400  ELSE
1401  t = zero
1402  aaqq = one
1403  CALL slassq( m, a( 1, q ), 1, t,
1404  $ aaqq )
1405  sva( q ) = t*sqrt( aaqq )*work( q )
1406  END IF
1407  END IF
1408  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1409  IF( ( aapp.LT.rootbig ) .AND.
1410  $ ( aapp.GT.rootsfmin ) ) THEN
1411  aapp = snrm2( m, a( 1, p ), 1 )*
1412  $ work( p )
1413  ELSE
1414  t = zero
1415  aapp = one
1416  CALL slassq( m, a( 1, p ), 1, t,
1417  $ aapp )
1418  aapp = t*sqrt( aapp )*work( p )
1419  END IF
1420  sva( p ) = aapp
1421  END IF
1422 * end of OK rotation
1423  ELSE
1424  notrot = notrot + 1
1425 *[RTD] SKIPPED = SKIPPED + 1
1426  pskipped = pskipped + 1
1427  ijblsk = ijblsk + 1
1428  END IF
1429  ELSE
1430  notrot = notrot + 1
1431  pskipped = pskipped + 1
1432  ijblsk = ijblsk + 1
1433  END IF
1434 *
1435  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1436  $ THEN
1437  sva( p ) = aapp
1438  notrot = 0
1439  GO TO 2011
1440  END IF
1441  IF( ( i.LE.swband ) .AND.
1442  $ ( pskipped.GT.rowskip ) ) THEN
1443  aapp = -aapp
1444  notrot = 0
1445  GO TO 2203
1446  END IF
1447 *
1448  2200 CONTINUE
1449 * end of the q-loop
1450  2203 CONTINUE
1451 *
1452  sva( p ) = aapp
1453 *
1454  ELSE
1455 *
1456  IF( aapp.EQ.zero )notrot = notrot +
1457  $ min( jgl+kbl-1, n ) - jgl + 1
1458  IF( aapp.LT.zero )notrot = 0
1459 *
1460  END IF
1461 *
1462  2100 CONTINUE
1463 * end of the p-loop
1464  2010 CONTINUE
1465 * end of the jbc-loop
1466  2011 CONTINUE
1467 *2011 bailed out of the jbc-loop
1468  DO 2012 p = igl, min( igl+kbl-1, n )
1469  sva( p ) = abs( sva( p ) )
1470  2012 CONTINUE
1471 ***
1472  2000 CONTINUE
1473 *2000 :: end of the ibr-loop
1474 *
1475 * .. update SVA(N)
1476  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1477  $ THEN
1478  sva( n ) = snrm2( m, a( 1, n ), 1 )*work( n )
1479  ELSE
1480  t = zero
1481  aapp = one
1482  CALL slassq( m, a( 1, n ), 1, t, aapp )
1483  sva( n ) = t*sqrt( aapp )*work( n )
1484  END IF
1485 *
1486 * Additional steering devices
1487 *
1488  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1489  $ ( iswrot.LE.n ) ) )swband = i
1490 *
1491  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.sqrt( float( n ) )*
1492  $ tol ) .AND. ( float( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1493  GO TO 1994
1494  END IF
1495 *
1496  IF( notrot.GE.emptsw )GO TO 1994
1497 *
1498  1993 CONTINUE
1499 * end i=1:NSWEEP loop
1500 *
1501 * #:( Reaching this point means that the procedure has not converged.
1502  info = nsweep - 1
1503  GO TO 1995
1504 *
1505  1994 CONTINUE
1506 * #:) Reaching this point means numerical convergence after the i-th
1507 * sweep.
1508 *
1509  info = 0
1510 * #:) INFO = 0 confirms successful iterations.
1511  1995 CONTINUE
1512 *
1513 * Sort the singular values and find how many are above
1514 * the underflow threshold.
1515 *
1516  n2 = 0
1517  n4 = 0
1518  DO 5991 p = 1, n - 1
1519  q = isamax( n-p+1, sva( p ), 1 ) + p - 1
1520  IF( p.NE.q ) THEN
1521  temp1 = sva( p )
1522  sva( p ) = sva( q )
1523  sva( q ) = temp1
1524  temp1 = work( p )
1525  work( p ) = work( q )
1526  work( q ) = temp1
1527  CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1528  IF( rsvec )CALL sswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1529  END IF
1530  IF( sva( p ).NE.zero ) THEN
1531  n4 = n4 + 1
1532  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1533  END IF
1534  5991 CONTINUE
1535  IF( sva( n ).NE.zero ) THEN
1536  n4 = n4 + 1
1537  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1538  END IF
1539 *
1540 * Normalize the left singular vectors.
1541 *
1542  IF( lsvec .OR. uctol ) THEN
1543  DO 1998 p = 1, n2
1544  CALL sscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1545  1998 CONTINUE
1546  END IF
1547 *
1548 * Scale the product of Jacobi rotations (assemble the fast rotations).
1549 *
1550  IF( rsvec ) THEN
1551  IF( applv ) THEN
1552  DO 2398 p = 1, n
1553  CALL sscal( mvl, work( p ), v( 1, p ), 1 )
1554  2398 CONTINUE
1555  ELSE
1556  DO 2399 p = 1, n
1557  temp1 = one / snrm2( mvl, v( 1, p ), 1 )
1558  CALL sscal( mvl, temp1, v( 1, p ), 1 )
1559  2399 CONTINUE
1560  END IF
1561  END IF
1562 *
1563 * Undo scaling, if necessary (and possible).
1564  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl ) ) )
1565  $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1566  $ ( sfmin / skl ) ) ) ) THEN
1567  DO 2400 p = 1, n
1568  sva( p ) = skl*sva( p )
1569  2400 CONTINUE
1570  skl = one
1571  END IF
1572 *
1573  work( 1 ) = skl
1574 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1575 * then some of the singular values may overflow or underflow and
1576 * the spectrum is given in this factored representation.
1577 *
1578  work( 2 ) = float( n4 )
1579 * N4 is the number of computed nonzero singular values of A.
1580 *
1581  work( 3 ) = float( n2 )
1582 * N2 is the number of singular values of A greater than SFMIN.
1583 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1584 * that may carry some information.
1585 *
1586  work( 4 ) = float( i )
1587 * i is the index of the last sweep before declaring convergence.
1588 *
1589  work( 5 ) = mxaapq
1590 * MXAAPQ is the largest absolute value of scaled pivots in the
1591 * last sweep
1592 *
1593  work( 6 ) = mxsinj
1594 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1595 * in the last sweep
1596 *
1597  RETURN
1598 * ..
1599 * .. END OF SGESVJ
1600 * ..
subroutine slassq(N, X, INCX, SCALE, SUMSQ)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f:105
subroutine srotm(N, SX, INCX, SY, INCY, SPARAM)
SROTM
Definition: srotm.f:101
real function sdot(N, SX, INCX, SY, INCY)
SDOT
Definition: sdot.f:53
integer function isamax(N, SX, INCX)
ISAMAX
Definition: isamax.f:53
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:145
subroutine sgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
SGSVJ1 pre-processor for the routine sgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: sgsvj1.f:238
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine slaset(UPLO, M, N, ALPHA, BETA, A, LDA)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: slaset.f:112
real function snrm2(N, X, INCX)
SNRM2
Definition: snrm2.f:56
subroutine saxpy(N, SA, SX, INCX, SY, INCY)
SAXPY
Definition: saxpy.f:54
subroutine sgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
SGSVJ0 pre-processor for the routine sgesvj.
Definition: sgsvj0.f:220
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine sswap(N, SX, INCX, SY, INCY)
SSWAP
Definition: sswap.f:53
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine scopy(N, SX, INCX, SY, INCY)
SCOPY
Definition: scopy.f:53

Here is the call graph for this function:

Here is the caller graph for this function: