LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ sgesvj()

subroutine sgesvj ( character*1  joba,
character*1  jobu,
character*1  jobv,
integer  m,
integer  n,
real, dimension( lda, * )  a,
integer  lda,
real, dimension( n )  sva,
integer  mv,
real, dimension( ldv, * )  v,
integer  ldv,
real, dimension( lwork )  work,
integer  lwork,
integer  info 
)

SGESVJ

Download SGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 SGESVJ computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, where M >= N. The SVD of A is written as
                                    [++]   [xx]   [x0]   [xx]
              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
                                    [++]   [xx]
 where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
 matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
 of SIGMA are the singular values of A. The columns of U and V are the
 left and the right singular vectors of A, respectively.
 SGESVJ can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
          Specifies the structure of A.
          = 'L': The input matrix A is lower triangular;
          = 'U': The input matrix A is upper triangular;
          = 'G': The input matrix A is general M-by-N matrix, M >= N.
[in]JOBU
          JOBU is CHARACTER*1
          Specifies whether to compute the left singular vectors
          (columns of U):
          = 'U': The left singular vectors corresponding to the nonzero
                 singular values are computed and returned in the leading
                 columns of A. See more details in the description of A.
                 The default numerical orthogonality threshold is set to
                 approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E').
          = 'C': Analogous to JOBU='U', except that user can control the
                 level of numerical orthogonality of the computed left
                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
                 CTOL is given on input in the array WORK.
                 No CTOL smaller than ONE is allowed. CTOL greater
                 than 1 / EPS is meaningless. The option 'C'
                 can be used if M*EPS is satisfactory orthogonality
                 of the computed left singular vectors, so CTOL=M could
                 save few sweeps of Jacobi rotations.
                 See the descriptions of A and WORK(1).
          = 'N': The matrix U is not computed. However, see the
                 description of A.
[in]JOBV
          JOBV is CHARACTER*1
          Specifies whether to compute the right singular vectors, that
          is, the matrix V:
          = 'V':  the matrix V is computed and returned in the array V
          = 'A':  the Jacobi rotations are applied to the MV-by-N
                  array V. In other words, the right singular vector
                  matrix V is not computed explicitly; instead it is
                  applied to an MV-by-N matrix initially stored in the
                  first MV rows of V.
          = 'N':  the matrix V is not computed and the array V is not
                  referenced
[in]M
          M is INTEGER
          The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.
          M >= N >= 0.
[in,out]A
          A is REAL array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          If JOBU = 'U' .OR. JOBU = 'C':
                 If INFO = 0:
                 RANKA orthonormal columns of U are returned in the
                 leading RANKA columns of the array A. Here RANKA <= N
                 is the number of computed singular values of A that are
                 above the underflow threshold SLAMCH('S'). The singular
                 vectors corresponding to underflowed or zero singular
                 values are not computed. The value of RANKA is returned
                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
                 descriptions of SVA and WORK. The computed columns of U
                 are mutually numerically orthogonal up to approximately
                 TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
                 see the description of JOBU.
                 If INFO > 0,
                 the procedure SGESVJ did not converge in the given number
                 of iterations (sweeps). In that case, the computed
                 columns of U may not be orthogonal up to TOL. The output
                 U (stored in A), SIGMA (given by the computed singular
                 values in SVA(1:N)) and V is still a decomposition of the
                 input matrix A in the sense that the residual
                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
          If JOBU = 'N':
                 If INFO = 0:
                 Note that the left singular vectors are 'for free' in the
                 one-sided Jacobi SVD algorithm. However, if only the
                 singular values are needed, the level of numerical
                 orthogonality of U is not an issue and iterations are
                 stopped when the columns of the iterated matrix are
                 numerically orthogonal up to approximately M*EPS. Thus,
                 on exit, A contains the columns of U scaled with the
                 corresponding singular values.
                 If INFO > 0:
                 the procedure SGESVJ did not converge in the given number
                 of iterations (sweeps).
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is REAL array, dimension (N)
          On exit,
          If INFO = 0 :
          depending on the value SCALE = WORK(1), we have:
                 If SCALE = ONE:
                 SVA(1:N) contains the computed singular values of A.
                 During the computation SVA contains the Euclidean column
                 norms of the iterated matrices in the array A.
                 If SCALE .NE. ONE:
                 The singular values of A are SCALE*SVA(1:N), and this
                 factored representation is due to the fact that some of the
                 singular values of A might underflow or overflow.

          If INFO > 0 :
          the procedure SGESVJ did not converge in the given number of
          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
[in]MV
          MV is INTEGER
          If JOBV = 'A', then the product of Jacobi rotations in SGESVJ
          is applied to the first MV rows of V. See the description of JOBV.
[in,out]V
          V is REAL array, dimension (LDV,N)
          If JOBV = 'V', then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'A', then V contains the product of the computed right
                         singular vector matrix and the initial matrix in
                         the array V.
          If JOBV = 'N', then V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V, LDV >= 1.
          If JOBV = 'V', then LDV >= max(1,N).
          If JOBV = 'A', then LDV >= max(1,MV) .
[in,out]WORK
          WORK is REAL array, dimension (LWORK)
          On entry,
          If JOBU = 'C' :
          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
                    The process stops if all columns of A are mutually
                    orthogonal up to CTOL*EPS, EPS=SLAMCH('E').
                    It is required that CTOL >= ONE, i.e. it is not
                    allowed to force the routine to obtain orthogonality
                    below EPSILON.
          On exit,
          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
                    are the computed singular vcalues of A.
                    (See description of SVA().)
          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
                    singular values.
          WORK(3) = NINT(WORK(3)) is the number of the computed singular
                    values that are larger than the underflow threshold.
          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
                    rotations needed for numerical convergence.
          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
                    This is useful information in cases when SGESVJ did
                    not converge, as it can be used to estimate whether
                    the output is still useful and for post festum analysis.
          WORK(6) = the largest absolute value over all sines of the
                    Jacobi rotation angles in the last sweep. It can be
                    useful for a post festum analysis.
[in]LWORK
          LWORK is INTEGER
         length of WORK, WORK >= MAX(6,M+N)
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, then the i-th argument had an illegal value
          > 0:  SGESVJ did not converge in the maximal allowed number (30)
                of sweeps. The output may still be useful. See the
                description of WORK.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane rotations. The rotations are implemented as fast scaled rotations of Anda and Park [1]. In the case of underflow of the Jacobi angle, a modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses column interchanges of de Rijk [2]. The relative accuracy of the computed singular values and the accuracy of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. The condition number that determines the accuracy in the full rank case is essentially min_{D=diag} kappa(A*D), where kappa(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if used in an accelerated version of Drmac and Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. Some tuning parameters (marked with [TP]) are available for the implementer.
The computational range for the nonzero singular values is the machine number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the corresponding gradual loss of accurate digits.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
[1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.

[2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer.
SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.

[3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
[4] Z. Drmac: Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic.
SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.

[5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
LAPACK Working note 169.

[6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
LAPACK Working note 170.

[7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Bugs, Examples and Comments:
Please report all bugs and send interesting test examples and comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 321 of file sgesvj.f.

323*
324* -- LAPACK computational routine --
325* -- LAPACK is a software package provided by Univ. of Tennessee, --
326* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
327*
328* .. Scalar Arguments ..
329 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
330 CHARACTER*1 JOBA, JOBU, JOBV
331* ..
332* .. Array Arguments ..
333 REAL A( LDA, * ), SVA( N ), V( LDV, * ),
334 $ WORK( LWORK )
335* ..
336*
337* =====================================================================
338*
339* .. Local Parameters ..
340 REAL ZERO, HALF, ONE
341 parameter( zero = 0.0e0, half = 0.5e0, one = 1.0e0)
342 INTEGER NSWEEP
343 parameter( nsweep = 30 )
344* ..
345* .. Local Scalars ..
346 REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
347 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
348 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
349 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
350 $ THSIGN, TOL
351 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
352 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
353 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
354 $ SWBAND
355 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
356 $ RSVEC, UCTOL, UPPER
357* ..
358* .. Local Arrays ..
359 REAL FASTR( 5 )
360* ..
361* .. Intrinsic Functions ..
362 INTRINSIC abs, max, min, float, sign, sqrt
363* ..
364* .. External Functions ..
365* ..
366* from BLAS
367 REAL SDOT, SNRM2
368 EXTERNAL sdot, snrm2
369 INTEGER ISAMAX
370 EXTERNAL isamax
371* from LAPACK
372 REAL SLAMCH
373 EXTERNAL slamch
374 LOGICAL LSAME
375 EXTERNAL lsame
376* ..
377* .. External Subroutines ..
378* ..
379* from BLAS
380 EXTERNAL saxpy, scopy, srotm, sscal, sswap
381* from LAPACK
382 EXTERNAL slascl, slaset, slassq, xerbla
383*
384 EXTERNAL sgsvj0, sgsvj1
385* ..
386* .. Executable Statements ..
387*
388* Test the input arguments
389*
390 lsvec = lsame( jobu, 'U' )
391 uctol = lsame( jobu, 'C' )
392 rsvec = lsame( jobv, 'V' )
393 applv = lsame( jobv, 'A' )
394 upper = lsame( joba, 'U' )
395 lower = lsame( joba, 'L' )
396*
397 IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
398 info = -1
399 ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
400 info = -2
401 ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
402 info = -3
403 ELSE IF( m.LT.0 ) THEN
404 info = -4
405 ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
406 info = -5
407 ELSE IF( lda.LT.m ) THEN
408 info = -7
409 ELSE IF( mv.LT.0 ) THEN
410 info = -9
411 ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
412 $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
413 info = -11
414 ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
415 info = -12
416 ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
417 info = -13
418 ELSE
419 info = 0
420 END IF
421*
422* #:(
423 IF( info.NE.0 ) THEN
424 CALL xerbla( 'SGESVJ', -info )
425 RETURN
426 END IF
427*
428* #:) Quick return for void matrix
429*
430 IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
431*
432* Set numerical parameters
433* The stopping criterion for Jacobi rotations is
434*
435* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
436*
437* where EPS is the round-off and CTOL is defined as follows:
438*
439 IF( uctol ) THEN
440* ... user controlled
441 ctol = work( 1 )
442 ELSE
443* ... default
444 IF( lsvec .OR. rsvec .OR. applv ) THEN
445 ctol = sqrt( float( m ) )
446 ELSE
447 ctol = float( m )
448 END IF
449 END IF
450* ... and the machine dependent parameters are
451*[!] (Make sure that SLAMCH() works properly on the target machine.)
452*
453 epsln = slamch( 'Epsilon' )
454 rooteps = sqrt( epsln )
455 sfmin = slamch( 'SafeMinimum' )
456 rootsfmin = sqrt( sfmin )
457 small = sfmin / epsln
458 big = slamch( 'Overflow' )
459* BIG = ONE / SFMIN
460 rootbig = one / rootsfmin
461 large = big / sqrt( float( m*n ) )
462 bigtheta = one / rooteps
463*
464 tol = ctol*epsln
465 roottol = sqrt( tol )
466*
467 IF( float( m )*epsln.GE.one ) THEN
468 info = -4
469 CALL xerbla( 'SGESVJ', -info )
470 RETURN
471 END IF
472*
473* Initialize the right singular vector matrix.
474*
475 IF( rsvec ) THEN
476 mvl = n
477 CALL slaset( 'A', mvl, n, zero, one, v, ldv )
478 ELSE IF( applv ) THEN
479 mvl = mv
480 END IF
481 rsvec = rsvec .OR. applv
482*
483* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
484*(!) If necessary, scale A to protect the largest singular value
485* from overflow. It is possible that saving the largest singular
486* value destroys the information about the small ones.
487* This initial scaling is almost minimal in the sense that the
488* goal is to make sure that no column norm overflows, and that
489* SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
490* in A are detected, the procedure returns with INFO=-6.
491*
492 skl = one / sqrt( float( m )*float( n ) )
493 noscale = .true.
494 goscale = .true.
495*
496 IF( lower ) THEN
497* the input matrix is M-by-N lower triangular (trapezoidal)
498 DO 1874 p = 1, n
499 aapp = zero
500 aaqq = one
501 CALL slassq( m-p+1, a( p, p ), 1, aapp, aaqq )
502 IF( aapp.GT.big ) THEN
503 info = -6
504 CALL xerbla( 'SGESVJ', -info )
505 RETURN
506 END IF
507 aaqq = sqrt( aaqq )
508 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
509 sva( p ) = aapp*aaqq
510 ELSE
511 noscale = .false.
512 sva( p ) = aapp*( aaqq*skl )
513 IF( goscale ) THEN
514 goscale = .false.
515 DO 1873 q = 1, p - 1
516 sva( q ) = sva( q )*skl
517 1873 CONTINUE
518 END IF
519 END IF
520 1874 CONTINUE
521 ELSE IF( upper ) THEN
522* the input matrix is M-by-N upper triangular (trapezoidal)
523 DO 2874 p = 1, n
524 aapp = zero
525 aaqq = one
526 CALL slassq( p, a( 1, p ), 1, aapp, aaqq )
527 IF( aapp.GT.big ) THEN
528 info = -6
529 CALL xerbla( 'SGESVJ', -info )
530 RETURN
531 END IF
532 aaqq = sqrt( aaqq )
533 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
534 sva( p ) = aapp*aaqq
535 ELSE
536 noscale = .false.
537 sva( p ) = aapp*( aaqq*skl )
538 IF( goscale ) THEN
539 goscale = .false.
540 DO 2873 q = 1, p - 1
541 sva( q ) = sva( q )*skl
542 2873 CONTINUE
543 END IF
544 END IF
545 2874 CONTINUE
546 ELSE
547* the input matrix is M-by-N general dense
548 DO 3874 p = 1, n
549 aapp = zero
550 aaqq = one
551 CALL slassq( m, a( 1, p ), 1, aapp, aaqq )
552 IF( aapp.GT.big ) THEN
553 info = -6
554 CALL xerbla( 'SGESVJ', -info )
555 RETURN
556 END IF
557 aaqq = sqrt( aaqq )
558 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
559 sva( p ) = aapp*aaqq
560 ELSE
561 noscale = .false.
562 sva( p ) = aapp*( aaqq*skl )
563 IF( goscale ) THEN
564 goscale = .false.
565 DO 3873 q = 1, p - 1
566 sva( q ) = sva( q )*skl
567 3873 CONTINUE
568 END IF
569 END IF
570 3874 CONTINUE
571 END IF
572*
573 IF( noscale )skl = one
574*
575* Move the smaller part of the spectrum from the underflow threshold
576*(!) Start by determining the position of the nonzero entries of the
577* array SVA() relative to ( SFMIN, BIG ).
578*
579 aapp = zero
580 aaqq = big
581 DO 4781 p = 1, n
582 IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
583 aapp = max( aapp, sva( p ) )
584 4781 CONTINUE
585*
586* #:) Quick return for zero matrix
587*
588 IF( aapp.EQ.zero ) THEN
589 IF( lsvec )CALL slaset( 'G', m, n, zero, one, a, lda )
590 work( 1 ) = one
591 work( 2 ) = zero
592 work( 3 ) = zero
593 work( 4 ) = zero
594 work( 5 ) = zero
595 work( 6 ) = zero
596 RETURN
597 END IF
598*
599* #:) Quick return for one-column matrix
600*
601 IF( n.EQ.1 ) THEN
602 IF( lsvec )CALL slascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
603 $ a( 1, 1 ), lda, ierr )
604 work( 1 ) = one / skl
605 IF( sva( 1 ).GE.sfmin ) THEN
606 work( 2 ) = one
607 ELSE
608 work( 2 ) = zero
609 END IF
610 work( 3 ) = zero
611 work( 4 ) = zero
612 work( 5 ) = zero
613 work( 6 ) = zero
614 RETURN
615 END IF
616*
617* Protect small singular values from underflow, and try to
618* avoid underflows/overflows in computing Jacobi rotations.
619*
620 sn = sqrt( sfmin / epsln )
621 temp1 = sqrt( big / float( n ) )
622 IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
623 $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
624 temp1 = min( big, temp1 / aapp )
625* AAQQ = AAQQ*TEMP1
626* AAPP = AAPP*TEMP1
627 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
628 temp1 = min( sn / aaqq, big / ( aapp*sqrt( float( n ) ) ) )
629* AAQQ = AAQQ*TEMP1
630* AAPP = AAPP*TEMP1
631 ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
632 temp1 = max( sn / aaqq, temp1 / aapp )
633* AAQQ = AAQQ*TEMP1
634* AAPP = AAPP*TEMP1
635 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
636 temp1 = min( sn / aaqq, big / ( sqrt( float( n ) )*aapp ) )
637* AAQQ = AAQQ*TEMP1
638* AAPP = AAPP*TEMP1
639 ELSE
640 temp1 = one
641 END IF
642*
643* Scale, if necessary
644*
645 IF( temp1.NE.one ) THEN
646 CALL slascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
647 END IF
648 skl = temp1*skl
649 IF( skl.NE.one ) THEN
650 CALL slascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
651 skl = one / skl
652 END IF
653*
654* Row-cyclic Jacobi SVD algorithm with column pivoting
655*
656 emptsw = ( n*( n-1 ) ) / 2
657 notrot = 0
658 fastr( 1 ) = zero
659*
660* A is represented in factored form A = A * diag(WORK), where diag(WORK)
661* is initialized to identity. WORK is updated during fast scaled
662* rotations.
663*
664 DO 1868 q = 1, n
665 work( q ) = one
666 1868 CONTINUE
667*
668*
669 swband = 3
670*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
671* if SGESVJ is used as a computational routine in the preconditioned
672* Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure
673* works on pivots inside a band-like region around the diagonal.
674* The boundaries are determined dynamically, based on the number of
675* pivots above a threshold.
676*
677 kbl = min( 8, n )
678*[TP] KBL is a tuning parameter that defines the tile size in the
679* tiling of the p-q loops of pivot pairs. In general, an optimal
680* value of KBL depends on the matrix dimensions and on the
681* parameters of the computer's memory.
682*
683 nbl = n / kbl
684 IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
685*
686 blskip = kbl**2
687*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
688*
689 rowskip = min( 5, kbl )
690*[TP] ROWSKIP is a tuning parameter.
691*
692 lkahead = 1
693*[TP] LKAHEAD is a tuning parameter.
694*
695* Quasi block transformations, using the lower (upper) triangular
696* structure of the input matrix. The quasi-block-cycling usually
697* invokes cubic convergence. Big part of this cycle is done inside
698* canonical subspaces of dimensions less than M.
699*
700 IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
701*[TP] The number of partition levels and the actual partition are
702* tuning parameters.
703 n4 = n / 4
704 n2 = n / 2
705 n34 = 3*n4
706 IF( applv ) THEN
707 q = 0
708 ELSE
709 q = 1
710 END IF
711*
712 IF( lower ) THEN
713*
714* This works very well on lower triangular matrices, in particular
715* in the framework of the preconditioned Jacobi SVD (xGEJSV).
716* The idea is simple:
717* [+ 0 0 0] Note that Jacobi transformations of [0 0]
718* [+ + 0 0] [0 0]
719* [+ + x 0] actually work on [x 0] [x 0]
720* [+ + x x] [x x]. [x x]
721*
722 CALL sgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
723 $ work( n34+1 ), sva( n34+1 ), mvl,
724 $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
725 $ 2, work( n+1 ), lwork-n, ierr )
726*
727 CALL sgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
728 $ work( n2+1 ), sva( n2+1 ), mvl,
729 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
730 $ work( n+1 ), lwork-n, ierr )
731*
732 CALL sgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
733 $ work( n2+1 ), sva( n2+1 ), mvl,
734 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
735 $ work( n+1 ), lwork-n, ierr )
736*
737 CALL sgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
738 $ work( n4+1 ), sva( n4+1 ), mvl,
739 $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
740 $ work( n+1 ), lwork-n, ierr )
741*
742 CALL sgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
743 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
744 $ ierr )
745*
746 CALL sgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
747 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
748 $ lwork-n, ierr )
749*
750*
751 ELSE IF( upper ) THEN
752*
753*
754 CALL sgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
755 $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
756 $ ierr )
757*
758 CALL sgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
759 $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
760 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
761 $ ierr )
762*
763 CALL sgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
764 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
765 $ lwork-n, ierr )
766*
767 CALL sgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
768 $ work( n2+1 ), sva( n2+1 ), mvl,
769 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
770 $ work( n+1 ), lwork-n, ierr )
771
772 END IF
773*
774 END IF
775*
776* .. Row-cyclic pivot strategy with de Rijk's pivoting ..
777*
778 DO 1993 i = 1, nsweep
779*
780* .. go go go ...
781*
782 mxaapq = zero
783 mxsinj = zero
784 iswrot = 0
785*
786 notrot = 0
787 pskipped = 0
788*
789* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
790* 1 <= p < q <= N. This is the first step toward a blocked implementation
791* of the rotations. New implementation, based on block transformations,
792* is under development.
793*
794 DO 2000 ibr = 1, nbl
795*
796 igl = ( ibr-1 )*kbl + 1
797*
798 DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
799*
800 igl = igl + ir1*kbl
801*
802 DO 2001 p = igl, min( igl+kbl-1, n-1 )
803*
804* .. de Rijk's pivoting
805*
806 q = isamax( n-p+1, sva( p ), 1 ) + p - 1
807 IF( p.NE.q ) THEN
808 CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
809 IF( rsvec )CALL sswap( mvl, v( 1, p ), 1,
810 $ v( 1, q ), 1 )
811 temp1 = sva( p )
812 sva( p ) = sva( q )
813 sva( q ) = temp1
814 temp1 = work( p )
815 work( p ) = work( q )
816 work( q ) = temp1
817 END IF
818*
819 IF( ir1.EQ.0 ) THEN
820*
821* Column norms are periodically updated by explicit
822* norm computation.
823* Caveat:
824* Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1)
825* as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
826* overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to
827* underflow for ||A(:,p)||_2 < SQRT(underflow_threshold).
828* Hence, SNRM2 cannot be trusted, not even in the case when
829* the true norm is far from the under(over)flow boundaries.
830* If properly implemented SNRM2 is available, the IF-THEN-ELSE
831* below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)".
832*
833 IF( ( sva( p ).LT.rootbig ) .AND.
834 $ ( sva( p ).GT.rootsfmin ) ) THEN
835 sva( p ) = snrm2( m, a( 1, p ), 1 )*work( p )
836 ELSE
837 temp1 = zero
838 aapp = one
839 CALL slassq( m, a( 1, p ), 1, temp1, aapp )
840 sva( p ) = temp1*sqrt( aapp )*work( p )
841 END IF
842 aapp = sva( p )
843 ELSE
844 aapp = sva( p )
845 END IF
846*
847 IF( aapp.GT.zero ) THEN
848*
849 pskipped = 0
850*
851 DO 2002 q = p + 1, min( igl+kbl-1, n )
852*
853 aaqq = sva( q )
854*
855 IF( aaqq.GT.zero ) THEN
856*
857 aapp0 = aapp
858 IF( aaqq.GE.one ) THEN
859 rotok = ( small*aapp ).LE.aaqq
860 IF( aapp.LT.( big / aaqq ) ) THEN
861 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
862 $ q ), 1 )*work( p )*work( q ) /
863 $ aaqq ) / aapp
864 ELSE
865 CALL scopy( m, a( 1, p ), 1,
866 $ work( n+1 ), 1 )
867 CALL slascl( 'G', 0, 0, aapp,
868 $ work( p ), m, 1,
869 $ work( n+1 ), lda, ierr )
870 aapq = sdot( m, work( n+1 ), 1,
871 $ a( 1, q ), 1 )*work( q ) / aaqq
872 END IF
873 ELSE
874 rotok = aapp.LE.( aaqq / small )
875 IF( aapp.GT.( small / aaqq ) ) THEN
876 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
877 $ q ), 1 )*work( p )*work( q ) /
878 $ aaqq ) / aapp
879 ELSE
880 CALL scopy( m, a( 1, q ), 1,
881 $ work( n+1 ), 1 )
882 CALL slascl( 'G', 0, 0, aaqq,
883 $ work( q ), m, 1,
884 $ work( n+1 ), lda, ierr )
885 aapq = sdot( m, work( n+1 ), 1,
886 $ a( 1, p ), 1 )*work( p ) / aapp
887 END IF
888 END IF
889*
890 mxaapq = max( mxaapq, abs( aapq ) )
891*
892* TO rotate or NOT to rotate, THAT is the question ...
893*
894 IF( abs( aapq ).GT.tol ) THEN
895*
896* .. rotate
897*[RTD] ROTATED = ROTATED + ONE
898*
899 IF( ir1.EQ.0 ) THEN
900 notrot = 0
901 pskipped = 0
902 iswrot = iswrot + 1
903 END IF
904*
905 IF( rotok ) THEN
906*
907 aqoap = aaqq / aapp
908 apoaq = aapp / aaqq
909 theta = -half*abs( aqoap-apoaq ) / aapq
910*
911 IF( abs( theta ).GT.bigtheta ) THEN
912*
913 t = half / theta
914 fastr( 3 ) = t*work( p ) / work( q )
915 fastr( 4 ) = -t*work( q ) /
916 $ work( p )
917 CALL srotm( m, a( 1, p ), 1,
918 $ a( 1, q ), 1, fastr )
919 IF( rsvec )CALL srotm( mvl,
920 $ v( 1, p ), 1,
921 $ v( 1, q ), 1,
922 $ fastr )
923 sva( q ) = aaqq*sqrt( max( zero,
924 $ one+t*apoaq*aapq ) )
925 aapp = aapp*sqrt( max( zero,
926 $ one-t*aqoap*aapq ) )
927 mxsinj = max( mxsinj, abs( t ) )
928*
929 ELSE
930*
931* .. choose correct signum for THETA and rotate
932*
933 thsign = -sign( one, aapq )
934 t = one / ( theta+thsign*
935 $ sqrt( one+theta*theta ) )
936 cs = sqrt( one / ( one+t*t ) )
937 sn = t*cs
938*
939 mxsinj = max( mxsinj, abs( sn ) )
940 sva( q ) = aaqq*sqrt( max( zero,
941 $ one+t*apoaq*aapq ) )
942 aapp = aapp*sqrt( max( zero,
943 $ one-t*aqoap*aapq ) )
944*
945 apoaq = work( p ) / work( q )
946 aqoap = work( q ) / work( p )
947 IF( work( p ).GE.one ) THEN
948 IF( work( q ).GE.one ) THEN
949 fastr( 3 ) = t*apoaq
950 fastr( 4 ) = -t*aqoap
951 work( p ) = work( p )*cs
952 work( q ) = work( q )*cs
953 CALL srotm( m, a( 1, p ), 1,
954 $ a( 1, q ), 1,
955 $ fastr )
956 IF( rsvec )CALL srotm( mvl,
957 $ v( 1, p ), 1, v( 1, q ),
958 $ 1, fastr )
959 ELSE
960 CALL saxpy( m, -t*aqoap,
961 $ a( 1, q ), 1,
962 $ a( 1, p ), 1 )
963 CALL saxpy( m, cs*sn*apoaq,
964 $ a( 1, p ), 1,
965 $ a( 1, q ), 1 )
966 work( p ) = work( p )*cs
967 work( q ) = work( q ) / cs
968 IF( rsvec ) THEN
969 CALL saxpy( mvl, -t*aqoap,
970 $ v( 1, q ), 1,
971 $ v( 1, p ), 1 )
972 CALL saxpy( mvl,
973 $ cs*sn*apoaq,
974 $ v( 1, p ), 1,
975 $ v( 1, q ), 1 )
976 END IF
977 END IF
978 ELSE
979 IF( work( q ).GE.one ) THEN
980 CALL saxpy( m, t*apoaq,
981 $ a( 1, p ), 1,
982 $ a( 1, q ), 1 )
983 CALL saxpy( m, -cs*sn*aqoap,
984 $ a( 1, q ), 1,
985 $ a( 1, p ), 1 )
986 work( p ) = work( p ) / cs
987 work( q ) = work( q )*cs
988 IF( rsvec ) THEN
989 CALL saxpy( mvl, t*apoaq,
990 $ v( 1, p ), 1,
991 $ v( 1, q ), 1 )
992 CALL saxpy( mvl,
993 $ -cs*sn*aqoap,
994 $ v( 1, q ), 1,
995 $ v( 1, p ), 1 )
996 END IF
997 ELSE
998 IF( work( p ).GE.work( q ) )
999 $ THEN
1000 CALL saxpy( m, -t*aqoap,
1001 $ a( 1, q ), 1,
1002 $ a( 1, p ), 1 )
1003 CALL saxpy( m, cs*sn*apoaq,
1004 $ a( 1, p ), 1,
1005 $ a( 1, q ), 1 )
1006 work( p ) = work( p )*cs
1007 work( q ) = work( q ) / cs
1008 IF( rsvec ) THEN
1009 CALL saxpy( mvl,
1010 $ -t*aqoap,
1011 $ v( 1, q ), 1,
1012 $ v( 1, p ), 1 )
1013 CALL saxpy( mvl,
1014 $ cs*sn*apoaq,
1015 $ v( 1, p ), 1,
1016 $ v( 1, q ), 1 )
1017 END IF
1018 ELSE
1019 CALL saxpy( m, t*apoaq,
1020 $ a( 1, p ), 1,
1021 $ a( 1, q ), 1 )
1022 CALL saxpy( m,
1023 $ -cs*sn*aqoap,
1024 $ a( 1, q ), 1,
1025 $ a( 1, p ), 1 )
1026 work( p ) = work( p ) / cs
1027 work( q ) = work( q )*cs
1028 IF( rsvec ) THEN
1029 CALL saxpy( mvl,
1030 $ t*apoaq, v( 1, p ),
1031 $ 1, v( 1, q ), 1 )
1032 CALL saxpy( mvl,
1033 $ -cs*sn*aqoap,
1034 $ v( 1, q ), 1,
1035 $ v( 1, p ), 1 )
1036 END IF
1037 END IF
1038 END IF
1039 END IF
1040 END IF
1041*
1042 ELSE
1043* .. have to use modified Gram-Schmidt like transformation
1044 CALL scopy( m, a( 1, p ), 1,
1045 $ work( n+1 ), 1 )
1046 CALL slascl( 'G', 0, 0, aapp, one, m,
1047 $ 1, work( n+1 ), lda,
1048 $ ierr )
1049 CALL slascl( 'G', 0, 0, aaqq, one, m,
1050 $ 1, a( 1, q ), lda, ierr )
1051 temp1 = -aapq*work( p ) / work( q )
1052 CALL saxpy( m, temp1, work( n+1 ), 1,
1053 $ a( 1, q ), 1 )
1054 CALL slascl( 'G', 0, 0, one, aaqq, m,
1055 $ 1, a( 1, q ), lda, ierr )
1056 sva( q ) = aaqq*sqrt( max( zero,
1057 $ one-aapq*aapq ) )
1058 mxsinj = max( mxsinj, sfmin )
1059 END IF
1060* END IF ROTOK THEN ... ELSE
1061*
1062* In the case of cancellation in updating SVA(q), SVA(p)
1063* recompute SVA(q), SVA(p).
1064*
1065 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1066 $ THEN
1067 IF( ( aaqq.LT.rootbig ) .AND.
1068 $ ( aaqq.GT.rootsfmin ) ) THEN
1069 sva( q ) = snrm2( m, a( 1, q ), 1 )*
1070 $ work( q )
1071 ELSE
1072 t = zero
1073 aaqq = one
1074 CALL slassq( m, a( 1, q ), 1, t,
1075 $ aaqq )
1076 sva( q ) = t*sqrt( aaqq )*work( q )
1077 END IF
1078 END IF
1079 IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1080 IF( ( aapp.LT.rootbig ) .AND.
1081 $ ( aapp.GT.rootsfmin ) ) THEN
1082 aapp = snrm2( m, a( 1, p ), 1 )*
1083 $ work( p )
1084 ELSE
1085 t = zero
1086 aapp = one
1087 CALL slassq( m, a( 1, p ), 1, t,
1088 $ aapp )
1089 aapp = t*sqrt( aapp )*work( p )
1090 END IF
1091 sva( p ) = aapp
1092 END IF
1093*
1094 ELSE
1095* A(:,p) and A(:,q) already numerically orthogonal
1096 IF( ir1.EQ.0 )notrot = notrot + 1
1097*[RTD] SKIPPED = SKIPPED + 1
1098 pskipped = pskipped + 1
1099 END IF
1100 ELSE
1101* A(:,q) is zero column
1102 IF( ir1.EQ.0 )notrot = notrot + 1
1103 pskipped = pskipped + 1
1104 END IF
1105*
1106 IF( ( i.LE.swband ) .AND.
1107 $ ( pskipped.GT.rowskip ) ) THEN
1108 IF( ir1.EQ.0 )aapp = -aapp
1109 notrot = 0
1110 GO TO 2103
1111 END IF
1112*
1113 2002 CONTINUE
1114* END q-LOOP
1115*
1116 2103 CONTINUE
1117* bailed out of q-loop
1118*
1119 sva( p ) = aapp
1120*
1121 ELSE
1122 sva( p ) = aapp
1123 IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1124 $ notrot = notrot + min( igl+kbl-1, n ) - p
1125 END IF
1126*
1127 2001 CONTINUE
1128* end of the p-loop
1129* end of doing the block ( ibr, ibr )
1130 1002 CONTINUE
1131* end of ir1-loop
1132*
1133* ... go to the off diagonal blocks
1134*
1135 igl = ( ibr-1 )*kbl + 1
1136*
1137 DO 2010 jbc = ibr + 1, nbl
1138*
1139 jgl = ( jbc-1 )*kbl + 1
1140*
1141* doing the block at ( ibr, jbc )
1142*
1143 ijblsk = 0
1144 DO 2100 p = igl, min( igl+kbl-1, n )
1145*
1146 aapp = sva( p )
1147 IF( aapp.GT.zero ) THEN
1148*
1149 pskipped = 0
1150*
1151 DO 2200 q = jgl, min( jgl+kbl-1, n )
1152*
1153 aaqq = sva( q )
1154 IF( aaqq.GT.zero ) THEN
1155 aapp0 = aapp
1156*
1157* .. M x 2 Jacobi SVD ..
1158*
1159* Safe Gram matrix computation
1160*
1161 IF( aaqq.GE.one ) THEN
1162 IF( aapp.GE.aaqq ) THEN
1163 rotok = ( small*aapp ).LE.aaqq
1164 ELSE
1165 rotok = ( small*aaqq ).LE.aapp
1166 END IF
1167 IF( aapp.LT.( big / aaqq ) ) THEN
1168 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1169 $ q ), 1 )*work( p )*work( q ) /
1170 $ aaqq ) / aapp
1171 ELSE
1172 CALL scopy( m, a( 1, p ), 1,
1173 $ work( n+1 ), 1 )
1174 CALL slascl( 'G', 0, 0, aapp,
1175 $ work( p ), m, 1,
1176 $ work( n+1 ), lda, ierr )
1177 aapq = sdot( m, work( n+1 ), 1,
1178 $ a( 1, q ), 1 )*work( q ) / aaqq
1179 END IF
1180 ELSE
1181 IF( aapp.GE.aaqq ) THEN
1182 rotok = aapp.LE.( aaqq / small )
1183 ELSE
1184 rotok = aaqq.LE.( aapp / small )
1185 END IF
1186 IF( aapp.GT.( small / aaqq ) ) THEN
1187 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1188 $ q ), 1 )*work( p )*work( q ) /
1189 $ aaqq ) / aapp
1190 ELSE
1191 CALL scopy( m, a( 1, q ), 1,
1192 $ work( n+1 ), 1 )
1193 CALL slascl( 'G', 0, 0, aaqq,
1194 $ work( q ), m, 1,
1195 $ work( n+1 ), lda, ierr )
1196 aapq = sdot( m, work( n+1 ), 1,
1197 $ a( 1, p ), 1 )*work( p ) / aapp
1198 END IF
1199 END IF
1200*
1201 mxaapq = max( mxaapq, abs( aapq ) )
1202*
1203* TO rotate or NOT to rotate, THAT is the question ...
1204*
1205 IF( abs( aapq ).GT.tol ) THEN
1206 notrot = 0
1207*[RTD] ROTATED = ROTATED + 1
1208 pskipped = 0
1209 iswrot = iswrot + 1
1210*
1211 IF( rotok ) THEN
1212*
1213 aqoap = aaqq / aapp
1214 apoaq = aapp / aaqq
1215 theta = -half*abs( aqoap-apoaq ) / aapq
1216 IF( aaqq.GT.aapp0 )theta = -theta
1217*
1218 IF( abs( theta ).GT.bigtheta ) THEN
1219 t = half / theta
1220 fastr( 3 ) = t*work( p ) / work( q )
1221 fastr( 4 ) = -t*work( q ) /
1222 $ work( p )
1223 CALL srotm( m, a( 1, p ), 1,
1224 $ a( 1, q ), 1, fastr )
1225 IF( rsvec )CALL srotm( mvl,
1226 $ v( 1, p ), 1,
1227 $ v( 1, q ), 1,
1228 $ fastr )
1229 sva( q ) = aaqq*sqrt( max( zero,
1230 $ one+t*apoaq*aapq ) )
1231 aapp = aapp*sqrt( max( zero,
1232 $ one-t*aqoap*aapq ) )
1233 mxsinj = max( mxsinj, abs( t ) )
1234 ELSE
1235*
1236* .. choose correct signum for THETA and rotate
1237*
1238 thsign = -sign( one, aapq )
1239 IF( aaqq.GT.aapp0 )thsign = -thsign
1240 t = one / ( theta+thsign*
1241 $ sqrt( one+theta*theta ) )
1242 cs = sqrt( one / ( one+t*t ) )
1243 sn = t*cs
1244 mxsinj = max( mxsinj, abs( sn ) )
1245 sva( q ) = aaqq*sqrt( max( zero,
1246 $ one+t*apoaq*aapq ) )
1247 aapp = aapp*sqrt( max( zero,
1248 $ one-t*aqoap*aapq ) )
1249*
1250 apoaq = work( p ) / work( q )
1251 aqoap = work( q ) / work( p )
1252 IF( work( p ).GE.one ) THEN
1253*
1254 IF( work( q ).GE.one ) THEN
1255 fastr( 3 ) = t*apoaq
1256 fastr( 4 ) = -t*aqoap
1257 work( p ) = work( p )*cs
1258 work( q ) = work( q )*cs
1259 CALL srotm( m, a( 1, p ), 1,
1260 $ a( 1, q ), 1,
1261 $ fastr )
1262 IF( rsvec )CALL srotm( mvl,
1263 $ v( 1, p ), 1, v( 1, q ),
1264 $ 1, fastr )
1265 ELSE
1266 CALL saxpy( m, -t*aqoap,
1267 $ a( 1, q ), 1,
1268 $ a( 1, p ), 1 )
1269 CALL saxpy( m, cs*sn*apoaq,
1270 $ a( 1, p ), 1,
1271 $ a( 1, q ), 1 )
1272 IF( rsvec ) THEN
1273 CALL saxpy( mvl, -t*aqoap,
1274 $ v( 1, q ), 1,
1275 $ v( 1, p ), 1 )
1276 CALL saxpy( mvl,
1277 $ cs*sn*apoaq,
1278 $ v( 1, p ), 1,
1279 $ v( 1, q ), 1 )
1280 END IF
1281 work( p ) = work( p )*cs
1282 work( q ) = work( q ) / cs
1283 END IF
1284 ELSE
1285 IF( work( q ).GE.one ) THEN
1286 CALL saxpy( m, t*apoaq,
1287 $ a( 1, p ), 1,
1288 $ a( 1, q ), 1 )
1289 CALL saxpy( m, -cs*sn*aqoap,
1290 $ a( 1, q ), 1,
1291 $ a( 1, p ), 1 )
1292 IF( rsvec ) THEN
1293 CALL saxpy( mvl, t*apoaq,
1294 $ v( 1, p ), 1,
1295 $ v( 1, q ), 1 )
1296 CALL saxpy( mvl,
1297 $ -cs*sn*aqoap,
1298 $ v( 1, q ), 1,
1299 $ v( 1, p ), 1 )
1300 END IF
1301 work( p ) = work( p ) / cs
1302 work( q ) = work( q )*cs
1303 ELSE
1304 IF( work( p ).GE.work( q ) )
1305 $ THEN
1306 CALL saxpy( m, -t*aqoap,
1307 $ a( 1, q ), 1,
1308 $ a( 1, p ), 1 )
1309 CALL saxpy( m, cs*sn*apoaq,
1310 $ a( 1, p ), 1,
1311 $ a( 1, q ), 1 )
1312 work( p ) = work( p )*cs
1313 work( q ) = work( q ) / cs
1314 IF( rsvec ) THEN
1315 CALL saxpy( mvl,
1316 $ -t*aqoap,
1317 $ v( 1, q ), 1,
1318 $ v( 1, p ), 1 )
1319 CALL saxpy( mvl,
1320 $ cs*sn*apoaq,
1321 $ v( 1, p ), 1,
1322 $ v( 1, q ), 1 )
1323 END IF
1324 ELSE
1325 CALL saxpy( m, t*apoaq,
1326 $ a( 1, p ), 1,
1327 $ a( 1, q ), 1 )
1328 CALL saxpy( m,
1329 $ -cs*sn*aqoap,
1330 $ a( 1, q ), 1,
1331 $ a( 1, p ), 1 )
1332 work( p ) = work( p ) / cs
1333 work( q ) = work( q )*cs
1334 IF( rsvec ) THEN
1335 CALL saxpy( mvl,
1336 $ t*apoaq, v( 1, p ),
1337 $ 1, v( 1, q ), 1 )
1338 CALL saxpy( mvl,
1339 $ -cs*sn*aqoap,
1340 $ v( 1, q ), 1,
1341 $ v( 1, p ), 1 )
1342 END IF
1343 END IF
1344 END IF
1345 END IF
1346 END IF
1347*
1348 ELSE
1349 IF( aapp.GT.aaqq ) THEN
1350 CALL scopy( m, a( 1, p ), 1,
1351 $ work( n+1 ), 1 )
1352 CALL slascl( 'G', 0, 0, aapp, one,
1353 $ m, 1, work( n+1 ), lda,
1354 $ ierr )
1355 CALL slascl( 'G', 0, 0, aaqq, one,
1356 $ m, 1, a( 1, q ), lda,
1357 $ ierr )
1358 temp1 = -aapq*work( p ) / work( q )
1359 CALL saxpy( m, temp1, work( n+1 ),
1360 $ 1, a( 1, q ), 1 )
1361 CALL slascl( 'G', 0, 0, one, aaqq,
1362 $ m, 1, a( 1, q ), lda,
1363 $ ierr )
1364 sva( q ) = aaqq*sqrt( max( zero,
1365 $ one-aapq*aapq ) )
1366 mxsinj = max( mxsinj, sfmin )
1367 ELSE
1368 CALL scopy( m, a( 1, q ), 1,
1369 $ work( n+1 ), 1 )
1370 CALL slascl( 'G', 0, 0, aaqq, one,
1371 $ m, 1, work( n+1 ), lda,
1372 $ ierr )
1373 CALL slascl( 'G', 0, 0, aapp, one,
1374 $ m, 1, a( 1, p ), lda,
1375 $ ierr )
1376 temp1 = -aapq*work( q ) / work( p )
1377 CALL saxpy( m, temp1, work( n+1 ),
1378 $ 1, a( 1, p ), 1 )
1379 CALL slascl( 'G', 0, 0, one, aapp,
1380 $ m, 1, a( 1, p ), lda,
1381 $ ierr )
1382 sva( p ) = aapp*sqrt( max( zero,
1383 $ one-aapq*aapq ) )
1384 mxsinj = max( mxsinj, sfmin )
1385 END IF
1386 END IF
1387* END IF ROTOK THEN ... ELSE
1388*
1389* In the case of cancellation in updating SVA(q)
1390* .. recompute SVA(q)
1391 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1392 $ THEN
1393 IF( ( aaqq.LT.rootbig ) .AND.
1394 $ ( aaqq.GT.rootsfmin ) ) THEN
1395 sva( q ) = snrm2( m, a( 1, q ), 1 )*
1396 $ work( q )
1397 ELSE
1398 t = zero
1399 aaqq = one
1400 CALL slassq( m, a( 1, q ), 1, t,
1401 $ aaqq )
1402 sva( q ) = t*sqrt( aaqq )*work( q )
1403 END IF
1404 END IF
1405 IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1406 IF( ( aapp.LT.rootbig ) .AND.
1407 $ ( aapp.GT.rootsfmin ) ) THEN
1408 aapp = snrm2( m, a( 1, p ), 1 )*
1409 $ work( p )
1410 ELSE
1411 t = zero
1412 aapp = one
1413 CALL slassq( m, a( 1, p ), 1, t,
1414 $ aapp )
1415 aapp = t*sqrt( aapp )*work( p )
1416 END IF
1417 sva( p ) = aapp
1418 END IF
1419* end of OK rotation
1420 ELSE
1421 notrot = notrot + 1
1422*[RTD] SKIPPED = SKIPPED + 1
1423 pskipped = pskipped + 1
1424 ijblsk = ijblsk + 1
1425 END IF
1426 ELSE
1427 notrot = notrot + 1
1428 pskipped = pskipped + 1
1429 ijblsk = ijblsk + 1
1430 END IF
1431*
1432 IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1433 $ THEN
1434 sva( p ) = aapp
1435 notrot = 0
1436 GO TO 2011
1437 END IF
1438 IF( ( i.LE.swband ) .AND.
1439 $ ( pskipped.GT.rowskip ) ) THEN
1440 aapp = -aapp
1441 notrot = 0
1442 GO TO 2203
1443 END IF
1444*
1445 2200 CONTINUE
1446* end of the q-loop
1447 2203 CONTINUE
1448*
1449 sva( p ) = aapp
1450*
1451 ELSE
1452*
1453 IF( aapp.EQ.zero )notrot = notrot +
1454 $ min( jgl+kbl-1, n ) - jgl + 1
1455 IF( aapp.LT.zero )notrot = 0
1456*
1457 END IF
1458*
1459 2100 CONTINUE
1460* end of the p-loop
1461 2010 CONTINUE
1462* end of the jbc-loop
1463 2011 CONTINUE
1464*2011 bailed out of the jbc-loop
1465 DO 2012 p = igl, min( igl+kbl-1, n )
1466 sva( p ) = abs( sva( p ) )
1467 2012 CONTINUE
1468***
1469 2000 CONTINUE
1470*2000 :: end of the ibr-loop
1471*
1472* .. update SVA(N)
1473 IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1474 $ THEN
1475 sva( n ) = snrm2( m, a( 1, n ), 1 )*work( n )
1476 ELSE
1477 t = zero
1478 aapp = one
1479 CALL slassq( m, a( 1, n ), 1, t, aapp )
1480 sva( n ) = t*sqrt( aapp )*work( n )
1481 END IF
1482*
1483* Additional steering devices
1484*
1485 IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1486 $ ( iswrot.LE.n ) ) )swband = i
1487*
1488 IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.sqrt( float( n ) )*
1489 $ tol ) .AND. ( float( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1490 GO TO 1994
1491 END IF
1492*
1493 IF( notrot.GE.emptsw )GO TO 1994
1494*
1495 1993 CONTINUE
1496* end i=1:NSWEEP loop
1497*
1498* #:( Reaching this point means that the procedure has not converged.
1499 info = nsweep - 1
1500 GO TO 1995
1501*
1502 1994 CONTINUE
1503* #:) Reaching this point means numerical convergence after the i-th
1504* sweep.
1505*
1506 info = 0
1507* #:) INFO = 0 confirms successful iterations.
1508 1995 CONTINUE
1509*
1510* Sort the singular values and find how many are above
1511* the underflow threshold.
1512*
1513 n2 = 0
1514 n4 = 0
1515 DO 5991 p = 1, n - 1
1516 q = isamax( n-p+1, sva( p ), 1 ) + p - 1
1517 IF( p.NE.q ) THEN
1518 temp1 = sva( p )
1519 sva( p ) = sva( q )
1520 sva( q ) = temp1
1521 temp1 = work( p )
1522 work( p ) = work( q )
1523 work( q ) = temp1
1524 CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1525 IF( rsvec )CALL sswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1526 END IF
1527 IF( sva( p ).NE.zero ) THEN
1528 n4 = n4 + 1
1529 IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1530 END IF
1531 5991 CONTINUE
1532 IF( sva( n ).NE.zero ) THEN
1533 n4 = n4 + 1
1534 IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1535 END IF
1536*
1537* Normalize the left singular vectors.
1538*
1539 IF( lsvec .OR. uctol ) THEN
1540 DO 1998 p = 1, n2
1541 CALL sscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1542 1998 CONTINUE
1543 END IF
1544*
1545* Scale the product of Jacobi rotations (assemble the fast rotations).
1546*
1547 IF( rsvec ) THEN
1548 IF( applv ) THEN
1549 DO 2398 p = 1, n
1550 CALL sscal( mvl, work( p ), v( 1, p ), 1 )
1551 2398 CONTINUE
1552 ELSE
1553 DO 2399 p = 1, n
1554 temp1 = one / snrm2( mvl, v( 1, p ), 1 )
1555 CALL sscal( mvl, temp1, v( 1, p ), 1 )
1556 2399 CONTINUE
1557 END IF
1558 END IF
1559*
1560* Undo scaling, if necessary (and possible).
1561 IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl ) ) )
1562 $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1563 $ ( sfmin / skl ) ) ) ) THEN
1564 DO 2400 p = 1, n
1565 sva( p ) = skl*sva( p )
1566 2400 CONTINUE
1567 skl = one
1568 END IF
1569*
1570 work( 1 ) = skl
1571* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1572* then some of the singular values may overflow or underflow and
1573* the spectrum is given in this factored representation.
1574*
1575 work( 2 ) = float( n4 )
1576* N4 is the number of computed nonzero singular values of A.
1577*
1578 work( 3 ) = float( n2 )
1579* N2 is the number of singular values of A greater than SFMIN.
1580* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1581* that may carry some information.
1582*
1583 work( 4 ) = float( i )
1584* i is the index of the last sweep before declaring convergence.
1585*
1586 work( 5 ) = mxaapq
1587* MXAAPQ is the largest absolute value of scaled pivots in the
1588* last sweep
1589*
1590 work( 6 ) = mxsinj
1591* MXSINJ is the largest absolute value of the sines of Jacobi angles
1592* in the last sweep
1593*
1594 RETURN
1595* ..
1596* .. END OF SGESVJ
1597* ..
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine saxpy(n, sa, sx, incx, sy, incy)
SAXPY
Definition saxpy.f:89
subroutine scopy(n, sx, incx, sy, incy)
SCOPY
Definition scopy.f:82
real function sdot(n, sx, incx, sy, incy)
SDOT
Definition sdot.f:82
subroutine sgsvj0(jobv, m, n, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
SGSVJ0 pre-processor for the routine sgesvj.
Definition sgsvj0.f:218
subroutine sgsvj1(jobv, m, n, n1, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
SGSVJ1 pre-processor for the routine sgesvj, applies Jacobi rotations targeting only particular pivot...
Definition sgsvj1.f:236
integer function isamax(n, sx, incx)
ISAMAX
Definition isamax.f:71
real function slamch(cmach)
SLAMCH
Definition slamch.f:68
subroutine slascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition slascl.f:143
subroutine slaset(uplo, m, n, alpha, beta, a, lda)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition slaset.f:110
subroutine slassq(n, x, incx, scale, sumsq)
SLASSQ updates a sum of squares represented in scaled form.
Definition slassq.f90:124
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real(wp) function snrm2(n, x, incx)
SNRM2
Definition snrm2.f90:89
subroutine srotm(n, sx, incx, sy, incy, sparam)
SROTM
Definition srotm.f:97
subroutine sscal(n, sa, sx, incx)
SSCAL
Definition sscal.f:79
subroutine sswap(n, sx, incx, sy, incy)
SSWAP
Definition sswap.f:82
Here is the call graph for this function:
Here is the caller graph for this function: