LAPACK 3.12.1
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ sgesvj()

subroutine sgesvj ( character*1 joba,
character*1 jobu,
character*1 jobv,
integer m,
integer n,
real, dimension( lda, * ) a,
integer lda,
real, dimension( n ) sva,
integer mv,
real, dimension( ldv, * ) v,
integer ldv,
real, dimension( lwork ) work,
integer lwork,
integer info )

SGESVJ

Download SGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:
!>
!> SGESVJ computes the singular value decomposition (SVD) of a real
!> M-by-N matrix A, where M >= N. The SVD of A is written as
!>                                    [++]   [xx]   [x0]   [xx]
!>              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
!>                                    [++]   [xx]
!> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
!> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
!> of SIGMA are the singular values of A. The columns of U and V are the
!> left and the right singular vectors of A, respectively.
!> SGESVJ can sometimes compute tiny singular values and their singular vectors much
!> more accurately than other SVD routines, see below under Further Details.
!> 
Parameters
[in]JOBA
!>          JOBA is CHARACTER*1
!>          Specifies the structure of A.
!>          = 'L': The input matrix A is lower triangular;
!>          = 'U': The input matrix A is upper triangular;
!>          = 'G': The input matrix A is general M-by-N matrix, M >= N.
!> 
[in]JOBU
!>          JOBU is CHARACTER*1
!>          Specifies whether to compute the left singular vectors
!>          (columns of U):
!>          = 'U': The left singular vectors corresponding to the nonzero
!>                 singular values are computed and returned in the leading
!>                 columns of A. See more details in the description of A.
!>                 The default numerical orthogonality threshold is set to
!>                 approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E').
!>          = 'C': Analogous to JOBU='U', except that user can control the
!>                 level of numerical orthogonality of the computed left
!>                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
!>                 CTOL is given on input in the array WORK.
!>                 No CTOL smaller than ONE is allowed. CTOL greater
!>                 than 1 / EPS is meaningless. The option 'C'
!>                 can be used if M*EPS is satisfactory orthogonality
!>                 of the computed left singular vectors, so CTOL=M could
!>                 save few sweeps of Jacobi rotations.
!>                 See the descriptions of A and WORK(1).
!>          = 'N': The matrix U is not computed. However, see the
!>                 description of A.
!> 
[in]JOBV
!>          JOBV is CHARACTER*1
!>          Specifies whether to compute the right singular vectors, that
!>          is, the matrix V:
!>          = 'V':  the matrix V is computed and returned in the array V
!>          = 'A':  the Jacobi rotations are applied to the MV-by-N
!>                  array V. In other words, the right singular vector
!>                  matrix V is not computed explicitly; instead it is
!>                  applied to an MV-by-N matrix initially stored in the
!>                  first MV rows of V.
!>          = 'N':  the matrix V is not computed and the array V is not
!>                  referenced
!> 
[in]M
!>          M is INTEGER
!>          The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0.
!> 
[in]N
!>          N is INTEGER
!>          The number of columns of the input matrix A.
!>          M >= N >= 0.
!> 
[in,out]A
!>          A is REAL array, dimension (LDA,N)
!>          On entry, the M-by-N matrix A.
!>          On exit,
!>          If JOBU = 'U' .OR. JOBU = 'C':
!>                 If INFO = 0:
!>                 RANKA orthonormal columns of U are returned in the
!>                 leading RANKA columns of the array A. Here RANKA <= N
!>                 is the number of computed singular values of A that are
!>                 above the underflow threshold SLAMCH('S'). The singular
!>                 vectors corresponding to underflowed or zero singular
!>                 values are not computed. The value of RANKA is returned
!>                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
!>                 descriptions of SVA and WORK. The computed columns of U
!>                 are mutually numerically orthogonal up to approximately
!>                 TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
!>                 see the description of JOBU.
!>                 If INFO > 0,
!>                 the procedure SGESVJ did not converge in the given number
!>                 of iterations (sweeps). In that case, the computed
!>                 columns of U may not be orthogonal up to TOL. The output
!>                 U (stored in A), SIGMA (given by the computed singular
!>                 values in SVA(1:N)) and V is still a decomposition of the
!>                 input matrix A in the sense that the residual
!>                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
!>          If JOBU = 'N':
!>                 If INFO = 0:
!>                 Note that the left singular vectors are 'for free' in the
!>                 one-sided Jacobi SVD algorithm. However, if only the
!>                 singular values are needed, the level of numerical
!>                 orthogonality of U is not an issue and iterations are
!>                 stopped when the columns of the iterated matrix are
!>                 numerically orthogonal up to approximately M*EPS. Thus,
!>                 on exit, A contains the columns of U scaled with the
!>                 corresponding singular values.
!>                 If INFO > 0:
!>                 the procedure SGESVJ did not converge in the given number
!>                 of iterations (sweeps).
!> 
[in]LDA
!>          LDA is INTEGER
!>          The leading dimension of the array A.  LDA >= max(1,M).
!> 
[out]SVA
!>          SVA is REAL array, dimension (N)
!>          On exit,
!>          If INFO = 0 :
!>          depending on the value SCALE = WORK(1), we have:
!>                 If SCALE = ONE:
!>                 SVA(1:N) contains the computed singular values of A.
!>                 During the computation SVA contains the Euclidean column
!>                 norms of the iterated matrices in the array A.
!>                 If SCALE .NE. ONE:
!>                 The singular values of A are SCALE*SVA(1:N), and this
!>                 factored representation is due to the fact that some of the
!>                 singular values of A might underflow or overflow.
!>
!>          If INFO > 0 :
!>          the procedure SGESVJ did not converge in the given number of
!>          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
!> 
[in]MV
!>          MV is INTEGER
!>          If JOBV = 'A', then the product of Jacobi rotations in SGESVJ
!>          is applied to the first MV rows of V. See the description of JOBV.
!> 
[in,out]V
!>          V is REAL array, dimension (LDV,N)
!>          If JOBV = 'V', then V contains on exit the N-by-N matrix of
!>                         the right singular vectors;
!>          If JOBV = 'A', then V contains the product of the computed right
!>                         singular vector matrix and the initial matrix in
!>                         the array V.
!>          If JOBV = 'N', then V is not referenced.
!> 
[in]LDV
!>          LDV is INTEGER
!>          The leading dimension of the array V, LDV >= 1.
!>          If JOBV = 'V', then LDV >= max(1,N).
!>          If JOBV = 'A', then LDV >= max(1,MV) .
!> 
[in,out]WORK
!>          WORK is REAL array, dimension (MAX(1,LWORK))
!>          On entry,
!>          If JOBU = 'C' :
!>          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
!>                    The process stops if all columns of A are mutually
!>                    orthogonal up to CTOL*EPS, EPS=SLAMCH('E').
!>                    It is required that CTOL >= ONE, i.e. it is not
!>                    allowed to force the routine to obtain orthogonality
!>                    below EPSILON.
!>          On exit,
!>          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
!>                    are the computed singular vcalues of A.
!>                    (See description of SVA().)
!>          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
!>                    singular values.
!>          WORK(3) = NINT(WORK(3)) is the number of the computed singular
!>                    values that are larger than the underflow threshold.
!>          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
!>                    rotations needed for numerical convergence.
!>          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
!>                    This is useful information in cases when SGESVJ did
!>                    not converge, as it can be used to estimate whether
!>                    the output is still useful and for post festum analysis.
!>          WORK(6) = the largest absolute value over all sines of the
!>                    Jacobi rotation angles in the last sweep. It can be
!>                    useful for a post festum analysis.
!> 
[in]LWORK
!>          LWORK is INTEGER
!>          Length of WORK.
!>          LWORK >= 1, if MIN(M,N) = 0, and LWORK >= MAX(6,M+N), otherwise.
!>
!>          If on entry LWORK = -1, then a workspace query is assumed and
!>          no computation is done; WORK(1) is set to the minial (and optimal)
!>          length of WORK.
!> 
[out]INFO
!>          INFO is INTEGER
!>          = 0:  successful exit.
!>          < 0:  if INFO = -i, then the i-th argument had an illegal value
!>          > 0:  SGESVJ did not converge in the maximal allowed number (30)
!>                of sweeps. The output may still be useful. See the
!>                description of WORK.
!> 
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane rotations. The rotations are implemented as fast scaled rotations of Anda and Park [1]. In the case of underflow of the Jacobi angle, a modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses column interchanges of de Rijk [2]. The relative accuracy of the computed singular values and the accuracy of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. The condition number that determines the accuracy in the full rank case is essentially min_{D=diag} kappa(A*D), where kappa(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if used in an accelerated version of Drmac and Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. Some tuning parameters (marked with [TP]) are available for the implementer.
The computational range for the nonzero singular values is the machine number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the corresponding gradual loss of accurate digits.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
[1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.

[2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer.
SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.

[3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
[4] Z. Drmac: Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic.
SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.

[5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
LAPACK Working note 169.

[6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
LAPACK Working note 170.

[7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Bugs, Examples and Comments:
Please report all bugs and send interesting test examples and comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 324 of file sgesvj.f.

326*
327* -- LAPACK computational routine --
328* -- LAPACK is a software package provided by Univ. of Tennessee, --
329* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
330*
331* .. Scalar Arguments ..
332 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
333 CHARACTER*1 JOBA, JOBU, JOBV
334* ..
335* .. Array Arguments ..
336 REAL A( LDA, * ), SVA( N ), V( LDV, * ),
337 $ WORK( LWORK )
338* ..
339*
340* =====================================================================
341*
342* .. Local Parameters ..
343 REAL ZERO, HALF, ONE
344 parameter( zero = 0.0e0, half = 0.5e0, one = 1.0e0)
345 INTEGER NSWEEP
346 parameter( nsweep = 30 )
347* ..
348* .. Local Scalars ..
349 REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
350 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
351 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
352 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
353 $ THSIGN, TOL
354 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
355 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
356 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
357 $ SWBAND, MINMN, LWMIN
358 LOGICAL APPLV, GOSCALE, LOWER, LQUERY, LSVEC, NOSCALE,
359 $ ROTOK, RSVEC, UCTOL, UPPER
360* ..
361* .. Local Arrays ..
362 REAL FASTR( 5 )
363* ..
364* .. Intrinsic Functions ..
365 INTRINSIC abs, max, min, float, sign, sqrt
366* ..
367* .. External Functions ..
368* ..
369* from BLAS
370 REAL SDOT, SNRM2
371 EXTERNAL sdot, snrm2
372 INTEGER ISAMAX
373 EXTERNAL isamax
374* from LAPACK
375 REAL SLAMCH, SROUNDUP_LWORK
376 EXTERNAL slamch, sroundup_lwork
377 LOGICAL LSAME
378 EXTERNAL lsame
379* ..
380* .. External Subroutines ..
381* ..
382* from BLAS
383 EXTERNAL saxpy, scopy, srotm, sscal, sswap
384* from LAPACK
385 EXTERNAL slascl, slaset, slassq, xerbla
386*
387 EXTERNAL sgsvj0, sgsvj1
388* ..
389* .. Executable Statements ..
390*
391* Test the input arguments
392*
393 lsvec = lsame( jobu, 'U' )
394 uctol = lsame( jobu, 'C' )
395 rsvec = lsame( jobv, 'V' )
396 applv = lsame( jobv, 'A' )
397 upper = lsame( joba, 'U' )
398 lower = lsame( joba, 'L' )
399*
400 minmn = min( m, n )
401 IF( minmn.EQ.0 ) THEN
402 lwmin = 1
403 ELSE
404 lwmin = max( 6, m+n )
405 END IF
406*
407 lquery = ( lwork.EQ.-1 )
408 IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
409 info = -1
410 ELSE IF( .NOT.( lsvec .OR.
411 $ uctol .OR.
412 $ lsame( jobu, 'N' ) ) ) THEN
413 info = -2
414 ELSE IF( .NOT.( rsvec .OR.
415 $ applv .OR.
416 $ lsame( jobv, 'N' ) ) ) THEN
417 info = -3
418 ELSE IF( m.LT.0 ) THEN
419 info = -4
420 ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
421 info = -5
422 ELSE IF( lda.LT.m ) THEN
423 info = -7
424 ELSE IF( mv.LT.0 ) THEN
425 info = -9
426 ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
427 $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
428 info = -11
429 ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
430 info = -12
431 ELSE IF( lwork.LT.lwmin .AND. ( .NOT.lquery ) ) THEN
432 info = -13
433 ELSE
434 info = 0
435 END IF
436*
437* #:(
438 IF( info.NE.0 ) THEN
439 CALL xerbla( 'SGESVJ', -info )
440 RETURN
441 ELSE IF( lquery ) THEN
442 work( 1 ) = sroundup_lwork( lwmin )
443 RETURN
444 END IF
445*
446* #:) Quick return for void matrix
447*
448 IF( minmn.EQ.0 ) RETURN
449*
450* Set numerical parameters
451* The stopping criterion for Jacobi rotations is
452*
453* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
454*
455* where EPS is the round-off and CTOL is defined as follows:
456*
457 IF( uctol ) THEN
458* ... user controlled
459 ctol = work( 1 )
460 ELSE
461* ... default
462 IF( lsvec .OR. rsvec .OR. applv ) THEN
463 ctol = sqrt( float( m ) )
464 ELSE
465 ctol = float( m )
466 END IF
467 END IF
468* ... and the machine dependent parameters are
469*[!] (Make sure that SLAMCH() works properly on the target machine.)
470*
471 epsln = slamch( 'Epsilon' )
472 rooteps = sqrt( epsln )
473 sfmin = slamch( 'SafeMinimum' )
474 rootsfmin = sqrt( sfmin )
475 small = sfmin / epsln
476 big = slamch( 'Overflow' )
477* BIG = ONE / SFMIN
478 rootbig = one / rootsfmin
479 large = big / sqrt( float( m*n ) )
480 bigtheta = one / rooteps
481*
482 tol = ctol*epsln
483 roottol = sqrt( tol )
484*
485 IF( float( m )*epsln.GE.one ) THEN
486 info = -4
487 CALL xerbla( 'SGESVJ', -info )
488 RETURN
489 END IF
490*
491* Initialize the right singular vector matrix.
492*
493 IF( rsvec ) THEN
494 mvl = n
495 CALL slaset( 'A', mvl, n, zero, one, v, ldv )
496 ELSE IF( applv ) THEN
497 mvl = mv
498 END IF
499 rsvec = rsvec .OR. applv
500*
501* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
502*(!) If necessary, scale A to protect the largest singular value
503* from overflow. It is possible that saving the largest singular
504* value destroys the information about the small ones.
505* This initial scaling is almost minimal in the sense that the
506* goal is to make sure that no column norm overflows, and that
507* SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
508* in A are detected, the procedure returns with INFO=-6.
509*
510 skl = one / sqrt( float( m )*float( n ) )
511 noscale = .true.
512 goscale = .true.
513*
514 IF( lower ) THEN
515* the input matrix is M-by-N lower triangular (trapezoidal)
516 DO 1874 p = 1, n
517 aapp = zero
518 aaqq = one
519 CALL slassq( m-p+1, a( p, p ), 1, aapp, aaqq )
520 IF( aapp.GT.big ) THEN
521 info = -6
522 CALL xerbla( 'SGESVJ', -info )
523 RETURN
524 END IF
525 aaqq = sqrt( aaqq )
526 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
527 sva( p ) = aapp*aaqq
528 ELSE
529 noscale = .false.
530 sva( p ) = aapp*( aaqq*skl )
531 IF( goscale ) THEN
532 goscale = .false.
533 DO 1873 q = 1, p - 1
534 sva( q ) = sva( q )*skl
535 1873 CONTINUE
536 END IF
537 END IF
538 1874 CONTINUE
539 ELSE IF( upper ) THEN
540* the input matrix is M-by-N upper triangular (trapezoidal)
541 DO 2874 p = 1, n
542 aapp = zero
543 aaqq = one
544 CALL slassq( p, a( 1, p ), 1, aapp, aaqq )
545 IF( aapp.GT.big ) THEN
546 info = -6
547 CALL xerbla( 'SGESVJ', -info )
548 RETURN
549 END IF
550 aaqq = sqrt( aaqq )
551 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
552 sva( p ) = aapp*aaqq
553 ELSE
554 noscale = .false.
555 sva( p ) = aapp*( aaqq*skl )
556 IF( goscale ) THEN
557 goscale = .false.
558 DO 2873 q = 1, p - 1
559 sva( q ) = sva( q )*skl
560 2873 CONTINUE
561 END IF
562 END IF
563 2874 CONTINUE
564 ELSE
565* the input matrix is M-by-N general dense
566 DO 3874 p = 1, n
567 aapp = zero
568 aaqq = one
569 CALL slassq( m, a( 1, p ), 1, aapp, aaqq )
570 IF( aapp.GT.big ) THEN
571 info = -6
572 CALL xerbla( 'SGESVJ', -info )
573 RETURN
574 END IF
575 aaqq = sqrt( aaqq )
576 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
577 sva( p ) = aapp*aaqq
578 ELSE
579 noscale = .false.
580 sva( p ) = aapp*( aaqq*skl )
581 IF( goscale ) THEN
582 goscale = .false.
583 DO 3873 q = 1, p - 1
584 sva( q ) = sva( q )*skl
585 3873 CONTINUE
586 END IF
587 END IF
588 3874 CONTINUE
589 END IF
590*
591 IF( noscale )skl = one
592*
593* Move the smaller part of the spectrum from the underflow threshold
594*(!) Start by determining the position of the nonzero entries of the
595* array SVA() relative to ( SFMIN, BIG ).
596*
597 aapp = zero
598 aaqq = big
599 DO 4781 p = 1, n
600 IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
601 aapp = max( aapp, sva( p ) )
602 4781 CONTINUE
603*
604* #:) Quick return for zero matrix
605*
606 IF( aapp.EQ.zero ) THEN
607 IF( lsvec )CALL slaset( 'G', m, n, zero, one, a, lda )
608 work( 1 ) = one
609 work( 2 ) = zero
610 work( 3 ) = zero
611 work( 4 ) = zero
612 work( 5 ) = zero
613 work( 6 ) = zero
614 RETURN
615 END IF
616*
617* #:) Quick return for one-column matrix
618*
619 IF( n.EQ.1 ) THEN
620 IF( lsvec )CALL slascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
621 $ a( 1, 1 ), lda, ierr )
622 work( 1 ) = one / skl
623 IF( sva( 1 ).GE.sfmin ) THEN
624 work( 2 ) = one
625 ELSE
626 work( 2 ) = zero
627 END IF
628 work( 3 ) = zero
629 work( 4 ) = zero
630 work( 5 ) = zero
631 work( 6 ) = zero
632 RETURN
633 END IF
634*
635* Protect small singular values from underflow, and try to
636* avoid underflows/overflows in computing Jacobi rotations.
637*
638 sn = sqrt( sfmin / epsln )
639 temp1 = sqrt( big / float( n ) )
640 IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
641 $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
642 temp1 = min( big, temp1 / aapp )
643* AAQQ = AAQQ*TEMP1
644* AAPP = AAPP*TEMP1
645 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
646 temp1 = min( sn / aaqq, big / ( aapp*sqrt( float( n ) ) ) )
647* AAQQ = AAQQ*TEMP1
648* AAPP = AAPP*TEMP1
649 ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
650 temp1 = max( sn / aaqq, temp1 / aapp )
651* AAQQ = AAQQ*TEMP1
652* AAPP = AAPP*TEMP1
653 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
654 temp1 = min( sn / aaqq, big / ( sqrt( float( n ) )*aapp ) )
655* AAQQ = AAQQ*TEMP1
656* AAPP = AAPP*TEMP1
657 ELSE
658 temp1 = one
659 END IF
660*
661* Scale, if necessary
662*
663 IF( temp1.NE.one ) THEN
664 CALL slascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
665 END IF
666 skl = temp1*skl
667 IF( skl.NE.one ) THEN
668 CALL slascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
669 skl = one / skl
670 END IF
671*
672* Row-cyclic Jacobi SVD algorithm with column pivoting
673*
674 emptsw = ( n*( n-1 ) ) / 2
675 notrot = 0
676 fastr( 1 ) = zero
677*
678* A is represented in factored form A = A * diag(WORK), where diag(WORK)
679* is initialized to identity. WORK is updated during fast scaled
680* rotations.
681*
682 DO 1868 q = 1, n
683 work( q ) = one
684 1868 CONTINUE
685*
686*
687 swband = 3
688*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
689* if SGESVJ is used as a computational routine in the preconditioned
690* Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure
691* works on pivots inside a band-like region around the diagonal.
692* The boundaries are determined dynamically, based on the number of
693* pivots above a threshold.
694*
695 kbl = min( 8, n )
696*[TP] KBL is a tuning parameter that defines the tile size in the
697* tiling of the p-q loops of pivot pairs. In general, an optimal
698* value of KBL depends on the matrix dimensions and on the
699* parameters of the computer's memory.
700*
701 nbl = n / kbl
702 IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
703*
704 blskip = kbl**2
705*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
706*
707 rowskip = min( 5, kbl )
708*[TP] ROWSKIP is a tuning parameter.
709*
710 lkahead = 1
711*[TP] LKAHEAD is a tuning parameter.
712*
713* Quasi block transformations, using the lower (upper) triangular
714* structure of the input matrix. The quasi-block-cycling usually
715* invokes cubic convergence. Big part of this cycle is done inside
716* canonical subspaces of dimensions less than M.
717*
718 IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
719*[TP] The number of partition levels and the actual partition are
720* tuning parameters.
721 n4 = n / 4
722 n2 = n / 2
723 n34 = 3*n4
724 IF( applv ) THEN
725 q = 0
726 ELSE
727 q = 1
728 END IF
729*
730 IF( lower ) THEN
731*
732* This works very well on lower triangular matrices, in particular
733* in the framework of the preconditioned Jacobi SVD (xGEJSV).
734* The idea is simple:
735* [+ 0 0 0] Note that Jacobi transformations of [0 0]
736* [+ + 0 0] [0 0]
737* [+ + x 0] actually work on [x 0] [x 0]
738* [+ + x x] [x x]. [x x]
739*
740 CALL sgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
741 $ work( n34+1 ), sva( n34+1 ), mvl,
742 $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
743 $ 2, work( n+1 ), lwork-n, ierr )
744*
745 CALL sgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
746 $ work( n2+1 ), sva( n2+1 ), mvl,
747 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
748 $ work( n+1 ), lwork-n, ierr )
749*
750 CALL sgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
751 $ work( n2+1 ), sva( n2+1 ), mvl,
752 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
753 $ work( n+1 ), lwork-n, ierr )
754*
755 CALL sgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
756 $ work( n4+1 ), sva( n4+1 ), mvl,
757 $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
758 $ work( n+1 ), lwork-n, ierr )
759*
760 CALL sgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
761 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
762 $ ierr )
763*
764 CALL sgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
765 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
766 $ lwork-n, ierr )
767*
768*
769 ELSE IF( upper ) THEN
770*
771*
772 CALL sgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v,
773 $ ldv,
774 $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
775 $ ierr )
776*
777 CALL sgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda,
778 $ work( n4+1 ),
779 $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
780 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
781 $ ierr )
782*
783 CALL sgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
784 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
785 $ lwork-n, ierr )
786*
787 CALL sgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
788 $ work( n2+1 ), sva( n2+1 ), mvl,
789 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
790 $ work( n+1 ), lwork-n, ierr )
791
792 END IF
793*
794 END IF
795*
796* .. Row-cyclic pivot strategy with de Rijk's pivoting ..
797*
798 DO 1993 i = 1, nsweep
799*
800* .. go go go ...
801*
802 mxaapq = zero
803 mxsinj = zero
804 iswrot = 0
805*
806 notrot = 0
807 pskipped = 0
808*
809* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
810* 1 <= p < q <= N. This is the first step toward a blocked implementation
811* of the rotations. New implementation, based on block transformations,
812* is under development.
813*
814 DO 2000 ibr = 1, nbl
815*
816 igl = ( ibr-1 )*kbl + 1
817*
818 DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
819*
820 igl = igl + ir1*kbl
821*
822 DO 2001 p = igl, min( igl+kbl-1, n-1 )
823*
824* .. de Rijk's pivoting
825*
826 q = isamax( n-p+1, sva( p ), 1 ) + p - 1
827 IF( p.NE.q ) THEN
828 CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
829 IF( rsvec )CALL sswap( mvl, v( 1, p ), 1,
830 $ v( 1, q ), 1 )
831 temp1 = sva( p )
832 sva( p ) = sva( q )
833 sva( q ) = temp1
834 temp1 = work( p )
835 work( p ) = work( q )
836 work( q ) = temp1
837 END IF
838*
839 IF( ir1.EQ.0 ) THEN
840*
841* Column norms are periodically updated by explicit
842* norm computation.
843* Caveat:
844* Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1)
845* as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
846* overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to
847* underflow for ||A(:,p)||_2 < SQRT(underflow_threshold).
848* Hence, SNRM2 cannot be trusted, not even in the case when
849* the true norm is far from the under(over)flow boundaries.
850* If properly implemented SNRM2 is available, the IF-THEN-ELSE
851* below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)".
852*
853 IF( ( sva( p ).LT.rootbig ) .AND.
854 $ ( sva( p ).GT.rootsfmin ) ) THEN
855 sva( p ) = snrm2( m, a( 1, p ), 1 )*work( p )
856 ELSE
857 temp1 = zero
858 aapp = one
859 CALL slassq( m, a( 1, p ), 1, temp1, aapp )
860 sva( p ) = temp1*sqrt( aapp )*work( p )
861 END IF
862 aapp = sva( p )
863 ELSE
864 aapp = sva( p )
865 END IF
866*
867 IF( aapp.GT.zero ) THEN
868*
869 pskipped = 0
870*
871 DO 2002 q = p + 1, min( igl+kbl-1, n )
872*
873 aaqq = sva( q )
874*
875 IF( aaqq.GT.zero ) THEN
876*
877 aapp0 = aapp
878 IF( aaqq.GE.one ) THEN
879 rotok = ( small*aapp ).LE.aaqq
880 IF( aapp.LT.( big / aaqq ) ) THEN
881 aapq = ( sdot( m, a( 1, p ), 1,
882 $ a( 1,
883 $ q ), 1 )*work( p )*work( q ) /
884 $ aaqq ) / aapp
885 ELSE
886 CALL scopy( m, a( 1, p ), 1,
887 $ work( n+1 ), 1 )
888 CALL slascl( 'G', 0, 0, aapp,
889 $ work( p ), m, 1,
890 $ work( n+1 ), lda, ierr )
891 aapq = sdot( m, work( n+1 ), 1,
892 $ a( 1, q ), 1 )*work( q ) / aaqq
893 END IF
894 ELSE
895 rotok = aapp.LE.( aaqq / small )
896 IF( aapp.GT.( small / aaqq ) ) THEN
897 aapq = ( sdot( m, a( 1, p ), 1,
898 $ a( 1,
899 $ q ), 1 )*work( p )*work( q ) /
900 $ aaqq ) / aapp
901 ELSE
902 CALL scopy( m, a( 1, q ), 1,
903 $ work( n+1 ), 1 )
904 CALL slascl( 'G', 0, 0, aaqq,
905 $ work( q ), m, 1,
906 $ work( n+1 ), lda, ierr )
907 aapq = sdot( m, work( n+1 ), 1,
908 $ a( 1, p ), 1 )*work( p ) / aapp
909 END IF
910 END IF
911*
912 mxaapq = max( mxaapq, abs( aapq ) )
913*
914* TO rotate or NOT to rotate, THAT is the question ...
915*
916 IF( abs( aapq ).GT.tol ) THEN
917*
918* .. rotate
919*[RTD] ROTATED = ROTATED + ONE
920*
921 IF( ir1.EQ.0 ) THEN
922 notrot = 0
923 pskipped = 0
924 iswrot = iswrot + 1
925 END IF
926*
927 IF( rotok ) THEN
928*
929 aqoap = aaqq / aapp
930 apoaq = aapp / aaqq
931 theta = -half*abs( aqoap-apoaq ) / aapq
932*
933 IF( abs( theta ).GT.bigtheta ) THEN
934*
935 t = half / theta
936 fastr( 3 ) = t*work( p ) / work( q )
937 fastr( 4 ) = -t*work( q ) /
938 $ work( p )
939 CALL srotm( m, a( 1, p ), 1,
940 $ a( 1, q ), 1, fastr )
941 IF( rsvec )CALL srotm( mvl,
942 $ v( 1, p ), 1,
943 $ v( 1, q ), 1,
944 $ fastr )
945 sva( q ) = aaqq*sqrt( max( zero,
946 $ one+t*apoaq*aapq ) )
947 aapp = aapp*sqrt( max( zero,
948 $ one-t*aqoap*aapq ) )
949 mxsinj = max( mxsinj, abs( t ) )
950*
951 ELSE
952*
953* .. choose correct signum for THETA and rotate
954*
955 thsign = -sign( one, aapq )
956 t = one / ( theta+thsign*
957 $ sqrt( one+theta*theta ) )
958 cs = sqrt( one / ( one+t*t ) )
959 sn = t*cs
960*
961 mxsinj = max( mxsinj, abs( sn ) )
962 sva( q ) = aaqq*sqrt( max( zero,
963 $ one+t*apoaq*aapq ) )
964 aapp = aapp*sqrt( max( zero,
965 $ one-t*aqoap*aapq ) )
966*
967 apoaq = work( p ) / work( q )
968 aqoap = work( q ) / work( p )
969 IF( work( p ).GE.one ) THEN
970 IF( work( q ).GE.one ) THEN
971 fastr( 3 ) = t*apoaq
972 fastr( 4 ) = -t*aqoap
973 work( p ) = work( p )*cs
974 work( q ) = work( q )*cs
975 CALL srotm( m, a( 1, p ),
976 $ 1,
977 $ a( 1, q ), 1,
978 $ fastr )
979 IF( rsvec )CALL srotm( mvl,
980 $ v( 1, p ), 1, v( 1, q ),
981 $ 1, fastr )
982 ELSE
983 CALL saxpy( m, -t*aqoap,
984 $ a( 1, q ), 1,
985 $ a( 1, p ), 1 )
986 CALL saxpy( m, cs*sn*apoaq,
987 $ a( 1, p ), 1,
988 $ a( 1, q ), 1 )
989 work( p ) = work( p )*cs
990 work( q ) = work( q ) / cs
991 IF( rsvec ) THEN
992 CALL saxpy( mvl,
993 $ -t*aqoap,
994 $ v( 1, q ), 1,
995 $ v( 1, p ), 1 )
996 CALL saxpy( mvl,
997 $ cs*sn*apoaq,
998 $ v( 1, p ), 1,
999 $ v( 1, q ), 1 )
1000 END IF
1001 END IF
1002 ELSE
1003 IF( work( q ).GE.one ) THEN
1004 CALL saxpy( m, t*apoaq,
1005 $ a( 1, p ), 1,
1006 $ a( 1, q ), 1 )
1007 CALL saxpy( m,
1008 $ -cs*sn*aqoap,
1009 $ a( 1, q ), 1,
1010 $ a( 1, p ), 1 )
1011 work( p ) = work( p ) / cs
1012 work( q ) = work( q )*cs
1013 IF( rsvec ) THEN
1014 CALL saxpy( mvl,
1015 $ t*apoaq,
1016 $ v( 1, p ), 1,
1017 $ v( 1, q ), 1 )
1018 CALL saxpy( mvl,
1019 $ -cs*sn*aqoap,
1020 $ v( 1, q ), 1,
1021 $ v( 1, p ), 1 )
1022 END IF
1023 ELSE
1024 IF( work( p ).GE.work( q ) )
1025 $ THEN
1026 CALL saxpy( m, -t*aqoap,
1027 $ a( 1, q ), 1,
1028 $ a( 1, p ), 1 )
1029 CALL saxpy( m,
1030 $ cs*sn*apoaq,
1031 $ a( 1, p ), 1,
1032 $ a( 1, q ), 1 )
1033 work( p ) = work( p )*cs
1034 work( q ) = work( q ) / cs
1035 IF( rsvec ) THEN
1036 CALL saxpy( mvl,
1037 $ -t*aqoap,
1038 $ v( 1, q ), 1,
1039 $ v( 1, p ), 1 )
1040 CALL saxpy( mvl,
1041 $ cs*sn*apoaq,
1042 $ v( 1, p ), 1,
1043 $ v( 1, q ), 1 )
1044 END IF
1045 ELSE
1046 CALL saxpy( m, t*apoaq,
1047 $ a( 1, p ), 1,
1048 $ a( 1, q ), 1 )
1049 CALL saxpy( m,
1050 $ -cs*sn*aqoap,
1051 $ a( 1, q ), 1,
1052 $ a( 1, p ), 1 )
1053 work( p ) = work( p ) / cs
1054 work( q ) = work( q )*cs
1055 IF( rsvec ) THEN
1056 CALL saxpy( mvl,
1057 $ t*apoaq, v( 1, p ),
1058 $ 1, v( 1, q ), 1 )
1059 CALL saxpy( mvl,
1060 $ -cs*sn*aqoap,
1061 $ v( 1, q ), 1,
1062 $ v( 1, p ), 1 )
1063 END IF
1064 END IF
1065 END IF
1066 END IF
1067 END IF
1068*
1069 ELSE
1070* .. have to use modified Gram-Schmidt like transformation
1071 CALL scopy( m, a( 1, p ), 1,
1072 $ work( n+1 ), 1 )
1073 CALL slascl( 'G', 0, 0, aapp, one,
1074 $ m,
1075 $ 1, work( n+1 ), lda,
1076 $ ierr )
1077 CALL slascl( 'G', 0, 0, aaqq, one,
1078 $ m,
1079 $ 1, a( 1, q ), lda, ierr )
1080 temp1 = -aapq*work( p ) / work( q )
1081 CALL saxpy( m, temp1, work( n+1 ),
1082 $ 1,
1083 $ a( 1, q ), 1 )
1084 CALL slascl( 'G', 0, 0, one, aaqq,
1085 $ m,
1086 $ 1, a( 1, q ), lda, ierr )
1087 sva( q ) = aaqq*sqrt( max( zero,
1088 $ one-aapq*aapq ) )
1089 mxsinj = max( mxsinj, sfmin )
1090 END IF
1091* END IF ROTOK THEN ... ELSE
1092*
1093* In the case of cancellation in updating SVA(q), SVA(p)
1094* recompute SVA(q), SVA(p).
1095*
1096 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1097 $ THEN
1098 IF( ( aaqq.LT.rootbig ) .AND.
1099 $ ( aaqq.GT.rootsfmin ) ) THEN
1100 sva( q ) = snrm2( m, a( 1, q ),
1101 $ 1 )*
1102 $ work( q )
1103 ELSE
1104 t = zero
1105 aaqq = one
1106 CALL slassq( m, a( 1, q ), 1, t,
1107 $ aaqq )
1108 sva( q ) = t*sqrt( aaqq )*work( q )
1109 END IF
1110 END IF
1111 IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1112 IF( ( aapp.LT.rootbig ) .AND.
1113 $ ( aapp.GT.rootsfmin ) ) THEN
1114 aapp = snrm2( m, a( 1, p ), 1 )*
1115 $ work( p )
1116 ELSE
1117 t = zero
1118 aapp = one
1119 CALL slassq( m, a( 1, p ), 1, t,
1120 $ aapp )
1121 aapp = t*sqrt( aapp )*work( p )
1122 END IF
1123 sva( p ) = aapp
1124 END IF
1125*
1126 ELSE
1127* A(:,p) and A(:,q) already numerically orthogonal
1128 IF( ir1.EQ.0 )notrot = notrot + 1
1129*[RTD] SKIPPED = SKIPPED + 1
1130 pskipped = pskipped + 1
1131 END IF
1132 ELSE
1133* A(:,q) is zero column
1134 IF( ir1.EQ.0 )notrot = notrot + 1
1135 pskipped = pskipped + 1
1136 END IF
1137*
1138 IF( ( i.LE.swband ) .AND.
1139 $ ( pskipped.GT.rowskip ) ) THEN
1140 IF( ir1.EQ.0 )aapp = -aapp
1141 notrot = 0
1142 GO TO 2103
1143 END IF
1144*
1145 2002 CONTINUE
1146* END q-LOOP
1147*
1148 2103 CONTINUE
1149* bailed out of q-loop
1150*
1151 sva( p ) = aapp
1152*
1153 ELSE
1154 sva( p ) = aapp
1155 IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1156 $ notrot = notrot + min( igl+kbl-1, n ) - p
1157 END IF
1158*
1159 2001 CONTINUE
1160* end of the p-loop
1161* end of doing the block ( ibr, ibr )
1162 1002 CONTINUE
1163* end of ir1-loop
1164*
1165* ... go to the off diagonal blocks
1166*
1167 igl = ( ibr-1 )*kbl + 1
1168*
1169 DO 2010 jbc = ibr + 1, nbl
1170*
1171 jgl = ( jbc-1 )*kbl + 1
1172*
1173* doing the block at ( ibr, jbc )
1174*
1175 ijblsk = 0
1176 DO 2100 p = igl, min( igl+kbl-1, n )
1177*
1178 aapp = sva( p )
1179 IF( aapp.GT.zero ) THEN
1180*
1181 pskipped = 0
1182*
1183 DO 2200 q = jgl, min( jgl+kbl-1, n )
1184*
1185 aaqq = sva( q )
1186 IF( aaqq.GT.zero ) THEN
1187 aapp0 = aapp
1188*
1189* .. M x 2 Jacobi SVD ..
1190*
1191* Safe Gram matrix computation
1192*
1193 IF( aaqq.GE.one ) THEN
1194 IF( aapp.GE.aaqq ) THEN
1195 rotok = ( small*aapp ).LE.aaqq
1196 ELSE
1197 rotok = ( small*aaqq ).LE.aapp
1198 END IF
1199 IF( aapp.LT.( big / aaqq ) ) THEN
1200 aapq = ( sdot( m, a( 1, p ), 1,
1201 $ a( 1,
1202 $ q ), 1 )*work( p )*work( q ) /
1203 $ aaqq ) / aapp
1204 ELSE
1205 CALL scopy( m, a( 1, p ), 1,
1206 $ work( n+1 ), 1 )
1207 CALL slascl( 'G', 0, 0, aapp,
1208 $ work( p ), m, 1,
1209 $ work( n+1 ), lda, ierr )
1210 aapq = sdot( m, work( n+1 ), 1,
1211 $ a( 1, q ), 1 )*work( q ) / aaqq
1212 END IF
1213 ELSE
1214 IF( aapp.GE.aaqq ) THEN
1215 rotok = aapp.LE.( aaqq / small )
1216 ELSE
1217 rotok = aaqq.LE.( aapp / small )
1218 END IF
1219 IF( aapp.GT.( small / aaqq ) ) THEN
1220 aapq = ( sdot( m, a( 1, p ), 1,
1221 $ a( 1,
1222 $ q ), 1 )*work( p )*work( q ) /
1223 $ aaqq ) / aapp
1224 ELSE
1225 CALL scopy( m, a( 1, q ), 1,
1226 $ work( n+1 ), 1 )
1227 CALL slascl( 'G', 0, 0, aaqq,
1228 $ work( q ), m, 1,
1229 $ work( n+1 ), lda, ierr )
1230 aapq = sdot( m, work( n+1 ), 1,
1231 $ a( 1, p ), 1 )*work( p ) / aapp
1232 END IF
1233 END IF
1234*
1235 mxaapq = max( mxaapq, abs( aapq ) )
1236*
1237* TO rotate or NOT to rotate, THAT is the question ...
1238*
1239 IF( abs( aapq ).GT.tol ) THEN
1240 notrot = 0
1241*[RTD] ROTATED = ROTATED + 1
1242 pskipped = 0
1243 iswrot = iswrot + 1
1244*
1245 IF( rotok ) THEN
1246*
1247 aqoap = aaqq / aapp
1248 apoaq = aapp / aaqq
1249 theta = -half*abs( aqoap-apoaq ) / aapq
1250 IF( aaqq.GT.aapp0 )theta = -theta
1251*
1252 IF( abs( theta ).GT.bigtheta ) THEN
1253 t = half / theta
1254 fastr( 3 ) = t*work( p ) / work( q )
1255 fastr( 4 ) = -t*work( q ) /
1256 $ work( p )
1257 CALL srotm( m, a( 1, p ), 1,
1258 $ a( 1, q ), 1, fastr )
1259 IF( rsvec )CALL srotm( mvl,
1260 $ v( 1, p ), 1,
1261 $ v( 1, q ), 1,
1262 $ fastr )
1263 sva( q ) = aaqq*sqrt( max( zero,
1264 $ one+t*apoaq*aapq ) )
1265 aapp = aapp*sqrt( max( zero,
1266 $ one-t*aqoap*aapq ) )
1267 mxsinj = max( mxsinj, abs( t ) )
1268 ELSE
1269*
1270* .. choose correct signum for THETA and rotate
1271*
1272 thsign = -sign( one, aapq )
1273 IF( aaqq.GT.aapp0 )thsign = -thsign
1274 t = one / ( theta+thsign*
1275 $ sqrt( one+theta*theta ) )
1276 cs = sqrt( one / ( one+t*t ) )
1277 sn = t*cs
1278 mxsinj = max( mxsinj, abs( sn ) )
1279 sva( q ) = aaqq*sqrt( max( zero,
1280 $ one+t*apoaq*aapq ) )
1281 aapp = aapp*sqrt( max( zero,
1282 $ one-t*aqoap*aapq ) )
1283*
1284 apoaq = work( p ) / work( q )
1285 aqoap = work( q ) / work( p )
1286 IF( work( p ).GE.one ) THEN
1287*
1288 IF( work( q ).GE.one ) THEN
1289 fastr( 3 ) = t*apoaq
1290 fastr( 4 ) = -t*aqoap
1291 work( p ) = work( p )*cs
1292 work( q ) = work( q )*cs
1293 CALL srotm( m, a( 1, p ),
1294 $ 1,
1295 $ a( 1, q ), 1,
1296 $ fastr )
1297 IF( rsvec )CALL srotm( mvl,
1298 $ v( 1, p ), 1, v( 1, q ),
1299 $ 1, fastr )
1300 ELSE
1301 CALL saxpy( m, -t*aqoap,
1302 $ a( 1, q ), 1,
1303 $ a( 1, p ), 1 )
1304 CALL saxpy( m, cs*sn*apoaq,
1305 $ a( 1, p ), 1,
1306 $ a( 1, q ), 1 )
1307 IF( rsvec ) THEN
1308 CALL saxpy( mvl,
1309 $ -t*aqoap,
1310 $ v( 1, q ), 1,
1311 $ v( 1, p ), 1 )
1312 CALL saxpy( mvl,
1313 $ cs*sn*apoaq,
1314 $ v( 1, p ), 1,
1315 $ v( 1, q ), 1 )
1316 END IF
1317 work( p ) = work( p )*cs
1318 work( q ) = work( q ) / cs
1319 END IF
1320 ELSE
1321 IF( work( q ).GE.one ) THEN
1322 CALL saxpy( m, t*apoaq,
1323 $ a( 1, p ), 1,
1324 $ a( 1, q ), 1 )
1325 CALL saxpy( m,
1326 $ -cs*sn*aqoap,
1327 $ a( 1, q ), 1,
1328 $ a( 1, p ), 1 )
1329 IF( rsvec ) THEN
1330 CALL saxpy( mvl,
1331 $ t*apoaq,
1332 $ v( 1, p ), 1,
1333 $ v( 1, q ), 1 )
1334 CALL saxpy( mvl,
1335 $ -cs*sn*aqoap,
1336 $ v( 1, q ), 1,
1337 $ v( 1, p ), 1 )
1338 END IF
1339 work( p ) = work( p ) / cs
1340 work( q ) = work( q )*cs
1341 ELSE
1342 IF( work( p ).GE.work( q ) )
1343 $ THEN
1344 CALL saxpy( m, -t*aqoap,
1345 $ a( 1, q ), 1,
1346 $ a( 1, p ), 1 )
1347 CALL saxpy( m,
1348 $ cs*sn*apoaq,
1349 $ a( 1, p ), 1,
1350 $ a( 1, q ), 1 )
1351 work( p ) = work( p )*cs
1352 work( q ) = work( q ) / cs
1353 IF( rsvec ) THEN
1354 CALL saxpy( mvl,
1355 $ -t*aqoap,
1356 $ v( 1, q ), 1,
1357 $ v( 1, p ), 1 )
1358 CALL saxpy( mvl,
1359 $ cs*sn*apoaq,
1360 $ v( 1, p ), 1,
1361 $ v( 1, q ), 1 )
1362 END IF
1363 ELSE
1364 CALL saxpy( m, t*apoaq,
1365 $ a( 1, p ), 1,
1366 $ a( 1, q ), 1 )
1367 CALL saxpy( m,
1368 $ -cs*sn*aqoap,
1369 $ a( 1, q ), 1,
1370 $ a( 1, p ), 1 )
1371 work( p ) = work( p ) / cs
1372 work( q ) = work( q )*cs
1373 IF( rsvec ) THEN
1374 CALL saxpy( mvl,
1375 $ t*apoaq, v( 1, p ),
1376 $ 1, v( 1, q ), 1 )
1377 CALL saxpy( mvl,
1378 $ -cs*sn*aqoap,
1379 $ v( 1, q ), 1,
1380 $ v( 1, p ), 1 )
1381 END IF
1382 END IF
1383 END IF
1384 END IF
1385 END IF
1386*
1387 ELSE
1388 IF( aapp.GT.aaqq ) THEN
1389 CALL scopy( m, a( 1, p ), 1,
1390 $ work( n+1 ), 1 )
1391 CALL slascl( 'G', 0, 0, aapp,
1392 $ one,
1393 $ m, 1, work( n+1 ), lda,
1394 $ ierr )
1395 CALL slascl( 'G', 0, 0, aaqq,
1396 $ one,
1397 $ m, 1, a( 1, q ), lda,
1398 $ ierr )
1399 temp1 = -aapq*work( p ) / work( q )
1400 CALL saxpy( m, temp1,
1401 $ work( n+1 ),
1402 $ 1, a( 1, q ), 1 )
1403 CALL slascl( 'G', 0, 0, one,
1404 $ aaqq,
1405 $ m, 1, a( 1, q ), lda,
1406 $ ierr )
1407 sva( q ) = aaqq*sqrt( max( zero,
1408 $ one-aapq*aapq ) )
1409 mxsinj = max( mxsinj, sfmin )
1410 ELSE
1411 CALL scopy( m, a( 1, q ), 1,
1412 $ work( n+1 ), 1 )
1413 CALL slascl( 'G', 0, 0, aaqq,
1414 $ one,
1415 $ m, 1, work( n+1 ), lda,
1416 $ ierr )
1417 CALL slascl( 'G', 0, 0, aapp,
1418 $ one,
1419 $ m, 1, a( 1, p ), lda,
1420 $ ierr )
1421 temp1 = -aapq*work( q ) / work( p )
1422 CALL saxpy( m, temp1,
1423 $ work( n+1 ),
1424 $ 1, a( 1, p ), 1 )
1425 CALL slascl( 'G', 0, 0, one,
1426 $ aapp,
1427 $ m, 1, a( 1, p ), lda,
1428 $ ierr )
1429 sva( p ) = aapp*sqrt( max( zero,
1430 $ one-aapq*aapq ) )
1431 mxsinj = max( mxsinj, sfmin )
1432 END IF
1433 END IF
1434* END IF ROTOK THEN ... ELSE
1435*
1436* In the case of cancellation in updating SVA(q)
1437* .. recompute SVA(q)
1438 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1439 $ THEN
1440 IF( ( aaqq.LT.rootbig ) .AND.
1441 $ ( aaqq.GT.rootsfmin ) ) THEN
1442 sva( q ) = snrm2( m, a( 1, q ),
1443 $ 1 )*
1444 $ work( q )
1445 ELSE
1446 t = zero
1447 aaqq = one
1448 CALL slassq( m, a( 1, q ), 1, t,
1449 $ aaqq )
1450 sva( q ) = t*sqrt( aaqq )*work( q )
1451 END IF
1452 END IF
1453 IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1454 IF( ( aapp.LT.rootbig ) .AND.
1455 $ ( aapp.GT.rootsfmin ) ) THEN
1456 aapp = snrm2( m, a( 1, p ), 1 )*
1457 $ work( p )
1458 ELSE
1459 t = zero
1460 aapp = one
1461 CALL slassq( m, a( 1, p ), 1, t,
1462 $ aapp )
1463 aapp = t*sqrt( aapp )*work( p )
1464 END IF
1465 sva( p ) = aapp
1466 END IF
1467* end of OK rotation
1468 ELSE
1469 notrot = notrot + 1
1470*[RTD] SKIPPED = SKIPPED + 1
1471 pskipped = pskipped + 1
1472 ijblsk = ijblsk + 1
1473 END IF
1474 ELSE
1475 notrot = notrot + 1
1476 pskipped = pskipped + 1
1477 ijblsk = ijblsk + 1
1478 END IF
1479*
1480 IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1481 $ THEN
1482 sva( p ) = aapp
1483 notrot = 0
1484 GO TO 2011
1485 END IF
1486 IF( ( i.LE.swband ) .AND.
1487 $ ( pskipped.GT.rowskip ) ) THEN
1488 aapp = -aapp
1489 notrot = 0
1490 GO TO 2203
1491 END IF
1492*
1493 2200 CONTINUE
1494* end of the q-loop
1495 2203 CONTINUE
1496*
1497 sva( p ) = aapp
1498*
1499 ELSE
1500*
1501 IF( aapp.EQ.zero )notrot = notrot +
1502 $ min( jgl+kbl-1, n ) - jgl + 1
1503 IF( aapp.LT.zero )notrot = 0
1504*
1505 END IF
1506*
1507 2100 CONTINUE
1508* end of the p-loop
1509 2010 CONTINUE
1510* end of the jbc-loop
1511 2011 CONTINUE
1512*2011 bailed out of the jbc-loop
1513 DO 2012 p = igl, min( igl+kbl-1, n )
1514 sva( p ) = abs( sva( p ) )
1515 2012 CONTINUE
1516***
1517 2000 CONTINUE
1518*2000 :: end of the ibr-loop
1519*
1520* .. update SVA(N)
1521 IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1522 $ THEN
1523 sva( n ) = snrm2( m, a( 1, n ), 1 )*work( n )
1524 ELSE
1525 t = zero
1526 aapp = one
1527 CALL slassq( m, a( 1, n ), 1, t, aapp )
1528 sva( n ) = t*sqrt( aapp )*work( n )
1529 END IF
1530*
1531* Additional steering devices
1532*
1533 IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1534 $ ( iswrot.LE.n ) ) )swband = i
1535*
1536 IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.sqrt( float( n ) )*
1537 $ tol ) .AND. ( float( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1538 GO TO 1994
1539 END IF
1540*
1541 IF( notrot.GE.emptsw )GO TO 1994
1542*
1543 1993 CONTINUE
1544* end i=1:NSWEEP loop
1545*
1546* #:( Reaching this point means that the procedure has not converged.
1547 info = nsweep - 1
1548 GO TO 1995
1549*
1550 1994 CONTINUE
1551* #:) Reaching this point means numerical convergence after the i-th
1552* sweep.
1553*
1554 info = 0
1555* #:) INFO = 0 confirms successful iterations.
1556 1995 CONTINUE
1557*
1558* Sort the singular values and find how many are above
1559* the underflow threshold.
1560*
1561 n2 = 0
1562 n4 = 0
1563 DO 5991 p = 1, n - 1
1564 q = isamax( n-p+1, sva( p ), 1 ) + p - 1
1565 IF( p.NE.q ) THEN
1566 temp1 = sva( p )
1567 sva( p ) = sva( q )
1568 sva( q ) = temp1
1569 temp1 = work( p )
1570 work( p ) = work( q )
1571 work( q ) = temp1
1572 CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1573 IF( rsvec )CALL sswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1574 END IF
1575 IF( sva( p ).NE.zero ) THEN
1576 n4 = n4 + 1
1577 IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1578 END IF
1579 5991 CONTINUE
1580 IF( sva( n ).NE.zero ) THEN
1581 n4 = n4 + 1
1582 IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1583 END IF
1584*
1585* Normalize the left singular vectors.
1586*
1587 IF( lsvec .OR. uctol ) THEN
1588 DO 1998 p = 1, n2
1589 CALL sscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1590 1998 CONTINUE
1591 END IF
1592*
1593* Scale the product of Jacobi rotations (assemble the fast rotations).
1594*
1595 IF( rsvec ) THEN
1596 IF( applv ) THEN
1597 DO 2398 p = 1, n
1598 CALL sscal( mvl, work( p ), v( 1, p ), 1 )
1599 2398 CONTINUE
1600 ELSE
1601 DO 2399 p = 1, n
1602 temp1 = one / snrm2( mvl, v( 1, p ), 1 )
1603 CALL sscal( mvl, temp1, v( 1, p ), 1 )
1604 2399 CONTINUE
1605 END IF
1606 END IF
1607*
1608* Undo scaling, if necessary (and possible).
1609 IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl ) ) )
1610 $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1611 $ ( sfmin / skl ) ) ) ) THEN
1612 DO 2400 p = 1, n
1613 sva( p ) = skl*sva( p )
1614 2400 CONTINUE
1615 skl = one
1616 END IF
1617*
1618 work( 1 ) = skl
1619* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1620* then some of the singular values may overflow or underflow and
1621* the spectrum is given in this factored representation.
1622*
1623 work( 2 ) = float( n4 )
1624* N4 is the number of computed nonzero singular values of A.
1625*
1626 work( 3 ) = float( n2 )
1627* N2 is the number of singular values of A greater than SFMIN.
1628* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1629* that may carry some information.
1630*
1631 work( 4 ) = float( i )
1632* i is the index of the last sweep before declaring convergence.
1633*
1634 work( 5 ) = mxaapq
1635* MXAAPQ is the largest absolute value of scaled pivots in the
1636* last sweep
1637*
1638 work( 6 ) = mxsinj
1639* MXSINJ is the largest absolute value of the sines of Jacobi angles
1640* in the last sweep
1641*
1642 RETURN
1643* ..
1644* .. END OF SGESVJ
1645* ..
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine saxpy(n, sa, sx, incx, sy, incy)
SAXPY
Definition saxpy.f:89
subroutine scopy(n, sx, incx, sy, incy)
SCOPY
Definition scopy.f:82
real function sdot(n, sx, incx, sy, incy)
SDOT
Definition sdot.f:82
subroutine sgsvj0(jobv, m, n, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
SGSVJ0 pre-processor for the routine sgesvj.
Definition sgsvj0.f:216
subroutine sgsvj1(jobv, m, n, n1, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
SGSVJ1 pre-processor for the routine sgesvj, applies Jacobi rotations targeting only particular pivot...
Definition sgsvj1.f:234
integer function isamax(n, sx, incx)
ISAMAX
Definition isamax.f:71
real function slamch(cmach)
SLAMCH
Definition slamch.f:68
subroutine slascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition slascl.f:142
subroutine slaset(uplo, m, n, alpha, beta, a, lda)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition slaset.f:108
subroutine slassq(n, x, incx, scale, sumsq)
SLASSQ updates a sum of squares represented in scaled form.
Definition slassq.f90:122
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real(wp) function snrm2(n, x, incx)
SNRM2
Definition snrm2.f90:89
subroutine srotm(n, sx, incx, sy, incy, sparam)
SROTM
Definition srotm.f:97
real function sroundup_lwork(lwork)
SROUNDUP_LWORK
subroutine sscal(n, sa, sx, incx)
SSCAL
Definition sscal.f:79
subroutine sswap(n, sx, incx, sy, incy)
SSWAP
Definition sswap.f:82
Here is the call graph for this function:
Here is the caller graph for this function: