LAPACK  3.6.1 LAPACK: Linear Algebra PACKage
 subroutine cgejsv ( character*1 JOBA, character*1 JOBU, character*1 JOBV, character*1 JOBR, character*1 JOBT, character*1 JOBP, integer M, integer N, complex, dimension( lda, * ) A, integer LDA, real, dimension( n ) SVA, complex, dimension( ldu, * ) U, integer LDU, complex, dimension( ldv, * ) V, integer LDV, complex, dimension( lwork ) CWORK, integer LWORK, real, dimension( * ) RWORK, integer LRWORK, integer, dimension( * ) IWORK, integer INFO )

CGEJSV

Download CGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
CGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
matrix [A], where M >= N. The SVD of [A] is written as

[A] = [U] * [SIGMA] * [V]^*,

where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
[V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
the singular values of [A]. The columns of [U] and [V] are the left and
the right singular vectors of [A], respectively. The matrices [U] and [V]
are computed and stored in the arrays U and V, respectively. The diagonal
of [SIGMA] is computed and stored in the array SVA.

# Arguments:

Parameters
Date
June 2016
Further Details:
CGEJSV implements a preconditioned Jacobi SVD algorithm. It uses CGEQP3,
CGEQRF, and CGELQF as preprocessors and preconditioners. Optionally, an
additional row pivoting can be used as a preprocessor, which in some
cases results in much higher accuracy. An example is matrix A with the
structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
diagonal matrices and C is well-conditioned matrix. In that case, complete
pivoting in the first QR factorizations provides accuracy dependent on the
condition number of C, and independent of D1, D2. Such higher accuracy is
not completely understood theoretically, but it works well in practice.
Further, if A can be written as A = B*D, with well-conditioned B and some
diagonal D, then the high accuracy is guaranteed, both theoretically and
in software, independent of D. For more details see [1], [2].
The computational range for the singular values can be the full range
( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
& LAPACK routines called by CGEJSV are implemented to work in that range.
If that is not the case, then the restriction for safe computation with
the singular values in the range of normalized IEEE numbers is that the
spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
overflow. This code (CGEJSV) is best used in this restricted range,
meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
returned as zeros. See JOBR for details on this.
Further, this implementation is somewhat slower than the one described
in [1,2] due to replacement of some non-LAPACK components, and because
the choice of some tuning parameters in the iterative part (CGESVJ) is
left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: CGEQP3) should be
implemented as in [3]. We have a new version of CGEQP3 under development
that is more robust than the current one in LAPACK, with a cleaner cut in
rank defficient cases. It will be available in the SIGMA library [4].
If M is much larger than N, it is obvious that the inital QRF with
column pivoting can be preprocessed by the QRF without pivoting. That
well known trick is not used in CGEJSV because in some cases heavy row
weighting can be treated with complete pivoting. The overhead in cases
M much larger than N is then only due to pivoting, but the benefits in
terms of accuracy have prevailed. The implementer/user can incorporate
this extra QRF step easily. The implementer can also improve data movement
(matrix transpose, matrix copy, matrix transposed copy) - this
implementation of CGEJSV uses only the simplest, naive data movement.  \par Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)  \par References:
@verbatim

[1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
LAPACK Working note 169.
[2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
LAPACK Working note 170.
[3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
factorization software - a case study.
ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
LAPACK Working note 176.
[4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 519 of file cgejsv.f.

519 *
520 * -- LAPACK computational routine (version 3.6.1) --
521 * -- LAPACK is a software package provided by Univ. of Tennessee, --
522 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
523 * June 2016
524 *
525 * .. Scalar Arguments ..
526  IMPLICIT NONE
527  INTEGER info, lda, ldu, ldv, lwork, lrwork, m, n
528 * ..
529 * .. Array Arguments ..
530  COMPLEX a( lda, * ), u( ldu, * ), v( ldv, * ), cwork( lwork )
531  REAL sva( n ), rwork( * )
532  INTEGER iwork( * )
533  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
534 * ..
535 *
536 * ===========================================================================
537 *
538 * .. Local Parameters ..
539  REAL zero, one
540  parameter( zero = 0.0e0, one = 1.0e0 )
541  COMPLEX czero, cone
542  parameter( czero = ( 0.0e0, 0.0e0 ), cone = ( 1.0e0, 0.0e0 ) )
543 * ..
544 * .. Local Scalars ..
545  COMPLEX ctemp
546  REAL aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
547  \$ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
548  \$ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
549  INTEGER ierr, n1, nr, numrank, p, q, warning
550  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
551  \$ l2aber, l2kill, l2pert, l2rank, l2tran,
552  \$ noscal, rowpiv, rsvec, transp
553 * ..
554 * .. Intrinsic Functions ..
555  INTRINSIC abs, conjg, alog, amax1, amin1, cmplx, float,
556  \$ max0, min0, nint, sign, sqrt
557 * ..
558 * .. External Functions ..
559  REAL slamch, scnrm2
560  INTEGER isamax, icamax
561  LOGICAL lsame
562  EXTERNAL isamax, icamax, lsame, slamch, scnrm2
563 * ..
564 * .. External Subroutines ..
565  EXTERNAL ccopy, cgelqf, cgeqp3, cgeqrf, clacpy, clascl,
568 *
569  EXTERNAL cgesvj
570 * ..
571 *
572 * Test the input arguments
573 *
574  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
575  jracc = lsame( jobv, 'J' )
576  rsvec = lsame( jobv, 'V' ) .OR. jracc
577  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
578  l2rank = lsame( joba, 'R' )
579  l2aber = lsame( joba, 'A' )
580  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
581  l2tran = lsame( jobt, 'T' )
582  l2kill = lsame( jobr, 'R' )
583  defr = lsame( jobr, 'N' )
584  l2pert = lsame( jobp, 'P' )
585 *
586  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
587  \$ errest .OR. lsame( joba, 'C' ) )) THEN
588  info = - 1
589  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
590  \$ lsame( jobu, 'W' )) ) THEN
591  info = - 2
592  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
593  \$ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
594  info = - 3
595  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
596  info = - 4
597  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
598  info = - 5
599  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
600  info = - 6
601  ELSE IF ( m .LT. 0 ) THEN
602  info = - 7
603  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
604  info = - 8
605  ELSE IF ( lda .LT. m ) THEN
606  info = - 10
607  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
608  info = - 13
609  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
610  info = - 15
611  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
612  \$ (lwork .LT. 2*n+1)) .OR.
613  \$ (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
614  \$ (lwork .LT. n*n+3*n)) .OR.
615  \$ (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. 3*n))
616  \$ .OR.
617  \$ (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. 3*n))
618  \$ .OR.
619  \$ (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
620  \$ (lwork.LT.5*n+2*n*n))
621  \$ .OR. (lsvec .AND. rsvec .AND. jracc .AND.
622  \$ lwork.LT.4*n+n*n))
623  \$ THEN
624  info = - 17
625  ELSE IF ( lrwork.LT. max0(n+2*m,7)) THEN
626  info = -19
627  ELSE
628 * #:)
629  info = 0
630  END IF
631 *
632  IF ( info .NE. 0 ) THEN
633 * #:(
634  CALL xerbla( 'CGEJSV', - info )
635  RETURN
636  END IF
637 *
638 * Quick return for void matrix (Y3K safe)
639 * #:)
640  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
641  iwork(1:3) = 0
642  rwork(1:7) = 0
643  RETURN
644  ENDIF
645 *
646 * Determine whether the matrix U should be M x N or M x M
647 *
648  IF ( lsvec ) THEN
649  n1 = n
650  IF ( lsame( jobu, 'F' ) ) n1 = m
651  END IF
652 *
653 * Set numerical parameters
654 *
655 *! NOTE: Make sure SLAMCH() does not fail on the target architecture.
656 *
657  epsln = slamch('Epsilon')
658  sfmin = slamch('SafeMinimum')
659  small = sfmin / epsln
660  big = slamch('O')
661 * BIG = ONE / SFMIN
662 *
663 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
664 *
665 *(!) If necessary, scale SVA() to protect the largest norm from
666 * overflow. It is possible that this scaling pushes the smallest
667 * column norm left from the underflow threshold (extreme case).
668 *
669  scalem = one / sqrt(float(m)*float(n))
670  noscal = .true.
671  goscal = .true.
672  DO 1874 p = 1, n
673  aapp = zero
674  aaqq = one
675  CALL classq( m, a(1,p), 1, aapp, aaqq )
676  IF ( aapp .GT. big ) THEN
677  info = - 9
678  CALL xerbla( 'CGEJSV', -info )
679  RETURN
680  END IF
681  aaqq = sqrt(aaqq)
682  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
683  sva(p) = aapp * aaqq
684  ELSE
685  noscal = .false.
686  sva(p) = aapp * ( aaqq * scalem )
687  IF ( goscal ) THEN
688  goscal = .false.
689  CALL sscal( p-1, scalem, sva, 1 )
690  END IF
691  END IF
692  1874 CONTINUE
693 *
694  IF ( noscal ) scalem = one
695 *
696  aapp = zero
697  aaqq = big
698  DO 4781 p = 1, n
699  aapp = amax1( aapp, sva(p) )
700  IF ( sva(p) .NE. zero ) aaqq = amin1( aaqq, sva(p) )
701  4781 CONTINUE
702 *
703 * Quick return for zero M x N matrix
704 * #:)
705  IF ( aapp .EQ. zero ) THEN
706  IF ( lsvec ) CALL claset( 'G', m, n1, czero, cone, u, ldu )
707  IF ( rsvec ) CALL claset( 'G', n, n, czero, cone, v, ldv )
708  rwork(1) = one
709  rwork(2) = one
710  IF ( errest ) rwork(3) = one
711  IF ( lsvec .AND. rsvec ) THEN
712  rwork(4) = one
713  rwork(5) = one
714  END IF
715  IF ( l2tran ) THEN
716  rwork(6) = zero
717  rwork(7) = zero
718  END IF
719  iwork(1) = 0
720  iwork(2) = 0
721  iwork(3) = 0
722  RETURN
723  END IF
724 *
725 * Issue warning if denormalized column norms detected. Override the
726 * high relative accuracy request. Issue licence to kill columns
727 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
728 * #:(
729  warning = 0
730  IF ( aaqq .LE. sfmin ) THEN
731  l2rank = .true.
732  l2kill = .true.
733  warning = 1
734  END IF
735 *
736 * Quick return for one-column matrix
737 * #:)
738  IF ( n .EQ. 1 ) THEN
739 *
740  IF ( lsvec ) THEN
741  CALL clascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
742  CALL clacpy( 'A', m, 1, a, lda, u, ldu )
743 * computing all M left singular vectors of the M x 1 matrix
744  IF ( n1 .NE. n ) THEN
745  CALL cgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
746  CALL cungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
747  CALL ccopy( m, a(1,1), 1, u(1,1), 1 )
748  END IF
749  END IF
750  IF ( rsvec ) THEN
751  v(1,1) = cone
752  END IF
753  IF ( sva(1) .LT. (big*scalem) ) THEN
754  sva(1) = sva(1) / scalem
755  scalem = one
756  END IF
757  rwork(1) = one / scalem
758  rwork(2) = one
759  IF ( sva(1) .NE. zero ) THEN
760  iwork(1) = 1
761  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
762  iwork(2) = 1
763  ELSE
764  iwork(2) = 0
765  END IF
766  ELSE
767  iwork(1) = 0
768  iwork(2) = 0
769  END IF
770  iwork(3) = 0
771  IF ( errest ) rwork(3) = one
772  IF ( lsvec .AND. rsvec ) THEN
773  rwork(4) = one
774  rwork(5) = one
775  END IF
776  IF ( l2tran ) THEN
777  rwork(6) = zero
778  rwork(7) = zero
779  END IF
780  RETURN
781 *
782  END IF
783 *
784  transp = .false.
785  l2tran = l2tran .AND. ( m .EQ. n )
786 *
787  aatmax = -one
788  aatmin = big
789  IF ( rowpiv .OR. l2tran ) THEN
790 *
791 * Compute the row norms, needed to determine row pivoting sequence
792 * (in the case of heavily row weighted A, row pivoting is strongly
793 * advised) and to collect information needed to compare the
794 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
795 *
796  IF ( l2tran ) THEN
797  DO 1950 p = 1, m
798  xsc = zero
799  temp1 = one
800  CALL classq( n, a(p,1), lda, xsc, temp1 )
801 * CLASSQ gets both the ell_2 and the ell_infinity norm
802 * in one pass through the vector
803  rwork(m+n+p) = xsc * scalem
804  rwork(n+p) = xsc * (scalem*sqrt(temp1))
805  aatmax = amax1( aatmax, rwork(n+p) )
806  IF (rwork(n+p) .NE. zero)
807  \$ aatmin = amin1(aatmin,rwork(n+p))
808  1950 CONTINUE
809  ELSE
810  DO 1904 p = 1, m
811  rwork(m+n+p) = scalem*abs( a(p,icamax(n,a(p,1),lda)) )
812  aatmax = amax1( aatmax, rwork(m+n+p) )
813  aatmin = amin1( aatmin, rwork(m+n+p) )
814  1904 CONTINUE
815  END IF
816 *
817  END IF
818 *
819 * For square matrix A try to determine whether A^* would be better
820 * input for the preconditioned Jacobi SVD, with faster convergence.
821 * The decision is based on an O(N) function of the vector of column
822 * and row norms of A, based on the Shannon entropy. This should give
823 * the right choice in most cases when the difference actually matters.
824 * It may fail and pick the slower converging side.
825 *
826  entra = zero
827  entrat = zero
828  IF ( l2tran ) THEN
829 *
830  xsc = zero
831  temp1 = one
832  CALL slassq( n, sva, 1, xsc, temp1 )
833  temp1 = one / temp1
834 *
835  entra = zero
836  DO 1113 p = 1, n
837  big1 = ( ( sva(p) / xsc )**2 ) * temp1
838  IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
839  1113 CONTINUE
840  entra = - entra / alog(float(n))
841 *
842 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
843 * It is derived from the diagonal of A^* * A. Do the same with the
844 * diagonal of A * A^*, compute the entropy of the corresponding
845 * probability distribution. Note that A * A^* and A^* * A have the
846 * same trace.
847 *
848  entrat = zero
849  DO 1114 p = n+1, n+m
850  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
851  IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
852  1114 CONTINUE
853  entrat = - entrat / alog(float(m))
854 *
855 * Analyze the entropies and decide A or A^*. Smaller entropy
856 * usually means better input for the algorithm.
857 *
858  transp = ( entrat .LT. entra )
859  transp = .true.
860 *
861 * If A^* is better than A, take the adjoint of A.
862 *
863  IF ( transp ) THEN
864 * In an optimal implementation, this trivial transpose
865 * should be replaced with faster transpose.
866  DO 1115 p = 1, n - 1
867  a(p,p) = conjg(a(p,p))
868  DO 1116 q = p + 1, n
869  ctemp = conjg(a(q,p))
870  a(q,p) = conjg(a(p,q))
871  a(p,q) = ctemp
872  1116 CONTINUE
873  1115 CONTINUE
874  a(n,n) = conjg(a(n,n))
875  DO 1117 p = 1, n
876  rwork(m+n+p) = sva(p)
877  sva(p) = rwork(n+p)
878 * previously computed row 2-norms are now column 2-norms
879 * of the transposed matrix
880  1117 CONTINUE
881  temp1 = aapp
882  aapp = aatmax
883  aatmax = temp1
884  temp1 = aaqq
885  aaqq = aatmin
886  aatmin = temp1
887  kill = lsvec
888  lsvec = rsvec
889  rsvec = kill
890  IF ( lsvec ) n1 = n
891 *
892  rowpiv = .true.
893  END IF
894 *
895  END IF
896 * END IF L2TRAN
897 *
898 * Scale the matrix so that its maximal singular value remains less
899 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
900 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
901 * SQRT(BIG) instead of BIG is the fact that CGEJSV uses LAPACK and
902 * BLAS routines that, in some implementations, are not capable of
903 * working in the full interval [SFMIN,BIG] and that they may provoke
904 * overflows in the intermediate results. If the singular values spread
905 * from SFMIN to BIG, then CGESVJ will compute them. So, in that case,
906 * one should use CGESVJ instead of CGEJSV.
907 *
908  big1 = sqrt( big )
909  temp1 = sqrt( big / float(n) )
910 *
911  CALL slascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
912  IF ( aaqq .GT. (aapp * sfmin) ) THEN
913  aaqq = ( aaqq / aapp ) * temp1
914  ELSE
915  aaqq = ( aaqq * temp1 ) / aapp
916  END IF
917  temp1 = temp1 * scalem
918  CALL clascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
919 *
920 * To undo scaling at the end of this procedure, multiply the
921 * computed singular values with USCAL2 / USCAL1.
922 *
923  uscal1 = temp1
924  uscal2 = aapp
925 *
926  IF ( l2kill ) THEN
927 * L2KILL enforces computation of nonzero singular values in
928 * the restricted range of condition number of the initial A,
929 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
930  xsc = sqrt( sfmin )
931  ELSE
932  xsc = small
933 *
934 * Now, if the condition number of A is too big,
935 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
936 * as a precaution measure, the full SVD is computed using CGESVJ
937 * with accumulated Jacobi rotations. This provides numerically
938 * more robust computation, at the cost of slightly increased run
939 * time. Depending on the concrete implementation of BLAS and LAPACK
940 * (i.e. how they behave in presence of extreme ill-conditioning) the
941 * implementor may decide to remove this switch.
942  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
943  jracc = .true.
944  END IF
945 *
946  END IF
947  IF ( aaqq .LT. xsc ) THEN
948  DO 700 p = 1, n
949  IF ( sva(p) .LT. xsc ) THEN
950  CALL claset( 'A', m, 1, czero, czero, a(1,p), lda )
951  sva(p) = zero
952  END IF
953  700 CONTINUE
954  END IF
955 *
956 * Preconditioning using QR factorization with pivoting
957 *
958  IF ( rowpiv ) THEN
959 * Optional row permutation (Bjoerck row pivoting):
960 * A result by Cox and Higham shows that the Bjoerck's
961 * row pivoting combined with standard column pivoting
962 * has similar effect as Powell-Reid complete pivoting.
963 * The ell-infinity norms of A are made nonincreasing.
964  DO 1952 p = 1, m - 1
965  q = isamax( m-p+1, rwork(m+n+p), 1 ) + p - 1
966  iwork(2*n+p) = q
967  IF ( p .NE. q ) THEN
968  temp1 = rwork(m+n+p)
969  rwork(m+n+p) = rwork(m+n+q)
970  rwork(m+n+q) = temp1
971  END IF
972  1952 CONTINUE
973  CALL claswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
974  END IF
975 *
976 * End of the preparation phase (scaling, optional sorting and
977 * transposing, optional flushing of small columns).
978 *
979 * Preconditioning
980 *
981 * If the full SVD is needed, the right singular vectors are computed
982 * from a matrix equation, and for that we need theoretical analysis
983 * of the Businger-Golub pivoting. So we use CGEQP3 as the first RR QRF.
984 * In all other cases the first RR QRF can be chosen by other criteria
985 * (eg speed by replacing global with restricted window pivoting, such
986 * as in xGEQPX from TOMS # 782). Good results will be obtained using
987 * xGEQPX with properly (!) chosen numerical parameters.
988 * Any improvement of CGEQP3 improves overal performance of CGEJSV.
989 *
990 * A * P1 = Q1 * [ R1^* 0]^*:
991  DO 1963 p = 1, n
992 * .. all columns are free columns
993  iwork(p) = 0
994  1963 CONTINUE
995  CALL cgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
996  \$ rwork, ierr )
997 *
998 * The upper triangular matrix R1 from the first QRF is inspected for
999 * rank deficiency and possibilities for deflation, or possible
1000 * ill-conditioning. Depending on the user specified flag L2RANK,
1001 * the procedure explores possibilities to reduce the numerical
1002 * rank by inspecting the computed upper triangular factor. If
1003 * L2RANK or L2ABER are up, then CGEJSV will compute the SVD of
1004 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1005 *
1006  nr = 1
1007  IF ( l2aber ) THEN
1008 * Standard absolute error bound suffices. All sigma_i with
1009 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1010 * agressive enforcement of lower numerical rank by introducing a
1011 * backward error of the order of N*EPSLN*||A||.
1012  temp1 = sqrt(float(n))*epsln
1013  DO 3001 p = 2, n
1014  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1015  nr = nr + 1
1016  ELSE
1017  GO TO 3002
1018  END IF
1019  3001 CONTINUE
1020  3002 CONTINUE
1021  ELSE IF ( l2rank ) THEN
1022 * .. similarly as above, only slightly more gentle (less agressive).
1023 * Sudden drop on the diagonal of R1 is used as the criterion for
1024 * close-to-rank-defficient.
1025  temp1 = sqrt(sfmin)
1026  DO 3401 p = 2, n
1027  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1028  \$ ( abs(a(p,p)) .LT. small ) .OR.
1029  \$ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1030  nr = nr + 1
1031  3401 CONTINUE
1032  3402 CONTINUE
1033 *
1034  ELSE
1035 * The goal is high relative accuracy. However, if the matrix
1036 * has high scaled condition number the relative accuracy is in
1037 * general not feasible. Later on, a condition number estimator
1038 * will be deployed to estimate the scaled condition number.
1039 * Here we just remove the underflowed part of the triangular
1040 * factor. This prevents the situation in which the code is
1041 * working hard to get the accuracy not warranted by the data.
1042  temp1 = sqrt(sfmin)
1043  DO 3301 p = 2, n
1044  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1045  \$ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1046  nr = nr + 1
1047  3301 CONTINUE
1048  3302 CONTINUE
1049 *
1050  END IF
1051 *
1052  almort = .false.
1053  IF ( nr .EQ. n ) THEN
1054  maxprj = one
1055  DO 3051 p = 2, n
1056  temp1 = abs(a(p,p)) / sva(iwork(p))
1057  maxprj = amin1( maxprj, temp1 )
1058  3051 CONTINUE
1059  IF ( maxprj**2 .GE. one - float(n)*epsln ) almort = .true.
1060  END IF
1061 *
1062 *
1063  sconda = - one
1064  condr1 = - one
1065  condr2 = - one
1066 *
1067  IF ( errest ) THEN
1068  IF ( n .EQ. nr ) THEN
1069  IF ( rsvec ) THEN
1070 * .. V is available as workspace
1071  CALL clacpy( 'U', n, n, a, lda, v, ldv )
1072  DO 3053 p = 1, n
1073  temp1 = sva(iwork(p))
1074  CALL csscal( p, one/temp1, v(1,p), 1 )
1075  3053 CONTINUE
1076  CALL cpocon( 'U', n, v, ldv, one, temp1,
1077  \$ cwork(n+1), rwork, ierr )
1078 *
1079  ELSE IF ( lsvec ) THEN
1080 * .. U is available as workspace
1081  CALL clacpy( 'U', n, n, a, lda, u, ldu )
1082  DO 3054 p = 1, n
1083  temp1 = sva(iwork(p))
1084  CALL csscal( p, one/temp1, u(1,p), 1 )
1085  3054 CONTINUE
1086  CALL cpocon( 'U', n, u, ldu, one, temp1,
1087  \$ cwork(n+1), rwork, ierr )
1088  ELSE
1089  CALL clacpy( 'U', n, n, a, lda, cwork(n+1), n )
1090  DO 3052 p = 1, n
1091  temp1 = sva(iwork(p))
1092  CALL csscal( p, one/temp1, cwork(n+(p-1)*n+1), 1 )
1093  3052 CONTINUE
1094 * .. the columns of R are scaled to have unit Euclidean lengths.
1095  CALL cpocon( 'U', n, cwork(n+1), n, one, temp1,
1096  \$ cwork(n+n*n+1), rwork, ierr )
1097 *
1098  END IF
1099  sconda = one / sqrt(temp1)
1100 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1101 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1102  ELSE
1103  sconda = - one
1104  END IF
1105  END IF
1106 *
1107  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1108 * If there is no violent scaling, artificial perturbation is not needed.
1109 *
1110 * Phase 3:
1111 *
1112  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1113 *
1114 * Singular Values only
1115 *
1116 * .. transpose A(1:NR,1:N)
1117  DO 1946 p = 1, min0( n-1, nr )
1118  CALL ccopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1119  CALL clacgv( n-p+1, a(p,p), 1 )
1120  1946 CONTINUE
1121  IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1122 *
1123 * The following two DO-loops introduce small relative perturbation
1124 * into the strict upper triangle of the lower triangular matrix.
1125 * Small entries below the main diagonal are also changed.
1126 * This modification is useful if the computing environment does not
1127 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1128 * annoying denormalized numbers in case of strongly scaled matrices.
1129 * The perturbation is structured so that it does not introduce any
1130 * new perturbation of the singular values, and it does not destroy
1131 * the job done by the preconditioner.
1132 * The licence for this perturbation is in the variable L2PERT, which
1133 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1134 *
1135  IF ( .NOT. almort ) THEN
1136 *
1137  IF ( l2pert ) THEN
1138 * XSC = SQRT(SMALL)
1139  xsc = epsln / float(n)
1140  DO 4947 q = 1, nr
1141  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1142  DO 4949 p = 1, n
1143  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1144  \$ .OR. ( p .LT. q ) )
1145 * \$ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1146  \$ a(p,q) = ctemp
1147  4949 CONTINUE
1148  4947 CONTINUE
1149  ELSE
1150  CALL claset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1151  END IF
1152 *
1153 * .. second preconditioning using the QR factorization
1154 *
1155  CALL cgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1156 *
1157 * .. and transpose upper to lower triangular
1158  DO 1948 p = 1, nr - 1
1159  CALL ccopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1160  CALL clacgv( nr-p+1, a(p,p), 1 )
1161  1948 CONTINUE
1162 *
1163  END IF
1164 *
1165 * Row-cyclic Jacobi SVD algorithm with column pivoting
1166 *
1167 * .. again some perturbation (a "background noise") is added
1168 * to drown denormals
1169  IF ( l2pert ) THEN
1170 * XSC = SQRT(SMALL)
1171  xsc = epsln / float(n)
1172  DO 1947 q = 1, nr
1173  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1174  DO 1949 p = 1, nr
1175  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1176  \$ .OR. ( p .LT. q ) )
1177 * \$ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1178  \$ a(p,q) = ctemp
1179  1949 CONTINUE
1180  1947 CONTINUE
1181  ELSE
1182  CALL claset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1183  END IF
1184 *
1185 * .. and one-sided Jacobi rotations are started on a lower
1186 * triangular matrix (plus perturbation which is ignored in
1187 * the part which destroys triangular form (confusing?!))
1188 *
1189  CALL cgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1190  \$ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1191 *
1192  scalem = rwork(1)
1193  numrank = nint(rwork(2))
1194 *
1195 *
1196  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1197 *
1198 * -> Singular Values and Right Singular Vectors <-
1199 *
1200  IF ( almort ) THEN
1201 *
1202 * .. in this case NR equals N
1203  DO 1998 p = 1, nr
1204  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1205  CALL clacgv( n-p+1, v(p,p), 1 )
1206  1998 CONTINUE
1207  CALL claset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1208 *
1209  CALL cgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1210  \$ cwork, lwork, rwork, lrwork, info )
1211  scalem = rwork(1)
1212  numrank = nint(rwork(2))
1213
1214  ELSE
1215 *
1216 * .. two more QR factorizations ( one QRF is not enough, two require
1217 * accumulated product of Jacobi rotations, three are perfect )
1218 *
1219  CALL claset( 'Lower', nr-1,nr-1, czero, czero, a(2,1), lda )
1220  CALL cgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1221  CALL clacpy( 'Lower', nr, nr, a, lda, v, ldv )
1222  CALL claset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1223  CALL cgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1224  \$ lwork-2*n, ierr )
1225  DO 8998 p = 1, nr
1226  CALL ccopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1227  CALL clacgv( nr-p+1, v(p,p), 1 )
1228  8998 CONTINUE
1229  CALL claset('Upper', nr-1, nr-1, czero, czero, v(1,2), ldv)
1230 *
1231  CALL cgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1232  \$ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1233  scalem = rwork(1)
1234  numrank = nint(rwork(2))
1235  IF ( nr .LT. n ) THEN
1236  CALL claset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1237  CALL claset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1238  CALL claset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1239  END IF
1240 *
1241  CALL cunmlq( 'Left', 'C', n, n, nr, a, lda, cwork,
1242  \$ v, ldv, cwork(n+1), lwork-n, ierr )
1243 *
1244  END IF
1245 *
1246  DO 8991 p = 1, n
1247  CALL ccopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1248  8991 CONTINUE
1249  CALL clacpy( 'All', n, n, a, lda, v, ldv )
1250 *
1251  IF ( transp ) THEN
1252  CALL clacpy( 'All', n, n, v, ldv, u, ldu )
1253  END IF
1254 *
1255  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1256 *
1257 * .. Singular Values and Left Singular Vectors ..
1258 *
1259 * .. second preconditioning step to avoid need to accumulate
1260 * Jacobi rotations in the Jacobi iterations.
1261  DO 1965 p = 1, nr
1262  CALL ccopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1263  CALL clacgv( n-p+1, u(p,p), 1 )
1264  1965 CONTINUE
1265  CALL claset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1266 *
1267  CALL cgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1268  \$ lwork-2*n, ierr )
1269 *
1270  DO 1967 p = 1, nr - 1
1271  CALL ccopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1272  CALL clacgv( n-p+1, u(p,p), 1 )
1273  1967 CONTINUE
1274  CALL claset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1275 *
1276  CALL cgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1277  \$ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1278  scalem = rwork(1)
1279  numrank = nint(rwork(2))
1280 *
1281  IF ( nr .LT. m ) THEN
1282  CALL claset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1283  IF ( nr .LT. n1 ) THEN
1284  CALL claset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1285  CALL claset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1286  END IF
1287  END IF
1288 *
1289  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1290  \$ ldu, cwork(n+1), lwork-n, ierr )
1291 *
1292  IF ( rowpiv )
1293  \$ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1294 *
1295  DO 1974 p = 1, n1
1296  xsc = one / scnrm2( m, u(1,p), 1 )
1297  CALL csscal( m, xsc, u(1,p), 1 )
1298  1974 CONTINUE
1299 *
1300  IF ( transp ) THEN
1301  CALL clacpy( 'All', n, n, u, ldu, v, ldv )
1302  END IF
1303 *
1304  ELSE
1305 *
1306 * .. Full SVD ..
1307 *
1308  IF ( .NOT. jracc ) THEN
1309 *
1310  IF ( .NOT. almort ) THEN
1311 *
1312 * Second Preconditioning Step (QRF [with pivoting])
1313 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1314 * equivalent to an LQF CALL. Since in many libraries the QRF
1315 * seems to be better optimized than the LQF, we do explicit
1316 * transpose and use the QRF. This is subject to changes in an
1317 * optimized implementation of CGEJSV.
1318 *
1319  DO 1968 p = 1, nr
1320  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1321  CALL clacgv( n-p+1, v(p,p), 1 )
1322  1968 CONTINUE
1323 *
1324 * .. the following two loops perturb small entries to avoid
1325 * denormals in the second QR factorization, where they are
1326 * as good as zeros. This is done to avoid painfully slow
1327 * computation with denormals. The relative size of the perturbation
1328 * is a parameter that can be changed by the implementer.
1329 * This perturbation device will be obsolete on machines with
1330 * properly implemented arithmetic.
1331 * To switch it off, set L2PERT=.FALSE. To remove it from the
1332 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1333 * The following two loops should be blocked and fused with the
1334 * transposed copy above.
1335 *
1336  IF ( l2pert ) THEN
1337  xsc = sqrt(small)
1338  DO 2969 q = 1, nr
1339  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1340  DO 2968 p = 1, n
1341  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1342  \$ .OR. ( p .LT. q ) )
1343 * \$ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1344  \$ v(p,q) = ctemp
1345  IF ( p .LT. q ) v(p,q) = - v(p,q)
1346  2968 CONTINUE
1347  2969 CONTINUE
1348  ELSE
1349  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1350  END IF
1351 *
1352 * Estimate the row scaled condition number of R1
1353 * (If R1 is rectangular, N > NR, then the condition number
1354 * of the leading NR x NR submatrix is estimated.)
1355 *
1356  CALL clacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1357  DO 3950 p = 1, nr
1358  temp1 = scnrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1359  CALL csscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1360  3950 CONTINUE
1361  CALL cpocon('Lower',nr,cwork(2*n+1),nr,one,temp1,
1362  \$ cwork(2*n+nr*nr+1),rwork,ierr)
1363  condr1 = one / sqrt(temp1)
1364 * .. here need a second oppinion on the condition number
1365 * .. then assume worst case scenario
1366 * R1 is OK for inverse <=> CONDR1 .LT. FLOAT(N)
1367 * more conservative <=> CONDR1 .LT. SQRT(FLOAT(N))
1368 *
1369  cond_ok = sqrt(sqrt(float(nr)))
1370 *[TP] COND_OK is a tuning parameter.
1371 *
1372  IF ( condr1 .LT. cond_ok ) THEN
1373 * .. the second QRF without pivoting. Note: in an optimized
1374 * implementation, this QRF should be implemented as the QRF
1375 * of a lower triangular matrix.
1376 * R1^* = Q2 * R2
1377  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1378  \$ lwork-2*n, ierr )
1379 *
1380  IF ( l2pert ) THEN
1381  xsc = sqrt(small)/epsln
1382  DO 3959 p = 2, nr
1383  DO 3958 q = 1, p - 1
1384  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1385  \$ zero)
1386  IF ( abs(v(q,p)) .LE. temp1 )
1387 * \$ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1388  \$ v(q,p) = ctemp
1389  3958 CONTINUE
1390  3959 CONTINUE
1391  END IF
1392 *
1393  IF ( nr .NE. n )
1394  \$ CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1395 * .. save ...
1396 *
1397 * .. this transposed copy should be better than naive
1398  DO 1969 p = 1, nr - 1
1399  CALL ccopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1400  CALL clacgv(nr-p+1, v(p,p), 1 )
1401  1969 CONTINUE
1402  v(nr,nr)=conjg(v(nr,nr))
1403 *
1404  condr2 = condr1
1405 *
1406  ELSE
1407 *
1408 * .. ill-conditioned case: second QRF with pivoting
1409 * Note that windowed pivoting would be equaly good
1410 * numerically, and more run-time efficient. So, in
1411 * an optimal implementation, the next call to CGEQP3
1412 * should be replaced with eg. CALL CGEQPX (ACM TOMS #782)
1413 * with properly (carefully) chosen parameters.
1414 *
1415 * R1^* * P2 = Q2 * R2
1416  DO 3003 p = 1, nr
1417  iwork(n+p) = 0
1418  3003 CONTINUE
1419  CALL cgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1420  \$ cwork(2*n+1), lwork-2*n, rwork, ierr )
1421 ** CALL CGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1422 ** \$ LWORK-2*N, IERR )
1423  IF ( l2pert ) THEN
1424  xsc = sqrt(small)
1425  DO 3969 p = 2, nr
1426  DO 3968 q = 1, p - 1
1427  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1428  \$ zero)
1429  IF ( abs(v(q,p)) .LE. temp1 )
1430 * \$ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1431  \$ v(q,p) = ctemp
1432  3968 CONTINUE
1433  3969 CONTINUE
1434  END IF
1435 *
1436  CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1437 *
1438  IF ( l2pert ) THEN
1439  xsc = sqrt(small)
1440  DO 8970 p = 2, nr
1441  DO 8971 q = 1, p - 1
1442  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1443  \$ zero)
1444 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1445  v(p,q) = - ctemp
1446  8971 CONTINUE
1447  8970 CONTINUE
1448  ELSE
1449  CALL claset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1450  END IF
1451 * Now, compute R2 = L3 * Q3, the LQ factorization.
1452  CALL cgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1453  \$ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1454 * .. and estimate the condition number
1455  CALL clacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1456  DO 4950 p = 1, nr
1457  temp1 = scnrm2( p, cwork(2*n+n*nr+nr+p), nr )
1458  CALL csscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1459  4950 CONTINUE
1460  CALL cpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1461  \$ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1462  condr2 = one / sqrt(temp1)
1463 *
1464 *
1465  IF ( condr2 .GE. cond_ok ) THEN
1466 * .. save the Householder vectors used for Q3
1467 * (this overwrittes the copy of R2, as it will not be
1468 * needed in this branch, but it does not overwritte the
1469 * Huseholder vectors of Q2.).
1470  CALL clacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1471 * .. and the rest of the information on Q3 is in
1472 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1473  END IF
1474 *
1475  END IF
1476 *
1477  IF ( l2pert ) THEN
1478  xsc = sqrt(small)
1479  DO 4968 q = 2, nr
1480  ctemp = xsc * v(q,q)
1481  DO 4969 p = 1, q - 1
1482 * V(p,q) = - SIGN( TEMP1, V(q,p) )
1483 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1484  v(p,q) = - ctemp
1485  4969 CONTINUE
1486  4968 CONTINUE
1487  ELSE
1488  CALL claset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1489  END IF
1490 *
1491 * Second preconditioning finished; continue with Jacobi SVD
1492 * The input matrix is lower trinagular.
1493 *
1494 * Recover the right singular vectors as solution of a well
1495 * conditioned triangular matrix equation.
1496 *
1497  IF ( condr1 .LT. cond_ok ) THEN
1498 *
1499  CALL cgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1500  \$ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1501  \$ lrwork, info )
1502  scalem = rwork(1)
1503  numrank = nint(rwork(2))
1504  DO 3970 p = 1, nr
1505  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1506  CALL csscal( nr, sva(p), v(1,p), 1 )
1507  3970 CONTINUE
1508
1509 * .. pick the right matrix equation and solve it
1510 *
1511  IF ( nr .EQ. n ) THEN
1512 * :)) .. best case, R1 is inverted. The solution of this matrix
1513 * equation is Q2*V2 = the product of the Jacobi rotations
1514 * used in CGESVJ, premultiplied with the orthogonal matrix
1515 * from the second QR factorization.
1516  CALL ctrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1517  ELSE
1518 * .. R1 is well conditioned, but non-square. Adjoint of R2
1519 * is inverted to get the product of the Jacobi rotations
1520 * used in CGESVJ. The Q-factor from the second QR
1521 * factorization is then built in explicitly.
1522  CALL ctrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1523  \$ n,v,ldv)
1524  IF ( nr .LT. n ) THEN
1525  CALL claset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1526  CALL claset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1527  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1528  END IF
1529  CALL cunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1530  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1531  END IF
1532 *
1533  ELSE IF ( condr2 .LT. cond_ok ) THEN
1534 *
1535 * The matrix R2 is inverted. The solution of the matrix equation
1536 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1537 * the lower triangular L3 from the LQ factorization of
1538 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1539  CALL cgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1540  \$ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1541  \$ rwork, lrwork, info )
1542  scalem = rwork(1)
1543  numrank = nint(rwork(2))
1544  DO 3870 p = 1, nr
1545  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1546  CALL csscal( nr, sva(p), u(1,p), 1 )
1547  3870 CONTINUE
1548  CALL ctrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1549  \$ u,ldu)
1550 * .. apply the permutation from the second QR factorization
1551  DO 873 q = 1, nr
1552  DO 872 p = 1, nr
1553  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1554  872 CONTINUE
1555  DO 874 p = 1, nr
1556  u(p,q) = cwork(2*n+n*nr+nr+p)
1557  874 CONTINUE
1558  873 CONTINUE
1559  IF ( nr .LT. n ) THEN
1560  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1561  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1562  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1563  END IF
1564  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1565  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1566  ELSE
1567 * Last line of defense.
1568 * #:( This is a rather pathological case: no scaled condition
1569 * improvement after two pivoted QR factorizations. Other
1570 * possibility is that the rank revealing QR factorization
1571 * or the condition estimator has failed, or the COND_OK
1572 * is set very close to ONE (which is unnecessary). Normally,
1573 * this branch should never be executed, but in rare cases of
1574 * failure of the RRQR or condition estimator, the last line of
1575 * defense ensures that CGEJSV completes the task.
1576 * Compute the full SVD of L3 using CGESVJ with explicit
1577 * accumulation of Jacobi rotations.
1578  CALL cgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1579  \$ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1580  \$ rwork, lrwork, info )
1581  scalem = rwork(1)
1582  numrank = nint(rwork(2))
1583  IF ( nr .LT. n ) THEN
1584  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1585  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1586  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1587  END IF
1588  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1589  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1590 *
1591  CALL cunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1592  \$ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1593  \$ lwork-2*n-n*nr-nr, ierr )
1594  DO 773 q = 1, nr
1595  DO 772 p = 1, nr
1596  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1597  772 CONTINUE
1598  DO 774 p = 1, nr
1599  u(p,q) = cwork(2*n+n*nr+nr+p)
1600  774 CONTINUE
1601  773 CONTINUE
1602 *
1603  END IF
1604 *
1605 * Permute the rows of V using the (column) permutation from the
1606 * first QRF. Also, scale the columns to make them unit in
1607 * Euclidean norm. This applies to all cases.
1608 *
1609  temp1 = sqrt(float(n)) * epsln
1610  DO 1972 q = 1, n
1611  DO 972 p = 1, n
1612  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1613  972 CONTINUE
1614  DO 973 p = 1, n
1615  v(p,q) = cwork(2*n+n*nr+nr+p)
1616  973 CONTINUE
1617  xsc = one / scnrm2( n, v(1,q), 1 )
1618  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1619  \$ CALL csscal( n, xsc, v(1,q), 1 )
1620  1972 CONTINUE
1621 * At this moment, V contains the right singular vectors of A.
1622 * Next, assemble the left singular vector matrix U (M x N).
1623  IF ( nr .LT. m ) THEN
1624  CALL claset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1625  IF ( nr .LT. n1 ) THEN
1626  CALL claset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1627  CALL claset('A',m-nr,n1-nr,czero,cone,
1628  \$ u(nr+1,nr+1),ldu)
1629  END IF
1630  END IF
1631 *
1632 * The Q matrix from the first QRF is built into the left singular
1633 * matrix U. This applies to all cases.
1634 *
1635  CALL cunmqr( 'Left', 'No_Tr', m, n1, n, a, lda, cwork, u,
1636  \$ ldu, cwork(n+1), lwork-n, ierr )
1637
1638 * The columns of U are normalized. The cost is O(M*N) flops.
1639  temp1 = sqrt(float(m)) * epsln
1640  DO 1973 p = 1, nr
1641  xsc = one / scnrm2( m, u(1,p), 1 )
1642  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1643  \$ CALL csscal( m, xsc, u(1,p), 1 )
1644  1973 CONTINUE
1645 *
1646 * If the initial QRF is computed with row pivoting, the left
1647 * singular vectors must be adjusted.
1648 *
1649  IF ( rowpiv )
1650  \$ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1651 *
1652  ELSE
1653 *
1654 * .. the initial matrix A has almost orthogonal columns and
1655 * the second QRF is not needed
1656 *
1657  CALL clacpy( 'Upper', n, n, a, lda, cwork(n+1), n )
1658  IF ( l2pert ) THEN
1659  xsc = sqrt(small)
1660  DO 5970 p = 2, n
1661  ctemp = xsc * cwork( n + (p-1)*n + p )
1662  DO 5971 q = 1, p - 1
1663 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
1664 * \$ ABS(CWORK(N+(p-1)*N+q)) )
1665  cwork(n+(q-1)*n+p)=-ctemp
1666  5971 CONTINUE
1667  5970 CONTINUE
1668  ELSE
1669  CALL claset( 'Lower',n-1,n-1,czero,czero,cwork(n+2),n )
1670  END IF
1671 *
1672  CALL cgesvj( 'Upper', 'U', 'N', n, n, cwork(n+1), n, sva,
1673  \$ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
1674  \$ info )
1675 *
1676  scalem = rwork(1)
1677  numrank = nint(rwork(2))
1678  DO 6970 p = 1, n
1679  CALL ccopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
1680  CALL csscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
1681  6970 CONTINUE
1682 *
1683  CALL ctrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1684  \$ cone, a, lda, cwork(n+1), n )
1685  DO 6972 p = 1, n
1686  CALL ccopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
1687  6972 CONTINUE
1688  temp1 = sqrt(float(n))*epsln
1689  DO 6971 p = 1, n
1690  xsc = one / scnrm2( n, v(1,p), 1 )
1691  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1692  \$ CALL csscal( n, xsc, v(1,p), 1 )
1693  6971 CONTINUE
1694 *
1695 * Assemble the left singular vector matrix U (M x N).
1696 *
1697  IF ( n .LT. m ) THEN
1698  CALL claset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
1699  IF ( n .LT. n1 ) THEN
1700  CALL claset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
1701  CALL claset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
1702  END IF
1703  END IF
1704  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1705  \$ ldu, cwork(n+1), lwork-n, ierr )
1706  temp1 = sqrt(float(m))*epsln
1707  DO 6973 p = 1, n1
1708  xsc = one / scnrm2( m, u(1,p), 1 )
1709  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1710  \$ CALL csscal( m, xsc, u(1,p), 1 )
1711  6973 CONTINUE
1712 *
1713  IF ( rowpiv )
1714  \$ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1715 *
1716  END IF
1717 *
1718 * end of the >> almost orthogonal case << in the full SVD
1719 *
1720  ELSE
1721 *
1722 * This branch deploys a preconditioned Jacobi SVD with explicitly
1723 * accumulated rotations. It is included as optional, mainly for
1724 * experimental purposes. It does perfom well, and can also be used.
1725 * In this implementation, this branch will be automatically activated
1726 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1727 * to be greater than the overflow threshold. This is because the
1728 * a posteriori computation of the singular vectors assumes robust
1729 * implementation of BLAS and some LAPACK procedures, capable of working
1730 * in presence of extreme values. Since that is not always the case, ...
1731 *
1732  DO 7968 p = 1, nr
1733  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1734  CALL clacgv( n-p+1, v(p,p), 1 )
1735  7968 CONTINUE
1736 *
1737  IF ( l2pert ) THEN
1738  xsc = sqrt(small/epsln)
1739  DO 5969 q = 1, nr
1740  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1741  DO 5968 p = 1, n
1742  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1743  \$ .OR. ( p .LT. q ) )
1744 * \$ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1745  \$ v(p,q) = ctemp
1746  IF ( p .LT. q ) v(p,q) = - v(p,q)
1747  5968 CONTINUE
1748  5969 CONTINUE
1749  ELSE
1750  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1751  END IF
1752
1753  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1754  \$ lwork-2*n, ierr )
1755  CALL clacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
1756 *
1757  DO 7969 p = 1, nr
1758  CALL ccopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1759  CALL clacgv( nr-p+1, u(p,p), 1 )
1760  7969 CONTINUE
1761
1762  IF ( l2pert ) THEN
1763  xsc = sqrt(small/epsln)
1764  DO 9970 q = 2, nr
1765  DO 9971 p = 1, q - 1
1766  ctemp = cmplx(xsc * amin1(abs(u(p,p)),abs(u(q,q))),
1767  \$ zero)
1768 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
1769  u(p,q) = - ctemp
1770  9971 CONTINUE
1771  9970 CONTINUE
1772  ELSE
1773  CALL claset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1774  END IF
1775
1776  CALL cgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
1777  \$ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
1778  \$ rwork, lrwork, info )
1779  scalem = rwork(1)
1780  numrank = nint(rwork(2))
1781
1782  IF ( nr .LT. n ) THEN
1783  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1784  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1785  CALL claset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
1786  END IF
1787
1788  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1789  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1790 *
1791 * Permute the rows of V using the (column) permutation from the
1792 * first QRF. Also, scale the columns to make them unit in
1793 * Euclidean norm. This applies to all cases.
1794 *
1795  temp1 = sqrt(float(n)) * epsln
1796  DO 7972 q = 1, n
1797  DO 8972 p = 1, n
1798  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1799  8972 CONTINUE
1800  DO 8973 p = 1, n
1801  v(p,q) = cwork(2*n+n*nr+nr+p)
1802  8973 CONTINUE
1803  xsc = one / scnrm2( n, v(1,q), 1 )
1804  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1805  \$ CALL csscal( n, xsc, v(1,q), 1 )
1806  7972 CONTINUE
1807 *
1808 * At this moment, V contains the right singular vectors of A.
1809 * Next, assemble the left singular vector matrix U (M x N).
1810 *
1811  IF ( nr .LT. m ) THEN
1812  CALL claset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
1813  IF ( nr .LT. n1 ) THEN
1814  CALL claset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
1815  CALL claset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
1816  END IF
1817  END IF
1818 *
1819  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1820  \$ ldu, cwork(n+1), lwork-n, ierr )
1821 *
1822  IF ( rowpiv )
1823  \$ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1824 *
1825 *
1826  END IF
1827  IF ( transp ) THEN
1828 * .. swap U and V because the procedure worked on A^*
1829  DO 6974 p = 1, n
1830  CALL cswap( n, u(1,p), 1, v(1,p), 1 )
1831  6974 CONTINUE
1832  END IF
1833 *
1834  END IF
1835 * end of the full SVD
1836 *
1837 * Undo scaling, if necessary (and possible)
1838 *
1839  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1840  CALL slascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1841  uscal1 = one
1842  uscal2 = one
1843  END IF
1844 *
1845  IF ( nr .LT. n ) THEN
1846  DO 3004 p = nr+1, n
1847  sva(p) = zero
1848  3004 CONTINUE
1849  END IF
1850 *
1851  rwork(1) = uscal2 * scalem
1852  rwork(2) = uscal1
1853  IF ( errest ) rwork(3) = sconda
1854  IF ( lsvec .AND. rsvec ) THEN
1855  rwork(4) = condr1
1856  rwork(5) = condr2
1857  END IF
1858  IF ( l2tran ) THEN
1859  rwork(6) = entra
1860  rwork(7) = entrat
1861  END IF
1862 *
1863  iwork(1) = nr
1864  iwork(2) = numrank
1865  iwork(3) = warning
1866 *
1867  RETURN
1868 * ..
1869 * .. END OF CGEJSV
1870 * ..
real function scnrm2(N, X, INCX)
SCNRM2
Definition: scnrm2.f:56
subroutine slassq(N, X, INCX, SCALE, SUMSQ)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f:105
integer function isamax(N, SX, INCX)
ISAMAX
Definition: isamax.f:53
subroutine clascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
CLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: clascl.f:145
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:145
subroutine classq(N, X, INCX, SCALE, SUMSQ)
CLASSQ updates a sum of squares represented in scaled form.
Definition: classq.f:108
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine cpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
CPOCON
Definition: cpocon.f:123
subroutine ctrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
CTRSM
Definition: ctrsm.f:182
subroutine cunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMQR
Definition: cunmqr.f:170
subroutine cgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGELQF
Definition: cgelqf.f:137
subroutine claset(UPLO, M, N, ALPHA, BETA, A, LDA)
CLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: claset.f:108
subroutine cgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
CGEQP3
Definition: cgeqp3.f:161
subroutine cgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGEQRF
Definition: cgeqrf.f:138
integer function icamax(N, CX, INCX)
ICAMAX
Definition: icamax.f:53
subroutine cgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
CGESVJ
Definition: cgesvj.f:332
subroutine clacpy(UPLO, M, N, A, LDA, B, LDB)
CLACPY copies all or part of one two-dimensional array to another.
Definition: clacpy.f:105
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine cunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMLQ
Definition: cunmlq.f:170
subroutine ccopy(N, CX, INCX, CY, INCY)
CCOPY
Definition: ccopy.f:52
subroutine clacgv(N, X, INCX)
CLACGV conjugates a complex vector.
Definition: clacgv.f:76
subroutine cswap(N, CX, INCX, CY, INCY)
CSWAP
Definition: cswap.f:52
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine cungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGQR
Definition: cungqr.f:130
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine claswp(N, A, LDA, K1, K2, IPIV, INCX)
CLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: claswp.f:116
subroutine csscal(N, SA, CX, INCX)
CSSCAL
Definition: csscal.f:54

Here is the call graph for this function:

Here is the caller graph for this function: