LAPACK 3.12.1
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
dgesvj.f
Go to the documentation of this file.
1*> \brief \b DGESVJ
2*
3* =========== DOCUMENTATION ===========
4*
5* Online html documentation available at
6* http://www.netlib.org/lapack/explore-html/
7*
8*> Download DGESVJ + dependencies
9*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgesvj.f">
10*> [TGZ]</a>
11*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgesvj.f">
12*> [ZIP]</a>
13*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgesvj.f">
14*> [TXT]</a>
15*
16* Definition:
17* ===========
18*
19* SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
20* LDV, WORK, LWORK, INFO )
21*
22* .. Scalar Arguments ..
23* INTEGER INFO, LDA, LDV, LWORK, M, MV, N
24* CHARACTER*1 JOBA, JOBU, JOBV
25* ..
26* .. Array Arguments ..
27* DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
28* $ WORK( LWORK )
29* ..
30*
31*
32*> \par Purpose:
33* =============
34*>
35*> \verbatim
36*>
37*> DGESVJ computes the singular value decomposition (SVD) of a real
38*> M-by-N matrix A, where M >= N. The SVD of A is written as
39*> [++] [xx] [x0] [xx]
40*> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
41*> [++] [xx]
42*> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
43*> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
44*> of SIGMA are the singular values of A. The columns of U and V are the
45*> left and the right singular vectors of A, respectively.
46*> DGESVJ can sometimes compute tiny singular values and their singular vectors much
47*> more accurately than other SVD routines, see below under Further Details.
48*> \endverbatim
49*
50* Arguments:
51* ==========
52*
53*> \param[in] JOBA
54*> \verbatim
55*> JOBA is CHARACTER*1
56*> Specifies the structure of A.
57*> = 'L': The input matrix A is lower triangular;
58*> = 'U': The input matrix A is upper triangular;
59*> = 'G': The input matrix A is general M-by-N matrix, M >= N.
60*> \endverbatim
61*>
62*> \param[in] JOBU
63*> \verbatim
64*> JOBU is CHARACTER*1
65*> Specifies whether to compute the left singular vectors
66*> (columns of U):
67*> = 'U': The left singular vectors corresponding to the nonzero
68*> singular values are computed and returned in the leading
69*> columns of A. See more details in the description of A.
70*> The default numerical orthogonality threshold is set to
71*> approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
72*> = 'C': Analogous to JOBU='U', except that user can control the
73*> level of numerical orthogonality of the computed left
74*> singular vectors. TOL can be set to TOL = CTOL*EPS, where
75*> CTOL is given on input in the array WORK.
76*> No CTOL smaller than ONE is allowed. CTOL greater
77*> than 1 / EPS is meaningless. The option 'C'
78*> can be used if M*EPS is satisfactory orthogonality
79*> of the computed left singular vectors, so CTOL=M could
80*> save few sweeps of Jacobi rotations.
81*> See the descriptions of A and WORK(1).
82*> = 'N': The matrix U is not computed. However, see the
83*> description of A.
84*> \endverbatim
85*>
86*> \param[in] JOBV
87*> \verbatim
88*> JOBV is CHARACTER*1
89*> Specifies whether to compute the right singular vectors, that
90*> is, the matrix V:
91*> = 'V': the matrix V is computed and returned in the array V
92*> = 'A': the Jacobi rotations are applied to the MV-by-N
93*> array V. In other words, the right singular vector
94*> matrix V is not computed explicitly, instead it is
95*> applied to an MV-by-N matrix initially stored in the
96*> first MV rows of V.
97*> = 'N': the matrix V is not computed and the array V is not
98*> referenced
99*> \endverbatim
100*>
101*> \param[in] M
102*> \verbatim
103*> M is INTEGER
104*> The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
105*> \endverbatim
106*>
107*> \param[in] N
108*> \verbatim
109*> N is INTEGER
110*> The number of columns of the input matrix A.
111*> M >= N >= 0.
112*> \endverbatim
113*>
114*> \param[in,out] A
115*> \verbatim
116*> A is DOUBLE PRECISION array, dimension (LDA,N)
117*> On entry, the M-by-N matrix A.
118*> On exit :
119*> If JOBU = 'U' .OR. JOBU = 'C' :
120*> If INFO = 0 :
121*> RANKA orthonormal columns of U are returned in the
122*> leading RANKA columns of the array A. Here RANKA <= N
123*> is the number of computed singular values of A that are
124*> above the underflow threshold DLAMCH('S'). The singular
125*> vectors corresponding to underflowed or zero singular
126*> values are not computed. The value of RANKA is returned
127*> in the array WORK as RANKA=NINT(WORK(2)). Also see the
128*> descriptions of SVA and WORK. The computed columns of U
129*> are mutually numerically orthogonal up to approximately
130*> TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
131*> see the description of JOBU.
132*> If INFO > 0 :
133*> the procedure DGESVJ did not converge in the given number
134*> of iterations (sweeps). In that case, the computed
135*> columns of U may not be orthogonal up to TOL. The output
136*> U (stored in A), SIGMA (given by the computed singular
137*> values in SVA(1:N)) and V is still a decomposition of the
138*> input matrix A in the sense that the residual
139*> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
140*>
141*> If JOBU = 'N' :
142*> If INFO = 0 :
143*> Note that the left singular vectors are 'for free' in the
144*> one-sided Jacobi SVD algorithm. However, if only the
145*> singular values are needed, the level of numerical
146*> orthogonality of U is not an issue and iterations are
147*> stopped when the columns of the iterated matrix are
148*> numerically orthogonal up to approximately M*EPS. Thus,
149*> on exit, A contains the columns of U scaled with the
150*> corresponding singular values.
151*> If INFO > 0 :
152*> the procedure DGESVJ did not converge in the given number
153*> of iterations (sweeps).
154*> \endverbatim
155*>
156*> \param[in] LDA
157*> \verbatim
158*> LDA is INTEGER
159*> The leading dimension of the array A. LDA >= max(1,M).
160*> \endverbatim
161*>
162*> \param[out] SVA
163*> \verbatim
164*> SVA is DOUBLE PRECISION array, dimension (N)
165*> On exit :
166*> If INFO = 0 :
167*> depending on the value SCALE = WORK(1), we have:
168*> If SCALE = ONE :
169*> SVA(1:N) contains the computed singular values of A.
170*> During the computation SVA contains the Euclidean column
171*> norms of the iterated matrices in the array A.
172*> If SCALE .NE. ONE :
173*> The singular values of A are SCALE*SVA(1:N), and this
174*> factored representation is due to the fact that some of the
175*> singular values of A might underflow or overflow.
176*> If INFO > 0 :
177*> the procedure DGESVJ did not converge in the given number of
178*> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
179*> \endverbatim
180*>
181*> \param[in] MV
182*> \verbatim
183*> MV is INTEGER
184*> If JOBV = 'A', then the product of Jacobi rotations in DGESVJ
185*> is applied to the first MV rows of V. See the description of JOBV.
186*> \endverbatim
187*>
188*> \param[in,out] V
189*> \verbatim
190*> V is DOUBLE PRECISION array, dimension (LDV,N)
191*> If JOBV = 'V', then V contains on exit the N-by-N matrix of
192*> the right singular vectors;
193*> If JOBV = 'A', then V contains the product of the computed right
194*> singular vector matrix and the initial matrix in
195*> the array V.
196*> If JOBV = 'N', then V is not referenced.
197*> \endverbatim
198*>
199*> \param[in] LDV
200*> \verbatim
201*> LDV is INTEGER
202*> The leading dimension of the array V, LDV >= 1.
203*> If JOBV = 'V', then LDV >= max(1,N).
204*> If JOBV = 'A', then LDV >= max(1,MV) .
205*> \endverbatim
206*>
207*> \param[in,out] WORK
208*> \verbatim
209*> WORK is DOUBLE PRECISION array, dimension (MAX(1,LWORK))
210*> On entry :
211*> If JOBU = 'C' :
212*> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
213*> The process stops if all columns of A are mutually
214*> orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
215*> It is required that CTOL >= ONE, i.e. it is not
216*> allowed to force the routine to obtain orthogonality
217*> below EPS.
218*> On exit :
219*> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
220*> are the computed singular values of A.
221*> (See description of SVA().)
222*> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
223*> singular values.
224*> WORK(3) = NINT(WORK(3)) is the number of the computed singular
225*> values that are larger than the underflow threshold.
226*> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
227*> rotations needed for numerical convergence.
228*> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
229*> This is useful information in cases when DGESVJ did
230*> not converge, as it can be used to estimate whether
231*> the output is still useful and for post festum analysis.
232*> WORK(6) = the largest absolute value over all sines of the
233*> Jacobi rotation angles in the last sweep. It can be
234*> useful for a post festum analysis.
235*> \endverbatim
236*>
237*> \param[in] LWORK
238*> \verbatim
239*> LWORK is INTEGER
240*> The length of the array WORK.
241*> LWORK >= 1, if MIN(M,N) = 0, and LWORK >= MAX(6,M+N), otherwise.
242*>
243*> If on entry LWORK = -1, then a workspace query is assumed and
244*> no computation is done; WORK(1) is set to the minial (and optimal)
245*> length of WORK.
246*> \endverbatim
247*>
248*> \param[out] INFO
249*> \verbatim
250*> INFO is INTEGER
251*> = 0: successful exit.
252*> < 0: if INFO = -i, then the i-th argument had an illegal value
253*> > 0: DGESVJ did not converge in the maximal allowed number (30)
254*> of sweeps. The output may still be useful. See the
255*> description of WORK.
256*> \endverbatim
257*
258* Authors:
259* ========
260*
261*> \author Univ. of Tennessee
262*> \author Univ. of California Berkeley
263*> \author Univ. of Colorado Denver
264*> \author NAG Ltd.
265*
266*> \ingroup gesvj
267*
268*> \par Further Details:
269* =====================
270*>
271*> \verbatim
272*>
273*> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
274*> rotations. The rotations are implemented as fast scaled rotations of
275*> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
276*> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
277*> column interchanges of de Rijk [2]. The relative accuracy of the computed
278*> singular values and the accuracy of the computed singular vectors (in
279*> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
280*> The condition number that determines the accuracy in the full rank case
281*> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
282*> spectral condition number. The best performance of this Jacobi SVD
283*> procedure is achieved if used in an accelerated version of Drmac and
284*> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
285*> Some tuning parameters (marked with [TP]) are available for the
286*> implementer.
287*> The computational range for the nonzero singular values is the machine
288*> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
289*> denormalized singular values can be computed with the corresponding
290*> gradual loss of accurate digits.
291*> \endverbatim
292*
293*> \par Contributors:
294* ==================
295*>
296*> \verbatim
297*>
298*> ============
299*>
300*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
301*> \endverbatim
302*
303*> \par References:
304* ================
305*>
306*> \verbatim
307*>
308*> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
309*> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
310*> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
311*> singular value decomposition on a vector computer.
312*> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
313*> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
314*> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
315*> value computation in floating point arithmetic.
316*> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
317*> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
318*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
319*> LAPACK Working note 169.
320*> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
321*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
322*> LAPACK Working note 170.
323*> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
324*> QSVD, (H,K)-SVD computations.
325*> Department of Mathematics, University of Zagreb, 2008.
326*> \endverbatim
327*
328*> \par Bugs, examples and comments:
329* =================================
330*>
331*> \verbatim
332*> ===========================
333*> Please report all bugs and send interesting test examples and comments to
334*> drmac@math.hr. Thank you.
335*> \endverbatim
336*>
337* =====================================================================
338 SUBROUTINE dgesvj( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
339 $ LDV, WORK, LWORK, INFO )
340*
341* -- LAPACK computational routine --
342* -- LAPACK is a software package provided by Univ. of Tennessee, --
343* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
344*
345* .. Scalar Arguments ..
346 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
347 CHARACTER*1 JOBA, JOBU, JOBV
348* ..
349* .. Array Arguments ..
350 DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
351 $ work( lwork )
352* ..
353*
354* =====================================================================
355*
356* .. Local Parameters ..
357 DOUBLE PRECISION ZERO, HALF, ONE
358 parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
359 INTEGER NSWEEP
360 parameter( nsweep = 30 )
361* ..
362* .. Local Scalars ..
363 DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
364 $ bigtheta, cs, ctol, epsln, large, mxaapq,
365 $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
366 $ skl, sfmin, small, sn, t, temp1, theta,
367 $ thsign, tol
368 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
369 $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
370 $ n4, nbl, notrot, p, pskipped, q, rowskip,
371 $ swband, minmn, lwmin
372 LOGICAL APPLV, GOSCALE, LOWER, LQUERY, LSVEC, NOSCALE,
373 $ rotok, rsvec, uctol, upper
374* ..
375* .. Local Arrays ..
376 DOUBLE PRECISION FASTR( 5 )
377* ..
378* .. Intrinsic Functions ..
379 INTRINSIC dabs, max, min, dble, dsign, dsqrt
380* ..
381* .. External Functions ..
382* ..
383* from BLAS
384 DOUBLE PRECISION DDOT, DNRM2
385 EXTERNAL ddot, dnrm2
386 INTEGER IDAMAX
387 EXTERNAL idamax
388* from LAPACK
389 DOUBLE PRECISION DLAMCH
390 EXTERNAL dlamch
391 LOGICAL LSAME
392 EXTERNAL lsame
393* ..
394* .. External Subroutines ..
395* ..
396* from BLAS
397 EXTERNAL daxpy, dcopy, drotm, dscal, dswap
398* from LAPACK
399 EXTERNAL dlascl, dlaset, dlassq, xerbla
400*
401 EXTERNAL dgsvj0, dgsvj1
402* ..
403* .. Executable Statements ..
404*
405* Test the input arguments
406*
407 lsvec = lsame( jobu, 'U' )
408 uctol = lsame( jobu, 'C' )
409 rsvec = lsame( jobv, 'V' )
410 applv = lsame( jobv, 'A' )
411 upper = lsame( joba, 'U' )
412 lower = lsame( joba, 'L' )
413*
414 minmn = min( m, n )
415 IF( minmn.EQ.0 ) THEN
416 lwmin = 1
417 ELSE
418 lwmin = max( 6, m+n )
419 END IF
420*
421 lquery = ( lwork.EQ.-1 )
422 IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
423 info = -1
424 ELSE IF( .NOT.( lsvec .OR.
425 $ uctol .OR.
426 $ lsame( jobu, 'N' ) ) ) THEN
427 info = -2
428 ELSE IF( .NOT.( rsvec .OR.
429 $ applv .OR.
430 $ lsame( jobv, 'N' ) ) ) THEN
431 info = -3
432 ELSE IF( m.LT.0 ) THEN
433 info = -4
434 ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
435 info = -5
436 ELSE IF( lda.LT.m ) THEN
437 info = -7
438 ELSE IF( mv.LT.0 ) THEN
439 info = -9
440 ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
441 $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
442 info = -11
443 ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
444 info = -12
445 ELSE IF( lwork.LT.lwmin .AND. ( .NOT.lquery ) ) THEN
446 info = -13
447 ELSE
448 info = 0
449 END IF
450*
451* #:(
452 IF( info.NE.0 ) THEN
453 CALL xerbla( 'DGESVJ', -info )
454 RETURN
455 ELSE IF( lquery ) THEN
456 work( 1 ) = lwmin
457 RETURN
458 END IF
459*
460* #:) Quick return for void matrix
461*
462 IF( minmn.EQ.0 ) RETURN
463*
464* Set numerical parameters
465* The stopping criterion for Jacobi rotations is
466*
467* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
468*
469* where EPS is the round-off and CTOL is defined as follows:
470*
471 IF( uctol ) THEN
472* ... user controlled
473 ctol = work( 1 )
474 ELSE
475* ... default
476 IF( lsvec .OR. rsvec .OR. applv ) THEN
477 ctol = dsqrt( dble( m ) )
478 ELSE
479 ctol = dble( m )
480 END IF
481 END IF
482* ... and the machine dependent parameters are
483*[!] (Make sure that DLAMCH() works properly on the target machine.)
484*
485 epsln = dlamch( 'Epsilon' )
486 rooteps = dsqrt( epsln )
487 sfmin = dlamch( 'SafeMinimum' )
488 rootsfmin = dsqrt( sfmin )
489 small = sfmin / epsln
490 big = dlamch( 'Overflow' )
491* BIG = ONE / SFMIN
492 rootbig = one / rootsfmin
493 large = big / dsqrt( dble( m*n ) )
494 bigtheta = one / rooteps
495*
496 tol = ctol*epsln
497 roottol = dsqrt( tol )
498*
499 IF( dble( m )*epsln.GE.one ) THEN
500 info = -4
501 CALL xerbla( 'DGESVJ', -info )
502 RETURN
503 END IF
504*
505* Initialize the right singular vector matrix.
506*
507 IF( rsvec ) THEN
508 mvl = n
509 CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
510 ELSE IF( applv ) THEN
511 mvl = mv
512 END IF
513 rsvec = rsvec .OR. applv
514*
515* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
516*(!) If necessary, scale A to protect the largest singular value
517* from overflow. It is possible that saving the largest singular
518* value destroys the information about the small ones.
519* This initial scaling is almost minimal in the sense that the
520* goal is to make sure that no column norm overflows, and that
521* DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
522* in A are detected, the procedure returns with INFO=-6.
523*
524 skl= one / dsqrt( dble( m )*dble( n ) )
525 noscale = .true.
526 goscale = .true.
527*
528 IF( lower ) THEN
529* the input matrix is M-by-N lower triangular (trapezoidal)
530 DO 1874 p = 1, n
531 aapp = zero
532 aaqq = one
533 CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
534 IF( aapp.GT.big ) THEN
535 info = -6
536 CALL xerbla( 'DGESVJ', -info )
537 RETURN
538 END IF
539 aaqq = dsqrt( aaqq )
540 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
541 sva( p ) = aapp*aaqq
542 ELSE
543 noscale = .false.
544 sva( p ) = aapp*( aaqq*skl)
545 IF( goscale ) THEN
546 goscale = .false.
547 DO 1873 q = 1, p - 1
548 sva( q ) = sva( q )*skl
549 1873 CONTINUE
550 END IF
551 END IF
552 1874 CONTINUE
553 ELSE IF( upper ) THEN
554* the input matrix is M-by-N upper triangular (trapezoidal)
555 DO 2874 p = 1, n
556 aapp = zero
557 aaqq = one
558 CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
559 IF( aapp.GT.big ) THEN
560 info = -6
561 CALL xerbla( 'DGESVJ', -info )
562 RETURN
563 END IF
564 aaqq = dsqrt( aaqq )
565 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
566 sva( p ) = aapp*aaqq
567 ELSE
568 noscale = .false.
569 sva( p ) = aapp*( aaqq*skl)
570 IF( goscale ) THEN
571 goscale = .false.
572 DO 2873 q = 1, p - 1
573 sva( q ) = sva( q )*skl
574 2873 CONTINUE
575 END IF
576 END IF
577 2874 CONTINUE
578 ELSE
579* the input matrix is M-by-N general dense
580 DO 3874 p = 1, n
581 aapp = zero
582 aaqq = one
583 CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
584 IF( aapp.GT.big ) THEN
585 info = -6
586 CALL xerbla( 'DGESVJ', -info )
587 RETURN
588 END IF
589 aaqq = dsqrt( aaqq )
590 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
591 sva( p ) = aapp*aaqq
592 ELSE
593 noscale = .false.
594 sva( p ) = aapp*( aaqq*skl)
595 IF( goscale ) THEN
596 goscale = .false.
597 DO 3873 q = 1, p - 1
598 sva( q ) = sva( q )*skl
599 3873 CONTINUE
600 END IF
601 END IF
602 3874 CONTINUE
603 END IF
604*
605 IF( noscale )skl= one
606*
607* Move the smaller part of the spectrum from the underflow threshold
608*(!) Start by determining the position of the nonzero entries of the
609* array SVA() relative to ( SFMIN, BIG ).
610*
611 aapp = zero
612 aaqq = big
613 DO 4781 p = 1, n
614 IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
615 aapp = max( aapp, sva( p ) )
616 4781 CONTINUE
617*
618* #:) Quick return for zero matrix
619*
620 IF( aapp.EQ.zero ) THEN
621 IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
622 work( 1 ) = one
623 work( 2 ) = zero
624 work( 3 ) = zero
625 work( 4 ) = zero
626 work( 5 ) = zero
627 work( 6 ) = zero
628 RETURN
629 END IF
630*
631* #:) Quick return for one-column matrix
632*
633 IF( n.EQ.1 ) THEN
634 IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
635 $ a( 1, 1 ), lda, ierr )
636 work( 1 ) = one / skl
637 IF( sva( 1 ).GE.sfmin ) THEN
638 work( 2 ) = one
639 ELSE
640 work( 2 ) = zero
641 END IF
642 work( 3 ) = zero
643 work( 4 ) = zero
644 work( 5 ) = zero
645 work( 6 ) = zero
646 RETURN
647 END IF
648*
649* Protect small singular values from underflow, and try to
650* avoid underflows/overflows in computing Jacobi rotations.
651*
652 sn = dsqrt( sfmin / epsln )
653 temp1 = dsqrt( big / dble( n ) )
654 IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
655 $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
656 temp1 = min( big, temp1 / aapp )
657* AAQQ = AAQQ*TEMP1
658* AAPP = AAPP*TEMP1
659 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
660 temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
661* AAQQ = AAQQ*TEMP1
662* AAPP = AAPP*TEMP1
663 ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
664 temp1 = max( sn / aaqq, temp1 / aapp )
665* AAQQ = AAQQ*TEMP1
666* AAPP = AAPP*TEMP1
667 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
668 temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
669* AAQQ = AAQQ*TEMP1
670* AAPP = AAPP*TEMP1
671 ELSE
672 temp1 = one
673 END IF
674*
675* Scale, if necessary
676*
677 IF( temp1.NE.one ) THEN
678 CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
679 END IF
680 skl= temp1*skl
681 IF( skl.NE.one ) THEN
682 CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
683 skl= one / skl
684 END IF
685*
686* Row-cyclic Jacobi SVD algorithm with column pivoting
687*
688 emptsw = ( n*( n-1 ) ) / 2
689 notrot = 0
690 fastr( 1 ) = zero
691*
692* A is represented in factored form A = A * diag(WORK), where diag(WORK)
693* is initialized to identity. WORK is updated during fast scaled
694* rotations.
695*
696 DO 1868 q = 1, n
697 work( q ) = one
698 1868 CONTINUE
699*
700*
701 swband = 3
702*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
703* if DGESVJ is used as a computational routine in the preconditioned
704* Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
705* works on pivots inside a band-like region around the diagonal.
706* The boundaries are determined dynamically, based on the number of
707* pivots above a threshold.
708*
709 kbl = min( 8, n )
710*[TP] KBL is a tuning parameter that defines the tile size in the
711* tiling of the p-q loops of pivot pairs. In general, an optimal
712* value of KBL depends on the matrix dimensions and on the
713* parameters of the computer's memory.
714*
715 nbl = n / kbl
716 IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
717*
718 blskip = kbl**2
719*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
720*
721 rowskip = min( 5, kbl )
722*[TP] ROWSKIP is a tuning parameter.
723*
724 lkahead = 1
725*[TP] LKAHEAD is a tuning parameter.
726*
727* Quasi block transformations, using the lower (upper) triangular
728* structure of the input matrix. The quasi-block-cycling usually
729* invokes cubic convergence. Big part of this cycle is done inside
730* canonical subspaces of dimensions less than M.
731*
732 IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
733*[TP] The number of partition levels and the actual partition are
734* tuning parameters.
735 n4 = n / 4
736 n2 = n / 2
737 n34 = 3*n4
738 IF( applv ) THEN
739 q = 0
740 ELSE
741 q = 1
742 END IF
743*
744 IF( lower ) THEN
745*
746* This works very well on lower triangular matrices, in particular
747* in the framework of the preconditioned Jacobi SVD (xGEJSV).
748* The idea is simple:
749* [+ 0 0 0] Note that Jacobi transformations of [0 0]
750* [+ + 0 0] [0 0]
751* [+ + x 0] actually work on [x 0] [x 0]
752* [+ + x x] [x x]. [x x]
753*
754 CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
755 $ work( n34+1 ), sva( n34+1 ), mvl,
756 $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
757 $ 2, work( n+1 ), lwork-n, ierr )
758*
759 CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
760 $ work( n2+1 ), sva( n2+1 ), mvl,
761 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
762 $ work( n+1 ), lwork-n, ierr )
763*
764 CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
765 $ work( n2+1 ), sva( n2+1 ), mvl,
766 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
767 $ work( n+1 ), lwork-n, ierr )
768*
769 CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
770 $ work( n4+1 ), sva( n4+1 ), mvl,
771 $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
772 $ work( n+1 ), lwork-n, ierr )
773*
774 CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
775 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
776 $ ierr )
777*
778 CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
779 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
780 $ lwork-n, ierr )
781*
782*
783 ELSE IF( upper ) THEN
784*
785*
786 CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v,
787 $ ldv,
788 $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
789 $ ierr )
790*
791 CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda,
792 $ work( n4+1 ),
793 $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
794 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
795 $ ierr )
796*
797 CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
798 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
799 $ lwork-n, ierr )
800*
801 CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
802 $ work( n2+1 ), sva( n2+1 ), mvl,
803 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
804 $ work( n+1 ), lwork-n, ierr )
805
806 END IF
807*
808 END IF
809*
810* .. Row-cyclic pivot strategy with de Rijk's pivoting ..
811*
812 DO 1993 i = 1, nsweep
813*
814* .. go go go ...
815*
816 mxaapq = zero
817 mxsinj = zero
818 iswrot = 0
819*
820 notrot = 0
821 pskipped = 0
822*
823* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
824* 1 <= p < q <= N. This is the first step toward a blocked implementation
825* of the rotations. New implementation, based on block transformations,
826* is under development.
827*
828 DO 2000 ibr = 1, nbl
829*
830 igl = ( ibr-1 )*kbl + 1
831*
832 DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
833*
834 igl = igl + ir1*kbl
835*
836 DO 2001 p = igl, min( igl+kbl-1, n-1 )
837*
838* .. de Rijk's pivoting
839*
840 q = idamax( n-p+1, sva( p ), 1 ) + p - 1
841 IF( p.NE.q ) THEN
842 CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
843 IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
844 $ v( 1, q ), 1 )
845 temp1 = sva( p )
846 sva( p ) = sva( q )
847 sva( q ) = temp1
848 temp1 = work( p )
849 work( p ) = work( q )
850 work( q ) = temp1
851 END IF
852*
853 IF( ir1.EQ.0 ) THEN
854*
855* Column norms are periodically updated by explicit
856* norm computation.
857* Caveat:
858* Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
859* as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
860* overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
861* underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
862* Hence, DNRM2 cannot be trusted, not even in the case when
863* the true norm is far from the under(over)flow boundaries.
864* If properly implemented DNRM2 is available, the IF-THEN-ELSE
865* below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
866*
867 IF( ( sva( p ).LT.rootbig ) .AND.
868 $ ( sva( p ).GT.rootsfmin ) ) THEN
869 sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
870 ELSE
871 temp1 = zero
872 aapp = one
873 CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
874 sva( p ) = temp1*dsqrt( aapp )*work( p )
875 END IF
876 aapp = sva( p )
877 ELSE
878 aapp = sva( p )
879 END IF
880*
881 IF( aapp.GT.zero ) THEN
882*
883 pskipped = 0
884*
885 DO 2002 q = p + 1, min( igl+kbl-1, n )
886*
887 aaqq = sva( q )
888*
889 IF( aaqq.GT.zero ) THEN
890*
891 aapp0 = aapp
892 IF( aaqq.GE.one ) THEN
893 rotok = ( small*aapp ).LE.aaqq
894 IF( aapp.LT.( big / aaqq ) ) THEN
895 aapq = ( ddot( m, a( 1, p ), 1,
896 $ a( 1,
897 $ q ), 1 )*work( p )*work( q ) /
898 $ aaqq ) / aapp
899 ELSE
900 CALL dcopy( m, a( 1, p ), 1,
901 $ work( n+1 ), 1 )
902 CALL dlascl( 'G', 0, 0, aapp,
903 $ work( p ), m, 1,
904 $ work( n+1 ), lda, ierr )
905 aapq = ddot( m, work( n+1 ), 1,
906 $ a( 1, q ), 1 )*work( q ) / aaqq
907 END IF
908 ELSE
909 rotok = aapp.LE.( aaqq / small )
910 IF( aapp.GT.( small / aaqq ) ) THEN
911 aapq = ( ddot( m, a( 1, p ), 1,
912 $ a( 1,
913 $ q ), 1 )*work( p )*work( q ) /
914 $ aaqq ) / aapp
915 ELSE
916 CALL dcopy( m, a( 1, q ), 1,
917 $ work( n+1 ), 1 )
918 CALL dlascl( 'G', 0, 0, aaqq,
919 $ work( q ), m, 1,
920 $ work( n+1 ), lda, ierr )
921 aapq = ddot( m, work( n+1 ), 1,
922 $ a( 1, p ), 1 )*work( p ) / aapp
923 END IF
924 END IF
925*
926 mxaapq = max( mxaapq, dabs( aapq ) )
927*
928* TO rotate or NOT to rotate, THAT is the question ...
929*
930 IF( dabs( aapq ).GT.tol ) THEN
931*
932* .. rotate
933*[RTD] ROTATED = ROTATED + ONE
934*
935 IF( ir1.EQ.0 ) THEN
936 notrot = 0
937 pskipped = 0
938 iswrot = iswrot + 1
939 END IF
940*
941 IF( rotok ) THEN
942*
943 aqoap = aaqq / aapp
944 apoaq = aapp / aaqq
945 theta = -half*dabs(aqoap-apoaq)/aapq
946*
947 IF( dabs( theta ).GT.bigtheta ) THEN
948*
949 t = half / theta
950 fastr( 3 ) = t*work( p ) / work( q )
951 fastr( 4 ) = -t*work( q ) /
952 $ work( p )
953 CALL drotm( m, a( 1, p ), 1,
954 $ a( 1, q ), 1, fastr )
955 IF( rsvec )CALL drotm( mvl,
956 $ v( 1, p ), 1,
957 $ v( 1, q ), 1,
958 $ fastr )
959 sva( q ) = aaqq*dsqrt( max( zero,
960 $ one+t*apoaq*aapq ) )
961 aapp = aapp*dsqrt( max( zero,
962 $ one-t*aqoap*aapq ) )
963 mxsinj = max( mxsinj, dabs( t ) )
964*
965 ELSE
966*
967* .. choose correct signum for THETA and rotate
968*
969 thsign = -dsign( one, aapq )
970 t = one / ( theta+thsign*
971 $ dsqrt( one+theta*theta ) )
972 cs = dsqrt( one / ( one+t*t ) )
973 sn = t*cs
974*
975 mxsinj = max( mxsinj, dabs( sn ) )
976 sva( q ) = aaqq*dsqrt( max( zero,
977 $ one+t*apoaq*aapq ) )
978 aapp = aapp*dsqrt( max( zero,
979 $ one-t*aqoap*aapq ) )
980*
981 apoaq = work( p ) / work( q )
982 aqoap = work( q ) / work( p )
983 IF( work( p ).GE.one ) THEN
984 IF( work( q ).GE.one ) THEN
985 fastr( 3 ) = t*apoaq
986 fastr( 4 ) = -t*aqoap
987 work( p ) = work( p )*cs
988 work( q ) = work( q )*cs
989 CALL drotm( m, a( 1, p ),
990 $ 1,
991 $ a( 1, q ), 1,
992 $ fastr )
993 IF( rsvec )CALL drotm( mvl,
994 $ v( 1, p ), 1, v( 1, q ),
995 $ 1, fastr )
996 ELSE
997 CALL daxpy( m, -t*aqoap,
998 $ a( 1, q ), 1,
999 $ a( 1, p ), 1 )
1000 CALL daxpy( m, cs*sn*apoaq,
1001 $ a( 1, p ), 1,
1002 $ a( 1, q ), 1 )
1003 work( p ) = work( p )*cs
1004 work( q ) = work( q ) / cs
1005 IF( rsvec ) THEN
1006 CALL daxpy( mvl,
1007 $ -t*aqoap,
1008 $ v( 1, q ), 1,
1009 $ v( 1, p ), 1 )
1010 CALL daxpy( mvl,
1011 $ cs*sn*apoaq,
1012 $ v( 1, p ), 1,
1013 $ v( 1, q ), 1 )
1014 END IF
1015 END IF
1016 ELSE
1017 IF( work( q ).GE.one ) THEN
1018 CALL daxpy( m, t*apoaq,
1019 $ a( 1, p ), 1,
1020 $ a( 1, q ), 1 )
1021 CALL daxpy( m,
1022 $ -cs*sn*aqoap,
1023 $ a( 1, q ), 1,
1024 $ a( 1, p ), 1 )
1025 work( p ) = work( p ) / cs
1026 work( q ) = work( q )*cs
1027 IF( rsvec ) THEN
1028 CALL daxpy( mvl,
1029 $ t*apoaq,
1030 $ v( 1, p ), 1,
1031 $ v( 1, q ), 1 )
1032 CALL daxpy( mvl,
1033 $ -cs*sn*aqoap,
1034 $ v( 1, q ), 1,
1035 $ v( 1, p ), 1 )
1036 END IF
1037 ELSE
1038 IF( work( p ).GE.work( q ) )
1039 $ THEN
1040 CALL daxpy( m, -t*aqoap,
1041 $ a( 1, q ), 1,
1042 $ a( 1, p ), 1 )
1043 CALL daxpy( m,
1044 $ cs*sn*apoaq,
1045 $ a( 1, p ), 1,
1046 $ a( 1, q ), 1 )
1047 work( p ) = work( p )*cs
1048 work( q ) = work( q ) / cs
1049 IF( rsvec ) THEN
1050 CALL daxpy( mvl,
1051 $ -t*aqoap,
1052 $ v( 1, q ), 1,
1053 $ v( 1, p ), 1 )
1054 CALL daxpy( mvl,
1055 $ cs*sn*apoaq,
1056 $ v( 1, p ), 1,
1057 $ v( 1, q ), 1 )
1058 END IF
1059 ELSE
1060 CALL daxpy( m, t*apoaq,
1061 $ a( 1, p ), 1,
1062 $ a( 1, q ), 1 )
1063 CALL daxpy( m,
1064 $ -cs*sn*aqoap,
1065 $ a( 1, q ), 1,
1066 $ a( 1, p ), 1 )
1067 work( p ) = work( p ) / cs
1068 work( q ) = work( q )*cs
1069 IF( rsvec ) THEN
1070 CALL daxpy( mvl,
1071 $ t*apoaq, v( 1, p ),
1072 $ 1, v( 1, q ), 1 )
1073 CALL daxpy( mvl,
1074 $ -cs*sn*aqoap,
1075 $ v( 1, q ), 1,
1076 $ v( 1, p ), 1 )
1077 END IF
1078 END IF
1079 END IF
1080 END IF
1081 END IF
1082*
1083 ELSE
1084* .. have to use modified Gram-Schmidt like transformation
1085 CALL dcopy( m, a( 1, p ), 1,
1086 $ work( n+1 ), 1 )
1087 CALL dlascl( 'G', 0, 0, aapp, one,
1088 $ m,
1089 $ 1, work( n+1 ), lda,
1090 $ ierr )
1091 CALL dlascl( 'G', 0, 0, aaqq, one,
1092 $ m,
1093 $ 1, a( 1, q ), lda, ierr )
1094 temp1 = -aapq*work( p ) / work( q )
1095 CALL daxpy( m, temp1, work( n+1 ),
1096 $ 1,
1097 $ a( 1, q ), 1 )
1098 CALL dlascl( 'G', 0, 0, one, aaqq,
1099 $ m,
1100 $ 1, a( 1, q ), lda, ierr )
1101 sva( q ) = aaqq*dsqrt( max( zero,
1102 $ one-aapq*aapq ) )
1103 mxsinj = max( mxsinj, sfmin )
1104 END IF
1105* END IF ROTOK THEN ... ELSE
1106*
1107* In the case of cancellation in updating SVA(q), SVA(p)
1108* recompute SVA(q), SVA(p).
1109*
1110 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1111 $ THEN
1112 IF( ( aaqq.LT.rootbig ) .AND.
1113 $ ( aaqq.GT.rootsfmin ) ) THEN
1114 sva( q ) = dnrm2( m, a( 1, q ),
1115 $ 1 )*
1116 $ work( q )
1117 ELSE
1118 t = zero
1119 aaqq = one
1120 CALL dlassq( m, a( 1, q ), 1, t,
1121 $ aaqq )
1122 sva( q ) = t*dsqrt( aaqq )*work( q )
1123 END IF
1124 END IF
1125 IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1126 IF( ( aapp.LT.rootbig ) .AND.
1127 $ ( aapp.GT.rootsfmin ) ) THEN
1128 aapp = dnrm2( m, a( 1, p ), 1 )*
1129 $ work( p )
1130 ELSE
1131 t = zero
1132 aapp = one
1133 CALL dlassq( m, a( 1, p ), 1, t,
1134 $ aapp )
1135 aapp = t*dsqrt( aapp )*work( p )
1136 END IF
1137 sva( p ) = aapp
1138 END IF
1139*
1140 ELSE
1141* A(:,p) and A(:,q) already numerically orthogonal
1142 IF( ir1.EQ.0 )notrot = notrot + 1
1143*[RTD] SKIPPED = SKIPPED + 1
1144 pskipped = pskipped + 1
1145 END IF
1146 ELSE
1147* A(:,q) is zero column
1148 IF( ir1.EQ.0 )notrot = notrot + 1
1149 pskipped = pskipped + 1
1150 END IF
1151*
1152 IF( ( i.LE.swband ) .AND.
1153 $ ( pskipped.GT.rowskip ) ) THEN
1154 IF( ir1.EQ.0 )aapp = -aapp
1155 notrot = 0
1156 GO TO 2103
1157 END IF
1158*
1159 2002 CONTINUE
1160* END q-LOOP
1161*
1162 2103 CONTINUE
1163* bailed out of q-loop
1164*
1165 sva( p ) = aapp
1166*
1167 ELSE
1168 sva( p ) = aapp
1169 IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1170 $ notrot = notrot + min( igl+kbl-1, n ) - p
1171 END IF
1172*
1173 2001 CONTINUE
1174* end of the p-loop
1175* end of doing the block ( ibr, ibr )
1176 1002 CONTINUE
1177* end of ir1-loop
1178*
1179* ... go to the off diagonal blocks
1180*
1181 igl = ( ibr-1 )*kbl + 1
1182*
1183 DO 2010 jbc = ibr + 1, nbl
1184*
1185 jgl = ( jbc-1 )*kbl + 1
1186*
1187* doing the block at ( ibr, jbc )
1188*
1189 ijblsk = 0
1190 DO 2100 p = igl, min( igl+kbl-1, n )
1191*
1192 aapp = sva( p )
1193 IF( aapp.GT.zero ) THEN
1194*
1195 pskipped = 0
1196*
1197 DO 2200 q = jgl, min( jgl+kbl-1, n )
1198*
1199 aaqq = sva( q )
1200 IF( aaqq.GT.zero ) THEN
1201 aapp0 = aapp
1202*
1203* .. M x 2 Jacobi SVD ..
1204*
1205* Safe Gram matrix computation
1206*
1207 IF( aaqq.GE.one ) THEN
1208 IF( aapp.GE.aaqq ) THEN
1209 rotok = ( small*aapp ).LE.aaqq
1210 ELSE
1211 rotok = ( small*aaqq ).LE.aapp
1212 END IF
1213 IF( aapp.LT.( big / aaqq ) ) THEN
1214 aapq = ( ddot( m, a( 1, p ), 1,
1215 $ a( 1,
1216 $ q ), 1 )*work( p )*work( q ) /
1217 $ aaqq ) / aapp
1218 ELSE
1219 CALL dcopy( m, a( 1, p ), 1,
1220 $ work( n+1 ), 1 )
1221 CALL dlascl( 'G', 0, 0, aapp,
1222 $ work( p ), m, 1,
1223 $ work( n+1 ), lda, ierr )
1224 aapq = ddot( m, work( n+1 ), 1,
1225 $ a( 1, q ), 1 )*work( q ) / aaqq
1226 END IF
1227 ELSE
1228 IF( aapp.GE.aaqq ) THEN
1229 rotok = aapp.LE.( aaqq / small )
1230 ELSE
1231 rotok = aaqq.LE.( aapp / small )
1232 END IF
1233 IF( aapp.GT.( small / aaqq ) ) THEN
1234 aapq = ( ddot( m, a( 1, p ), 1,
1235 $ a( 1,
1236 $ q ), 1 )*work( p )*work( q ) /
1237 $ aaqq ) / aapp
1238 ELSE
1239 CALL dcopy( m, a( 1, q ), 1,
1240 $ work( n+1 ), 1 )
1241 CALL dlascl( 'G', 0, 0, aaqq,
1242 $ work( q ), m, 1,
1243 $ work( n+1 ), lda, ierr )
1244 aapq = ddot( m, work( n+1 ), 1,
1245 $ a( 1, p ), 1 )*work( p ) / aapp
1246 END IF
1247 END IF
1248*
1249 mxaapq = max( mxaapq, dabs( aapq ) )
1250*
1251* TO rotate or NOT to rotate, THAT is the question ...
1252*
1253 IF( dabs( aapq ).GT.tol ) THEN
1254 notrot = 0
1255*[RTD] ROTATED = ROTATED + 1
1256 pskipped = 0
1257 iswrot = iswrot + 1
1258*
1259 IF( rotok ) THEN
1260*
1261 aqoap = aaqq / aapp
1262 apoaq = aapp / aaqq
1263 theta = -half*dabs(aqoap-apoaq)/aapq
1264 IF( aaqq.GT.aapp0 )theta = -theta
1265*
1266 IF( dabs( theta ).GT.bigtheta ) THEN
1267 t = half / theta
1268 fastr( 3 ) = t*work( p ) / work( q )
1269 fastr( 4 ) = -t*work( q ) /
1270 $ work( p )
1271 CALL drotm( m, a( 1, p ), 1,
1272 $ a( 1, q ), 1, fastr )
1273 IF( rsvec )CALL drotm( mvl,
1274 $ v( 1, p ), 1,
1275 $ v( 1, q ), 1,
1276 $ fastr )
1277 sva( q ) = aaqq*dsqrt( max( zero,
1278 $ one+t*apoaq*aapq ) )
1279 aapp = aapp*dsqrt( max( zero,
1280 $ one-t*aqoap*aapq ) )
1281 mxsinj = max( mxsinj, dabs( t ) )
1282 ELSE
1283*
1284* .. choose correct signum for THETA and rotate
1285*
1286 thsign = -dsign( one, aapq )
1287 IF( aaqq.GT.aapp0 )thsign = -thsign
1288 t = one / ( theta+thsign*
1289 $ dsqrt( one+theta*theta ) )
1290 cs = dsqrt( one / ( one+t*t ) )
1291 sn = t*cs
1292 mxsinj = max( mxsinj, dabs( sn ) )
1293 sva( q ) = aaqq*dsqrt( max( zero,
1294 $ one+t*apoaq*aapq ) )
1295 aapp = aapp*dsqrt( max( zero,
1296 $ one-t*aqoap*aapq ) )
1297*
1298 apoaq = work( p ) / work( q )
1299 aqoap = work( q ) / work( p )
1300 IF( work( p ).GE.one ) THEN
1301*
1302 IF( work( q ).GE.one ) THEN
1303 fastr( 3 ) = t*apoaq
1304 fastr( 4 ) = -t*aqoap
1305 work( p ) = work( p )*cs
1306 work( q ) = work( q )*cs
1307 CALL drotm( m, a( 1, p ),
1308 $ 1,
1309 $ a( 1, q ), 1,
1310 $ fastr )
1311 IF( rsvec )CALL drotm( mvl,
1312 $ v( 1, p ), 1, v( 1, q ),
1313 $ 1, fastr )
1314 ELSE
1315 CALL daxpy( m, -t*aqoap,
1316 $ a( 1, q ), 1,
1317 $ a( 1, p ), 1 )
1318 CALL daxpy( m, cs*sn*apoaq,
1319 $ a( 1, p ), 1,
1320 $ a( 1, q ), 1 )
1321 IF( rsvec ) THEN
1322 CALL daxpy( mvl,
1323 $ -t*aqoap,
1324 $ v( 1, q ), 1,
1325 $ v( 1, p ), 1 )
1326 CALL daxpy( mvl,
1327 $ cs*sn*apoaq,
1328 $ v( 1, p ), 1,
1329 $ v( 1, q ), 1 )
1330 END IF
1331 work( p ) = work( p )*cs
1332 work( q ) = work( q ) / cs
1333 END IF
1334 ELSE
1335 IF( work( q ).GE.one ) THEN
1336 CALL daxpy( m, t*apoaq,
1337 $ a( 1, p ), 1,
1338 $ a( 1, q ), 1 )
1339 CALL daxpy( m,
1340 $ -cs*sn*aqoap,
1341 $ a( 1, q ), 1,
1342 $ a( 1, p ), 1 )
1343 IF( rsvec ) THEN
1344 CALL daxpy( mvl,
1345 $ t*apoaq,
1346 $ v( 1, p ), 1,
1347 $ v( 1, q ), 1 )
1348 CALL daxpy( mvl,
1349 $ -cs*sn*aqoap,
1350 $ v( 1, q ), 1,
1351 $ v( 1, p ), 1 )
1352 END IF
1353 work( p ) = work( p ) / cs
1354 work( q ) = work( q )*cs
1355 ELSE
1356 IF( work( p ).GE.work( q ) )
1357 $ THEN
1358 CALL daxpy( m, -t*aqoap,
1359 $ a( 1, q ), 1,
1360 $ a( 1, p ), 1 )
1361 CALL daxpy( m,
1362 $ cs*sn*apoaq,
1363 $ a( 1, p ), 1,
1364 $ a( 1, q ), 1 )
1365 work( p ) = work( p )*cs
1366 work( q ) = work( q ) / cs
1367 IF( rsvec ) THEN
1368 CALL daxpy( mvl,
1369 $ -t*aqoap,
1370 $ v( 1, q ), 1,
1371 $ v( 1, p ), 1 )
1372 CALL daxpy( mvl,
1373 $ cs*sn*apoaq,
1374 $ v( 1, p ), 1,
1375 $ v( 1, q ), 1 )
1376 END IF
1377 ELSE
1378 CALL daxpy( m, t*apoaq,
1379 $ a( 1, p ), 1,
1380 $ a( 1, q ), 1 )
1381 CALL daxpy( m,
1382 $ -cs*sn*aqoap,
1383 $ a( 1, q ), 1,
1384 $ a( 1, p ), 1 )
1385 work( p ) = work( p ) / cs
1386 work( q ) = work( q )*cs
1387 IF( rsvec ) THEN
1388 CALL daxpy( mvl,
1389 $ t*apoaq, v( 1, p ),
1390 $ 1, v( 1, q ), 1 )
1391 CALL daxpy( mvl,
1392 $ -cs*sn*aqoap,
1393 $ v( 1, q ), 1,
1394 $ v( 1, p ), 1 )
1395 END IF
1396 END IF
1397 END IF
1398 END IF
1399 END IF
1400*
1401 ELSE
1402 IF( aapp.GT.aaqq ) THEN
1403 CALL dcopy( m, a( 1, p ), 1,
1404 $ work( n+1 ), 1 )
1405 CALL dlascl( 'G', 0, 0, aapp,
1406 $ one,
1407 $ m, 1, work( n+1 ), lda,
1408 $ ierr )
1409 CALL dlascl( 'G', 0, 0, aaqq,
1410 $ one,
1411 $ m, 1, a( 1, q ), lda,
1412 $ ierr )
1413 temp1 = -aapq*work( p ) / work( q )
1414 CALL daxpy( m, temp1,
1415 $ work( n+1 ),
1416 $ 1, a( 1, q ), 1 )
1417 CALL dlascl( 'G', 0, 0, one,
1418 $ aaqq,
1419 $ m, 1, a( 1, q ), lda,
1420 $ ierr )
1421 sva( q ) = aaqq*dsqrt( max( zero,
1422 $ one-aapq*aapq ) )
1423 mxsinj = max( mxsinj, sfmin )
1424 ELSE
1425 CALL dcopy( m, a( 1, q ), 1,
1426 $ work( n+1 ), 1 )
1427 CALL dlascl( 'G', 0, 0, aaqq,
1428 $ one,
1429 $ m, 1, work( n+1 ), lda,
1430 $ ierr )
1431 CALL dlascl( 'G', 0, 0, aapp,
1432 $ one,
1433 $ m, 1, a( 1, p ), lda,
1434 $ ierr )
1435 temp1 = -aapq*work( q ) / work( p )
1436 CALL daxpy( m, temp1,
1437 $ work( n+1 ),
1438 $ 1, a( 1, p ), 1 )
1439 CALL dlascl( 'G', 0, 0, one,
1440 $ aapp,
1441 $ m, 1, a( 1, p ), lda,
1442 $ ierr )
1443 sva( p ) = aapp*dsqrt( max( zero,
1444 $ one-aapq*aapq ) )
1445 mxsinj = max( mxsinj, sfmin )
1446 END IF
1447 END IF
1448* END IF ROTOK THEN ... ELSE
1449*
1450* In the case of cancellation in updating SVA(q)
1451* .. recompute SVA(q)
1452 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1453 $ THEN
1454 IF( ( aaqq.LT.rootbig ) .AND.
1455 $ ( aaqq.GT.rootsfmin ) ) THEN
1456 sva( q ) = dnrm2( m, a( 1, q ),
1457 $ 1 )*
1458 $ work( q )
1459 ELSE
1460 t = zero
1461 aaqq = one
1462 CALL dlassq( m, a( 1, q ), 1, t,
1463 $ aaqq )
1464 sva( q ) = t*dsqrt( aaqq )*work( q )
1465 END IF
1466 END IF
1467 IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1468 IF( ( aapp.LT.rootbig ) .AND.
1469 $ ( aapp.GT.rootsfmin ) ) THEN
1470 aapp = dnrm2( m, a( 1, p ), 1 )*
1471 $ work( p )
1472 ELSE
1473 t = zero
1474 aapp = one
1475 CALL dlassq( m, a( 1, p ), 1, t,
1476 $ aapp )
1477 aapp = t*dsqrt( aapp )*work( p )
1478 END IF
1479 sva( p ) = aapp
1480 END IF
1481* end of OK rotation
1482 ELSE
1483 notrot = notrot + 1
1484*[RTD] SKIPPED = SKIPPED + 1
1485 pskipped = pskipped + 1
1486 ijblsk = ijblsk + 1
1487 END IF
1488 ELSE
1489 notrot = notrot + 1
1490 pskipped = pskipped + 1
1491 ijblsk = ijblsk + 1
1492 END IF
1493*
1494 IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1495 $ THEN
1496 sva( p ) = aapp
1497 notrot = 0
1498 GO TO 2011
1499 END IF
1500 IF( ( i.LE.swband ) .AND.
1501 $ ( pskipped.GT.rowskip ) ) THEN
1502 aapp = -aapp
1503 notrot = 0
1504 GO TO 2203
1505 END IF
1506*
1507 2200 CONTINUE
1508* end of the q-loop
1509 2203 CONTINUE
1510*
1511 sva( p ) = aapp
1512*
1513 ELSE
1514*
1515 IF( aapp.EQ.zero )notrot = notrot +
1516 $ min( jgl+kbl-1, n ) - jgl + 1
1517 IF( aapp.LT.zero )notrot = 0
1518*
1519 END IF
1520*
1521 2100 CONTINUE
1522* end of the p-loop
1523 2010 CONTINUE
1524* end of the jbc-loop
1525 2011 CONTINUE
1526*2011 bailed out of the jbc-loop
1527 DO 2012 p = igl, min( igl+kbl-1, n )
1528 sva( p ) = dabs( sva( p ) )
1529 2012 CONTINUE
1530***
1531 2000 CONTINUE
1532*2000 :: end of the ibr-loop
1533*
1534* .. update SVA(N)
1535 IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1536 $ THEN
1537 sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1538 ELSE
1539 t = zero
1540 aapp = one
1541 CALL dlassq( m, a( 1, n ), 1, t, aapp )
1542 sva( n ) = t*dsqrt( aapp )*work( n )
1543 END IF
1544*
1545* Additional steering devices
1546*
1547 IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1548 $ ( iswrot.LE.n ) ) )swband = i
1549*
1550 IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1551 $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1552 GO TO 1994
1553 END IF
1554*
1555 IF( notrot.GE.emptsw )GO TO 1994
1556*
1557 1993 CONTINUE
1558* end i=1:NSWEEP loop
1559*
1560* #:( Reaching this point means that the procedure has not converged.
1561 info = nsweep - 1
1562 GO TO 1995
1563*
1564 1994 CONTINUE
1565* #:) Reaching this point means numerical convergence after the i-th
1566* sweep.
1567*
1568 info = 0
1569* #:) INFO = 0 confirms successful iterations.
1570 1995 CONTINUE
1571*
1572* Sort the singular values and find how many are above
1573* the underflow threshold.
1574*
1575 n2 = 0
1576 n4 = 0
1577 DO 5991 p = 1, n - 1
1578 q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1579 IF( p.NE.q ) THEN
1580 temp1 = sva( p )
1581 sva( p ) = sva( q )
1582 sva( q ) = temp1
1583 temp1 = work( p )
1584 work( p ) = work( q )
1585 work( q ) = temp1
1586 CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1587 IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1588 END IF
1589 IF( sva( p ).NE.zero ) THEN
1590 n4 = n4 + 1
1591 IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1592 END IF
1593 5991 CONTINUE
1594 IF( sva( n ).NE.zero ) THEN
1595 n4 = n4 + 1
1596 IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1597 END IF
1598*
1599* Normalize the left singular vectors.
1600*
1601 IF( lsvec .OR. uctol ) THEN
1602 DO 1998 p = 1, n2
1603 CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1604 1998 CONTINUE
1605 END IF
1606*
1607* Scale the product of Jacobi rotations (assemble the fast rotations).
1608*
1609 IF( rsvec ) THEN
1610 IF( applv ) THEN
1611 DO 2398 p = 1, n
1612 CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1613 2398 CONTINUE
1614 ELSE
1615 DO 2399 p = 1, n
1616 temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1617 CALL dscal( mvl, temp1, v( 1, p ), 1 )
1618 2399 CONTINUE
1619 END IF
1620 END IF
1621*
1622* Undo scaling, if necessary (and possible).
1623 IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1624 $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1625 $ ( sfmin / skl) ) ) ) THEN
1626 DO 2400 p = 1, n
1627 sva( p ) = skl*sva( p )
1628 2400 CONTINUE
1629 skl= one
1630 END IF
1631*
1632 work( 1 ) = skl
1633* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1634* then some of the singular values may overflow or underflow and
1635* the spectrum is given in this factored representation.
1636*
1637 work( 2 ) = dble( n4 )
1638* N4 is the number of computed nonzero singular values of A.
1639*
1640 work( 3 ) = dble( n2 )
1641* N2 is the number of singular values of A greater than SFMIN.
1642* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1643* that may carry some information.
1644*
1645 work( 4 ) = dble( i )
1646* i is the index of the last sweep before declaring convergence.
1647*
1648 work( 5 ) = mxaapq
1649* MXAAPQ is the largest absolute value of scaled pivots in the
1650* last sweep
1651*
1652 work( 6 ) = mxsinj
1653* MXSINJ is the largest absolute value of the sines of Jacobi angles
1654* in the last sweep
1655*
1656 RETURN
1657* ..
1658* .. END OF DGESVJ
1659* ..
1660 END
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine daxpy(n, da, dx, incx, dy, incy)
DAXPY
Definition daxpy.f:89
subroutine dcopy(n, dx, incx, dy, incy)
DCOPY
Definition dcopy.f:82
subroutine dgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
DGESVJ
Definition dgesvj.f:340
subroutine dgsvj0(jobv, m, n, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
DGSVJ0 pre-processor for the routine dgesvj.
Definition dgsvj0.f:216
subroutine dgsvj1(jobv, m, n, n1, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition dgsvj1.f:234
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:142
subroutine dlaset(uplo, m, n, alpha, beta, a, lda)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition dlaset.f:108
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:122
subroutine drotm(n, dx, incx, dy, incy, dparam)
DROTM
Definition drotm.f:96
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine dswap(n, dx, incx, dy, incy)
DSWAP
Definition dswap.f:82