LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
sgesvj.f
Go to the documentation of this file.
1*> \brief \b SGESVJ
2*
3* =========== DOCUMENTATION ===========
4*
5* Online html documentation available at
6* http://www.netlib.org/lapack/explore-html/
7*
8*> \htmlonly
9*> Download SGESVJ + dependencies
10*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/sgesvj.f">
11*> [TGZ]</a>
12*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/sgesvj.f">
13*> [ZIP]</a>
14*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/sgesvj.f">
15*> [TXT]</a>
16*> \endhtmlonly
17*
18* Definition:
19* ===========
20*
21* SUBROUTINE SGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
22* LDV, WORK, LWORK, INFO )
23*
24* .. Scalar Arguments ..
25* INTEGER INFO, LDA, LDV, LWORK, M, MV, N
26* CHARACTER*1 JOBA, JOBU, JOBV
27* ..
28* .. Array Arguments ..
29* REAL A( LDA, * ), SVA( N ), V( LDV, * ),
30* $ WORK( LWORK )
31* ..
32*
33*
34*> \par Purpose:
35* =============
36*>
37*> \verbatim
38*>
39*> SGESVJ computes the singular value decomposition (SVD) of a real
40*> M-by-N matrix A, where M >= N. The SVD of A is written as
41*> [++] [xx] [x0] [xx]
42*> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
43*> [++] [xx]
44*> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
45*> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
46*> of SIGMA are the singular values of A. The columns of U and V are the
47*> left and the right singular vectors of A, respectively.
48*> SGESVJ can sometimes compute tiny singular values and their singular vectors much
49*> more accurately than other SVD routines, see below under Further Details.
50*> \endverbatim
51*
52* Arguments:
53* ==========
54*
55*> \param[in] JOBA
56*> \verbatim
57*> JOBA is CHARACTER*1
58*> Specifies the structure of A.
59*> = 'L': The input matrix A is lower triangular;
60*> = 'U': The input matrix A is upper triangular;
61*> = 'G': The input matrix A is general M-by-N matrix, M >= N.
62*> \endverbatim
63*>
64*> \param[in] JOBU
65*> \verbatim
66*> JOBU is CHARACTER*1
67*> Specifies whether to compute the left singular vectors
68*> (columns of U):
69*> = 'U': The left singular vectors corresponding to the nonzero
70*> singular values are computed and returned in the leading
71*> columns of A. See more details in the description of A.
72*> The default numerical orthogonality threshold is set to
73*> approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E').
74*> = 'C': Analogous to JOBU='U', except that user can control the
75*> level of numerical orthogonality of the computed left
76*> singular vectors. TOL can be set to TOL = CTOL*EPS, where
77*> CTOL is given on input in the array WORK.
78*> No CTOL smaller than ONE is allowed. CTOL greater
79*> than 1 / EPS is meaningless. The option 'C'
80*> can be used if M*EPS is satisfactory orthogonality
81*> of the computed left singular vectors, so CTOL=M could
82*> save few sweeps of Jacobi rotations.
83*> See the descriptions of A and WORK(1).
84*> = 'N': The matrix U is not computed. However, see the
85*> description of A.
86*> \endverbatim
87*>
88*> \param[in] JOBV
89*> \verbatim
90*> JOBV is CHARACTER*1
91*> Specifies whether to compute the right singular vectors, that
92*> is, the matrix V:
93*> = 'V': the matrix V is computed and returned in the array V
94*> = 'A': the Jacobi rotations are applied to the MV-by-N
95*> array V. In other words, the right singular vector
96*> matrix V is not computed explicitly; instead it is
97*> applied to an MV-by-N matrix initially stored in the
98*> first MV rows of V.
99*> = 'N': the matrix V is not computed and the array V is not
100*> referenced
101*> \endverbatim
102*>
103*> \param[in] M
104*> \verbatim
105*> M is INTEGER
106*> The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0.
107*> \endverbatim
108*>
109*> \param[in] N
110*> \verbatim
111*> N is INTEGER
112*> The number of columns of the input matrix A.
113*> M >= N >= 0.
114*> \endverbatim
115*>
116*> \param[in,out] A
117*> \verbatim
118*> A is REAL array, dimension (LDA,N)
119*> On entry, the M-by-N matrix A.
120*> On exit,
121*> If JOBU = 'U' .OR. JOBU = 'C':
122*> If INFO = 0:
123*> RANKA orthonormal columns of U are returned in the
124*> leading RANKA columns of the array A. Here RANKA <= N
125*> is the number of computed singular values of A that are
126*> above the underflow threshold SLAMCH('S'). The singular
127*> vectors corresponding to underflowed or zero singular
128*> values are not computed. The value of RANKA is returned
129*> in the array WORK as RANKA=NINT(WORK(2)). Also see the
130*> descriptions of SVA and WORK. The computed columns of U
131*> are mutually numerically orthogonal up to approximately
132*> TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
133*> see the description of JOBU.
134*> If INFO > 0,
135*> the procedure SGESVJ did not converge in the given number
136*> of iterations (sweeps). In that case, the computed
137*> columns of U may not be orthogonal up to TOL. The output
138*> U (stored in A), SIGMA (given by the computed singular
139*> values in SVA(1:N)) and V is still a decomposition of the
140*> input matrix A in the sense that the residual
141*> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
142*> If JOBU = 'N':
143*> If INFO = 0:
144*> Note that the left singular vectors are 'for free' in the
145*> one-sided Jacobi SVD algorithm. However, if only the
146*> singular values are needed, the level of numerical
147*> orthogonality of U is not an issue and iterations are
148*> stopped when the columns of the iterated matrix are
149*> numerically orthogonal up to approximately M*EPS. Thus,
150*> on exit, A contains the columns of U scaled with the
151*> corresponding singular values.
152*> If INFO > 0:
153*> the procedure SGESVJ did not converge in the given number
154*> of iterations (sweeps).
155*> \endverbatim
156*>
157*> \param[in] LDA
158*> \verbatim
159*> LDA is INTEGER
160*> The leading dimension of the array A. LDA >= max(1,M).
161*> \endverbatim
162*>
163*> \param[out] SVA
164*> \verbatim
165*> SVA is REAL array, dimension (N)
166*> On exit,
167*> If INFO = 0 :
168*> depending on the value SCALE = WORK(1), we have:
169*> If SCALE = ONE:
170*> SVA(1:N) contains the computed singular values of A.
171*> During the computation SVA contains the Euclidean column
172*> norms of the iterated matrices in the array A.
173*> If SCALE .NE. ONE:
174*> The singular values of A are SCALE*SVA(1:N), and this
175*> factored representation is due to the fact that some of the
176*> singular values of A might underflow or overflow.
177*>
178*> If INFO > 0 :
179*> the procedure SGESVJ did not converge in the given number of
180*> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
181*> \endverbatim
182*>
183*> \param[in] MV
184*> \verbatim
185*> MV is INTEGER
186*> If JOBV = 'A', then the product of Jacobi rotations in SGESVJ
187*> is applied to the first MV rows of V. See the description of JOBV.
188*> \endverbatim
189*>
190*> \param[in,out] V
191*> \verbatim
192*> V is REAL array, dimension (LDV,N)
193*> If JOBV = 'V', then V contains on exit the N-by-N matrix of
194*> the right singular vectors;
195*> If JOBV = 'A', then V contains the product of the computed right
196*> singular vector matrix and the initial matrix in
197*> the array V.
198*> If JOBV = 'N', then V is not referenced.
199*> \endverbatim
200*>
201*> \param[in] LDV
202*> \verbatim
203*> LDV is INTEGER
204*> The leading dimension of the array V, LDV >= 1.
205*> If JOBV = 'V', then LDV >= max(1,N).
206*> If JOBV = 'A', then LDV >= max(1,MV) .
207*> \endverbatim
208*>
209*> \param[in,out] WORK
210*> \verbatim
211*> WORK is REAL array, dimension (LWORK)
212*> On entry,
213*> If JOBU = 'C' :
214*> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
215*> The process stops if all columns of A are mutually
216*> orthogonal up to CTOL*EPS, EPS=SLAMCH('E').
217*> It is required that CTOL >= ONE, i.e. it is not
218*> allowed to force the routine to obtain orthogonality
219*> below EPSILON.
220*> On exit,
221*> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
222*> are the computed singular vcalues of A.
223*> (See description of SVA().)
224*> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
225*> singular values.
226*> WORK(3) = NINT(WORK(3)) is the number of the computed singular
227*> values that are larger than the underflow threshold.
228*> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
229*> rotations needed for numerical convergence.
230*> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
231*> This is useful information in cases when SGESVJ did
232*> not converge, as it can be used to estimate whether
233*> the output is still useful and for post festum analysis.
234*> WORK(6) = the largest absolute value over all sines of the
235*> Jacobi rotation angles in the last sweep. It can be
236*> useful for a post festum analysis.
237*> \endverbatim
238*>
239*> \param[in] LWORK
240*> \verbatim
241*> LWORK is INTEGER
242*> length of WORK, WORK >= MAX(6,M+N)
243*> \endverbatim
244*>
245*> \param[out] INFO
246*> \verbatim
247*> INFO is INTEGER
248*> = 0: successful exit.
249*> < 0: if INFO = -i, then the i-th argument had an illegal value
250*> > 0: SGESVJ did not converge in the maximal allowed number (30)
251*> of sweeps. The output may still be useful. See the
252*> description of WORK.
253*> \endverbatim
254*
255* Authors:
256* ========
257*
258*> \author Univ. of Tennessee
259*> \author Univ. of California Berkeley
260*> \author Univ. of Colorado Denver
261*> \author NAG Ltd.
262*
263*> \ingroup gesvj
264*
265*> \par Further Details:
266* =====================
267*>
268*> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
269*> rotations. The rotations are implemented as fast scaled rotations of
270*> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
271*> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
272*> column interchanges of de Rijk [2]. The relative accuracy of the computed
273*> singular values and the accuracy of the computed singular vectors (in
274*> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
275*> The condition number that determines the accuracy in the full rank case
276*> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
277*> spectral condition number. The best performance of this Jacobi SVD
278*> procedure is achieved if used in an accelerated version of Drmac and
279*> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
280*> Some tuning parameters (marked with [TP]) are available for the
281*> implementer. \n
282*> The computational range for the nonzero singular values is the machine
283*> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
284*> denormalized singular values can be computed with the corresponding
285*> gradual loss of accurate digits.
286*>
287*> \par Contributors:
288* ==================
289*>
290*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
291*>
292*> \par References:
293* ================
294*>
295*> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. \n
296*> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. \n\n
297*> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
298*> singular value decomposition on a vector computer. \n
299*> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. \n\n
300*> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. \n
301*> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
302*> value computation in floating point arithmetic. \n
303*> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. \n\n
304*> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. \n
305*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. \n
306*> LAPACK Working note 169. \n\n
307*> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. \n
308*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. \n
309*> LAPACK Working note 170. \n\n
310*> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
311*> QSVD, (H,K)-SVD computations.\n
312*> Department of Mathematics, University of Zagreb, 2008.
313*>
314*> \par Bugs, Examples and Comments:
315* =================================
316*>
317*> Please report all bugs and send interesting test examples and comments to
318*> drmac@math.hr. Thank you.
319*
320* =====================================================================
321 SUBROUTINE sgesvj( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
322 $ LDV, WORK, LWORK, INFO )
323*
324* -- LAPACK computational routine --
325* -- LAPACK is a software package provided by Univ. of Tennessee, --
326* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
327*
328* .. Scalar Arguments ..
329 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
330 CHARACTER*1 JOBA, JOBU, JOBV
331* ..
332* .. Array Arguments ..
333 REAL A( LDA, * ), SVA( N ), V( LDV, * ),
334 $ work( lwork )
335* ..
336*
337* =====================================================================
338*
339* .. Local Parameters ..
340 REAL ZERO, HALF, ONE
341 parameter( zero = 0.0e0, half = 0.5e0, one = 1.0e0)
342 INTEGER NSWEEP
343 parameter( nsweep = 30 )
344* ..
345* .. Local Scalars ..
346 REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
347 $ bigtheta, cs, ctol, epsln, large, mxaapq,
348 $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
349 $ skl, sfmin, small, sn, t, temp1, theta,
350 $ thsign, tol
351 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
352 $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
353 $ n4, nbl, notrot, p, pskipped, q, rowskip,
354 $ swband
355 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
356 $ rsvec, uctol, upper
357* ..
358* .. Local Arrays ..
359 REAL FASTR( 5 )
360* ..
361* .. Intrinsic Functions ..
362 INTRINSIC abs, max, min, float, sign, sqrt
363* ..
364* .. External Functions ..
365* ..
366* from BLAS
367 REAL SDOT, SNRM2
368 EXTERNAL sdot, snrm2
369 INTEGER ISAMAX
370 EXTERNAL isamax
371* from LAPACK
372 REAL SLAMCH
373 EXTERNAL slamch
374 LOGICAL LSAME
375 EXTERNAL lsame
376* ..
377* .. External Subroutines ..
378* ..
379* from BLAS
380 EXTERNAL saxpy, scopy, srotm, sscal, sswap
381* from LAPACK
382 EXTERNAL slascl, slaset, slassq, xerbla
383*
384 EXTERNAL sgsvj0, sgsvj1
385* ..
386* .. Executable Statements ..
387*
388* Test the input arguments
389*
390 lsvec = lsame( jobu, 'U' )
391 uctol = lsame( jobu, 'C' )
392 rsvec = lsame( jobv, 'V' )
393 applv = lsame( jobv, 'A' )
394 upper = lsame( joba, 'U' )
395 lower = lsame( joba, 'L' )
396*
397 IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
398 info = -1
399 ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
400 info = -2
401 ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
402 info = -3
403 ELSE IF( m.LT.0 ) THEN
404 info = -4
405 ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
406 info = -5
407 ELSE IF( lda.LT.m ) THEN
408 info = -7
409 ELSE IF( mv.LT.0 ) THEN
410 info = -9
411 ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
412 $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
413 info = -11
414 ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
415 info = -12
416 ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
417 info = -13
418 ELSE
419 info = 0
420 END IF
421*
422* #:(
423 IF( info.NE.0 ) THEN
424 CALL xerbla( 'SGESVJ', -info )
425 RETURN
426 END IF
427*
428* #:) Quick return for void matrix
429*
430 IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
431*
432* Set numerical parameters
433* The stopping criterion for Jacobi rotations is
434*
435* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
436*
437* where EPS is the round-off and CTOL is defined as follows:
438*
439 IF( uctol ) THEN
440* ... user controlled
441 ctol = work( 1 )
442 ELSE
443* ... default
444 IF( lsvec .OR. rsvec .OR. applv ) THEN
445 ctol = sqrt( float( m ) )
446 ELSE
447 ctol = float( m )
448 END IF
449 END IF
450* ... and the machine dependent parameters are
451*[!] (Make sure that SLAMCH() works properly on the target machine.)
452*
453 epsln = slamch( 'Epsilon' )
454 rooteps = sqrt( epsln )
455 sfmin = slamch( 'SafeMinimum' )
456 rootsfmin = sqrt( sfmin )
457 small = sfmin / epsln
458 big = slamch( 'Overflow' )
459* BIG = ONE / SFMIN
460 rootbig = one / rootsfmin
461 large = big / sqrt( float( m*n ) )
462 bigtheta = one / rooteps
463*
464 tol = ctol*epsln
465 roottol = sqrt( tol )
466*
467 IF( float( m )*epsln.GE.one ) THEN
468 info = -4
469 CALL xerbla( 'SGESVJ', -info )
470 RETURN
471 END IF
472*
473* Initialize the right singular vector matrix.
474*
475 IF( rsvec ) THEN
476 mvl = n
477 CALL slaset( 'A', mvl, n, zero, one, v, ldv )
478 ELSE IF( applv ) THEN
479 mvl = mv
480 END IF
481 rsvec = rsvec .OR. applv
482*
483* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
484*(!) If necessary, scale A to protect the largest singular value
485* from overflow. It is possible that saving the largest singular
486* value destroys the information about the small ones.
487* This initial scaling is almost minimal in the sense that the
488* goal is to make sure that no column norm overflows, and that
489* SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
490* in A are detected, the procedure returns with INFO=-6.
491*
492 skl = one / sqrt( float( m )*float( n ) )
493 noscale = .true.
494 goscale = .true.
495*
496 IF( lower ) THEN
497* the input matrix is M-by-N lower triangular (trapezoidal)
498 DO 1874 p = 1, n
499 aapp = zero
500 aaqq = one
501 CALL slassq( m-p+1, a( p, p ), 1, aapp, aaqq )
502 IF( aapp.GT.big ) THEN
503 info = -6
504 CALL xerbla( 'SGESVJ', -info )
505 RETURN
506 END IF
507 aaqq = sqrt( aaqq )
508 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
509 sva( p ) = aapp*aaqq
510 ELSE
511 noscale = .false.
512 sva( p ) = aapp*( aaqq*skl )
513 IF( goscale ) THEN
514 goscale = .false.
515 DO 1873 q = 1, p - 1
516 sva( q ) = sva( q )*skl
517 1873 CONTINUE
518 END IF
519 END IF
520 1874 CONTINUE
521 ELSE IF( upper ) THEN
522* the input matrix is M-by-N upper triangular (trapezoidal)
523 DO 2874 p = 1, n
524 aapp = zero
525 aaqq = one
526 CALL slassq( p, a( 1, p ), 1, aapp, aaqq )
527 IF( aapp.GT.big ) THEN
528 info = -6
529 CALL xerbla( 'SGESVJ', -info )
530 RETURN
531 END IF
532 aaqq = sqrt( aaqq )
533 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
534 sva( p ) = aapp*aaqq
535 ELSE
536 noscale = .false.
537 sva( p ) = aapp*( aaqq*skl )
538 IF( goscale ) THEN
539 goscale = .false.
540 DO 2873 q = 1, p - 1
541 sva( q ) = sva( q )*skl
542 2873 CONTINUE
543 END IF
544 END IF
545 2874 CONTINUE
546 ELSE
547* the input matrix is M-by-N general dense
548 DO 3874 p = 1, n
549 aapp = zero
550 aaqq = one
551 CALL slassq( m, a( 1, p ), 1, aapp, aaqq )
552 IF( aapp.GT.big ) THEN
553 info = -6
554 CALL xerbla( 'SGESVJ', -info )
555 RETURN
556 END IF
557 aaqq = sqrt( aaqq )
558 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
559 sva( p ) = aapp*aaqq
560 ELSE
561 noscale = .false.
562 sva( p ) = aapp*( aaqq*skl )
563 IF( goscale ) THEN
564 goscale = .false.
565 DO 3873 q = 1, p - 1
566 sva( q ) = sva( q )*skl
567 3873 CONTINUE
568 END IF
569 END IF
570 3874 CONTINUE
571 END IF
572*
573 IF( noscale )skl = one
574*
575* Move the smaller part of the spectrum from the underflow threshold
576*(!) Start by determining the position of the nonzero entries of the
577* array SVA() relative to ( SFMIN, BIG ).
578*
579 aapp = zero
580 aaqq = big
581 DO 4781 p = 1, n
582 IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
583 aapp = max( aapp, sva( p ) )
584 4781 CONTINUE
585*
586* #:) Quick return for zero matrix
587*
588 IF( aapp.EQ.zero ) THEN
589 IF( lsvec )CALL slaset( 'G', m, n, zero, one, a, lda )
590 work( 1 ) = one
591 work( 2 ) = zero
592 work( 3 ) = zero
593 work( 4 ) = zero
594 work( 5 ) = zero
595 work( 6 ) = zero
596 RETURN
597 END IF
598*
599* #:) Quick return for one-column matrix
600*
601 IF( n.EQ.1 ) THEN
602 IF( lsvec )CALL slascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
603 $ a( 1, 1 ), lda, ierr )
604 work( 1 ) = one / skl
605 IF( sva( 1 ).GE.sfmin ) THEN
606 work( 2 ) = one
607 ELSE
608 work( 2 ) = zero
609 END IF
610 work( 3 ) = zero
611 work( 4 ) = zero
612 work( 5 ) = zero
613 work( 6 ) = zero
614 RETURN
615 END IF
616*
617* Protect small singular values from underflow, and try to
618* avoid underflows/overflows in computing Jacobi rotations.
619*
620 sn = sqrt( sfmin / epsln )
621 temp1 = sqrt( big / float( n ) )
622 IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
623 $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
624 temp1 = min( big, temp1 / aapp )
625* AAQQ = AAQQ*TEMP1
626* AAPP = AAPP*TEMP1
627 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
628 temp1 = min( sn / aaqq, big / ( aapp*sqrt( float( n ) ) ) )
629* AAQQ = AAQQ*TEMP1
630* AAPP = AAPP*TEMP1
631 ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
632 temp1 = max( sn / aaqq, temp1 / aapp )
633* AAQQ = AAQQ*TEMP1
634* AAPP = AAPP*TEMP1
635 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
636 temp1 = min( sn / aaqq, big / ( sqrt( float( n ) )*aapp ) )
637* AAQQ = AAQQ*TEMP1
638* AAPP = AAPP*TEMP1
639 ELSE
640 temp1 = one
641 END IF
642*
643* Scale, if necessary
644*
645 IF( temp1.NE.one ) THEN
646 CALL slascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
647 END IF
648 skl = temp1*skl
649 IF( skl.NE.one ) THEN
650 CALL slascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
651 skl = one / skl
652 END IF
653*
654* Row-cyclic Jacobi SVD algorithm with column pivoting
655*
656 emptsw = ( n*( n-1 ) ) / 2
657 notrot = 0
658 fastr( 1 ) = zero
659*
660* A is represented in factored form A = A * diag(WORK), where diag(WORK)
661* is initialized to identity. WORK is updated during fast scaled
662* rotations.
663*
664 DO 1868 q = 1, n
665 work( q ) = one
666 1868 CONTINUE
667*
668*
669 swband = 3
670*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
671* if SGESVJ is used as a computational routine in the preconditioned
672* Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure
673* works on pivots inside a band-like region around the diagonal.
674* The boundaries are determined dynamically, based on the number of
675* pivots above a threshold.
676*
677 kbl = min( 8, n )
678*[TP] KBL is a tuning parameter that defines the tile size in the
679* tiling of the p-q loops of pivot pairs. In general, an optimal
680* value of KBL depends on the matrix dimensions and on the
681* parameters of the computer's memory.
682*
683 nbl = n / kbl
684 IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
685*
686 blskip = kbl**2
687*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
688*
689 rowskip = min( 5, kbl )
690*[TP] ROWSKIP is a tuning parameter.
691*
692 lkahead = 1
693*[TP] LKAHEAD is a tuning parameter.
694*
695* Quasi block transformations, using the lower (upper) triangular
696* structure of the input matrix. The quasi-block-cycling usually
697* invokes cubic convergence. Big part of this cycle is done inside
698* canonical subspaces of dimensions less than M.
699*
700 IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
701*[TP] The number of partition levels and the actual partition are
702* tuning parameters.
703 n4 = n / 4
704 n2 = n / 2
705 n34 = 3*n4
706 IF( applv ) THEN
707 q = 0
708 ELSE
709 q = 1
710 END IF
711*
712 IF( lower ) THEN
713*
714* This works very well on lower triangular matrices, in particular
715* in the framework of the preconditioned Jacobi SVD (xGEJSV).
716* The idea is simple:
717* [+ 0 0 0] Note that Jacobi transformations of [0 0]
718* [+ + 0 0] [0 0]
719* [+ + x 0] actually work on [x 0] [x 0]
720* [+ + x x] [x x]. [x x]
721*
722 CALL sgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
723 $ work( n34+1 ), sva( n34+1 ), mvl,
724 $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
725 $ 2, work( n+1 ), lwork-n, ierr )
726*
727 CALL sgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
728 $ work( n2+1 ), sva( n2+1 ), mvl,
729 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
730 $ work( n+1 ), lwork-n, ierr )
731*
732 CALL sgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
733 $ work( n2+1 ), sva( n2+1 ), mvl,
734 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
735 $ work( n+1 ), lwork-n, ierr )
736*
737 CALL sgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
738 $ work( n4+1 ), sva( n4+1 ), mvl,
739 $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
740 $ work( n+1 ), lwork-n, ierr )
741*
742 CALL sgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
743 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
744 $ ierr )
745*
746 CALL sgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
747 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
748 $ lwork-n, ierr )
749*
750*
751 ELSE IF( upper ) THEN
752*
753*
754 CALL sgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
755 $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
756 $ ierr )
757*
758 CALL sgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
759 $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
760 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
761 $ ierr )
762*
763 CALL sgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
764 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
765 $ lwork-n, ierr )
766*
767 CALL sgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
768 $ work( n2+1 ), sva( n2+1 ), mvl,
769 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
770 $ work( n+1 ), lwork-n, ierr )
771
772 END IF
773*
774 END IF
775*
776* .. Row-cyclic pivot strategy with de Rijk's pivoting ..
777*
778 DO 1993 i = 1, nsweep
779*
780* .. go go go ...
781*
782 mxaapq = zero
783 mxsinj = zero
784 iswrot = 0
785*
786 notrot = 0
787 pskipped = 0
788*
789* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
790* 1 <= p < q <= N. This is the first step toward a blocked implementation
791* of the rotations. New implementation, based on block transformations,
792* is under development.
793*
794 DO 2000 ibr = 1, nbl
795*
796 igl = ( ibr-1 )*kbl + 1
797*
798 DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
799*
800 igl = igl + ir1*kbl
801*
802 DO 2001 p = igl, min( igl+kbl-1, n-1 )
803*
804* .. de Rijk's pivoting
805*
806 q = isamax( n-p+1, sva( p ), 1 ) + p - 1
807 IF( p.NE.q ) THEN
808 CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
809 IF( rsvec )CALL sswap( mvl, v( 1, p ), 1,
810 $ v( 1, q ), 1 )
811 temp1 = sva( p )
812 sva( p ) = sva( q )
813 sva( q ) = temp1
814 temp1 = work( p )
815 work( p ) = work( q )
816 work( q ) = temp1
817 END IF
818*
819 IF( ir1.EQ.0 ) THEN
820*
821* Column norms are periodically updated by explicit
822* norm computation.
823* Caveat:
824* Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1)
825* as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
826* overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to
827* underflow for ||A(:,p)||_2 < SQRT(underflow_threshold).
828* Hence, SNRM2 cannot be trusted, not even in the case when
829* the true norm is far from the under(over)flow boundaries.
830* If properly implemented SNRM2 is available, the IF-THEN-ELSE
831* below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)".
832*
833 IF( ( sva( p ).LT.rootbig ) .AND.
834 $ ( sva( p ).GT.rootsfmin ) ) THEN
835 sva( p ) = snrm2( m, a( 1, p ), 1 )*work( p )
836 ELSE
837 temp1 = zero
838 aapp = one
839 CALL slassq( m, a( 1, p ), 1, temp1, aapp )
840 sva( p ) = temp1*sqrt( aapp )*work( p )
841 END IF
842 aapp = sva( p )
843 ELSE
844 aapp = sva( p )
845 END IF
846*
847 IF( aapp.GT.zero ) THEN
848*
849 pskipped = 0
850*
851 DO 2002 q = p + 1, min( igl+kbl-1, n )
852*
853 aaqq = sva( q )
854*
855 IF( aaqq.GT.zero ) THEN
856*
857 aapp0 = aapp
858 IF( aaqq.GE.one ) THEN
859 rotok = ( small*aapp ).LE.aaqq
860 IF( aapp.LT.( big / aaqq ) ) THEN
861 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
862 $ q ), 1 )*work( p )*work( q ) /
863 $ aaqq ) / aapp
864 ELSE
865 CALL scopy( m, a( 1, p ), 1,
866 $ work( n+1 ), 1 )
867 CALL slascl( 'G', 0, 0, aapp,
868 $ work( p ), m, 1,
869 $ work( n+1 ), lda, ierr )
870 aapq = sdot( m, work( n+1 ), 1,
871 $ a( 1, q ), 1 )*work( q ) / aaqq
872 END IF
873 ELSE
874 rotok = aapp.LE.( aaqq / small )
875 IF( aapp.GT.( small / aaqq ) ) THEN
876 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
877 $ q ), 1 )*work( p )*work( q ) /
878 $ aaqq ) / aapp
879 ELSE
880 CALL scopy( m, a( 1, q ), 1,
881 $ work( n+1 ), 1 )
882 CALL slascl( 'G', 0, 0, aaqq,
883 $ work( q ), m, 1,
884 $ work( n+1 ), lda, ierr )
885 aapq = sdot( m, work( n+1 ), 1,
886 $ a( 1, p ), 1 )*work( p ) / aapp
887 END IF
888 END IF
889*
890 mxaapq = max( mxaapq, abs( aapq ) )
891*
892* TO rotate or NOT to rotate, THAT is the question ...
893*
894 IF( abs( aapq ).GT.tol ) THEN
895*
896* .. rotate
897*[RTD] ROTATED = ROTATED + ONE
898*
899 IF( ir1.EQ.0 ) THEN
900 notrot = 0
901 pskipped = 0
902 iswrot = iswrot + 1
903 END IF
904*
905 IF( rotok ) THEN
906*
907 aqoap = aaqq / aapp
908 apoaq = aapp / aaqq
909 theta = -half*abs( aqoap-apoaq ) / aapq
910*
911 IF( abs( theta ).GT.bigtheta ) THEN
912*
913 t = half / theta
914 fastr( 3 ) = t*work( p ) / work( q )
915 fastr( 4 ) = -t*work( q ) /
916 $ work( p )
917 CALL srotm( m, a( 1, p ), 1,
918 $ a( 1, q ), 1, fastr )
919 IF( rsvec )CALL srotm( mvl,
920 $ v( 1, p ), 1,
921 $ v( 1, q ), 1,
922 $ fastr )
923 sva( q ) = aaqq*sqrt( max( zero,
924 $ one+t*apoaq*aapq ) )
925 aapp = aapp*sqrt( max( zero,
926 $ one-t*aqoap*aapq ) )
927 mxsinj = max( mxsinj, abs( t ) )
928*
929 ELSE
930*
931* .. choose correct signum for THETA and rotate
932*
933 thsign = -sign( one, aapq )
934 t = one / ( theta+thsign*
935 $ sqrt( one+theta*theta ) )
936 cs = sqrt( one / ( one+t*t ) )
937 sn = t*cs
938*
939 mxsinj = max( mxsinj, abs( sn ) )
940 sva( q ) = aaqq*sqrt( max( zero,
941 $ one+t*apoaq*aapq ) )
942 aapp = aapp*sqrt( max( zero,
943 $ one-t*aqoap*aapq ) )
944*
945 apoaq = work( p ) / work( q )
946 aqoap = work( q ) / work( p )
947 IF( work( p ).GE.one ) THEN
948 IF( work( q ).GE.one ) THEN
949 fastr( 3 ) = t*apoaq
950 fastr( 4 ) = -t*aqoap
951 work( p ) = work( p )*cs
952 work( q ) = work( q )*cs
953 CALL srotm( m, a( 1, p ), 1,
954 $ a( 1, q ), 1,
955 $ fastr )
956 IF( rsvec )CALL srotm( mvl,
957 $ v( 1, p ), 1, v( 1, q ),
958 $ 1, fastr )
959 ELSE
960 CALL saxpy( m, -t*aqoap,
961 $ a( 1, q ), 1,
962 $ a( 1, p ), 1 )
963 CALL saxpy( m, cs*sn*apoaq,
964 $ a( 1, p ), 1,
965 $ a( 1, q ), 1 )
966 work( p ) = work( p )*cs
967 work( q ) = work( q ) / cs
968 IF( rsvec ) THEN
969 CALL saxpy( mvl, -t*aqoap,
970 $ v( 1, q ), 1,
971 $ v( 1, p ), 1 )
972 CALL saxpy( mvl,
973 $ cs*sn*apoaq,
974 $ v( 1, p ), 1,
975 $ v( 1, q ), 1 )
976 END IF
977 END IF
978 ELSE
979 IF( work( q ).GE.one ) THEN
980 CALL saxpy( m, t*apoaq,
981 $ a( 1, p ), 1,
982 $ a( 1, q ), 1 )
983 CALL saxpy( m, -cs*sn*aqoap,
984 $ a( 1, q ), 1,
985 $ a( 1, p ), 1 )
986 work( p ) = work( p ) / cs
987 work( q ) = work( q )*cs
988 IF( rsvec ) THEN
989 CALL saxpy( mvl, t*apoaq,
990 $ v( 1, p ), 1,
991 $ v( 1, q ), 1 )
992 CALL saxpy( mvl,
993 $ -cs*sn*aqoap,
994 $ v( 1, q ), 1,
995 $ v( 1, p ), 1 )
996 END IF
997 ELSE
998 IF( work( p ).GE.work( q ) )
999 $ THEN
1000 CALL saxpy( m, -t*aqoap,
1001 $ a( 1, q ), 1,
1002 $ a( 1, p ), 1 )
1003 CALL saxpy( m, cs*sn*apoaq,
1004 $ a( 1, p ), 1,
1005 $ a( 1, q ), 1 )
1006 work( p ) = work( p )*cs
1007 work( q ) = work( q ) / cs
1008 IF( rsvec ) THEN
1009 CALL saxpy( mvl,
1010 $ -t*aqoap,
1011 $ v( 1, q ), 1,
1012 $ v( 1, p ), 1 )
1013 CALL saxpy( mvl,
1014 $ cs*sn*apoaq,
1015 $ v( 1, p ), 1,
1016 $ v( 1, q ), 1 )
1017 END IF
1018 ELSE
1019 CALL saxpy( m, t*apoaq,
1020 $ a( 1, p ), 1,
1021 $ a( 1, q ), 1 )
1022 CALL saxpy( m,
1023 $ -cs*sn*aqoap,
1024 $ a( 1, q ), 1,
1025 $ a( 1, p ), 1 )
1026 work( p ) = work( p ) / cs
1027 work( q ) = work( q )*cs
1028 IF( rsvec ) THEN
1029 CALL saxpy( mvl,
1030 $ t*apoaq, v( 1, p ),
1031 $ 1, v( 1, q ), 1 )
1032 CALL saxpy( mvl,
1033 $ -cs*sn*aqoap,
1034 $ v( 1, q ), 1,
1035 $ v( 1, p ), 1 )
1036 END IF
1037 END IF
1038 END IF
1039 END IF
1040 END IF
1041*
1042 ELSE
1043* .. have to use modified Gram-Schmidt like transformation
1044 CALL scopy( m, a( 1, p ), 1,
1045 $ work( n+1 ), 1 )
1046 CALL slascl( 'G', 0, 0, aapp, one, m,
1047 $ 1, work( n+1 ), lda,
1048 $ ierr )
1049 CALL slascl( 'G', 0, 0, aaqq, one, m,
1050 $ 1, a( 1, q ), lda, ierr )
1051 temp1 = -aapq*work( p ) / work( q )
1052 CALL saxpy( m, temp1, work( n+1 ), 1,
1053 $ a( 1, q ), 1 )
1054 CALL slascl( 'G', 0, 0, one, aaqq, m,
1055 $ 1, a( 1, q ), lda, ierr )
1056 sva( q ) = aaqq*sqrt( max( zero,
1057 $ one-aapq*aapq ) )
1058 mxsinj = max( mxsinj, sfmin )
1059 END IF
1060* END IF ROTOK THEN ... ELSE
1061*
1062* In the case of cancellation in updating SVA(q), SVA(p)
1063* recompute SVA(q), SVA(p).
1064*
1065 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1066 $ THEN
1067 IF( ( aaqq.LT.rootbig ) .AND.
1068 $ ( aaqq.GT.rootsfmin ) ) THEN
1069 sva( q ) = snrm2( m, a( 1, q ), 1 )*
1070 $ work( q )
1071 ELSE
1072 t = zero
1073 aaqq = one
1074 CALL slassq( m, a( 1, q ), 1, t,
1075 $ aaqq )
1076 sva( q ) = t*sqrt( aaqq )*work( q )
1077 END IF
1078 END IF
1079 IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1080 IF( ( aapp.LT.rootbig ) .AND.
1081 $ ( aapp.GT.rootsfmin ) ) THEN
1082 aapp = snrm2( m, a( 1, p ), 1 )*
1083 $ work( p )
1084 ELSE
1085 t = zero
1086 aapp = one
1087 CALL slassq( m, a( 1, p ), 1, t,
1088 $ aapp )
1089 aapp = t*sqrt( aapp )*work( p )
1090 END IF
1091 sva( p ) = aapp
1092 END IF
1093*
1094 ELSE
1095* A(:,p) and A(:,q) already numerically orthogonal
1096 IF( ir1.EQ.0 )notrot = notrot + 1
1097*[RTD] SKIPPED = SKIPPED + 1
1098 pskipped = pskipped + 1
1099 END IF
1100 ELSE
1101* A(:,q) is zero column
1102 IF( ir1.EQ.0 )notrot = notrot + 1
1103 pskipped = pskipped + 1
1104 END IF
1105*
1106 IF( ( i.LE.swband ) .AND.
1107 $ ( pskipped.GT.rowskip ) ) THEN
1108 IF( ir1.EQ.0 )aapp = -aapp
1109 notrot = 0
1110 GO TO 2103
1111 END IF
1112*
1113 2002 CONTINUE
1114* END q-LOOP
1115*
1116 2103 CONTINUE
1117* bailed out of q-loop
1118*
1119 sva( p ) = aapp
1120*
1121 ELSE
1122 sva( p ) = aapp
1123 IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1124 $ notrot = notrot + min( igl+kbl-1, n ) - p
1125 END IF
1126*
1127 2001 CONTINUE
1128* end of the p-loop
1129* end of doing the block ( ibr, ibr )
1130 1002 CONTINUE
1131* end of ir1-loop
1132*
1133* ... go to the off diagonal blocks
1134*
1135 igl = ( ibr-1 )*kbl + 1
1136*
1137 DO 2010 jbc = ibr + 1, nbl
1138*
1139 jgl = ( jbc-1 )*kbl + 1
1140*
1141* doing the block at ( ibr, jbc )
1142*
1143 ijblsk = 0
1144 DO 2100 p = igl, min( igl+kbl-1, n )
1145*
1146 aapp = sva( p )
1147 IF( aapp.GT.zero ) THEN
1148*
1149 pskipped = 0
1150*
1151 DO 2200 q = jgl, min( jgl+kbl-1, n )
1152*
1153 aaqq = sva( q )
1154 IF( aaqq.GT.zero ) THEN
1155 aapp0 = aapp
1156*
1157* .. M x 2 Jacobi SVD ..
1158*
1159* Safe Gram matrix computation
1160*
1161 IF( aaqq.GE.one ) THEN
1162 IF( aapp.GE.aaqq ) THEN
1163 rotok = ( small*aapp ).LE.aaqq
1164 ELSE
1165 rotok = ( small*aaqq ).LE.aapp
1166 END IF
1167 IF( aapp.LT.( big / aaqq ) ) THEN
1168 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1169 $ q ), 1 )*work( p )*work( q ) /
1170 $ aaqq ) / aapp
1171 ELSE
1172 CALL scopy( m, a( 1, p ), 1,
1173 $ work( n+1 ), 1 )
1174 CALL slascl( 'G', 0, 0, aapp,
1175 $ work( p ), m, 1,
1176 $ work( n+1 ), lda, ierr )
1177 aapq = sdot( m, work( n+1 ), 1,
1178 $ a( 1, q ), 1 )*work( q ) / aaqq
1179 END IF
1180 ELSE
1181 IF( aapp.GE.aaqq ) THEN
1182 rotok = aapp.LE.( aaqq / small )
1183 ELSE
1184 rotok = aaqq.LE.( aapp / small )
1185 END IF
1186 IF( aapp.GT.( small / aaqq ) ) THEN
1187 aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1188 $ q ), 1 )*work( p )*work( q ) /
1189 $ aaqq ) / aapp
1190 ELSE
1191 CALL scopy( m, a( 1, q ), 1,
1192 $ work( n+1 ), 1 )
1193 CALL slascl( 'G', 0, 0, aaqq,
1194 $ work( q ), m, 1,
1195 $ work( n+1 ), lda, ierr )
1196 aapq = sdot( m, work( n+1 ), 1,
1197 $ a( 1, p ), 1 )*work( p ) / aapp
1198 END IF
1199 END IF
1200*
1201 mxaapq = max( mxaapq, abs( aapq ) )
1202*
1203* TO rotate or NOT to rotate, THAT is the question ...
1204*
1205 IF( abs( aapq ).GT.tol ) THEN
1206 notrot = 0
1207*[RTD] ROTATED = ROTATED + 1
1208 pskipped = 0
1209 iswrot = iswrot + 1
1210*
1211 IF( rotok ) THEN
1212*
1213 aqoap = aaqq / aapp
1214 apoaq = aapp / aaqq
1215 theta = -half*abs( aqoap-apoaq ) / aapq
1216 IF( aaqq.GT.aapp0 )theta = -theta
1217*
1218 IF( abs( theta ).GT.bigtheta ) THEN
1219 t = half / theta
1220 fastr( 3 ) = t*work( p ) / work( q )
1221 fastr( 4 ) = -t*work( q ) /
1222 $ work( p )
1223 CALL srotm( m, a( 1, p ), 1,
1224 $ a( 1, q ), 1, fastr )
1225 IF( rsvec )CALL srotm( mvl,
1226 $ v( 1, p ), 1,
1227 $ v( 1, q ), 1,
1228 $ fastr )
1229 sva( q ) = aaqq*sqrt( max( zero,
1230 $ one+t*apoaq*aapq ) )
1231 aapp = aapp*sqrt( max( zero,
1232 $ one-t*aqoap*aapq ) )
1233 mxsinj = max( mxsinj, abs( t ) )
1234 ELSE
1235*
1236* .. choose correct signum for THETA and rotate
1237*
1238 thsign = -sign( one, aapq )
1239 IF( aaqq.GT.aapp0 )thsign = -thsign
1240 t = one / ( theta+thsign*
1241 $ sqrt( one+theta*theta ) )
1242 cs = sqrt( one / ( one+t*t ) )
1243 sn = t*cs
1244 mxsinj = max( mxsinj, abs( sn ) )
1245 sva( q ) = aaqq*sqrt( max( zero,
1246 $ one+t*apoaq*aapq ) )
1247 aapp = aapp*sqrt( max( zero,
1248 $ one-t*aqoap*aapq ) )
1249*
1250 apoaq = work( p ) / work( q )
1251 aqoap = work( q ) / work( p )
1252 IF( work( p ).GE.one ) THEN
1253*
1254 IF( work( q ).GE.one ) THEN
1255 fastr( 3 ) = t*apoaq
1256 fastr( 4 ) = -t*aqoap
1257 work( p ) = work( p )*cs
1258 work( q ) = work( q )*cs
1259 CALL srotm( m, a( 1, p ), 1,
1260 $ a( 1, q ), 1,
1261 $ fastr )
1262 IF( rsvec )CALL srotm( mvl,
1263 $ v( 1, p ), 1, v( 1, q ),
1264 $ 1, fastr )
1265 ELSE
1266 CALL saxpy( m, -t*aqoap,
1267 $ a( 1, q ), 1,
1268 $ a( 1, p ), 1 )
1269 CALL saxpy( m, cs*sn*apoaq,
1270 $ a( 1, p ), 1,
1271 $ a( 1, q ), 1 )
1272 IF( rsvec ) THEN
1273 CALL saxpy( mvl, -t*aqoap,
1274 $ v( 1, q ), 1,
1275 $ v( 1, p ), 1 )
1276 CALL saxpy( mvl,
1277 $ cs*sn*apoaq,
1278 $ v( 1, p ), 1,
1279 $ v( 1, q ), 1 )
1280 END IF
1281 work( p ) = work( p )*cs
1282 work( q ) = work( q ) / cs
1283 END IF
1284 ELSE
1285 IF( work( q ).GE.one ) THEN
1286 CALL saxpy( m, t*apoaq,
1287 $ a( 1, p ), 1,
1288 $ a( 1, q ), 1 )
1289 CALL saxpy( m, -cs*sn*aqoap,
1290 $ a( 1, q ), 1,
1291 $ a( 1, p ), 1 )
1292 IF( rsvec ) THEN
1293 CALL saxpy( mvl, t*apoaq,
1294 $ v( 1, p ), 1,
1295 $ v( 1, q ), 1 )
1296 CALL saxpy( mvl,
1297 $ -cs*sn*aqoap,
1298 $ v( 1, q ), 1,
1299 $ v( 1, p ), 1 )
1300 END IF
1301 work( p ) = work( p ) / cs
1302 work( q ) = work( q )*cs
1303 ELSE
1304 IF( work( p ).GE.work( q ) )
1305 $ THEN
1306 CALL saxpy( m, -t*aqoap,
1307 $ a( 1, q ), 1,
1308 $ a( 1, p ), 1 )
1309 CALL saxpy( m, cs*sn*apoaq,
1310 $ a( 1, p ), 1,
1311 $ a( 1, q ), 1 )
1312 work( p ) = work( p )*cs
1313 work( q ) = work( q ) / cs
1314 IF( rsvec ) THEN
1315 CALL saxpy( mvl,
1316 $ -t*aqoap,
1317 $ v( 1, q ), 1,
1318 $ v( 1, p ), 1 )
1319 CALL saxpy( mvl,
1320 $ cs*sn*apoaq,
1321 $ v( 1, p ), 1,
1322 $ v( 1, q ), 1 )
1323 END IF
1324 ELSE
1325 CALL saxpy( m, t*apoaq,
1326 $ a( 1, p ), 1,
1327 $ a( 1, q ), 1 )
1328 CALL saxpy( m,
1329 $ -cs*sn*aqoap,
1330 $ a( 1, q ), 1,
1331 $ a( 1, p ), 1 )
1332 work( p ) = work( p ) / cs
1333 work( q ) = work( q )*cs
1334 IF( rsvec ) THEN
1335 CALL saxpy( mvl,
1336 $ t*apoaq, v( 1, p ),
1337 $ 1, v( 1, q ), 1 )
1338 CALL saxpy( mvl,
1339 $ -cs*sn*aqoap,
1340 $ v( 1, q ), 1,
1341 $ v( 1, p ), 1 )
1342 END IF
1343 END IF
1344 END IF
1345 END IF
1346 END IF
1347*
1348 ELSE
1349 IF( aapp.GT.aaqq ) THEN
1350 CALL scopy( m, a( 1, p ), 1,
1351 $ work( n+1 ), 1 )
1352 CALL slascl( 'G', 0, 0, aapp, one,
1353 $ m, 1, work( n+1 ), lda,
1354 $ ierr )
1355 CALL slascl( 'G', 0, 0, aaqq, one,
1356 $ m, 1, a( 1, q ), lda,
1357 $ ierr )
1358 temp1 = -aapq*work( p ) / work( q )
1359 CALL saxpy( m, temp1, work( n+1 ),
1360 $ 1, a( 1, q ), 1 )
1361 CALL slascl( 'G', 0, 0, one, aaqq,
1362 $ m, 1, a( 1, q ), lda,
1363 $ ierr )
1364 sva( q ) = aaqq*sqrt( max( zero,
1365 $ one-aapq*aapq ) )
1366 mxsinj = max( mxsinj, sfmin )
1367 ELSE
1368 CALL scopy( m, a( 1, q ), 1,
1369 $ work( n+1 ), 1 )
1370 CALL slascl( 'G', 0, 0, aaqq, one,
1371 $ m, 1, work( n+1 ), lda,
1372 $ ierr )
1373 CALL slascl( 'G', 0, 0, aapp, one,
1374 $ m, 1, a( 1, p ), lda,
1375 $ ierr )
1376 temp1 = -aapq*work( q ) / work( p )
1377 CALL saxpy( m, temp1, work( n+1 ),
1378 $ 1, a( 1, p ), 1 )
1379 CALL slascl( 'G', 0, 0, one, aapp,
1380 $ m, 1, a( 1, p ), lda,
1381 $ ierr )
1382 sva( p ) = aapp*sqrt( max( zero,
1383 $ one-aapq*aapq ) )
1384 mxsinj = max( mxsinj, sfmin )
1385 END IF
1386 END IF
1387* END IF ROTOK THEN ... ELSE
1388*
1389* In the case of cancellation in updating SVA(q)
1390* .. recompute SVA(q)
1391 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1392 $ THEN
1393 IF( ( aaqq.LT.rootbig ) .AND.
1394 $ ( aaqq.GT.rootsfmin ) ) THEN
1395 sva( q ) = snrm2( m, a( 1, q ), 1 )*
1396 $ work( q )
1397 ELSE
1398 t = zero
1399 aaqq = one
1400 CALL slassq( m, a( 1, q ), 1, t,
1401 $ aaqq )
1402 sva( q ) = t*sqrt( aaqq )*work( q )
1403 END IF
1404 END IF
1405 IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1406 IF( ( aapp.LT.rootbig ) .AND.
1407 $ ( aapp.GT.rootsfmin ) ) THEN
1408 aapp = snrm2( m, a( 1, p ), 1 )*
1409 $ work( p )
1410 ELSE
1411 t = zero
1412 aapp = one
1413 CALL slassq( m, a( 1, p ), 1, t,
1414 $ aapp )
1415 aapp = t*sqrt( aapp )*work( p )
1416 END IF
1417 sva( p ) = aapp
1418 END IF
1419* end of OK rotation
1420 ELSE
1421 notrot = notrot + 1
1422*[RTD] SKIPPED = SKIPPED + 1
1423 pskipped = pskipped + 1
1424 ijblsk = ijblsk + 1
1425 END IF
1426 ELSE
1427 notrot = notrot + 1
1428 pskipped = pskipped + 1
1429 ijblsk = ijblsk + 1
1430 END IF
1431*
1432 IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1433 $ THEN
1434 sva( p ) = aapp
1435 notrot = 0
1436 GO TO 2011
1437 END IF
1438 IF( ( i.LE.swband ) .AND.
1439 $ ( pskipped.GT.rowskip ) ) THEN
1440 aapp = -aapp
1441 notrot = 0
1442 GO TO 2203
1443 END IF
1444*
1445 2200 CONTINUE
1446* end of the q-loop
1447 2203 CONTINUE
1448*
1449 sva( p ) = aapp
1450*
1451 ELSE
1452*
1453 IF( aapp.EQ.zero )notrot = notrot +
1454 $ min( jgl+kbl-1, n ) - jgl + 1
1455 IF( aapp.LT.zero )notrot = 0
1456*
1457 END IF
1458*
1459 2100 CONTINUE
1460* end of the p-loop
1461 2010 CONTINUE
1462* end of the jbc-loop
1463 2011 CONTINUE
1464*2011 bailed out of the jbc-loop
1465 DO 2012 p = igl, min( igl+kbl-1, n )
1466 sva( p ) = abs( sva( p ) )
1467 2012 CONTINUE
1468***
1469 2000 CONTINUE
1470*2000 :: end of the ibr-loop
1471*
1472* .. update SVA(N)
1473 IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1474 $ THEN
1475 sva( n ) = snrm2( m, a( 1, n ), 1 )*work( n )
1476 ELSE
1477 t = zero
1478 aapp = one
1479 CALL slassq( m, a( 1, n ), 1, t, aapp )
1480 sva( n ) = t*sqrt( aapp )*work( n )
1481 END IF
1482*
1483* Additional steering devices
1484*
1485 IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1486 $ ( iswrot.LE.n ) ) )swband = i
1487*
1488 IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.sqrt( float( n ) )*
1489 $ tol ) .AND. ( float( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1490 GO TO 1994
1491 END IF
1492*
1493 IF( notrot.GE.emptsw )GO TO 1994
1494*
1495 1993 CONTINUE
1496* end i=1:NSWEEP loop
1497*
1498* #:( Reaching this point means that the procedure has not converged.
1499 info = nsweep - 1
1500 GO TO 1995
1501*
1502 1994 CONTINUE
1503* #:) Reaching this point means numerical convergence after the i-th
1504* sweep.
1505*
1506 info = 0
1507* #:) INFO = 0 confirms successful iterations.
1508 1995 CONTINUE
1509*
1510* Sort the singular values and find how many are above
1511* the underflow threshold.
1512*
1513 n2 = 0
1514 n4 = 0
1515 DO 5991 p = 1, n - 1
1516 q = isamax( n-p+1, sva( p ), 1 ) + p - 1
1517 IF( p.NE.q ) THEN
1518 temp1 = sva( p )
1519 sva( p ) = sva( q )
1520 sva( q ) = temp1
1521 temp1 = work( p )
1522 work( p ) = work( q )
1523 work( q ) = temp1
1524 CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1525 IF( rsvec )CALL sswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1526 END IF
1527 IF( sva( p ).NE.zero ) THEN
1528 n4 = n4 + 1
1529 IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1530 END IF
1531 5991 CONTINUE
1532 IF( sva( n ).NE.zero ) THEN
1533 n4 = n4 + 1
1534 IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1535 END IF
1536*
1537* Normalize the left singular vectors.
1538*
1539 IF( lsvec .OR. uctol ) THEN
1540 DO 1998 p = 1, n2
1541 CALL sscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1542 1998 CONTINUE
1543 END IF
1544*
1545* Scale the product of Jacobi rotations (assemble the fast rotations).
1546*
1547 IF( rsvec ) THEN
1548 IF( applv ) THEN
1549 DO 2398 p = 1, n
1550 CALL sscal( mvl, work( p ), v( 1, p ), 1 )
1551 2398 CONTINUE
1552 ELSE
1553 DO 2399 p = 1, n
1554 temp1 = one / snrm2( mvl, v( 1, p ), 1 )
1555 CALL sscal( mvl, temp1, v( 1, p ), 1 )
1556 2399 CONTINUE
1557 END IF
1558 END IF
1559*
1560* Undo scaling, if necessary (and possible).
1561 IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl ) ) )
1562 $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1563 $ ( sfmin / skl ) ) ) ) THEN
1564 DO 2400 p = 1, n
1565 sva( p ) = skl*sva( p )
1566 2400 CONTINUE
1567 skl = one
1568 END IF
1569*
1570 work( 1 ) = skl
1571* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1572* then some of the singular values may overflow or underflow and
1573* the spectrum is given in this factored representation.
1574*
1575 work( 2 ) = float( n4 )
1576* N4 is the number of computed nonzero singular values of A.
1577*
1578 work( 3 ) = float( n2 )
1579* N2 is the number of singular values of A greater than SFMIN.
1580* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1581* that may carry some information.
1582*
1583 work( 4 ) = float( i )
1584* i is the index of the last sweep before declaring convergence.
1585*
1586 work( 5 ) = mxaapq
1587* MXAAPQ is the largest absolute value of scaled pivots in the
1588* last sweep
1589*
1590 work( 6 ) = mxsinj
1591* MXSINJ is the largest absolute value of the sines of Jacobi angles
1592* in the last sweep
1593*
1594 RETURN
1595* ..
1596* .. END OF SGESVJ
1597* ..
1598 END
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine saxpy(n, sa, sx, incx, sy, incy)
SAXPY
Definition saxpy.f:89
subroutine scopy(n, sx, incx, sy, incy)
SCOPY
Definition scopy.f:82
subroutine sgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
SGESVJ
Definition sgesvj.f:323
subroutine sgsvj0(jobv, m, n, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
SGSVJ0 pre-processor for the routine sgesvj.
Definition sgsvj0.f:218
subroutine sgsvj1(jobv, m, n, n1, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
SGSVJ1 pre-processor for the routine sgesvj, applies Jacobi rotations targeting only particular pivot...
Definition sgsvj1.f:236
subroutine slascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition slascl.f:143
subroutine slaset(uplo, m, n, alpha, beta, a, lda)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition slaset.f:110
subroutine slassq(n, x, incx, scale, sumsq)
SLASSQ updates a sum of squares represented in scaled form.
Definition slassq.f90:124
subroutine srotm(n, sx, incx, sy, incy, sparam)
SROTM
Definition srotm.f:97
subroutine sscal(n, sa, sx, incx)
SSCAL
Definition sscal.f:79
subroutine sswap(n, sx, incx, sy, incy)
SSWAP
Definition sswap.f:82