LAPACK  3.10.1
LAPACK: Linear Algebra PACKage
dgesvj.f
Go to the documentation of this file.
1 *> \brief \b DGESVJ
2 *
3 * =========== DOCUMENTATION ===========
4 *
5 * Online html documentation available at
6 * http://www.netlib.org/lapack/explore-html/
7 *
8 *> \htmlonly
9 *> Download DGESVJ + dependencies
10 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgesvj.f">
11 *> [TGZ]</a>
12 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgesvj.f">
13 *> [ZIP]</a>
14 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgesvj.f">
15 *> [TXT]</a>
16 *> \endhtmlonly
17 *
18 * Definition:
19 * ===========
20 *
21 * SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
22 * LDV, WORK, LWORK, INFO )
23 *
24 * .. Scalar Arguments ..
25 * INTEGER INFO, LDA, LDV, LWORK, M, MV, N
26 * CHARACTER*1 JOBA, JOBU, JOBV
27 * ..
28 * .. Array Arguments ..
29 * DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
30 * $ WORK( LWORK )
31 * ..
32 *
33 *
34 *> \par Purpose:
35 * =============
36 *>
37 *> \verbatim
38 *>
39 *> DGESVJ computes the singular value decomposition (SVD) of a real
40 *> M-by-N matrix A, where M >= N. The SVD of A is written as
41 *> [++] [xx] [x0] [xx]
42 *> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
43 *> [++] [xx]
44 *> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
45 *> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
46 *> of SIGMA are the singular values of A. The columns of U and V are the
47 *> left and the right singular vectors of A, respectively.
48 *> DGESVJ can sometimes compute tiny singular values and their singular vectors much
49 *> more accurately than other SVD routines, see below under Further Details.
50 *> \endverbatim
51 *
52 * Arguments:
53 * ==========
54 *
55 *> \param[in] JOBA
56 *> \verbatim
57 *> JOBA is CHARACTER*1
58 *> Specifies the structure of A.
59 *> = 'L': The input matrix A is lower triangular;
60 *> = 'U': The input matrix A is upper triangular;
61 *> = 'G': The input matrix A is general M-by-N matrix, M >= N.
62 *> \endverbatim
63 *>
64 *> \param[in] JOBU
65 *> \verbatim
66 *> JOBU is CHARACTER*1
67 *> Specifies whether to compute the left singular vectors
68 *> (columns of U):
69 *> = 'U': The left singular vectors corresponding to the nonzero
70 *> singular values are computed and returned in the leading
71 *> columns of A. See more details in the description of A.
72 *> The default numerical orthogonality threshold is set to
73 *> approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
74 *> = 'C': Analogous to JOBU='U', except that user can control the
75 *> level of numerical orthogonality of the computed left
76 *> singular vectors. TOL can be set to TOL = CTOL*EPS, where
77 *> CTOL is given on input in the array WORK.
78 *> No CTOL smaller than ONE is allowed. CTOL greater
79 *> than 1 / EPS is meaningless. The option 'C'
80 *> can be used if M*EPS is satisfactory orthogonality
81 *> of the computed left singular vectors, so CTOL=M could
82 *> save few sweeps of Jacobi rotations.
83 *> See the descriptions of A and WORK(1).
84 *> = 'N': The matrix U is not computed. However, see the
85 *> description of A.
86 *> \endverbatim
87 *>
88 *> \param[in] JOBV
89 *> \verbatim
90 *> JOBV is CHARACTER*1
91 *> Specifies whether to compute the right singular vectors, that
92 *> is, the matrix V:
93 *> = 'V': the matrix V is computed and returned in the array V
94 *> = 'A': the Jacobi rotations are applied to the MV-by-N
95 *> array V. In other words, the right singular vector
96 *> matrix V is not computed explicitly, instead it is
97 *> applied to an MV-by-N matrix initially stored in the
98 *> first MV rows of V.
99 *> = 'N': the matrix V is not computed and the array V is not
100 *> referenced
101 *> \endverbatim
102 *>
103 *> \param[in] M
104 *> \verbatim
105 *> M is INTEGER
106 *> The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
107 *> \endverbatim
108 *>
109 *> \param[in] N
110 *> \verbatim
111 *> N is INTEGER
112 *> The number of columns of the input matrix A.
113 *> M >= N >= 0.
114 *> \endverbatim
115 *>
116 *> \param[in,out] A
117 *> \verbatim
118 *> A is DOUBLE PRECISION array, dimension (LDA,N)
119 *> On entry, the M-by-N matrix A.
120 *> On exit :
121 *> If JOBU = 'U' .OR. JOBU = 'C' :
122 *> If INFO = 0 :
123 *> RANKA orthonormal columns of U are returned in the
124 *> leading RANKA columns of the array A. Here RANKA <= N
125 *> is the number of computed singular values of A that are
126 *> above the underflow threshold DLAMCH('S'). The singular
127 *> vectors corresponding to underflowed or zero singular
128 *> values are not computed. The value of RANKA is returned
129 *> in the array WORK as RANKA=NINT(WORK(2)). Also see the
130 *> descriptions of SVA and WORK. The computed columns of U
131 *> are mutually numerically orthogonal up to approximately
132 *> TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
133 *> see the description of JOBU.
134 *> If INFO > 0 :
135 *> the procedure DGESVJ did not converge in the given number
136 *> of iterations (sweeps). In that case, the computed
137 *> columns of U may not be orthogonal up to TOL. The output
138 *> U (stored in A), SIGMA (given by the computed singular
139 *> values in SVA(1:N)) and V is still a decomposition of the
140 *> input matrix A in the sense that the residual
141 *> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
142 *>
143 *> If JOBU = 'N' :
144 *> If INFO = 0 :
145 *> Note that the left singular vectors are 'for free' in the
146 *> one-sided Jacobi SVD algorithm. However, if only the
147 *> singular values are needed, the level of numerical
148 *> orthogonality of U is not an issue and iterations are
149 *> stopped when the columns of the iterated matrix are
150 *> numerically orthogonal up to approximately M*EPS. Thus,
151 *> on exit, A contains the columns of U scaled with the
152 *> corresponding singular values.
153 *> If INFO > 0 :
154 *> the procedure DGESVJ did not converge in the given number
155 *> of iterations (sweeps).
156 *> \endverbatim
157 *>
158 *> \param[in] LDA
159 *> \verbatim
160 *> LDA is INTEGER
161 *> The leading dimension of the array A. LDA >= max(1,M).
162 *> \endverbatim
163 *>
164 *> \param[out] SVA
165 *> \verbatim
166 *> SVA is DOUBLE PRECISION array, dimension (N)
167 *> On exit :
168 *> If INFO = 0 :
169 *> depending on the value SCALE = WORK(1), we have:
170 *> If SCALE = ONE :
171 *> SVA(1:N) contains the computed singular values of A.
172 *> During the computation SVA contains the Euclidean column
173 *> norms of the iterated matrices in the array A.
174 *> If SCALE .NE. ONE :
175 *> The singular values of A are SCALE*SVA(1:N), and this
176 *> factored representation is due to the fact that some of the
177 *> singular values of A might underflow or overflow.
178 *> If INFO > 0 :
179 *> the procedure DGESVJ did not converge in the given number of
180 *> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
181 *> \endverbatim
182 *>
183 *> \param[in] MV
184 *> \verbatim
185 *> MV is INTEGER
186 *> If JOBV = 'A', then the product of Jacobi rotations in DGESVJ
187 *> is applied to the first MV rows of V. See the description of JOBV.
188 *> \endverbatim
189 *>
190 *> \param[in,out] V
191 *> \verbatim
192 *> V is DOUBLE PRECISION array, dimension (LDV,N)
193 *> If JOBV = 'V', then V contains on exit the N-by-N matrix of
194 *> the right singular vectors;
195 *> If JOBV = 'A', then V contains the product of the computed right
196 *> singular vector matrix and the initial matrix in
197 *> the array V.
198 *> If JOBV = 'N', then V is not referenced.
199 *> \endverbatim
200 *>
201 *> \param[in] LDV
202 *> \verbatim
203 *> LDV is INTEGER
204 *> The leading dimension of the array V, LDV >= 1.
205 *> If JOBV = 'V', then LDV >= max(1,N).
206 *> If JOBV = 'A', then LDV >= max(1,MV) .
207 *> \endverbatim
208 *>
209 *> \param[in,out] WORK
210 *> \verbatim
211 *> WORK is DOUBLE PRECISION array, dimension (LWORK)
212 *> On entry :
213 *> If JOBU = 'C' :
214 *> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
215 *> The process stops if all columns of A are mutually
216 *> orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
217 *> It is required that CTOL >= ONE, i.e. it is not
218 *> allowed to force the routine to obtain orthogonality
219 *> below EPS.
220 *> On exit :
221 *> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
222 *> are the computed singular values of A.
223 *> (See description of SVA().)
224 *> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
225 *> singular values.
226 *> WORK(3) = NINT(WORK(3)) is the number of the computed singular
227 *> values that are larger than the underflow threshold.
228 *> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
229 *> rotations needed for numerical convergence.
230 *> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
231 *> This is useful information in cases when DGESVJ did
232 *> not converge, as it can be used to estimate whether
233 *> the output is still useful and for post festum analysis.
234 *> WORK(6) = the largest absolute value over all sines of the
235 *> Jacobi rotation angles in the last sweep. It can be
236 *> useful for a post festum analysis.
237 *> \endverbatim
238 *>
239 *> \param[in] LWORK
240 *> \verbatim
241 *> LWORK is INTEGER
242 *> length of WORK, WORK >= MAX(6,M+N)
243 *> \endverbatim
244 *>
245 *> \param[out] INFO
246 *> \verbatim
247 *> INFO is INTEGER
248 *> = 0: successful exit.
249 *> < 0: if INFO = -i, then the i-th argument had an illegal value
250 *> > 0: DGESVJ did not converge in the maximal allowed number (30)
251 *> of sweeps. The output may still be useful. See the
252 *> description of WORK.
253 *> \endverbatim
254 *
255 * Authors:
256 * ========
257 *
258 *> \author Univ. of Tennessee
259 *> \author Univ. of California Berkeley
260 *> \author Univ. of Colorado Denver
261 *> \author NAG Ltd.
262 *
263 *> \ingroup doubleGEcomputational
264 *
265 *> \par Further Details:
266 * =====================
267 *>
268 *> \verbatim
269 *>
270 *> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
271 *> rotations. The rotations are implemented as fast scaled rotations of
272 *> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
273 *> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
274 *> column interchanges of de Rijk [2]. The relative accuracy of the computed
275 *> singular values and the accuracy of the computed singular vectors (in
276 *> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
277 *> The condition number that determines the accuracy in the full rank case
278 *> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
279 *> spectral condition number. The best performance of this Jacobi SVD
280 *> procedure is achieved if used in an accelerated version of Drmac and
281 *> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
282 *> Some tuning parameters (marked with [TP]) are available for the
283 *> implementer.
284 *> The computational range for the nonzero singular values is the machine
285 *> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
286 *> denormalized singular values can be computed with the corresponding
287 *> gradual loss of accurate digits.
288 *> \endverbatim
289 *
290 *> \par Contributors:
291 * ==================
292 *>
293 *> \verbatim
294 *>
295 *> ============
296 *>
297 *> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
298 *> \endverbatim
299 *
300 *> \par References:
301 * ================
302 *>
303 *> \verbatim
304 *>
305 *> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
306 *> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
307 *> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
308 *> singular value decomposition on a vector computer.
309 *> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
310 *> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
311 *> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
312 *> value computation in floating point arithmetic.
313 *> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
314 *> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
315 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
316 *> LAPACK Working note 169.
317 *> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
318 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
319 *> LAPACK Working note 170.
320 *> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
321 *> QSVD, (H,K)-SVD computations.
322 *> Department of Mathematics, University of Zagreb, 2008.
323 *> \endverbatim
324 *
325 *> \par Bugs, examples and comments:
326 * =================================
327 *>
328 *> \verbatim
329 *> ===========================
330 *> Please report all bugs and send interesting test examples and comments to
331 *> drmac@math.hr. Thank you.
332 *> \endverbatim
333 *>
334 * =====================================================================
335  SUBROUTINE dgesvj( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
336  $ LDV, WORK, LWORK, INFO )
337 *
338 * -- LAPACK computational routine --
339 * -- LAPACK is a software package provided by Univ. of Tennessee, --
340 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
341 *
342 * .. Scalar Arguments ..
343  INTEGER INFO, LDA, LDV, LWORK, M, MV, N
344  CHARACTER*1 JOBA, JOBU, JOBV
345 * ..
346 * .. Array Arguments ..
347  DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
348  $ work( lwork )
349 * ..
350 *
351 * =====================================================================
352 *
353 * .. Local Parameters ..
354  DOUBLE PRECISION ZERO, HALF, ONE
355  parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
356  INTEGER NSWEEP
357  parameter( nsweep = 30 )
358 * ..
359 * .. Local Scalars ..
360  DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
361  $ bigtheta, cs, ctol, epsln, large, mxaapq,
362  $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
363  $ skl, sfmin, small, sn, t, temp1, theta,
364  $ thsign, tol
365  INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
366  $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
367  $ n4, nbl, notrot, p, pskipped, q, rowskip,
368  $ swband
369  LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
370  $ rsvec, uctol, upper
371 * ..
372 * .. Local Arrays ..
373  DOUBLE PRECISION FASTR( 5 )
374 * ..
375 * .. Intrinsic Functions ..
376  INTRINSIC dabs, max, min, dble, dsign, dsqrt
377 * ..
378 * .. External Functions ..
379 * ..
380 * from BLAS
381  DOUBLE PRECISION DDOT, DNRM2
382  EXTERNAL ddot, dnrm2
383  INTEGER IDAMAX
384  EXTERNAL idamax
385 * from LAPACK
386  DOUBLE PRECISION DLAMCH
387  EXTERNAL dlamch
388  LOGICAL LSAME
389  EXTERNAL lsame
390 * ..
391 * .. External Subroutines ..
392 * ..
393 * from BLAS
394  EXTERNAL daxpy, dcopy, drotm, dscal, dswap
395 * from LAPACK
396  EXTERNAL dlascl, dlaset, dlassq, xerbla
397 *
398  EXTERNAL dgsvj0, dgsvj1
399 * ..
400 * .. Executable Statements ..
401 *
402 * Test the input arguments
403 *
404  lsvec = lsame( jobu, 'U' )
405  uctol = lsame( jobu, 'C' )
406  rsvec = lsame( jobv, 'V' )
407  applv = lsame( jobv, 'A' )
408  upper = lsame( joba, 'U' )
409  lower = lsame( joba, 'L' )
410 *
411  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
412  info = -1
413  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
414  info = -2
415  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
416  info = -3
417  ELSE IF( m.LT.0 ) THEN
418  info = -4
419  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
420  info = -5
421  ELSE IF( lda.LT.m ) THEN
422  info = -7
423  ELSE IF( mv.LT.0 ) THEN
424  info = -9
425  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
426  $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
427  info = -11
428  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
429  info = -12
430  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
431  info = -13
432  ELSE
433  info = 0
434  END IF
435 *
436 * #:(
437  IF( info.NE.0 ) THEN
438  CALL xerbla( 'DGESVJ', -info )
439  RETURN
440  END IF
441 *
442 * #:) Quick return for void matrix
443 *
444  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
445 *
446 * Set numerical parameters
447 * The stopping criterion for Jacobi rotations is
448 *
449 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
450 *
451 * where EPS is the round-off and CTOL is defined as follows:
452 *
453  IF( uctol ) THEN
454 * ... user controlled
455  ctol = work( 1 )
456  ELSE
457 * ... default
458  IF( lsvec .OR. rsvec .OR. applv ) THEN
459  ctol = dsqrt( dble( m ) )
460  ELSE
461  ctol = dble( m )
462  END IF
463  END IF
464 * ... and the machine dependent parameters are
465 *[!] (Make sure that DLAMCH() works properly on the target machine.)
466 *
467  epsln = dlamch( 'Epsilon' )
468  rooteps = dsqrt( epsln )
469  sfmin = dlamch( 'SafeMinimum' )
470  rootsfmin = dsqrt( sfmin )
471  small = sfmin / epsln
472  big = dlamch( 'Overflow' )
473 * BIG = ONE / SFMIN
474  rootbig = one / rootsfmin
475  large = big / dsqrt( dble( m*n ) )
476  bigtheta = one / rooteps
477 *
478  tol = ctol*epsln
479  roottol = dsqrt( tol )
480 *
481  IF( dble( m )*epsln.GE.one ) THEN
482  info = -4
483  CALL xerbla( 'DGESVJ', -info )
484  RETURN
485  END IF
486 *
487 * Initialize the right singular vector matrix.
488 *
489  IF( rsvec ) THEN
490  mvl = n
491  CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
492  ELSE IF( applv ) THEN
493  mvl = mv
494  END IF
495  rsvec = rsvec .OR. applv
496 *
497 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
498 *(!) If necessary, scale A to protect the largest singular value
499 * from overflow. It is possible that saving the largest singular
500 * value destroys the information about the small ones.
501 * This initial scaling is almost minimal in the sense that the
502 * goal is to make sure that no column norm overflows, and that
503 * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
504 * in A are detected, the procedure returns with INFO=-6.
505 *
506  skl= one / dsqrt( dble( m )*dble( n ) )
507  noscale = .true.
508  goscale = .true.
509 *
510  IF( lower ) THEN
511 * the input matrix is M-by-N lower triangular (trapezoidal)
512  DO 1874 p = 1, n
513  aapp = zero
514  aaqq = one
515  CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
516  IF( aapp.GT.big ) THEN
517  info = -6
518  CALL xerbla( 'DGESVJ', -info )
519  RETURN
520  END IF
521  aaqq = dsqrt( aaqq )
522  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
523  sva( p ) = aapp*aaqq
524  ELSE
525  noscale = .false.
526  sva( p ) = aapp*( aaqq*skl)
527  IF( goscale ) THEN
528  goscale = .false.
529  DO 1873 q = 1, p - 1
530  sva( q ) = sva( q )*skl
531  1873 CONTINUE
532  END IF
533  END IF
534  1874 CONTINUE
535  ELSE IF( upper ) THEN
536 * the input matrix is M-by-N upper triangular (trapezoidal)
537  DO 2874 p = 1, n
538  aapp = zero
539  aaqq = one
540  CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
541  IF( aapp.GT.big ) THEN
542  info = -6
543  CALL xerbla( 'DGESVJ', -info )
544  RETURN
545  END IF
546  aaqq = dsqrt( aaqq )
547  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
548  sva( p ) = aapp*aaqq
549  ELSE
550  noscale = .false.
551  sva( p ) = aapp*( aaqq*skl)
552  IF( goscale ) THEN
553  goscale = .false.
554  DO 2873 q = 1, p - 1
555  sva( q ) = sva( q )*skl
556  2873 CONTINUE
557  END IF
558  END IF
559  2874 CONTINUE
560  ELSE
561 * the input matrix is M-by-N general dense
562  DO 3874 p = 1, n
563  aapp = zero
564  aaqq = one
565  CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
566  IF( aapp.GT.big ) THEN
567  info = -6
568  CALL xerbla( 'DGESVJ', -info )
569  RETURN
570  END IF
571  aaqq = dsqrt( aaqq )
572  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
573  sva( p ) = aapp*aaqq
574  ELSE
575  noscale = .false.
576  sva( p ) = aapp*( aaqq*skl)
577  IF( goscale ) THEN
578  goscale = .false.
579  DO 3873 q = 1, p - 1
580  sva( q ) = sva( q )*skl
581  3873 CONTINUE
582  END IF
583  END IF
584  3874 CONTINUE
585  END IF
586 *
587  IF( noscale )skl= one
588 *
589 * Move the smaller part of the spectrum from the underflow threshold
590 *(!) Start by determining the position of the nonzero entries of the
591 * array SVA() relative to ( SFMIN, BIG ).
592 *
593  aapp = zero
594  aaqq = big
595  DO 4781 p = 1, n
596  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
597  aapp = max( aapp, sva( p ) )
598  4781 CONTINUE
599 *
600 * #:) Quick return for zero matrix
601 *
602  IF( aapp.EQ.zero ) THEN
603  IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
604  work( 1 ) = one
605  work( 2 ) = zero
606  work( 3 ) = zero
607  work( 4 ) = zero
608  work( 5 ) = zero
609  work( 6 ) = zero
610  RETURN
611  END IF
612 *
613 * #:) Quick return for one-column matrix
614 *
615  IF( n.EQ.1 ) THEN
616  IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
617  $ a( 1, 1 ), lda, ierr )
618  work( 1 ) = one / skl
619  IF( sva( 1 ).GE.sfmin ) THEN
620  work( 2 ) = one
621  ELSE
622  work( 2 ) = zero
623  END IF
624  work( 3 ) = zero
625  work( 4 ) = zero
626  work( 5 ) = zero
627  work( 6 ) = zero
628  RETURN
629  END IF
630 *
631 * Protect small singular values from underflow, and try to
632 * avoid underflows/overflows in computing Jacobi rotations.
633 *
634  sn = dsqrt( sfmin / epsln )
635  temp1 = dsqrt( big / dble( n ) )
636  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
637  $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
638  temp1 = min( big, temp1 / aapp )
639 * AAQQ = AAQQ*TEMP1
640 * AAPP = AAPP*TEMP1
641  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
642  temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
643 * AAQQ = AAQQ*TEMP1
644 * AAPP = AAPP*TEMP1
645  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
646  temp1 = max( sn / aaqq, temp1 / aapp )
647 * AAQQ = AAQQ*TEMP1
648 * AAPP = AAPP*TEMP1
649  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
650  temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
651 * AAQQ = AAQQ*TEMP1
652 * AAPP = AAPP*TEMP1
653  ELSE
654  temp1 = one
655  END IF
656 *
657 * Scale, if necessary
658 *
659  IF( temp1.NE.one ) THEN
660  CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
661  END IF
662  skl= temp1*skl
663  IF( skl.NE.one ) THEN
664  CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
665  skl= one / skl
666  END IF
667 *
668 * Row-cyclic Jacobi SVD algorithm with column pivoting
669 *
670  emptsw = ( n*( n-1 ) ) / 2
671  notrot = 0
672  fastr( 1 ) = zero
673 *
674 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
675 * is initialized to identity. WORK is updated during fast scaled
676 * rotations.
677 *
678  DO 1868 q = 1, n
679  work( q ) = one
680  1868 CONTINUE
681 *
682 *
683  swband = 3
684 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
685 * if DGESVJ is used as a computational routine in the preconditioned
686 * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
687 * works on pivots inside a band-like region around the diagonal.
688 * The boundaries are determined dynamically, based on the number of
689 * pivots above a threshold.
690 *
691  kbl = min( 8, n )
692 *[TP] KBL is a tuning parameter that defines the tile size in the
693 * tiling of the p-q loops of pivot pairs. In general, an optimal
694 * value of KBL depends on the matrix dimensions and on the
695 * parameters of the computer's memory.
696 *
697  nbl = n / kbl
698  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
699 *
700  blskip = kbl**2
701 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
702 *
703  rowskip = min( 5, kbl )
704 *[TP] ROWSKIP is a tuning parameter.
705 *
706  lkahead = 1
707 *[TP] LKAHEAD is a tuning parameter.
708 *
709 * Quasi block transformations, using the lower (upper) triangular
710 * structure of the input matrix. The quasi-block-cycling usually
711 * invokes cubic convergence. Big part of this cycle is done inside
712 * canonical subspaces of dimensions less than M.
713 *
714  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
715 *[TP] The number of partition levels and the actual partition are
716 * tuning parameters.
717  n4 = n / 4
718  n2 = n / 2
719  n34 = 3*n4
720  IF( applv ) THEN
721  q = 0
722  ELSE
723  q = 1
724  END IF
725 *
726  IF( lower ) THEN
727 *
728 * This works very well on lower triangular matrices, in particular
729 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
730 * The idea is simple:
731 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
732 * [+ + 0 0] [0 0]
733 * [+ + x 0] actually work on [x 0] [x 0]
734 * [+ + x x] [x x]. [x x]
735 *
736  CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
737  $ work( n34+1 ), sva( n34+1 ), mvl,
738  $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
739  $ 2, work( n+1 ), lwork-n, ierr )
740 *
741  CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
742  $ work( n2+1 ), sva( n2+1 ), mvl,
743  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
744  $ work( n+1 ), lwork-n, ierr )
745 *
746  CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
747  $ work( n2+1 ), sva( n2+1 ), mvl,
748  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
749  $ work( n+1 ), lwork-n, ierr )
750 *
751  CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
752  $ work( n4+1 ), sva( n4+1 ), mvl,
753  $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
754  $ work( n+1 ), lwork-n, ierr )
755 *
756  CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
757  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
758  $ ierr )
759 *
760  CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
761  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
762  $ lwork-n, ierr )
763 *
764 *
765  ELSE IF( upper ) THEN
766 *
767 *
768  CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
769  $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
770  $ ierr )
771 *
772  CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
773  $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
774  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
775  $ ierr )
776 *
777  CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
778  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
779  $ lwork-n, ierr )
780 *
781  CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
782  $ work( n2+1 ), sva( n2+1 ), mvl,
783  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
784  $ work( n+1 ), lwork-n, ierr )
785 
786  END IF
787 *
788  END IF
789 *
790 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
791 *
792  DO 1993 i = 1, nsweep
793 *
794 * .. go go go ...
795 *
796  mxaapq = zero
797  mxsinj = zero
798  iswrot = 0
799 *
800  notrot = 0
801  pskipped = 0
802 *
803 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
804 * 1 <= p < q <= N. This is the first step toward a blocked implementation
805 * of the rotations. New implementation, based on block transformations,
806 * is under development.
807 *
808  DO 2000 ibr = 1, nbl
809 *
810  igl = ( ibr-1 )*kbl + 1
811 *
812  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
813 *
814  igl = igl + ir1*kbl
815 *
816  DO 2001 p = igl, min( igl+kbl-1, n-1 )
817 *
818 * .. de Rijk's pivoting
819 *
820  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
821  IF( p.NE.q ) THEN
822  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
823  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
824  $ v( 1, q ), 1 )
825  temp1 = sva( p )
826  sva( p ) = sva( q )
827  sva( q ) = temp1
828  temp1 = work( p )
829  work( p ) = work( q )
830  work( q ) = temp1
831  END IF
832 *
833  IF( ir1.EQ.0 ) THEN
834 *
835 * Column norms are periodically updated by explicit
836 * norm computation.
837 * Caveat:
838 * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
839 * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
840 * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
841 * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
842 * Hence, DNRM2 cannot be trusted, not even in the case when
843 * the true norm is far from the under(over)flow boundaries.
844 * If properly implemented DNRM2 is available, the IF-THEN-ELSE
845 * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
846 *
847  IF( ( sva( p ).LT.rootbig ) .AND.
848  $ ( sva( p ).GT.rootsfmin ) ) THEN
849  sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
850  ELSE
851  temp1 = zero
852  aapp = one
853  CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
854  sva( p ) = temp1*dsqrt( aapp )*work( p )
855  END IF
856  aapp = sva( p )
857  ELSE
858  aapp = sva( p )
859  END IF
860 *
861  IF( aapp.GT.zero ) THEN
862 *
863  pskipped = 0
864 *
865  DO 2002 q = p + 1, min( igl+kbl-1, n )
866 *
867  aaqq = sva( q )
868 *
869  IF( aaqq.GT.zero ) THEN
870 *
871  aapp0 = aapp
872  IF( aaqq.GE.one ) THEN
873  rotok = ( small*aapp ).LE.aaqq
874  IF( aapp.LT.( big / aaqq ) ) THEN
875  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
876  $ q ), 1 )*work( p )*work( q ) /
877  $ aaqq ) / aapp
878  ELSE
879  CALL dcopy( m, a( 1, p ), 1,
880  $ work( n+1 ), 1 )
881  CALL dlascl( 'G', 0, 0, aapp,
882  $ work( p ), m, 1,
883  $ work( n+1 ), lda, ierr )
884  aapq = ddot( m, work( n+1 ), 1,
885  $ a( 1, q ), 1 )*work( q ) / aaqq
886  END IF
887  ELSE
888  rotok = aapp.LE.( aaqq / small )
889  IF( aapp.GT.( small / aaqq ) ) THEN
890  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
891  $ q ), 1 )*work( p )*work( q ) /
892  $ aaqq ) / aapp
893  ELSE
894  CALL dcopy( m, a( 1, q ), 1,
895  $ work( n+1 ), 1 )
896  CALL dlascl( 'G', 0, 0, aaqq,
897  $ work( q ), m, 1,
898  $ work( n+1 ), lda, ierr )
899  aapq = ddot( m, work( n+1 ), 1,
900  $ a( 1, p ), 1 )*work( p ) / aapp
901  END IF
902  END IF
903 *
904  mxaapq = max( mxaapq, dabs( aapq ) )
905 *
906 * TO rotate or NOT to rotate, THAT is the question ...
907 *
908  IF( dabs( aapq ).GT.tol ) THEN
909 *
910 * .. rotate
911 *[RTD] ROTATED = ROTATED + ONE
912 *
913  IF( ir1.EQ.0 ) THEN
914  notrot = 0
915  pskipped = 0
916  iswrot = iswrot + 1
917  END IF
918 *
919  IF( rotok ) THEN
920 *
921  aqoap = aaqq / aapp
922  apoaq = aapp / aaqq
923  theta = -half*dabs(aqoap-apoaq)/aapq
924 *
925  IF( dabs( theta ).GT.bigtheta ) THEN
926 *
927  t = half / theta
928  fastr( 3 ) = t*work( p ) / work( q )
929  fastr( 4 ) = -t*work( q ) /
930  $ work( p )
931  CALL drotm( m, a( 1, p ), 1,
932  $ a( 1, q ), 1, fastr )
933  IF( rsvec )CALL drotm( mvl,
934  $ v( 1, p ), 1,
935  $ v( 1, q ), 1,
936  $ fastr )
937  sva( q ) = aaqq*dsqrt( max( zero,
938  $ one+t*apoaq*aapq ) )
939  aapp = aapp*dsqrt( max( zero,
940  $ one-t*aqoap*aapq ) )
941  mxsinj = max( mxsinj, dabs( t ) )
942 *
943  ELSE
944 *
945 * .. choose correct signum for THETA and rotate
946 *
947  thsign = -dsign( one, aapq )
948  t = one / ( theta+thsign*
949  $ dsqrt( one+theta*theta ) )
950  cs = dsqrt( one / ( one+t*t ) )
951  sn = t*cs
952 *
953  mxsinj = max( mxsinj, dabs( sn ) )
954  sva( q ) = aaqq*dsqrt( max( zero,
955  $ one+t*apoaq*aapq ) )
956  aapp = aapp*dsqrt( max( zero,
957  $ one-t*aqoap*aapq ) )
958 *
959  apoaq = work( p ) / work( q )
960  aqoap = work( q ) / work( p )
961  IF( work( p ).GE.one ) THEN
962  IF( work( q ).GE.one ) THEN
963  fastr( 3 ) = t*apoaq
964  fastr( 4 ) = -t*aqoap
965  work( p ) = work( p )*cs
966  work( q ) = work( q )*cs
967  CALL drotm( m, a( 1, p ), 1,
968  $ a( 1, q ), 1,
969  $ fastr )
970  IF( rsvec )CALL drotm( mvl,
971  $ v( 1, p ), 1, v( 1, q ),
972  $ 1, fastr )
973  ELSE
974  CALL daxpy( m, -t*aqoap,
975  $ a( 1, q ), 1,
976  $ a( 1, p ), 1 )
977  CALL daxpy( m, cs*sn*apoaq,
978  $ a( 1, p ), 1,
979  $ a( 1, q ), 1 )
980  work( p ) = work( p )*cs
981  work( q ) = work( q ) / cs
982  IF( rsvec ) THEN
983  CALL daxpy( mvl, -t*aqoap,
984  $ v( 1, q ), 1,
985  $ v( 1, p ), 1 )
986  CALL daxpy( mvl,
987  $ cs*sn*apoaq,
988  $ v( 1, p ), 1,
989  $ v( 1, q ), 1 )
990  END IF
991  END IF
992  ELSE
993  IF( work( q ).GE.one ) THEN
994  CALL daxpy( m, t*apoaq,
995  $ a( 1, p ), 1,
996  $ a( 1, q ), 1 )
997  CALL daxpy( m, -cs*sn*aqoap,
998  $ a( 1, q ), 1,
999  $ a( 1, p ), 1 )
1000  work( p ) = work( p ) / cs
1001  work( q ) = work( q )*cs
1002  IF( rsvec ) THEN
1003  CALL daxpy( mvl, t*apoaq,
1004  $ v( 1, p ), 1,
1005  $ v( 1, q ), 1 )
1006  CALL daxpy( mvl,
1007  $ -cs*sn*aqoap,
1008  $ v( 1, q ), 1,
1009  $ v( 1, p ), 1 )
1010  END IF
1011  ELSE
1012  IF( work( p ).GE.work( q ) )
1013  $ THEN
1014  CALL daxpy( m, -t*aqoap,
1015  $ a( 1, q ), 1,
1016  $ a( 1, p ), 1 )
1017  CALL daxpy( m, cs*sn*apoaq,
1018  $ a( 1, p ), 1,
1019  $ a( 1, q ), 1 )
1020  work( p ) = work( p )*cs
1021  work( q ) = work( q ) / cs
1022  IF( rsvec ) THEN
1023  CALL daxpy( mvl,
1024  $ -t*aqoap,
1025  $ v( 1, q ), 1,
1026  $ v( 1, p ), 1 )
1027  CALL daxpy( mvl,
1028  $ cs*sn*apoaq,
1029  $ v( 1, p ), 1,
1030  $ v( 1, q ), 1 )
1031  END IF
1032  ELSE
1033  CALL daxpy( m, t*apoaq,
1034  $ a( 1, p ), 1,
1035  $ a( 1, q ), 1 )
1036  CALL daxpy( m,
1037  $ -cs*sn*aqoap,
1038  $ a( 1, q ), 1,
1039  $ a( 1, p ), 1 )
1040  work( p ) = work( p ) / cs
1041  work( q ) = work( q )*cs
1042  IF( rsvec ) THEN
1043  CALL daxpy( mvl,
1044  $ t*apoaq, v( 1, p ),
1045  $ 1, v( 1, q ), 1 )
1046  CALL daxpy( mvl,
1047  $ -cs*sn*aqoap,
1048  $ v( 1, q ), 1,
1049  $ v( 1, p ), 1 )
1050  END IF
1051  END IF
1052  END IF
1053  END IF
1054  END IF
1055 *
1056  ELSE
1057 * .. have to use modified Gram-Schmidt like transformation
1058  CALL dcopy( m, a( 1, p ), 1,
1059  $ work( n+1 ), 1 )
1060  CALL dlascl( 'G', 0, 0, aapp, one, m,
1061  $ 1, work( n+1 ), lda,
1062  $ ierr )
1063  CALL dlascl( 'G', 0, 0, aaqq, one, m,
1064  $ 1, a( 1, q ), lda, ierr )
1065  temp1 = -aapq*work( p ) / work( q )
1066  CALL daxpy( m, temp1, work( n+1 ), 1,
1067  $ a( 1, q ), 1 )
1068  CALL dlascl( 'G', 0, 0, one, aaqq, m,
1069  $ 1, a( 1, q ), lda, ierr )
1070  sva( q ) = aaqq*dsqrt( max( zero,
1071  $ one-aapq*aapq ) )
1072  mxsinj = max( mxsinj, sfmin )
1073  END IF
1074 * END IF ROTOK THEN ... ELSE
1075 *
1076 * In the case of cancellation in updating SVA(q), SVA(p)
1077 * recompute SVA(q), SVA(p).
1078 *
1079  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1080  $ THEN
1081  IF( ( aaqq.LT.rootbig ) .AND.
1082  $ ( aaqq.GT.rootsfmin ) ) THEN
1083  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1084  $ work( q )
1085  ELSE
1086  t = zero
1087  aaqq = one
1088  CALL dlassq( m, a( 1, q ), 1, t,
1089  $ aaqq )
1090  sva( q ) = t*dsqrt( aaqq )*work( q )
1091  END IF
1092  END IF
1093  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1094  IF( ( aapp.LT.rootbig ) .AND.
1095  $ ( aapp.GT.rootsfmin ) ) THEN
1096  aapp = dnrm2( m, a( 1, p ), 1 )*
1097  $ work( p )
1098  ELSE
1099  t = zero
1100  aapp = one
1101  CALL dlassq( m, a( 1, p ), 1, t,
1102  $ aapp )
1103  aapp = t*dsqrt( aapp )*work( p )
1104  END IF
1105  sva( p ) = aapp
1106  END IF
1107 *
1108  ELSE
1109 * A(:,p) and A(:,q) already numerically orthogonal
1110  IF( ir1.EQ.0 )notrot = notrot + 1
1111 *[RTD] SKIPPED = SKIPPED + 1
1112  pskipped = pskipped + 1
1113  END IF
1114  ELSE
1115 * A(:,q) is zero column
1116  IF( ir1.EQ.0 )notrot = notrot + 1
1117  pskipped = pskipped + 1
1118  END IF
1119 *
1120  IF( ( i.LE.swband ) .AND.
1121  $ ( pskipped.GT.rowskip ) ) THEN
1122  IF( ir1.EQ.0 )aapp = -aapp
1123  notrot = 0
1124  GO TO 2103
1125  END IF
1126 *
1127  2002 CONTINUE
1128 * END q-LOOP
1129 *
1130  2103 CONTINUE
1131 * bailed out of q-loop
1132 *
1133  sva( p ) = aapp
1134 *
1135  ELSE
1136  sva( p ) = aapp
1137  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1138  $ notrot = notrot + min( igl+kbl-1, n ) - p
1139  END IF
1140 *
1141  2001 CONTINUE
1142 * end of the p-loop
1143 * end of doing the block ( ibr, ibr )
1144  1002 CONTINUE
1145 * end of ir1-loop
1146 *
1147 * ... go to the off diagonal blocks
1148 *
1149  igl = ( ibr-1 )*kbl + 1
1150 *
1151  DO 2010 jbc = ibr + 1, nbl
1152 *
1153  jgl = ( jbc-1 )*kbl + 1
1154 *
1155 * doing the block at ( ibr, jbc )
1156 *
1157  ijblsk = 0
1158  DO 2100 p = igl, min( igl+kbl-1, n )
1159 *
1160  aapp = sva( p )
1161  IF( aapp.GT.zero ) THEN
1162 *
1163  pskipped = 0
1164 *
1165  DO 2200 q = jgl, min( jgl+kbl-1, n )
1166 *
1167  aaqq = sva( q )
1168  IF( aaqq.GT.zero ) THEN
1169  aapp0 = aapp
1170 *
1171 * .. M x 2 Jacobi SVD ..
1172 *
1173 * Safe Gram matrix computation
1174 *
1175  IF( aaqq.GE.one ) THEN
1176  IF( aapp.GE.aaqq ) THEN
1177  rotok = ( small*aapp ).LE.aaqq
1178  ELSE
1179  rotok = ( small*aaqq ).LE.aapp
1180  END IF
1181  IF( aapp.LT.( big / aaqq ) ) THEN
1182  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1183  $ q ), 1 )*work( p )*work( q ) /
1184  $ aaqq ) / aapp
1185  ELSE
1186  CALL dcopy( m, a( 1, p ), 1,
1187  $ work( n+1 ), 1 )
1188  CALL dlascl( 'G', 0, 0, aapp,
1189  $ work( p ), m, 1,
1190  $ work( n+1 ), lda, ierr )
1191  aapq = ddot( m, work( n+1 ), 1,
1192  $ a( 1, q ), 1 )*work( q ) / aaqq
1193  END IF
1194  ELSE
1195  IF( aapp.GE.aaqq ) THEN
1196  rotok = aapp.LE.( aaqq / small )
1197  ELSE
1198  rotok = aaqq.LE.( aapp / small )
1199  END IF
1200  IF( aapp.GT.( small / aaqq ) ) THEN
1201  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1202  $ q ), 1 )*work( p )*work( q ) /
1203  $ aaqq ) / aapp
1204  ELSE
1205  CALL dcopy( m, a( 1, q ), 1,
1206  $ work( n+1 ), 1 )
1207  CALL dlascl( 'G', 0, 0, aaqq,
1208  $ work( q ), m, 1,
1209  $ work( n+1 ), lda, ierr )
1210  aapq = ddot( m, work( n+1 ), 1,
1211  $ a( 1, p ), 1 )*work( p ) / aapp
1212  END IF
1213  END IF
1214 *
1215  mxaapq = max( mxaapq, dabs( aapq ) )
1216 *
1217 * TO rotate or NOT to rotate, THAT is the question ...
1218 *
1219  IF( dabs( aapq ).GT.tol ) THEN
1220  notrot = 0
1221 *[RTD] ROTATED = ROTATED + 1
1222  pskipped = 0
1223  iswrot = iswrot + 1
1224 *
1225  IF( rotok ) THEN
1226 *
1227  aqoap = aaqq / aapp
1228  apoaq = aapp / aaqq
1229  theta = -half*dabs(aqoap-apoaq)/aapq
1230  IF( aaqq.GT.aapp0 )theta = -theta
1231 *
1232  IF( dabs( theta ).GT.bigtheta ) THEN
1233  t = half / theta
1234  fastr( 3 ) = t*work( p ) / work( q )
1235  fastr( 4 ) = -t*work( q ) /
1236  $ work( p )
1237  CALL drotm( m, a( 1, p ), 1,
1238  $ a( 1, q ), 1, fastr )
1239  IF( rsvec )CALL drotm( mvl,
1240  $ v( 1, p ), 1,
1241  $ v( 1, q ), 1,
1242  $ fastr )
1243  sva( q ) = aaqq*dsqrt( max( zero,
1244  $ one+t*apoaq*aapq ) )
1245  aapp = aapp*dsqrt( max( zero,
1246  $ one-t*aqoap*aapq ) )
1247  mxsinj = max( mxsinj, dabs( t ) )
1248  ELSE
1249 *
1250 * .. choose correct signum for THETA and rotate
1251 *
1252  thsign = -dsign( one, aapq )
1253  IF( aaqq.GT.aapp0 )thsign = -thsign
1254  t = one / ( theta+thsign*
1255  $ dsqrt( one+theta*theta ) )
1256  cs = dsqrt( one / ( one+t*t ) )
1257  sn = t*cs
1258  mxsinj = max( mxsinj, dabs( sn ) )
1259  sva( q ) = aaqq*dsqrt( max( zero,
1260  $ one+t*apoaq*aapq ) )
1261  aapp = aapp*dsqrt( max( zero,
1262  $ one-t*aqoap*aapq ) )
1263 *
1264  apoaq = work( p ) / work( q )
1265  aqoap = work( q ) / work( p )
1266  IF( work( p ).GE.one ) THEN
1267 *
1268  IF( work( q ).GE.one ) THEN
1269  fastr( 3 ) = t*apoaq
1270  fastr( 4 ) = -t*aqoap
1271  work( p ) = work( p )*cs
1272  work( q ) = work( q )*cs
1273  CALL drotm( m, a( 1, p ), 1,
1274  $ a( 1, q ), 1,
1275  $ fastr )
1276  IF( rsvec )CALL drotm( mvl,
1277  $ v( 1, p ), 1, v( 1, q ),
1278  $ 1, fastr )
1279  ELSE
1280  CALL daxpy( m, -t*aqoap,
1281  $ a( 1, q ), 1,
1282  $ a( 1, p ), 1 )
1283  CALL daxpy( m, cs*sn*apoaq,
1284  $ a( 1, p ), 1,
1285  $ a( 1, q ), 1 )
1286  IF( rsvec ) THEN
1287  CALL daxpy( mvl, -t*aqoap,
1288  $ v( 1, q ), 1,
1289  $ v( 1, p ), 1 )
1290  CALL daxpy( mvl,
1291  $ cs*sn*apoaq,
1292  $ v( 1, p ), 1,
1293  $ v( 1, q ), 1 )
1294  END IF
1295  work( p ) = work( p )*cs
1296  work( q ) = work( q ) / cs
1297  END IF
1298  ELSE
1299  IF( work( q ).GE.one ) THEN
1300  CALL daxpy( m, t*apoaq,
1301  $ a( 1, p ), 1,
1302  $ a( 1, q ), 1 )
1303  CALL daxpy( m, -cs*sn*aqoap,
1304  $ a( 1, q ), 1,
1305  $ a( 1, p ), 1 )
1306  IF( rsvec ) THEN
1307  CALL daxpy( mvl, t*apoaq,
1308  $ v( 1, p ), 1,
1309  $ v( 1, q ), 1 )
1310  CALL daxpy( mvl,
1311  $ -cs*sn*aqoap,
1312  $ v( 1, q ), 1,
1313  $ v( 1, p ), 1 )
1314  END IF
1315  work( p ) = work( p ) / cs
1316  work( q ) = work( q )*cs
1317  ELSE
1318  IF( work( p ).GE.work( q ) )
1319  $ THEN
1320  CALL daxpy( m, -t*aqoap,
1321  $ a( 1, q ), 1,
1322  $ a( 1, p ), 1 )
1323  CALL daxpy( m, cs*sn*apoaq,
1324  $ a( 1, p ), 1,
1325  $ a( 1, q ), 1 )
1326  work( p ) = work( p )*cs
1327  work( q ) = work( q ) / cs
1328  IF( rsvec ) THEN
1329  CALL daxpy( mvl,
1330  $ -t*aqoap,
1331  $ v( 1, q ), 1,
1332  $ v( 1, p ), 1 )
1333  CALL daxpy( mvl,
1334  $ cs*sn*apoaq,
1335  $ v( 1, p ), 1,
1336  $ v( 1, q ), 1 )
1337  END IF
1338  ELSE
1339  CALL daxpy( m, t*apoaq,
1340  $ a( 1, p ), 1,
1341  $ a( 1, q ), 1 )
1342  CALL daxpy( m,
1343  $ -cs*sn*aqoap,
1344  $ a( 1, q ), 1,
1345  $ a( 1, p ), 1 )
1346  work( p ) = work( p ) / cs
1347  work( q ) = work( q )*cs
1348  IF( rsvec ) THEN
1349  CALL daxpy( mvl,
1350  $ t*apoaq, v( 1, p ),
1351  $ 1, v( 1, q ), 1 )
1352  CALL daxpy( mvl,
1353  $ -cs*sn*aqoap,
1354  $ v( 1, q ), 1,
1355  $ v( 1, p ), 1 )
1356  END IF
1357  END IF
1358  END IF
1359  END IF
1360  END IF
1361 *
1362  ELSE
1363  IF( aapp.GT.aaqq ) THEN
1364  CALL dcopy( m, a( 1, p ), 1,
1365  $ work( n+1 ), 1 )
1366  CALL dlascl( 'G', 0, 0, aapp, one,
1367  $ m, 1, work( n+1 ), lda,
1368  $ ierr )
1369  CALL dlascl( 'G', 0, 0, aaqq, one,
1370  $ m, 1, a( 1, q ), lda,
1371  $ ierr )
1372  temp1 = -aapq*work( p ) / work( q )
1373  CALL daxpy( m, temp1, work( n+1 ),
1374  $ 1, a( 1, q ), 1 )
1375  CALL dlascl( 'G', 0, 0, one, aaqq,
1376  $ m, 1, a( 1, q ), lda,
1377  $ ierr )
1378  sva( q ) = aaqq*dsqrt( max( zero,
1379  $ one-aapq*aapq ) )
1380  mxsinj = max( mxsinj, sfmin )
1381  ELSE
1382  CALL dcopy( m, a( 1, q ), 1,
1383  $ work( n+1 ), 1 )
1384  CALL dlascl( 'G', 0, 0, aaqq, one,
1385  $ m, 1, work( n+1 ), lda,
1386  $ ierr )
1387  CALL dlascl( 'G', 0, 0, aapp, one,
1388  $ m, 1, a( 1, p ), lda,
1389  $ ierr )
1390  temp1 = -aapq*work( q ) / work( p )
1391  CALL daxpy( m, temp1, work( n+1 ),
1392  $ 1, a( 1, p ), 1 )
1393  CALL dlascl( 'G', 0, 0, one, aapp,
1394  $ m, 1, a( 1, p ), lda,
1395  $ ierr )
1396  sva( p ) = aapp*dsqrt( max( zero,
1397  $ one-aapq*aapq ) )
1398  mxsinj = max( mxsinj, sfmin )
1399  END IF
1400  END IF
1401 * END IF ROTOK THEN ... ELSE
1402 *
1403 * In the case of cancellation in updating SVA(q)
1404 * .. recompute SVA(q)
1405  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1406  $ THEN
1407  IF( ( aaqq.LT.rootbig ) .AND.
1408  $ ( aaqq.GT.rootsfmin ) ) THEN
1409  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1410  $ work( q )
1411  ELSE
1412  t = zero
1413  aaqq = one
1414  CALL dlassq( m, a( 1, q ), 1, t,
1415  $ aaqq )
1416  sva( q ) = t*dsqrt( aaqq )*work( q )
1417  END IF
1418  END IF
1419  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1420  IF( ( aapp.LT.rootbig ) .AND.
1421  $ ( aapp.GT.rootsfmin ) ) THEN
1422  aapp = dnrm2( m, a( 1, p ), 1 )*
1423  $ work( p )
1424  ELSE
1425  t = zero
1426  aapp = one
1427  CALL dlassq( m, a( 1, p ), 1, t,
1428  $ aapp )
1429  aapp = t*dsqrt( aapp )*work( p )
1430  END IF
1431  sva( p ) = aapp
1432  END IF
1433 * end of OK rotation
1434  ELSE
1435  notrot = notrot + 1
1436 *[RTD] SKIPPED = SKIPPED + 1
1437  pskipped = pskipped + 1
1438  ijblsk = ijblsk + 1
1439  END IF
1440  ELSE
1441  notrot = notrot + 1
1442  pskipped = pskipped + 1
1443  ijblsk = ijblsk + 1
1444  END IF
1445 *
1446  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1447  $ THEN
1448  sva( p ) = aapp
1449  notrot = 0
1450  GO TO 2011
1451  END IF
1452  IF( ( i.LE.swband ) .AND.
1453  $ ( pskipped.GT.rowskip ) ) THEN
1454  aapp = -aapp
1455  notrot = 0
1456  GO TO 2203
1457  END IF
1458 *
1459  2200 CONTINUE
1460 * end of the q-loop
1461  2203 CONTINUE
1462 *
1463  sva( p ) = aapp
1464 *
1465  ELSE
1466 *
1467  IF( aapp.EQ.zero )notrot = notrot +
1468  $ min( jgl+kbl-1, n ) - jgl + 1
1469  IF( aapp.LT.zero )notrot = 0
1470 *
1471  END IF
1472 *
1473  2100 CONTINUE
1474 * end of the p-loop
1475  2010 CONTINUE
1476 * end of the jbc-loop
1477  2011 CONTINUE
1478 *2011 bailed out of the jbc-loop
1479  DO 2012 p = igl, min( igl+kbl-1, n )
1480  sva( p ) = dabs( sva( p ) )
1481  2012 CONTINUE
1482 ***
1483  2000 CONTINUE
1484 *2000 :: end of the ibr-loop
1485 *
1486 * .. update SVA(N)
1487  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1488  $ THEN
1489  sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1490  ELSE
1491  t = zero
1492  aapp = one
1493  CALL dlassq( m, a( 1, n ), 1, t, aapp )
1494  sva( n ) = t*dsqrt( aapp )*work( n )
1495  END IF
1496 *
1497 * Additional steering devices
1498 *
1499  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1500  $ ( iswrot.LE.n ) ) )swband = i
1501 *
1502  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1503  $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1504  GO TO 1994
1505  END IF
1506 *
1507  IF( notrot.GE.emptsw )GO TO 1994
1508 *
1509  1993 CONTINUE
1510 * end i=1:NSWEEP loop
1511 *
1512 * #:( Reaching this point means that the procedure has not converged.
1513  info = nsweep - 1
1514  GO TO 1995
1515 *
1516  1994 CONTINUE
1517 * #:) Reaching this point means numerical convergence after the i-th
1518 * sweep.
1519 *
1520  info = 0
1521 * #:) INFO = 0 confirms successful iterations.
1522  1995 CONTINUE
1523 *
1524 * Sort the singular values and find how many are above
1525 * the underflow threshold.
1526 *
1527  n2 = 0
1528  n4 = 0
1529  DO 5991 p = 1, n - 1
1530  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1531  IF( p.NE.q ) THEN
1532  temp1 = sva( p )
1533  sva( p ) = sva( q )
1534  sva( q ) = temp1
1535  temp1 = work( p )
1536  work( p ) = work( q )
1537  work( q ) = temp1
1538  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1539  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1540  END IF
1541  IF( sva( p ).NE.zero ) THEN
1542  n4 = n4 + 1
1543  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1544  END IF
1545  5991 CONTINUE
1546  IF( sva( n ).NE.zero ) THEN
1547  n4 = n4 + 1
1548  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1549  END IF
1550 *
1551 * Normalize the left singular vectors.
1552 *
1553  IF( lsvec .OR. uctol ) THEN
1554  DO 1998 p = 1, n2
1555  CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1556  1998 CONTINUE
1557  END IF
1558 *
1559 * Scale the product of Jacobi rotations (assemble the fast rotations).
1560 *
1561  IF( rsvec ) THEN
1562  IF( applv ) THEN
1563  DO 2398 p = 1, n
1564  CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1565  2398 CONTINUE
1566  ELSE
1567  DO 2399 p = 1, n
1568  temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1569  CALL dscal( mvl, temp1, v( 1, p ), 1 )
1570  2399 CONTINUE
1571  END IF
1572  END IF
1573 *
1574 * Undo scaling, if necessary (and possible).
1575  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1576  $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1577  $ ( sfmin / skl) ) ) ) THEN
1578  DO 2400 p = 1, n
1579  sva( p ) = skl*sva( p )
1580  2400 CONTINUE
1581  skl= one
1582  END IF
1583 *
1584  work( 1 ) = skl
1585 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1586 * then some of the singular values may overflow or underflow and
1587 * the spectrum is given in this factored representation.
1588 *
1589  work( 2 ) = dble( n4 )
1590 * N4 is the number of computed nonzero singular values of A.
1591 *
1592  work( 3 ) = dble( n2 )
1593 * N2 is the number of singular values of A greater than SFMIN.
1594 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1595 * that may carry some information.
1596 *
1597  work( 4 ) = dble( i )
1598 * i is the index of the last sweep before declaring convergence.
1599 *
1600  work( 5 ) = mxaapq
1601 * MXAAPQ is the largest absolute value of scaled pivots in the
1602 * last sweep
1603 *
1604  work( 6 ) = mxsinj
1605 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1606 * in the last sweep
1607 *
1608  RETURN
1609 * ..
1610 * .. END OF DGESVJ
1611 * ..
1612  END
subroutine dlassq(n, x, incx, scl, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f90:137
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:143
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition: dlaset.f:110
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:60
subroutine drotm(N, DX, INCX, DY, INCY, DPARAM)
DROTM
Definition: drotm.f:96
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:82
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:79
subroutine daxpy(N, DA, DX, INCX, DY, INCY)
DAXPY
Definition: daxpy.f:89
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:82
subroutine dgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
DGESVJ
Definition: dgesvj.f:337
subroutine dgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ0 pre-processor for the routine dgesvj.
Definition: dgsvj0.f:218
subroutine dgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: dgsvj1.f:236