LAPACK 3.3.0
|
00001 SUBROUTINE SGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 00002 + LDV, WORK, LWORK, INFO ) 00003 * 00004 * -- LAPACK routine (version 3.3.0) -- 00005 * 00006 * -- Contributed by Zlatko Drmac of the University of Zagreb and -- 00007 * -- Kresimir Veselic of the Fernuniversitaet Hagen -- 00008 * November 2010 00009 * 00010 * -- LAPACK is a software package provided by Univ. of Tennessee, -- 00011 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- 00012 * 00013 * This routine is also part of SIGMA (version 1.23, October 23. 2008.) 00014 * SIGMA is a library of algorithms for highly accurate algorithms for 00015 * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the 00016 * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0. 00017 * 00018 IMPLICIT NONE 00019 * .. 00020 * .. Scalar Arguments .. 00021 INTEGER INFO, LDA, LDV, LWORK, M, MV, N 00022 CHARACTER*1 JOBA, JOBU, JOBV 00023 * .. 00024 * .. Array Arguments .. 00025 REAL A( LDA, * ), SVA( N ), V( LDV, * ), 00026 + WORK( LWORK ) 00027 * .. 00028 * 00029 * Purpose 00030 * ======= 00031 * 00032 * SGESVJ computes the singular value decomposition (SVD) of a real 00033 * M-by-N matrix A, where M >= N. The SVD of A is written as 00034 * [++] [xx] [x0] [xx] 00035 * A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx] 00036 * [++] [xx] 00037 * where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal 00038 * matrix, and V is an N-by-N orthogonal matrix. The diagonal elements 00039 * of SIGMA are the singular values of A. The columns of U and V are the 00040 * left and the right singular vectors of A, respectively. 00041 * 00042 * Further Details 00043 * ~~~~~~~~~~~~~~~ 00044 * The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane 00045 * rotations. The rotations are implemented as fast scaled rotations of 00046 * Anda and Park [1]. In the case of underflow of the Jacobi angle, a 00047 * modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses 00048 * column interchanges of de Rijk [2]. The relative accuracy of the computed 00049 * singular values and the accuracy of the computed singular vectors (in 00050 * angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. 00051 * The condition number that determines the accuracy in the full rank case 00052 * is essentially min_{D=diag} kappa(A*D), where kappa(.) is the 00053 * spectral condition number. The best performance of this Jacobi SVD 00054 * procedure is achieved if used in an accelerated version of Drmac and 00055 * Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. 00056 * Some tunning parameters (marked with [TP]) are available for the 00057 * implementer. 00058 * The computational range for the nonzero singular values is the machine 00059 * number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even 00060 * denormalized singular values can be computed with the corresponding 00061 * gradual loss of accurate digits. 00062 * 00063 * Contributors 00064 * ~~~~~~~~~~~~ 00065 * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) 00066 * 00067 * References 00068 * ~~~~~~~~~~ 00069 * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. 00070 * SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. 00071 * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the 00072 * singular value decomposition on a vector computer. 00073 * SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. 00074 * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. 00075 * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular 00076 * value computation in floating point arithmetic. 00077 * SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. 00078 * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. 00079 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. 00080 * LAPACK Working note 169. 00081 * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. 00082 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. 00083 * LAPACK Working note 170. 00084 * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, 00085 * QSVD, (H,K)-SVD computations. 00086 * Department of Mathematics, University of Zagreb, 2008. 00087 * 00088 * Bugs, Examples and Comments 00089 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00090 * Please report all bugs and send interesting test examples and comments to 00091 * drmac@math.hr. Thank you. 00092 * 00093 * Arguments 00094 * ========= 00095 * 00096 * JOBA (input) CHARACTER* 1 00097 * Specifies the structure of A. 00098 * = 'L': The input matrix A is lower triangular; 00099 * = 'U': The input matrix A is upper triangular; 00100 * = 'G': The input matrix A is general M-by-N matrix, M >= N. 00101 * 00102 * JOBU (input) CHARACTER*1 00103 * Specifies whether to compute the left singular vectors 00104 * (columns of U): 00105 * = 'U': The left singular vectors corresponding to the nonzero 00106 * singular values are computed and returned in the leading 00107 * columns of A. See more details in the description of A. 00108 * The default numerical orthogonality threshold is set to 00109 * approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E'). 00110 * = 'C': Analogous to JOBU='U', except that user can control the 00111 * level of numerical orthogonality of the computed left 00112 * singular vectors. TOL can be set to TOL = CTOL*EPS, where 00113 * CTOL is given on input in the array WORK. 00114 * No CTOL smaller than ONE is allowed. CTOL greater 00115 * than 1 / EPS is meaningless. The option 'C' 00116 * can be used if M*EPS is satisfactory orthogonality 00117 * of the computed left singular vectors, so CTOL=M could 00118 * save few sweeps of Jacobi rotations. 00119 * See the descriptions of A and WORK(1). 00120 * = 'N': The matrix U is not computed. However, see the 00121 * description of A. 00122 * 00123 * JOBV (input) CHARACTER*1 00124 * Specifies whether to compute the right singular vectors, that 00125 * is, the matrix V: 00126 * = 'V' : the matrix V is computed and returned in the array V 00127 * = 'A' : the Jacobi rotations are applied to the MV-by-N 00128 * array V. In other words, the right singular vector 00129 * matrix V is not computed explicitly; instead it is 00130 * applied to an MV-by-N matrix initially stored in the 00131 * first MV rows of V. 00132 * = 'N' : the matrix V is not computed and the array V is not 00133 * referenced 00134 * 00135 * M (input) INTEGER 00136 * The number of rows of the input matrix A. M >= 0. 00137 * 00138 * N (input) INTEGER 00139 * The number of columns of the input matrix A. 00140 * M >= N >= 0. 00141 * 00142 * A (input/output) REAL array, dimension (LDA,N) 00143 * On entry, the M-by-N matrix A. 00144 * On exit, 00145 * If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C': 00146 * If INFO .EQ. 0 : 00147 * RANKA orthonormal columns of U are returned in the 00148 * leading RANKA columns of the array A. Here RANKA <= N 00149 * is the number of computed singular values of A that are 00150 * above the underflow threshold SLAMCH('S'). The singular 00151 * vectors corresponding to underflowed or zero singular 00152 * values are not computed. The value of RANKA is returned 00153 * in the array WORK as RANKA=NINT(WORK(2)). Also see the 00154 * descriptions of SVA and WORK. The computed columns of U 00155 * are mutually numerically orthogonal up to approximately 00156 * TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'), 00157 * see the description of JOBU. 00158 * If INFO .GT. 0, 00159 * the procedure SGESVJ did not converge in the given number 00160 * of iterations (sweeps). In that case, the computed 00161 * columns of U may not be orthogonal up to TOL. The output 00162 * U (stored in A), SIGMA (given by the computed singular 00163 * values in SVA(1:N)) and V is still a decomposition of the 00164 * input matrix A in the sense that the residual 00165 * ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small. 00166 * If JOBU .EQ. 'N': 00167 * If INFO .EQ. 0 : 00168 * Note that the left singular vectors are 'for free' in the 00169 * one-sided Jacobi SVD algorithm. However, if only the 00170 * singular values are needed, the level of numerical 00171 * orthogonality of U is not an issue and iterations are 00172 * stopped when the columns of the iterated matrix are 00173 * numerically orthogonal up to approximately M*EPS. Thus, 00174 * on exit, A contains the columns of U scaled with the 00175 * corresponding singular values. 00176 * If INFO .GT. 0 : 00177 * the procedure SGESVJ did not converge in the given number 00178 * of iterations (sweeps). 00179 * 00180 * LDA (input) INTEGER 00181 * The leading dimension of the array A. LDA >= max(1,M). 00182 * 00183 * SVA (workspace/output) REAL array, dimension (N) 00184 * On exit, 00185 * If INFO .EQ. 0 : 00186 * depending on the value SCALE = WORK(1), we have: 00187 * If SCALE .EQ. ONE: 00188 * SVA(1:N) contains the computed singular values of A. 00189 * During the computation SVA contains the Euclidean column 00190 * norms of the iterated matrices in the array A. 00191 * If SCALE .NE. ONE: 00192 * The singular values of A are SCALE*SVA(1:N), and this 00193 * factored representation is due to the fact that some of the 00194 * singular values of A might underflow or overflow. 00195 * 00196 * If INFO .GT. 0 : 00197 * the procedure SGESVJ did not converge in the given number of 00198 * iterations (sweeps) and SCALE*SVA(1:N) may not be accurate. 00199 * 00200 * MV (input) INTEGER 00201 * If JOBV .EQ. 'A', then the product of Jacobi rotations in SGESVJ 00202 * is applied to the first MV rows of V. See the description of JOBV. 00203 * 00204 * V (input/output) REAL array, dimension (LDV,N) 00205 * If JOBV = 'V', then V contains on exit the N-by-N matrix of 00206 * the right singular vectors; 00207 * If JOBV = 'A', then V contains the product of the computed right 00208 * singular vector matrix and the initial matrix in 00209 * the array V. 00210 * If JOBV = 'N', then V is not referenced. 00211 * 00212 * LDV (input) INTEGER 00213 * The leading dimension of the array V, LDV .GE. 1. 00214 * If JOBV .EQ. 'V', then LDV .GE. max(1,N). 00215 * If JOBV .EQ. 'A', then LDV .GE. max(1,MV) . 00216 * 00217 * WORK (input/workspace/output) REAL array, dimension max(4,M+N). 00218 * On entry, 00219 * If JOBU .EQ. 'C' : 00220 * WORK(1) = CTOL, where CTOL defines the threshold for convergence. 00221 * The process stops if all columns of A are mutually 00222 * orthogonal up to CTOL*EPS, EPS=SLAMCH('E'). 00223 * It is required that CTOL >= ONE, i.e. it is not 00224 * allowed to force the routine to obtain orthogonality 00225 * below EPSILON. 00226 * On exit, 00227 * WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N) 00228 * are the computed singular vcalues of A. 00229 * (See description of SVA().) 00230 * WORK(2) = NINT(WORK(2)) is the number of the computed nonzero 00231 * singular values. 00232 * WORK(3) = NINT(WORK(3)) is the number of the computed singular 00233 * values that are larger than the underflow threshold. 00234 * WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi 00235 * rotations needed for numerical convergence. 00236 * WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep. 00237 * This is useful information in cases when SGESVJ did 00238 * not converge, as it can be used to estimate whether 00239 * the output is stil useful and for post festum analysis. 00240 * WORK(6) = the largest absolute value over all sines of the 00241 * Jacobi rotation angles in the last sweep. It can be 00242 * useful for a post festum analysis. 00243 * 00244 * LWORK length of WORK, WORK >= MAX(6,M+N) 00245 * 00246 * INFO (output) INTEGER 00247 * = 0 : successful exit. 00248 * < 0 : if INFO = -i, then the i-th argument had an illegal value 00249 * > 0 : SGESVJ did not converge in the maximal allowed number (30) 00250 * of sweeps. The output may still be useful. See the 00251 * description of WORK. 00252 * ===================================================================== 00253 * 00254 * .. Local Parameters .. 00255 REAL ZERO, HALF, ONE, TWO 00256 PARAMETER ( ZERO = 0.0E0, HALF = 0.5E0, ONE = 1.0E0, 00257 + TWO = 2.0E0 ) 00258 INTEGER NSWEEP 00259 PARAMETER ( NSWEEP = 30 ) 00260 * .. 00261 * .. Local Scalars .. 00262 REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG, 00263 + BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ, 00264 + MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL, 00265 + SKL, SFMIN, SMALL, SN, T, TEMP1, THETA, 00266 + THSIGN, TOL 00267 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1, 00268 + ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34, 00269 + N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP, 00270 + SWBAND 00271 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK, 00272 + RSVEC, UCTOL, UPPER 00273 * .. 00274 * .. Local Arrays .. 00275 REAL FASTR( 5 ) 00276 * .. 00277 * .. Intrinsic Functions .. 00278 INTRINSIC ABS, AMAX1, AMIN1, FLOAT, MIN0, SIGN, SQRT 00279 * .. 00280 * .. External Functions .. 00281 * from BLAS 00282 REAL SDOT, SNRM2 00283 EXTERNAL SDOT, SNRM2 00284 INTEGER ISAMAX 00285 EXTERNAL ISAMAX 00286 * from LAPACK 00287 REAL SLAMCH 00288 EXTERNAL SLAMCH 00289 LOGICAL LSAME 00290 EXTERNAL LSAME 00291 * .. 00292 * .. External Subroutines .. 00293 * from BLAS 00294 EXTERNAL SAXPY, SCOPY, SROTM, SSCAL, SSWAP 00295 * from LAPACK 00296 EXTERNAL SLASCL, SLASET, SLASSQ, XERBLA 00297 * 00298 EXTERNAL SGSVJ0, SGSVJ1 00299 * .. 00300 * .. Executable Statements .. 00301 * 00302 * Test the input arguments 00303 * 00304 LSVEC = LSAME( JOBU, 'U' ) 00305 UCTOL = LSAME( JOBU, 'C' ) 00306 RSVEC = LSAME( JOBV, 'V' ) 00307 APPLV = LSAME( JOBV, 'A' ) 00308 UPPER = LSAME( JOBA, 'U' ) 00309 LOWER = LSAME( JOBA, 'L' ) 00310 * 00311 IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN 00312 INFO = -1 00313 ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN 00314 INFO = -2 00315 ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN 00316 INFO = -3 00317 ELSE IF( M.LT.0 ) THEN 00318 INFO = -4 00319 ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN 00320 INFO = -5 00321 ELSE IF( LDA.LT.M ) THEN 00322 INFO = -7 00323 ELSE IF( MV.LT.0 ) THEN 00324 INFO = -9 00325 ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR. 00326 + ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN 00327 INFO = -11 00328 ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN 00329 INFO = -12 00330 ELSE IF( LWORK.LT.MAX0( M+N, 6 ) ) THEN 00331 INFO = -13 00332 ELSE 00333 INFO = 0 00334 END IF 00335 * 00336 * #:( 00337 IF( INFO.NE.0 ) THEN 00338 CALL XERBLA( 'SGESVJ', -INFO ) 00339 RETURN 00340 END IF 00341 * 00342 * #:) Quick return for void matrix 00343 * 00344 IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN 00345 * 00346 * Set numerical parameters 00347 * The stopping criterion for Jacobi rotations is 00348 * 00349 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS 00350 * 00351 * where EPS is the round-off and CTOL is defined as follows: 00352 * 00353 IF( UCTOL ) THEN 00354 * ... user controlled 00355 CTOL = WORK( 1 ) 00356 ELSE 00357 * ... default 00358 IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN 00359 CTOL = SQRT( FLOAT( M ) ) 00360 ELSE 00361 CTOL = FLOAT( M ) 00362 END IF 00363 END IF 00364 * ... and the machine dependent parameters are 00365 *[!] (Make sure that SLAMCH() works properly on the target machine.) 00366 * 00367 EPSLN = SLAMCH( 'Epsilon' ) 00368 ROOTEPS = SQRT( EPSLN ) 00369 SFMIN = SLAMCH( 'SafeMinimum' ) 00370 ROOTSFMIN = SQRT( SFMIN ) 00371 SMALL = SFMIN / EPSLN 00372 BIG = SLAMCH( 'Overflow' ) 00373 ROOTBIG = ONE / ROOTSFMIN 00374 LARGE = BIG / SQRT( FLOAT( M*N ) ) 00375 BIGTHETA = ONE / ROOTEPS 00376 * 00377 TOL = CTOL*EPSLN 00378 ROOTTOL = SQRT( TOL ) 00379 * 00380 IF( FLOAT( M )*EPSLN.GE.ONE ) THEN 00381 INFO = -5 00382 CALL XERBLA( 'SGESVJ', -INFO ) 00383 RETURN 00384 END IF 00385 * 00386 * Initialize the right singular vector matrix. 00387 * 00388 IF( RSVEC ) THEN 00389 MVL = N 00390 CALL SLASET( 'A', MVL, N, ZERO, ONE, V, LDV ) 00391 ELSE IF( APPLV ) THEN 00392 MVL = MV 00393 END IF 00394 RSVEC = RSVEC .OR. APPLV 00395 * 00396 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N ) 00397 *(!) If necessary, scale A to protect the largest singular value 00398 * from overflow. It is possible that saving the largest singular 00399 * value destroys the information about the small ones. 00400 * This initial scaling is almost minimal in the sense that the 00401 * goal is to make sure that no column norm overflows, and that 00402 * SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries 00403 * in A are detected, the procedure returns with INFO=-6. 00404 * 00405 SKL = ONE / SQRT( FLOAT( M )*FLOAT( N ) ) 00406 NOSCALE = .TRUE. 00407 GOSCALE = .TRUE. 00408 * 00409 IF( LOWER ) THEN 00410 * the input matrix is M-by-N lower triangular (trapezoidal) 00411 DO 1874 p = 1, N 00412 AAPP = ZERO 00413 AAQQ = ONE 00414 CALL SLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ ) 00415 IF( AAPP.GT.BIG ) THEN 00416 INFO = -6 00417 CALL XERBLA( 'SGESVJ', -INFO ) 00418 RETURN 00419 END IF 00420 AAQQ = SQRT( AAQQ ) 00421 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 00422 SVA( p ) = AAPP*AAQQ 00423 ELSE 00424 NOSCALE = .FALSE. 00425 SVA( p ) = AAPP*( AAQQ*SKL ) 00426 IF( GOSCALE ) THEN 00427 GOSCALE = .FALSE. 00428 DO 1873 q = 1, p - 1 00429 SVA( q ) = SVA( q )*SKL 00430 1873 CONTINUE 00431 END IF 00432 END IF 00433 1874 CONTINUE 00434 ELSE IF( UPPER ) THEN 00435 * the input matrix is M-by-N upper triangular (trapezoidal) 00436 DO 2874 p = 1, N 00437 AAPP = ZERO 00438 AAQQ = ONE 00439 CALL SLASSQ( p, A( 1, p ), 1, AAPP, AAQQ ) 00440 IF( AAPP.GT.BIG ) THEN 00441 INFO = -6 00442 CALL XERBLA( 'SGESVJ', -INFO ) 00443 RETURN 00444 END IF 00445 AAQQ = SQRT( AAQQ ) 00446 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 00447 SVA( p ) = AAPP*AAQQ 00448 ELSE 00449 NOSCALE = .FALSE. 00450 SVA( p ) = AAPP*( AAQQ*SKL ) 00451 IF( GOSCALE ) THEN 00452 GOSCALE = .FALSE. 00453 DO 2873 q = 1, p - 1 00454 SVA( q ) = SVA( q )*SKL 00455 2873 CONTINUE 00456 END IF 00457 END IF 00458 2874 CONTINUE 00459 ELSE 00460 * the input matrix is M-by-N general dense 00461 DO 3874 p = 1, N 00462 AAPP = ZERO 00463 AAQQ = ONE 00464 CALL SLASSQ( M, A( 1, p ), 1, AAPP, AAQQ ) 00465 IF( AAPP.GT.BIG ) THEN 00466 INFO = -6 00467 CALL XERBLA( 'SGESVJ', -INFO ) 00468 RETURN 00469 END IF 00470 AAQQ = SQRT( AAQQ ) 00471 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 00472 SVA( p ) = AAPP*AAQQ 00473 ELSE 00474 NOSCALE = .FALSE. 00475 SVA( p ) = AAPP*( AAQQ*SKL ) 00476 IF( GOSCALE ) THEN 00477 GOSCALE = .FALSE. 00478 DO 3873 q = 1, p - 1 00479 SVA( q ) = SVA( q )*SKL 00480 3873 CONTINUE 00481 END IF 00482 END IF 00483 3874 CONTINUE 00484 END IF 00485 * 00486 IF( NOSCALE )SKL = ONE 00487 * 00488 * Move the smaller part of the spectrum from the underflow threshold 00489 *(!) Start by determining the position of the nonzero entries of the 00490 * array SVA() relative to ( SFMIN, BIG ). 00491 * 00492 AAPP = ZERO 00493 AAQQ = BIG 00494 DO 4781 p = 1, N 00495 IF( SVA( p ).NE.ZERO )AAQQ = AMIN1( AAQQ, SVA( p ) ) 00496 AAPP = AMAX1( AAPP, SVA( p ) ) 00497 4781 CONTINUE 00498 * 00499 * #:) Quick return for zero matrix 00500 * 00501 IF( AAPP.EQ.ZERO ) THEN 00502 IF( LSVEC )CALL SLASET( 'G', M, N, ZERO, ONE, A, LDA ) 00503 WORK( 1 ) = ONE 00504 WORK( 2 ) = ZERO 00505 WORK( 3 ) = ZERO 00506 WORK( 4 ) = ZERO 00507 WORK( 5 ) = ZERO 00508 WORK( 6 ) = ZERO 00509 RETURN 00510 END IF 00511 * 00512 * #:) Quick return for one-column matrix 00513 * 00514 IF( N.EQ.1 ) THEN 00515 IF( LSVEC )CALL SLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1, 00516 + A( 1, 1 ), LDA, IERR ) 00517 WORK( 1 ) = ONE / SKL 00518 IF( SVA( 1 ).GE.SFMIN ) THEN 00519 WORK( 2 ) = ONE 00520 ELSE 00521 WORK( 2 ) = ZERO 00522 END IF 00523 WORK( 3 ) = ZERO 00524 WORK( 4 ) = ZERO 00525 WORK( 5 ) = ZERO 00526 WORK( 6 ) = ZERO 00527 RETURN 00528 END IF 00529 * 00530 * Protect small singular values from underflow, and try to 00531 * avoid underflows/overflows in computing Jacobi rotations. 00532 * 00533 SN = SQRT( SFMIN / EPSLN ) 00534 TEMP1 = SQRT( BIG / FLOAT( N ) ) 00535 IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR. 00536 + ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN 00537 TEMP1 = AMIN1( BIG, TEMP1 / AAPP ) 00538 * AAQQ = AAQQ*TEMP1 00539 * AAPP = AAPP*TEMP1 00540 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN 00541 TEMP1 = AMIN1( SN / AAQQ, BIG / ( AAPP*SQRT( FLOAT( N ) ) ) ) 00542 * AAQQ = AAQQ*TEMP1 00543 * AAPP = AAPP*TEMP1 00544 ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 00545 TEMP1 = AMAX1( SN / AAQQ, TEMP1 / AAPP ) 00546 * AAQQ = AAQQ*TEMP1 00547 * AAPP = AAPP*TEMP1 00548 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 00549 TEMP1 = AMIN1( SN / AAQQ, BIG / ( SQRT( FLOAT( N ) )*AAPP ) ) 00550 * AAQQ = AAQQ*TEMP1 00551 * AAPP = AAPP*TEMP1 00552 ELSE 00553 TEMP1 = ONE 00554 END IF 00555 * 00556 * Scale, if necessary 00557 * 00558 IF( TEMP1.NE.ONE ) THEN 00559 CALL SLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR ) 00560 END IF 00561 SKL = TEMP1*SKL 00562 IF( SKL.NE.ONE ) THEN 00563 CALL SLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR ) 00564 SKL = ONE / SKL 00565 END IF 00566 * 00567 * Row-cyclic Jacobi SVD algorithm with column pivoting 00568 * 00569 EMPTSW = ( N*( N-1 ) ) / 2 00570 NOTROT = 0 00571 FASTR( 1 ) = ZERO 00572 * 00573 * A is represented in factored form A = A * diag(WORK), where diag(WORK) 00574 * is initialized to identity. WORK is updated during fast scaled 00575 * rotations. 00576 * 00577 DO 1868 q = 1, N 00578 WORK( q ) = ONE 00579 1868 CONTINUE 00580 * 00581 * 00582 SWBAND = 3 00583 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective 00584 * if SGESVJ is used as a computational routine in the preconditioned 00585 * Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure 00586 * works on pivots inside a band-like region around the diagonal. 00587 * The boundaries are determined dynamically, based on the number of 00588 * pivots above a threshold. 00589 * 00590 KBL = MIN0( 8, N ) 00591 *[TP] KBL is a tuning parameter that defines the tile size in the 00592 * tiling of the p-q loops of pivot pairs. In general, an optimal 00593 * value of KBL depends on the matrix dimensions and on the 00594 * parameters of the computer's memory. 00595 * 00596 NBL = N / KBL 00597 IF( ( NBL*KBL ).NE.N )NBL = NBL + 1 00598 * 00599 BLSKIP = KBL**2 00600 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL. 00601 * 00602 ROWSKIP = MIN0( 5, KBL ) 00603 *[TP] ROWSKIP is a tuning parameter. 00604 * 00605 LKAHEAD = 1 00606 *[TP] LKAHEAD is a tuning parameter. 00607 * 00608 * Quasi block transformations, using the lower (upper) triangular 00609 * structure of the input matrix. The quasi-block-cycling usually 00610 * invokes cubic convergence. Big part of this cycle is done inside 00611 * canonical subspaces of dimensions less than M. 00612 * 00613 IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX0( 64, 4*KBL ) ) ) THEN 00614 *[TP] The number of partition levels and the actual partition are 00615 * tuning parameters. 00616 N4 = N / 4 00617 N2 = N / 2 00618 N34 = 3*N4 00619 IF( APPLV ) THEN 00620 q = 0 00621 ELSE 00622 q = 1 00623 END IF 00624 * 00625 IF( LOWER ) THEN 00626 * 00627 * This works very well on lower triangular matrices, in particular 00628 * in the framework of the preconditioned Jacobi SVD (xGEJSV). 00629 * The idea is simple: 00630 * [+ 0 0 0] Note that Jacobi transformations of [0 0] 00631 * [+ + 0 0] [0 0] 00632 * [+ + x 0] actually work on [x 0] [x 0] 00633 * [+ + x x] [x x]. [x x] 00634 * 00635 CALL SGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA, 00636 + WORK( N34+1 ), SVA( N34+1 ), MVL, 00637 + V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL, 00638 + 2, WORK( N+1 ), LWORK-N, IERR ) 00639 * 00640 CALL SGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA, 00641 + WORK( N2+1 ), SVA( N2+1 ), MVL, 00642 + V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2, 00643 + WORK( N+1 ), LWORK-N, IERR ) 00644 * 00645 CALL SGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA, 00646 + WORK( N2+1 ), SVA( N2+1 ), MVL, 00647 + V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 00648 + WORK( N+1 ), LWORK-N, IERR ) 00649 * 00650 CALL SGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA, 00651 + WORK( N4+1 ), SVA( N4+1 ), MVL, 00652 + V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1, 00653 + WORK( N+1 ), LWORK-N, IERR ) 00654 * 00655 CALL SGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV, 00656 + EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 00657 + IERR ) 00658 * 00659 CALL SGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V, 00660 + LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 00661 + LWORK-N, IERR ) 00662 * 00663 * 00664 ELSE IF( UPPER ) THEN 00665 * 00666 * 00667 CALL SGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV, 00668 + EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N, 00669 + IERR ) 00670 * 00671 CALL SGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ), 00672 + SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV, 00673 + EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 00674 + IERR ) 00675 * 00676 CALL SGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V, 00677 + LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 00678 + LWORK-N, IERR ) 00679 * 00680 CALL SGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA, 00681 + WORK( N2+1 ), SVA( N2+1 ), MVL, 00682 + V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 00683 + WORK( N+1 ), LWORK-N, IERR ) 00684 00685 END IF 00686 * 00687 END IF 00688 * 00689 * .. Row-cyclic pivot strategy with de Rijk's pivoting .. 00690 * 00691 DO 1993 i = 1, NSWEEP 00692 * .. go go go ... 00693 * 00694 MXAAPQ = ZERO 00695 MXSINJ = ZERO 00696 ISWROT = 0 00697 * 00698 NOTROT = 0 00699 PSKIPPED = 0 00700 * 00701 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs 00702 * 1 <= p < q <= N. This is the first step toward a blocked implementation 00703 * of the rotations. New implementation, based on block transformations, 00704 * is under development. 00705 * 00706 DO 2000 ibr = 1, NBL 00707 * 00708 igl = ( ibr-1 )*KBL + 1 00709 * 00710 DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL-ibr ) 00711 * 00712 igl = igl + ir1*KBL 00713 * 00714 DO 2001 p = igl, MIN0( igl+KBL-1, N-1 ) 00715 * 00716 * .. de Rijk's pivoting 00717 * 00718 q = ISAMAX( N-p+1, SVA( p ), 1 ) + p - 1 00719 IF( p.NE.q ) THEN 00720 CALL SSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 00721 IF( RSVEC )CALL SSWAP( MVL, V( 1, p ), 1, 00722 + V( 1, q ), 1 ) 00723 TEMP1 = SVA( p ) 00724 SVA( p ) = SVA( q ) 00725 SVA( q ) = TEMP1 00726 TEMP1 = WORK( p ) 00727 WORK( p ) = WORK( q ) 00728 WORK( q ) = TEMP1 00729 END IF 00730 * 00731 IF( ir1.EQ.0 ) THEN 00732 * 00733 * Column norms are periodically updated by explicit 00734 * norm computation. 00735 * Caveat: 00736 * Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1) 00737 * as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to 00738 * overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to 00739 * underflow for ||A(:,p)||_2 < SQRT(underflow_threshold). 00740 * Hence, SNRM2 cannot be trusted, not even in the case when 00741 * the true norm is far from the under(over)flow boundaries. 00742 * If properly implemented SNRM2 is available, the IF-THEN-ELSE 00743 * below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)". 00744 * 00745 IF( ( SVA( p ).LT.ROOTBIG ) .AND. 00746 + ( SVA( p ).GT.ROOTSFMIN ) ) THEN 00747 SVA( p ) = SNRM2( M, A( 1, p ), 1 )*WORK( p ) 00748 ELSE 00749 TEMP1 = ZERO 00750 AAPP = ONE 00751 CALL SLASSQ( M, A( 1, p ), 1, TEMP1, AAPP ) 00752 SVA( p ) = TEMP1*SQRT( AAPP )*WORK( p ) 00753 END IF 00754 AAPP = SVA( p ) 00755 ELSE 00756 AAPP = SVA( p ) 00757 END IF 00758 * 00759 IF( AAPP.GT.ZERO ) THEN 00760 * 00761 PSKIPPED = 0 00762 * 00763 DO 2002 q = p + 1, MIN0( igl+KBL-1, N ) 00764 * 00765 AAQQ = SVA( q ) 00766 * 00767 IF( AAQQ.GT.ZERO ) THEN 00768 * 00769 AAPP0 = AAPP 00770 IF( AAQQ.GE.ONE ) THEN 00771 ROTOK = ( SMALL*AAPP ).LE.AAQQ 00772 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 00773 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 00774 + q ), 1 )*WORK( p )*WORK( q ) / 00775 + AAQQ ) / AAPP 00776 ELSE 00777 CALL SCOPY( M, A( 1, p ), 1, 00778 + WORK( N+1 ), 1 ) 00779 CALL SLASCL( 'G', 0, 0, AAPP, 00780 + WORK( p ), M, 1, 00781 + WORK( N+1 ), LDA, IERR ) 00782 AAPQ = SDOT( M, WORK( N+1 ), 1, 00783 + A( 1, q ), 1 )*WORK( q ) / AAQQ 00784 END IF 00785 ELSE 00786 ROTOK = AAPP.LE.( AAQQ / SMALL ) 00787 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 00788 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 00789 + q ), 1 )*WORK( p )*WORK( q ) / 00790 + AAQQ ) / AAPP 00791 ELSE 00792 CALL SCOPY( M, A( 1, q ), 1, 00793 + WORK( N+1 ), 1 ) 00794 CALL SLASCL( 'G', 0, 0, AAQQ, 00795 + WORK( q ), M, 1, 00796 + WORK( N+1 ), LDA, IERR ) 00797 AAPQ = SDOT( M, WORK( N+1 ), 1, 00798 + A( 1, p ), 1 )*WORK( p ) / AAPP 00799 END IF 00800 END IF 00801 * 00802 MXAAPQ = AMAX1( MXAAPQ, ABS( AAPQ ) ) 00803 * 00804 * TO rotate or NOT to rotate, THAT is the question ... 00805 * 00806 IF( ABS( AAPQ ).GT.TOL ) THEN 00807 * 00808 * .. rotate 00809 *[RTD] ROTATED = ROTATED + ONE 00810 * 00811 IF( ir1.EQ.0 ) THEN 00812 NOTROT = 0 00813 PSKIPPED = 0 00814 ISWROT = ISWROT + 1 00815 END IF 00816 * 00817 IF( ROTOK ) THEN 00818 * 00819 AQOAP = AAQQ / AAPP 00820 APOAQ = AAPP / AAQQ 00821 THETA = -HALF*ABS( AQOAP-APOAQ ) / AAPQ 00822 * 00823 IF( ABS( THETA ).GT.BIGTHETA ) THEN 00824 * 00825 T = HALF / THETA 00826 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 00827 FASTR( 4 ) = -T*WORK( q ) / 00828 + WORK( p ) 00829 CALL SROTM( M, A( 1, p ), 1, 00830 + A( 1, q ), 1, FASTR ) 00831 IF( RSVEC )CALL SROTM( MVL, 00832 + V( 1, p ), 1, 00833 + V( 1, q ), 1, 00834 + FASTR ) 00835 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 00836 + ONE+T*APOAQ*AAPQ ) ) 00837 AAPP = AAPP*SQRT( AMAX1( ZERO, 00838 + ONE-T*AQOAP*AAPQ ) ) 00839 MXSINJ = AMAX1( MXSINJ, ABS( T ) ) 00840 * 00841 ELSE 00842 * 00843 * .. choose correct signum for THETA and rotate 00844 * 00845 THSIGN = -SIGN( ONE, AAPQ ) 00846 T = ONE / ( THETA+THSIGN* 00847 + SQRT( ONE+THETA*THETA ) ) 00848 CS = SQRT( ONE / ( ONE+T*T ) ) 00849 SN = T*CS 00850 * 00851 MXSINJ = AMAX1( MXSINJ, ABS( SN ) ) 00852 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 00853 + ONE+T*APOAQ*AAPQ ) ) 00854 AAPP = AAPP*SQRT( AMAX1( ZERO, 00855 + ONE-T*AQOAP*AAPQ ) ) 00856 * 00857 APOAQ = WORK( p ) / WORK( q ) 00858 AQOAP = WORK( q ) / WORK( p ) 00859 IF( WORK( p ).GE.ONE ) THEN 00860 IF( WORK( q ).GE.ONE ) THEN 00861 FASTR( 3 ) = T*APOAQ 00862 FASTR( 4 ) = -T*AQOAP 00863 WORK( p ) = WORK( p )*CS 00864 WORK( q ) = WORK( q )*CS 00865 CALL SROTM( M, A( 1, p ), 1, 00866 + A( 1, q ), 1, 00867 + FASTR ) 00868 IF( RSVEC )CALL SROTM( MVL, 00869 + V( 1, p ), 1, V( 1, q ), 00870 + 1, FASTR ) 00871 ELSE 00872 CALL SAXPY( M, -T*AQOAP, 00873 + A( 1, q ), 1, 00874 + A( 1, p ), 1 ) 00875 CALL SAXPY( M, CS*SN*APOAQ, 00876 + A( 1, p ), 1, 00877 + A( 1, q ), 1 ) 00878 WORK( p ) = WORK( p )*CS 00879 WORK( q ) = WORK( q ) / CS 00880 IF( RSVEC ) THEN 00881 CALL SAXPY( MVL, -T*AQOAP, 00882 + V( 1, q ), 1, 00883 + V( 1, p ), 1 ) 00884 CALL SAXPY( MVL, 00885 + CS*SN*APOAQ, 00886 + V( 1, p ), 1, 00887 + V( 1, q ), 1 ) 00888 END IF 00889 END IF 00890 ELSE 00891 IF( WORK( q ).GE.ONE ) THEN 00892 CALL SAXPY( M, T*APOAQ, 00893 + A( 1, p ), 1, 00894 + A( 1, q ), 1 ) 00895 CALL SAXPY( M, -CS*SN*AQOAP, 00896 + A( 1, q ), 1, 00897 + A( 1, p ), 1 ) 00898 WORK( p ) = WORK( p ) / CS 00899 WORK( q ) = WORK( q )*CS 00900 IF( RSVEC ) THEN 00901 CALL SAXPY( MVL, T*APOAQ, 00902 + V( 1, p ), 1, 00903 + V( 1, q ), 1 ) 00904 CALL SAXPY( MVL, 00905 + -CS*SN*AQOAP, 00906 + V( 1, q ), 1, 00907 + V( 1, p ), 1 ) 00908 END IF 00909 ELSE 00910 IF( WORK( p ).GE.WORK( q ) ) 00911 + THEN 00912 CALL SAXPY( M, -T*AQOAP, 00913 + A( 1, q ), 1, 00914 + A( 1, p ), 1 ) 00915 CALL SAXPY( M, CS*SN*APOAQ, 00916 + A( 1, p ), 1, 00917 + A( 1, q ), 1 ) 00918 WORK( p ) = WORK( p )*CS 00919 WORK( q ) = WORK( q ) / CS 00920 IF( RSVEC ) THEN 00921 CALL SAXPY( MVL, 00922 + -T*AQOAP, 00923 + V( 1, q ), 1, 00924 + V( 1, p ), 1 ) 00925 CALL SAXPY( MVL, 00926 + CS*SN*APOAQ, 00927 + V( 1, p ), 1, 00928 + V( 1, q ), 1 ) 00929 END IF 00930 ELSE 00931 CALL SAXPY( M, T*APOAQ, 00932 + A( 1, p ), 1, 00933 + A( 1, q ), 1 ) 00934 CALL SAXPY( M, 00935 + -CS*SN*AQOAP, 00936 + A( 1, q ), 1, 00937 + A( 1, p ), 1 ) 00938 WORK( p ) = WORK( p ) / CS 00939 WORK( q ) = WORK( q )*CS 00940 IF( RSVEC ) THEN 00941 CALL SAXPY( MVL, 00942 + T*APOAQ, V( 1, p ), 00943 + 1, V( 1, q ), 1 ) 00944 CALL SAXPY( MVL, 00945 + -CS*SN*AQOAP, 00946 + V( 1, q ), 1, 00947 + V( 1, p ), 1 ) 00948 END IF 00949 END IF 00950 END IF 00951 END IF 00952 END IF 00953 * 00954 ELSE 00955 * .. have to use modified Gram-Schmidt like transformation 00956 CALL SCOPY( M, A( 1, p ), 1, 00957 + WORK( N+1 ), 1 ) 00958 CALL SLASCL( 'G', 0, 0, AAPP, ONE, M, 00959 + 1, WORK( N+1 ), LDA, 00960 + IERR ) 00961 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, M, 00962 + 1, A( 1, q ), LDA, IERR ) 00963 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 00964 CALL SAXPY( M, TEMP1, WORK( N+1 ), 1, 00965 + A( 1, q ), 1 ) 00966 CALL SLASCL( 'G', 0, 0, ONE, AAQQ, M, 00967 + 1, A( 1, q ), LDA, IERR ) 00968 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 00969 + ONE-AAPQ*AAPQ ) ) 00970 MXSINJ = AMAX1( MXSINJ, SFMIN ) 00971 END IF 00972 * END IF ROTOK THEN ... ELSE 00973 * 00974 * In the case of cancellation in updating SVA(q), SVA(p) 00975 * recompute SVA(q), SVA(p). 00976 * 00977 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 00978 + THEN 00979 IF( ( AAQQ.LT.ROOTBIG ) .AND. 00980 + ( AAQQ.GT.ROOTSFMIN ) ) THEN 00981 SVA( q ) = SNRM2( M, A( 1, q ), 1 )* 00982 + WORK( q ) 00983 ELSE 00984 T = ZERO 00985 AAQQ = ONE 00986 CALL SLASSQ( M, A( 1, q ), 1, T, 00987 + AAQQ ) 00988 SVA( q ) = T*SQRT( AAQQ )*WORK( q ) 00989 END IF 00990 END IF 00991 IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN 00992 IF( ( AAPP.LT.ROOTBIG ) .AND. 00993 + ( AAPP.GT.ROOTSFMIN ) ) THEN 00994 AAPP = SNRM2( M, A( 1, p ), 1 )* 00995 + WORK( p ) 00996 ELSE 00997 T = ZERO 00998 AAPP = ONE 00999 CALL SLASSQ( M, A( 1, p ), 1, T, 01000 + AAPP ) 01001 AAPP = T*SQRT( AAPP )*WORK( p ) 01002 END IF 01003 SVA( p ) = AAPP 01004 END IF 01005 * 01006 ELSE 01007 * A(:,p) and A(:,q) already numerically orthogonal 01008 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 01009 *[RTD] SKIPPED = SKIPPED + 1 01010 PSKIPPED = PSKIPPED + 1 01011 END IF 01012 ELSE 01013 * A(:,q) is zero column 01014 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 01015 PSKIPPED = PSKIPPED + 1 01016 END IF 01017 * 01018 IF( ( i.LE.SWBAND ) .AND. 01019 + ( PSKIPPED.GT.ROWSKIP ) ) THEN 01020 IF( ir1.EQ.0 )AAPP = -AAPP 01021 NOTROT = 0 01022 GO TO 2103 01023 END IF 01024 * 01025 2002 CONTINUE 01026 * END q-LOOP 01027 * 01028 2103 CONTINUE 01029 * bailed out of q-loop 01030 * 01031 SVA( p ) = AAPP 01032 * 01033 ELSE 01034 SVA( p ) = AAPP 01035 IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) ) 01036 + NOTROT = NOTROT + MIN0( igl+KBL-1, N ) - p 01037 END IF 01038 * 01039 2001 CONTINUE 01040 * end of the p-loop 01041 * end of doing the block ( ibr, ibr ) 01042 1002 CONTINUE 01043 * end of ir1-loop 01044 * 01045 * ... go to the off diagonal blocks 01046 * 01047 igl = ( ibr-1 )*KBL + 1 01048 * 01049 DO 2010 jbc = ibr + 1, NBL 01050 * 01051 jgl = ( jbc-1 )*KBL + 1 01052 * 01053 * doing the block at ( ibr, jbc ) 01054 * 01055 IJBLSK = 0 01056 DO 2100 p = igl, MIN0( igl+KBL-1, N ) 01057 * 01058 AAPP = SVA( p ) 01059 IF( AAPP.GT.ZERO ) THEN 01060 * 01061 PSKIPPED = 0 01062 * 01063 DO 2200 q = jgl, MIN0( jgl+KBL-1, N ) 01064 * 01065 AAQQ = SVA( q ) 01066 IF( AAQQ.GT.ZERO ) THEN 01067 AAPP0 = AAPP 01068 * 01069 * .. M x 2 Jacobi SVD .. 01070 * 01071 * Safe Gram matrix computation 01072 * 01073 IF( AAQQ.GE.ONE ) THEN 01074 IF( AAPP.GE.AAQQ ) THEN 01075 ROTOK = ( SMALL*AAPP ).LE.AAQQ 01076 ELSE 01077 ROTOK = ( SMALL*AAQQ ).LE.AAPP 01078 END IF 01079 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 01080 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 01081 + q ), 1 )*WORK( p )*WORK( q ) / 01082 + AAQQ ) / AAPP 01083 ELSE 01084 CALL SCOPY( M, A( 1, p ), 1, 01085 + WORK( N+1 ), 1 ) 01086 CALL SLASCL( 'G', 0, 0, AAPP, 01087 + WORK( p ), M, 1, 01088 + WORK( N+1 ), LDA, IERR ) 01089 AAPQ = SDOT( M, WORK( N+1 ), 1, 01090 + A( 1, q ), 1 )*WORK( q ) / AAQQ 01091 END IF 01092 ELSE 01093 IF( AAPP.GE.AAQQ ) THEN 01094 ROTOK = AAPP.LE.( AAQQ / SMALL ) 01095 ELSE 01096 ROTOK = AAQQ.LE.( AAPP / SMALL ) 01097 END IF 01098 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 01099 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 01100 + q ), 1 )*WORK( p )*WORK( q ) / 01101 + AAQQ ) / AAPP 01102 ELSE 01103 CALL SCOPY( M, A( 1, q ), 1, 01104 + WORK( N+1 ), 1 ) 01105 CALL SLASCL( 'G', 0, 0, AAQQ, 01106 + WORK( q ), M, 1, 01107 + WORK( N+1 ), LDA, IERR ) 01108 AAPQ = SDOT( M, WORK( N+1 ), 1, 01109 + A( 1, p ), 1 )*WORK( p ) / AAPP 01110 END IF 01111 END IF 01112 * 01113 MXAAPQ = AMAX1( MXAAPQ, ABS( AAPQ ) ) 01114 * 01115 * TO rotate or NOT to rotate, THAT is the question ... 01116 * 01117 IF( ABS( AAPQ ).GT.TOL ) THEN 01118 NOTROT = 0 01119 *[RTD] ROTATED = ROTATED + 1 01120 PSKIPPED = 0 01121 ISWROT = ISWROT + 1 01122 * 01123 IF( ROTOK ) THEN 01124 * 01125 AQOAP = AAQQ / AAPP 01126 APOAQ = AAPP / AAQQ 01127 THETA = -HALF*ABS( AQOAP-APOAQ ) / AAPQ 01128 IF( AAQQ.GT.AAPP0 )THETA = -THETA 01129 * 01130 IF( ABS( THETA ).GT.BIGTHETA ) THEN 01131 T = HALF / THETA 01132 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 01133 FASTR( 4 ) = -T*WORK( q ) / 01134 + WORK( p ) 01135 CALL SROTM( M, A( 1, p ), 1, 01136 + A( 1, q ), 1, FASTR ) 01137 IF( RSVEC )CALL SROTM( MVL, 01138 + V( 1, p ), 1, 01139 + V( 1, q ), 1, 01140 + FASTR ) 01141 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 01142 + ONE+T*APOAQ*AAPQ ) ) 01143 AAPP = AAPP*SQRT( AMAX1( ZERO, 01144 + ONE-T*AQOAP*AAPQ ) ) 01145 MXSINJ = AMAX1( MXSINJ, ABS( T ) ) 01146 ELSE 01147 * 01148 * .. choose correct signum for THETA and rotate 01149 * 01150 THSIGN = -SIGN( ONE, AAPQ ) 01151 IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN 01152 T = ONE / ( THETA+THSIGN* 01153 + SQRT( ONE+THETA*THETA ) ) 01154 CS = SQRT( ONE / ( ONE+T*T ) ) 01155 SN = T*CS 01156 MXSINJ = AMAX1( MXSINJ, ABS( SN ) ) 01157 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 01158 + ONE+T*APOAQ*AAPQ ) ) 01159 AAPP = AAPP*SQRT( AMAX1( ZERO, 01160 + ONE-T*AQOAP*AAPQ ) ) 01161 * 01162 APOAQ = WORK( p ) / WORK( q ) 01163 AQOAP = WORK( q ) / WORK( p ) 01164 IF( WORK( p ).GE.ONE ) THEN 01165 * 01166 IF( WORK( q ).GE.ONE ) THEN 01167 FASTR( 3 ) = T*APOAQ 01168 FASTR( 4 ) = -T*AQOAP 01169 WORK( p ) = WORK( p )*CS 01170 WORK( q ) = WORK( q )*CS 01171 CALL SROTM( M, A( 1, p ), 1, 01172 + A( 1, q ), 1, 01173 + FASTR ) 01174 IF( RSVEC )CALL SROTM( MVL, 01175 + V( 1, p ), 1, V( 1, q ), 01176 + 1, FASTR ) 01177 ELSE 01178 CALL SAXPY( M, -T*AQOAP, 01179 + A( 1, q ), 1, 01180 + A( 1, p ), 1 ) 01181 CALL SAXPY( M, CS*SN*APOAQ, 01182 + A( 1, p ), 1, 01183 + A( 1, q ), 1 ) 01184 IF( RSVEC ) THEN 01185 CALL SAXPY( MVL, -T*AQOAP, 01186 + V( 1, q ), 1, 01187 + V( 1, p ), 1 ) 01188 CALL SAXPY( MVL, 01189 + CS*SN*APOAQ, 01190 + V( 1, p ), 1, 01191 + V( 1, q ), 1 ) 01192 END IF 01193 WORK( p ) = WORK( p )*CS 01194 WORK( q ) = WORK( q ) / CS 01195 END IF 01196 ELSE 01197 IF( WORK( q ).GE.ONE ) THEN 01198 CALL SAXPY( M, T*APOAQ, 01199 + A( 1, p ), 1, 01200 + A( 1, q ), 1 ) 01201 CALL SAXPY( M, -CS*SN*AQOAP, 01202 + A( 1, q ), 1, 01203 + A( 1, p ), 1 ) 01204 IF( RSVEC ) THEN 01205 CALL SAXPY( MVL, T*APOAQ, 01206 + V( 1, p ), 1, 01207 + V( 1, q ), 1 ) 01208 CALL SAXPY( MVL, 01209 + -CS*SN*AQOAP, 01210 + V( 1, q ), 1, 01211 + V( 1, p ), 1 ) 01212 END IF 01213 WORK( p ) = WORK( p ) / CS 01214 WORK( q ) = WORK( q )*CS 01215 ELSE 01216 IF( WORK( p ).GE.WORK( q ) ) 01217 + THEN 01218 CALL SAXPY( M, -T*AQOAP, 01219 + A( 1, q ), 1, 01220 + A( 1, p ), 1 ) 01221 CALL SAXPY( M, CS*SN*APOAQ, 01222 + A( 1, p ), 1, 01223 + A( 1, q ), 1 ) 01224 WORK( p ) = WORK( p )*CS 01225 WORK( q ) = WORK( q ) / CS 01226 IF( RSVEC ) THEN 01227 CALL SAXPY( MVL, 01228 + -T*AQOAP, 01229 + V( 1, q ), 1, 01230 + V( 1, p ), 1 ) 01231 CALL SAXPY( MVL, 01232 + CS*SN*APOAQ, 01233 + V( 1, p ), 1, 01234 + V( 1, q ), 1 ) 01235 END IF 01236 ELSE 01237 CALL SAXPY( M, T*APOAQ, 01238 + A( 1, p ), 1, 01239 + A( 1, q ), 1 ) 01240 CALL SAXPY( M, 01241 + -CS*SN*AQOAP, 01242 + A( 1, q ), 1, 01243 + A( 1, p ), 1 ) 01244 WORK( p ) = WORK( p ) / CS 01245 WORK( q ) = WORK( q )*CS 01246 IF( RSVEC ) THEN 01247 CALL SAXPY( MVL, 01248 + T*APOAQ, V( 1, p ), 01249 + 1, V( 1, q ), 1 ) 01250 CALL SAXPY( MVL, 01251 + -CS*SN*AQOAP, 01252 + V( 1, q ), 1, 01253 + V( 1, p ), 1 ) 01254 END IF 01255 END IF 01256 END IF 01257 END IF 01258 END IF 01259 * 01260 ELSE 01261 IF( AAPP.GT.AAQQ ) THEN 01262 CALL SCOPY( M, A( 1, p ), 1, 01263 + WORK( N+1 ), 1 ) 01264 CALL SLASCL( 'G', 0, 0, AAPP, ONE, 01265 + M, 1, WORK( N+1 ), LDA, 01266 + IERR ) 01267 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, 01268 + M, 1, A( 1, q ), LDA, 01269 + IERR ) 01270 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 01271 CALL SAXPY( M, TEMP1, WORK( N+1 ), 01272 + 1, A( 1, q ), 1 ) 01273 CALL SLASCL( 'G', 0, 0, ONE, AAQQ, 01274 + M, 1, A( 1, q ), LDA, 01275 + IERR ) 01276 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 01277 + ONE-AAPQ*AAPQ ) ) 01278 MXSINJ = AMAX1( MXSINJ, SFMIN ) 01279 ELSE 01280 CALL SCOPY( M, A( 1, q ), 1, 01281 + WORK( N+1 ), 1 ) 01282 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, 01283 + M, 1, WORK( N+1 ), LDA, 01284 + IERR ) 01285 CALL SLASCL( 'G', 0, 0, AAPP, ONE, 01286 + M, 1, A( 1, p ), LDA, 01287 + IERR ) 01288 TEMP1 = -AAPQ*WORK( q ) / WORK( p ) 01289 CALL SAXPY( M, TEMP1, WORK( N+1 ), 01290 + 1, A( 1, p ), 1 ) 01291 CALL SLASCL( 'G', 0, 0, ONE, AAPP, 01292 + M, 1, A( 1, p ), LDA, 01293 + IERR ) 01294 SVA( p ) = AAPP*SQRT( AMAX1( ZERO, 01295 + ONE-AAPQ*AAPQ ) ) 01296 MXSINJ = AMAX1( MXSINJ, SFMIN ) 01297 END IF 01298 END IF 01299 * END IF ROTOK THEN ... ELSE 01300 * 01301 * In the case of cancellation in updating SVA(q) 01302 * .. recompute SVA(q) 01303 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 01304 + THEN 01305 IF( ( AAQQ.LT.ROOTBIG ) .AND. 01306 + ( AAQQ.GT.ROOTSFMIN ) ) THEN 01307 SVA( q ) = SNRM2( M, A( 1, q ), 1 )* 01308 + WORK( q ) 01309 ELSE 01310 T = ZERO 01311 AAQQ = ONE 01312 CALL SLASSQ( M, A( 1, q ), 1, T, 01313 + AAQQ ) 01314 SVA( q ) = T*SQRT( AAQQ )*WORK( q ) 01315 END IF 01316 END IF 01317 IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN 01318 IF( ( AAPP.LT.ROOTBIG ) .AND. 01319 + ( AAPP.GT.ROOTSFMIN ) ) THEN 01320 AAPP = SNRM2( M, A( 1, p ), 1 )* 01321 + WORK( p ) 01322 ELSE 01323 T = ZERO 01324 AAPP = ONE 01325 CALL SLASSQ( M, A( 1, p ), 1, T, 01326 + AAPP ) 01327 AAPP = T*SQRT( AAPP )*WORK( p ) 01328 END IF 01329 SVA( p ) = AAPP 01330 END IF 01331 * end of OK rotation 01332 ELSE 01333 NOTROT = NOTROT + 1 01334 *[RTD] SKIPPED = SKIPPED + 1 01335 PSKIPPED = PSKIPPED + 1 01336 IJBLSK = IJBLSK + 1 01337 END IF 01338 ELSE 01339 NOTROT = NOTROT + 1 01340 PSKIPPED = PSKIPPED + 1 01341 IJBLSK = IJBLSK + 1 01342 END IF 01343 * 01344 IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) ) 01345 + THEN 01346 SVA( p ) = AAPP 01347 NOTROT = 0 01348 GO TO 2011 01349 END IF 01350 IF( ( i.LE.SWBAND ) .AND. 01351 + ( PSKIPPED.GT.ROWSKIP ) ) THEN 01352 AAPP = -AAPP 01353 NOTROT = 0 01354 GO TO 2203 01355 END IF 01356 * 01357 2200 CONTINUE 01358 * end of the q-loop 01359 2203 CONTINUE 01360 * 01361 SVA( p ) = AAPP 01362 * 01363 ELSE 01364 * 01365 IF( AAPP.EQ.ZERO )NOTROT = NOTROT + 01366 + MIN0( jgl+KBL-1, N ) - jgl + 1 01367 IF( AAPP.LT.ZERO )NOTROT = 0 01368 * 01369 END IF 01370 * 01371 2100 CONTINUE 01372 * end of the p-loop 01373 2010 CONTINUE 01374 * end of the jbc-loop 01375 2011 CONTINUE 01376 *2011 bailed out of the jbc-loop 01377 DO 2012 p = igl, MIN0( igl+KBL-1, N ) 01378 SVA( p ) = ABS( SVA( p ) ) 01379 2012 CONTINUE 01380 *** 01381 2000 CONTINUE 01382 *2000 :: end of the ibr-loop 01383 * 01384 * .. update SVA(N) 01385 IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) ) 01386 + THEN 01387 SVA( N ) = SNRM2( M, A( 1, N ), 1 )*WORK( N ) 01388 ELSE 01389 T = ZERO 01390 AAPP = ONE 01391 CALL SLASSQ( M, A( 1, N ), 1, T, AAPP ) 01392 SVA( N ) = T*SQRT( AAPP )*WORK( N ) 01393 END IF 01394 * 01395 * Additional steering devices 01396 * 01397 IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR. 01398 + ( ISWROT.LE.N ) ) )SWBAND = i 01399 * 01400 IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.SQRT( FLOAT( N ) )* 01401 + TOL ) .AND. ( FLOAT( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN 01402 GO TO 1994 01403 END IF 01404 * 01405 IF( NOTROT.GE.EMPTSW )GO TO 1994 01406 * 01407 1993 CONTINUE 01408 * end i=1:NSWEEP loop 01409 * 01410 * #:( Reaching this point means that the procedure has not converged. 01411 INFO = NSWEEP - 1 01412 GO TO 1995 01413 * 01414 1994 CONTINUE 01415 * #:) Reaching this point means numerical convergence after the i-th 01416 * sweep. 01417 * 01418 INFO = 0 01419 * #:) INFO = 0 confirms successful iterations. 01420 1995 CONTINUE 01421 * 01422 * Sort the singular values and find how many are above 01423 * the underflow threshold. 01424 * 01425 N2 = 0 01426 N4 = 0 01427 DO 5991 p = 1, N - 1 01428 q = ISAMAX( N-p+1, SVA( p ), 1 ) + p - 1 01429 IF( p.NE.q ) THEN 01430 TEMP1 = SVA( p ) 01431 SVA( p ) = SVA( q ) 01432 SVA( q ) = TEMP1 01433 TEMP1 = WORK( p ) 01434 WORK( p ) = WORK( q ) 01435 WORK( q ) = TEMP1 01436 CALL SSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 01437 IF( RSVEC )CALL SSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 ) 01438 END IF 01439 IF( SVA( p ).NE.ZERO ) THEN 01440 N4 = N4 + 1 01441 IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1 01442 END IF 01443 5991 CONTINUE 01444 IF( SVA( N ).NE.ZERO ) THEN 01445 N4 = N4 + 1 01446 IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1 01447 END IF 01448 * 01449 * Normalize the left singular vectors. 01450 * 01451 IF( LSVEC .OR. UCTOL ) THEN 01452 DO 1998 p = 1, N2 01453 CALL SSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 ) 01454 1998 CONTINUE 01455 END IF 01456 * 01457 * Scale the product of Jacobi rotations (assemble the fast rotations). 01458 * 01459 IF( RSVEC ) THEN 01460 IF( APPLV ) THEN 01461 DO 2398 p = 1, N 01462 CALL SSCAL( MVL, WORK( p ), V( 1, p ), 1 ) 01463 2398 CONTINUE 01464 ELSE 01465 DO 2399 p = 1, N 01466 TEMP1 = ONE / SNRM2( MVL, V( 1, p ), 1 ) 01467 CALL SSCAL( MVL, TEMP1, V( 1, p ), 1 ) 01468 2399 CONTINUE 01469 END IF 01470 END IF 01471 * 01472 * Undo scaling, if necessary (and possible). 01473 IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG / 01474 + SKL ) ) ) .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( N2 ).GT. 01475 + ( SFMIN / SKL ) ) ) ) THEN 01476 DO 2400 p = 1, N 01477 SVA( p ) = SKL*SVA( p ) 01478 2400 CONTINUE 01479 SKL = ONE 01480 END IF 01481 * 01482 WORK( 1 ) = SKL 01483 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE 01484 * then some of the singular values may overflow or underflow and 01485 * the spectrum is given in this factored representation. 01486 * 01487 WORK( 2 ) = FLOAT( N4 ) 01488 * N4 is the number of computed nonzero singular values of A. 01489 * 01490 WORK( 3 ) = FLOAT( N2 ) 01491 * N2 is the number of singular values of A greater than SFMIN. 01492 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers 01493 * that may carry some information. 01494 * 01495 WORK( 4 ) = FLOAT( i ) 01496 * i is the index of the last sweep before declaring convergence. 01497 * 01498 WORK( 5 ) = MXAAPQ 01499 * MXAAPQ is the largest absolute value of scaled pivots in the 01500 * last sweep 01501 * 01502 WORK( 6 ) = MXSINJ 01503 * MXSINJ is the largest absolute value of the sines of Jacobi angles 01504 * in the last sweep 01505 * 01506 RETURN 01507 * .. 01508 * .. END OF SGESVJ 01509 * .. 01510 END