LAPACK 3.3.1
Linear Algebra PACKage
|
00001 SUBROUTINE SGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 00002 $ LDV, WORK, LWORK, INFO ) 00003 * 00004 * -- LAPACK routine (version 3.3.1) -- 00005 * 00006 * -- Contributed by Zlatko Drmac of the University of Zagreb and -- 00007 * -- Kresimir Veselic of the Fernuniversitaet Hagen -- 00008 * -- April 2011 -- 00009 * 00010 * -- LAPACK is a software package provided by Univ. of Tennessee, -- 00011 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- 00012 * 00013 * This routine is also part of SIGMA (version 1.23, October 23. 2008.) 00014 * SIGMA is a library of algorithms for highly accurate algorithms for 00015 * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the 00016 * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0. 00017 * 00018 IMPLICIT NONE 00019 * .. 00020 * .. Scalar Arguments .. 00021 INTEGER INFO, LDA, LDV, LWORK, M, MV, N 00022 CHARACTER*1 JOBA, JOBU, JOBV 00023 * .. 00024 * .. Array Arguments .. 00025 REAL A( LDA, * ), SVA( N ), V( LDV, * ), 00026 $ WORK( LWORK ) 00027 * .. 00028 * 00029 * Purpose 00030 * ======= 00031 * 00032 * SGESVJ computes the singular value decomposition (SVD) of a real 00033 * M-by-N matrix A, where M >= N. The SVD of A is written as 00034 * [++] [xx] [x0] [xx] 00035 * A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx] 00036 * [++] [xx] 00037 * where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal 00038 * matrix, and V is an N-by-N orthogonal matrix. The diagonal elements 00039 * of SIGMA are the singular values of A. The columns of U and V are the 00040 * left and the right singular vectors of A, respectively. 00041 * 00042 * Further Details 00043 * ~~~~~~~~~~~~~~~ 00044 * The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane 00045 * rotations. The rotations are implemented as fast scaled rotations of 00046 * Anda and Park [1]. In the case of underflow of the Jacobi angle, a 00047 * modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses 00048 * column interchanges of de Rijk [2]. The relative accuracy of the computed 00049 * singular values and the accuracy of the computed singular vectors (in 00050 * angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. 00051 * The condition number that determines the accuracy in the full rank case 00052 * is essentially min_{D=diag} kappa(A*D), where kappa(.) is the 00053 * spectral condition number. The best performance of this Jacobi SVD 00054 * procedure is achieved if used in an accelerated version of Drmac and 00055 * Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. 00056 * Some tunning parameters (marked with [TP]) are available for the 00057 * implementer. 00058 * The computational range for the nonzero singular values is the machine 00059 * number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even 00060 * denormalized singular values can be computed with the corresponding 00061 * gradual loss of accurate digits. 00062 * 00063 * Contributors 00064 * ~~~~~~~~~~~~ 00065 * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) 00066 * 00067 * References 00068 * ~~~~~~~~~~ 00069 * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. 00070 * SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. 00071 * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the 00072 * singular value decomposition on a vector computer. 00073 * SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. 00074 * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. 00075 * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular 00076 * value computation in floating point arithmetic. 00077 * SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. 00078 * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. 00079 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. 00080 * LAPACK Working note 169. 00081 * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. 00082 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. 00083 * LAPACK Working note 170. 00084 * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, 00085 * QSVD, (H,K)-SVD computations. 00086 * Department of Mathematics, University of Zagreb, 2008. 00087 * 00088 * Bugs, Examples and Comments 00089 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00090 * Please report all bugs and send interesting test examples and comments to 00091 * drmac@math.hr. Thank you. 00092 * 00093 * Arguments 00094 * ========= 00095 * 00096 * JOBA (input) CHARACTER* 1 00097 * Specifies the structure of A. 00098 * = 'L': The input matrix A is lower triangular; 00099 * = 'U': The input matrix A is upper triangular; 00100 * = 'G': The input matrix A is general M-by-N matrix, M >= N. 00101 * 00102 * JOBU (input) CHARACTER*1 00103 * Specifies whether to compute the left singular vectors 00104 * (columns of U): 00105 * = 'U': The left singular vectors corresponding to the nonzero 00106 * singular values are computed and returned in the leading 00107 * columns of A. See more details in the description of A. 00108 * The default numerical orthogonality threshold is set to 00109 * approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E'). 00110 * = 'C': Analogous to JOBU='U', except that user can control the 00111 * level of numerical orthogonality of the computed left 00112 * singular vectors. TOL can be set to TOL = CTOL*EPS, where 00113 * CTOL is given on input in the array WORK. 00114 * No CTOL smaller than ONE is allowed. CTOL greater 00115 * than 1 / EPS is meaningless. The option 'C' 00116 * can be used if M*EPS is satisfactory orthogonality 00117 * of the computed left singular vectors, so CTOL=M could 00118 * save few sweeps of Jacobi rotations. 00119 * See the descriptions of A and WORK(1). 00120 * = 'N': The matrix U is not computed. However, see the 00121 * description of A. 00122 * 00123 * JOBV (input) CHARACTER*1 00124 * Specifies whether to compute the right singular vectors, that 00125 * is, the matrix V: 00126 * = 'V' : the matrix V is computed and returned in the array V 00127 * = 'A' : the Jacobi rotations are applied to the MV-by-N 00128 * array V. In other words, the right singular vector 00129 * matrix V is not computed explicitly; instead it is 00130 * applied to an MV-by-N matrix initially stored in the 00131 * first MV rows of V. 00132 * = 'N' : the matrix V is not computed and the array V is not 00133 * referenced 00134 * 00135 * M (input) INTEGER 00136 * The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0. 00137 * 00138 * N (input) INTEGER 00139 * The number of columns of the input matrix A. 00140 * M >= N >= 0. 00141 * 00142 * A (input/output) REAL array, dimension (LDA,N) 00143 * On entry, the M-by-N matrix A. 00144 * On exit, 00145 * If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C': 00146 * If INFO .EQ. 0 : 00147 * RANKA orthonormal columns of U are returned in the 00148 * leading RANKA columns of the array A. Here RANKA <= N 00149 * is the number of computed singular values of A that are 00150 * above the underflow threshold SLAMCH('S'). The singular 00151 * vectors corresponding to underflowed or zero singular 00152 * values are not computed. The value of RANKA is returned 00153 * in the array WORK as RANKA=NINT(WORK(2)). Also see the 00154 * descriptions of SVA and WORK. The computed columns of U 00155 * are mutually numerically orthogonal up to approximately 00156 * TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'), 00157 * see the description of JOBU. 00158 * If INFO .GT. 0, 00159 * the procedure SGESVJ did not converge in the given number 00160 * of iterations (sweeps). In that case, the computed 00161 * columns of U may not be orthogonal up to TOL. The output 00162 * U (stored in A), SIGMA (given by the computed singular 00163 * values in SVA(1:N)) and V is still a decomposition of the 00164 * input matrix A in the sense that the residual 00165 * ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small. 00166 * If JOBU .EQ. 'N': 00167 * If INFO .EQ. 0 : 00168 * Note that the left singular vectors are 'for free' in the 00169 * one-sided Jacobi SVD algorithm. However, if only the 00170 * singular values are needed, the level of numerical 00171 * orthogonality of U is not an issue and iterations are 00172 * stopped when the columns of the iterated matrix are 00173 * numerically orthogonal up to approximately M*EPS. Thus, 00174 * on exit, A contains the columns of U scaled with the 00175 * corresponding singular values. 00176 * If INFO .GT. 0 : 00177 * the procedure SGESVJ did not converge in the given number 00178 * of iterations (sweeps). 00179 * 00180 * LDA (input) INTEGER 00181 * The leading dimension of the array A. LDA >= max(1,M). 00182 * 00183 * SVA (workspace/output) REAL array, dimension (N) 00184 * On exit, 00185 * If INFO .EQ. 0 : 00186 * depending on the value SCALE = WORK(1), we have: 00187 * If SCALE .EQ. ONE: 00188 * SVA(1:N) contains the computed singular values of A. 00189 * During the computation SVA contains the Euclidean column 00190 * norms of the iterated matrices in the array A. 00191 * If SCALE .NE. ONE: 00192 * The singular values of A are SCALE*SVA(1:N), and this 00193 * factored representation is due to the fact that some of the 00194 * singular values of A might underflow or overflow. 00195 * 00196 * If INFO .GT. 0 : 00197 * the procedure SGESVJ did not converge in the given number of 00198 * iterations (sweeps) and SCALE*SVA(1:N) may not be accurate. 00199 * 00200 * MV (input) INTEGER 00201 * If JOBV .EQ. 'A', then the product of Jacobi rotations in SGESVJ 00202 * is applied to the first MV rows of V. See the description of JOBV. 00203 * 00204 * V (input/output) REAL array, dimension (LDV,N) 00205 * If JOBV = 'V', then V contains on exit the N-by-N matrix of 00206 * the right singular vectors; 00207 * If JOBV = 'A', then V contains the product of the computed right 00208 * singular vector matrix and the initial matrix in 00209 * the array V. 00210 * If JOBV = 'N', then V is not referenced. 00211 * 00212 * LDV (input) INTEGER 00213 * The leading dimension of the array V, LDV .GE. 1. 00214 * If JOBV .EQ. 'V', then LDV .GE. max(1,N). 00215 * If JOBV .EQ. 'A', then LDV .GE. max(1,MV) . 00216 * 00217 * WORK (input/workspace/output) REAL array, dimension max(4,M+N). 00218 * On entry, 00219 * If JOBU .EQ. 'C' : 00220 * WORK(1) = CTOL, where CTOL defines the threshold for convergence. 00221 * The process stops if all columns of A are mutually 00222 * orthogonal up to CTOL*EPS, EPS=SLAMCH('E'). 00223 * It is required that CTOL >= ONE, i.e. it is not 00224 * allowed to force the routine to obtain orthogonality 00225 * below EPSILON. 00226 * On exit, 00227 * WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N) 00228 * are the computed singular vcalues of A. 00229 * (See description of SVA().) 00230 * WORK(2) = NINT(WORK(2)) is the number of the computed nonzero 00231 * singular values. 00232 * WORK(3) = NINT(WORK(3)) is the number of the computed singular 00233 * values that are larger than the underflow threshold. 00234 * WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi 00235 * rotations needed for numerical convergence. 00236 * WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep. 00237 * This is useful information in cases when SGESVJ did 00238 * not converge, as it can be used to estimate whether 00239 * the output is stil useful and for post festum analysis. 00240 * WORK(6) = the largest absolute value over all sines of the 00241 * Jacobi rotation angles in the last sweep. It can be 00242 * useful for a post festum analysis. 00243 * 00244 * LWORK (input) INTEGER 00245 * length of WORK, WORK >= MAX(6,M+N) 00246 * 00247 * INFO (output) INTEGER 00248 * = 0 : successful exit. 00249 * < 0 : if INFO = -i, then the i-th argument had an illegal value 00250 * > 0 : SGESVJ did not converge in the maximal allowed number (30) 00251 * of sweeps. The output may still be useful. See the 00252 * description of WORK. 00253 * 00254 * ===================================================================== 00255 * 00256 * .. Local Parameters .. 00257 REAL ZERO, HALF, ONE, TWO 00258 PARAMETER ( ZERO = 0.0E0, HALF = 0.5E0, ONE = 1.0E0, 00259 $ TWO = 2.0E0 ) 00260 INTEGER NSWEEP 00261 PARAMETER ( NSWEEP = 30 ) 00262 * .. 00263 * .. Local Scalars .. 00264 REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG, 00265 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ, 00266 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL, 00267 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA, 00268 $ THSIGN, TOL 00269 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1, 00270 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34, 00271 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP, 00272 $ SWBAND 00273 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK, 00274 $ RSVEC, UCTOL, UPPER 00275 * .. 00276 * .. Local Arrays .. 00277 REAL FASTR( 5 ) 00278 * .. 00279 * .. Intrinsic Functions .. 00280 INTRINSIC ABS, AMAX1, AMIN1, FLOAT, MIN0, SIGN, SQRT 00281 * .. 00282 * .. External Functions .. 00283 * .. 00284 * from BLAS 00285 REAL SDOT, SNRM2 00286 EXTERNAL SDOT, SNRM2 00287 INTEGER ISAMAX 00288 EXTERNAL ISAMAX 00289 * from LAPACK 00290 REAL SLAMCH 00291 EXTERNAL SLAMCH 00292 LOGICAL LSAME 00293 EXTERNAL LSAME 00294 * .. 00295 * .. External Subroutines .. 00296 * .. 00297 * from BLAS 00298 EXTERNAL SAXPY, SCOPY, SROTM, SSCAL, SSWAP 00299 * from LAPACK 00300 EXTERNAL SLASCL, SLASET, SLASSQ, XERBLA 00301 * 00302 EXTERNAL SGSVJ0, SGSVJ1 00303 * .. 00304 * .. Executable Statements .. 00305 * 00306 * Test the input arguments 00307 * 00308 LSVEC = LSAME( JOBU, 'U' ) 00309 UCTOL = LSAME( JOBU, 'C' ) 00310 RSVEC = LSAME( JOBV, 'V' ) 00311 APPLV = LSAME( JOBV, 'A' ) 00312 UPPER = LSAME( JOBA, 'U' ) 00313 LOWER = LSAME( JOBA, 'L' ) 00314 * 00315 IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN 00316 INFO = -1 00317 ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN 00318 INFO = -2 00319 ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN 00320 INFO = -3 00321 ELSE IF( M.LT.0 ) THEN 00322 INFO = -4 00323 ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN 00324 INFO = -5 00325 ELSE IF( LDA.LT.M ) THEN 00326 INFO = -7 00327 ELSE IF( MV.LT.0 ) THEN 00328 INFO = -9 00329 ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR. 00330 $ ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN 00331 INFO = -11 00332 ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN 00333 INFO = -12 00334 ELSE IF( LWORK.LT.MAX0( M+N, 6 ) ) THEN 00335 INFO = -13 00336 ELSE 00337 INFO = 0 00338 END IF 00339 * 00340 * #:( 00341 IF( INFO.NE.0 ) THEN 00342 CALL XERBLA( 'SGESVJ', -INFO ) 00343 RETURN 00344 END IF 00345 * 00346 * #:) Quick return for void matrix 00347 * 00348 IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN 00349 * 00350 * Set numerical parameters 00351 * The stopping criterion for Jacobi rotations is 00352 * 00353 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS 00354 * 00355 * where EPS is the round-off and CTOL is defined as follows: 00356 * 00357 IF( UCTOL ) THEN 00358 * ... user controlled 00359 CTOL = WORK( 1 ) 00360 ELSE 00361 * ... default 00362 IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN 00363 CTOL = SQRT( FLOAT( M ) ) 00364 ELSE 00365 CTOL = FLOAT( M ) 00366 END IF 00367 END IF 00368 * ... and the machine dependent parameters are 00369 *[!] (Make sure that SLAMCH() works properly on the target machine.) 00370 * 00371 EPSLN = SLAMCH( 'Epsilon' ) 00372 ROOTEPS = SQRT( EPSLN ) 00373 SFMIN = SLAMCH( 'SafeMinimum' ) 00374 ROOTSFMIN = SQRT( SFMIN ) 00375 SMALL = SFMIN / EPSLN 00376 BIG = SLAMCH( 'Overflow' ) 00377 * BIG = ONE / SFMIN 00378 ROOTBIG = ONE / ROOTSFMIN 00379 LARGE = BIG / SQRT( FLOAT( M*N ) ) 00380 BIGTHETA = ONE / ROOTEPS 00381 * 00382 TOL = CTOL*EPSLN 00383 ROOTTOL = SQRT( TOL ) 00384 * 00385 IF( FLOAT( M )*EPSLN.GE.ONE ) THEN 00386 INFO = -4 00387 CALL XERBLA( 'SGESVJ', -INFO ) 00388 RETURN 00389 END IF 00390 * 00391 * Initialize the right singular vector matrix. 00392 * 00393 IF( RSVEC ) THEN 00394 MVL = N 00395 CALL SLASET( 'A', MVL, N, ZERO, ONE, V, LDV ) 00396 ELSE IF( APPLV ) THEN 00397 MVL = MV 00398 END IF 00399 RSVEC = RSVEC .OR. APPLV 00400 * 00401 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N ) 00402 *(!) If necessary, scale A to protect the largest singular value 00403 * from overflow. It is possible that saving the largest singular 00404 * value destroys the information about the small ones. 00405 * This initial scaling is almost minimal in the sense that the 00406 * goal is to make sure that no column norm overflows, and that 00407 * SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries 00408 * in A are detected, the procedure returns with INFO=-6. 00409 * 00410 SKL = ONE / SQRT( FLOAT( M )*FLOAT( N ) ) 00411 NOSCALE = .TRUE. 00412 GOSCALE = .TRUE. 00413 * 00414 IF( LOWER ) THEN 00415 * the input matrix is M-by-N lower triangular (trapezoidal) 00416 DO 1874 p = 1, N 00417 AAPP = ZERO 00418 AAQQ = ONE 00419 CALL SLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ ) 00420 IF( AAPP.GT.BIG ) THEN 00421 INFO = -6 00422 CALL XERBLA( 'SGESVJ', -INFO ) 00423 RETURN 00424 END IF 00425 AAQQ = SQRT( AAQQ ) 00426 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 00427 SVA( p ) = AAPP*AAQQ 00428 ELSE 00429 NOSCALE = .FALSE. 00430 SVA( p ) = AAPP*( AAQQ*SKL ) 00431 IF( GOSCALE ) THEN 00432 GOSCALE = .FALSE. 00433 DO 1873 q = 1, p - 1 00434 SVA( q ) = SVA( q )*SKL 00435 1873 CONTINUE 00436 END IF 00437 END IF 00438 1874 CONTINUE 00439 ELSE IF( UPPER ) THEN 00440 * the input matrix is M-by-N upper triangular (trapezoidal) 00441 DO 2874 p = 1, N 00442 AAPP = ZERO 00443 AAQQ = ONE 00444 CALL SLASSQ( p, A( 1, p ), 1, AAPP, AAQQ ) 00445 IF( AAPP.GT.BIG ) THEN 00446 INFO = -6 00447 CALL XERBLA( 'SGESVJ', -INFO ) 00448 RETURN 00449 END IF 00450 AAQQ = SQRT( AAQQ ) 00451 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 00452 SVA( p ) = AAPP*AAQQ 00453 ELSE 00454 NOSCALE = .FALSE. 00455 SVA( p ) = AAPP*( AAQQ*SKL ) 00456 IF( GOSCALE ) THEN 00457 GOSCALE = .FALSE. 00458 DO 2873 q = 1, p - 1 00459 SVA( q ) = SVA( q )*SKL 00460 2873 CONTINUE 00461 END IF 00462 END IF 00463 2874 CONTINUE 00464 ELSE 00465 * the input matrix is M-by-N general dense 00466 DO 3874 p = 1, N 00467 AAPP = ZERO 00468 AAQQ = ONE 00469 CALL SLASSQ( M, A( 1, p ), 1, AAPP, AAQQ ) 00470 IF( AAPP.GT.BIG ) THEN 00471 INFO = -6 00472 CALL XERBLA( 'SGESVJ', -INFO ) 00473 RETURN 00474 END IF 00475 AAQQ = SQRT( AAQQ ) 00476 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 00477 SVA( p ) = AAPP*AAQQ 00478 ELSE 00479 NOSCALE = .FALSE. 00480 SVA( p ) = AAPP*( AAQQ*SKL ) 00481 IF( GOSCALE ) THEN 00482 GOSCALE = .FALSE. 00483 DO 3873 q = 1, p - 1 00484 SVA( q ) = SVA( q )*SKL 00485 3873 CONTINUE 00486 END IF 00487 END IF 00488 3874 CONTINUE 00489 END IF 00490 * 00491 IF( NOSCALE )SKL = ONE 00492 * 00493 * Move the smaller part of the spectrum from the underflow threshold 00494 *(!) Start by determining the position of the nonzero entries of the 00495 * array SVA() relative to ( SFMIN, BIG ). 00496 * 00497 AAPP = ZERO 00498 AAQQ = BIG 00499 DO 4781 p = 1, N 00500 IF( SVA( p ).NE.ZERO )AAQQ = AMIN1( AAQQ, SVA( p ) ) 00501 AAPP = AMAX1( AAPP, SVA( p ) ) 00502 4781 CONTINUE 00503 * 00504 * #:) Quick return for zero matrix 00505 * 00506 IF( AAPP.EQ.ZERO ) THEN 00507 IF( LSVEC )CALL SLASET( 'G', M, N, ZERO, ONE, A, LDA ) 00508 WORK( 1 ) = ONE 00509 WORK( 2 ) = ZERO 00510 WORK( 3 ) = ZERO 00511 WORK( 4 ) = ZERO 00512 WORK( 5 ) = ZERO 00513 WORK( 6 ) = ZERO 00514 RETURN 00515 END IF 00516 * 00517 * #:) Quick return for one-column matrix 00518 * 00519 IF( N.EQ.1 ) THEN 00520 IF( LSVEC )CALL SLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1, 00521 $ A( 1, 1 ), LDA, IERR ) 00522 WORK( 1 ) = ONE / SKL 00523 IF( SVA( 1 ).GE.SFMIN ) THEN 00524 WORK( 2 ) = ONE 00525 ELSE 00526 WORK( 2 ) = ZERO 00527 END IF 00528 WORK( 3 ) = ZERO 00529 WORK( 4 ) = ZERO 00530 WORK( 5 ) = ZERO 00531 WORK( 6 ) = ZERO 00532 RETURN 00533 END IF 00534 * 00535 * Protect small singular values from underflow, and try to 00536 * avoid underflows/overflows in computing Jacobi rotations. 00537 * 00538 SN = SQRT( SFMIN / EPSLN ) 00539 TEMP1 = SQRT( BIG / FLOAT( N ) ) 00540 IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR. 00541 $ ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN 00542 TEMP1 = AMIN1( BIG, TEMP1 / AAPP ) 00543 * AAQQ = AAQQ*TEMP1 00544 * AAPP = AAPP*TEMP1 00545 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN 00546 TEMP1 = AMIN1( SN / AAQQ, BIG / ( AAPP*SQRT( FLOAT( N ) ) ) ) 00547 * AAQQ = AAQQ*TEMP1 00548 * AAPP = AAPP*TEMP1 00549 ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 00550 TEMP1 = AMAX1( SN / AAQQ, TEMP1 / AAPP ) 00551 * AAQQ = AAQQ*TEMP1 00552 * AAPP = AAPP*TEMP1 00553 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 00554 TEMP1 = AMIN1( SN / AAQQ, BIG / ( SQRT( FLOAT( N ) )*AAPP ) ) 00555 * AAQQ = AAQQ*TEMP1 00556 * AAPP = AAPP*TEMP1 00557 ELSE 00558 TEMP1 = ONE 00559 END IF 00560 * 00561 * Scale, if necessary 00562 * 00563 IF( TEMP1.NE.ONE ) THEN 00564 CALL SLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR ) 00565 END IF 00566 SKL = TEMP1*SKL 00567 IF( SKL.NE.ONE ) THEN 00568 CALL SLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR ) 00569 SKL = ONE / SKL 00570 END IF 00571 * 00572 * Row-cyclic Jacobi SVD algorithm with column pivoting 00573 * 00574 EMPTSW = ( N*( N-1 ) ) / 2 00575 NOTROT = 0 00576 FASTR( 1 ) = ZERO 00577 * 00578 * A is represented in factored form A = A * diag(WORK), where diag(WORK) 00579 * is initialized to identity. WORK is updated during fast scaled 00580 * rotations. 00581 * 00582 DO 1868 q = 1, N 00583 WORK( q ) = ONE 00584 1868 CONTINUE 00585 * 00586 * 00587 SWBAND = 3 00588 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective 00589 * if SGESVJ is used as a computational routine in the preconditioned 00590 * Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure 00591 * works on pivots inside a band-like region around the diagonal. 00592 * The boundaries are determined dynamically, based on the number of 00593 * pivots above a threshold. 00594 * 00595 KBL = MIN0( 8, N ) 00596 *[TP] KBL is a tuning parameter that defines the tile size in the 00597 * tiling of the p-q loops of pivot pairs. In general, an optimal 00598 * value of KBL depends on the matrix dimensions and on the 00599 * parameters of the computer's memory. 00600 * 00601 NBL = N / KBL 00602 IF( ( NBL*KBL ).NE.N )NBL = NBL + 1 00603 * 00604 BLSKIP = KBL**2 00605 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL. 00606 * 00607 ROWSKIP = MIN0( 5, KBL ) 00608 *[TP] ROWSKIP is a tuning parameter. 00609 * 00610 LKAHEAD = 1 00611 *[TP] LKAHEAD is a tuning parameter. 00612 * 00613 * Quasi block transformations, using the lower (upper) triangular 00614 * structure of the input matrix. The quasi-block-cycling usually 00615 * invokes cubic convergence. Big part of this cycle is done inside 00616 * canonical subspaces of dimensions less than M. 00617 * 00618 IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX0( 64, 4*KBL ) ) ) THEN 00619 *[TP] The number of partition levels and the actual partition are 00620 * tuning parameters. 00621 N4 = N / 4 00622 N2 = N / 2 00623 N34 = 3*N4 00624 IF( APPLV ) THEN 00625 q = 0 00626 ELSE 00627 q = 1 00628 END IF 00629 * 00630 IF( LOWER ) THEN 00631 * 00632 * This works very well on lower triangular matrices, in particular 00633 * in the framework of the preconditioned Jacobi SVD (xGEJSV). 00634 * The idea is simple: 00635 * [+ 0 0 0] Note that Jacobi transformations of [0 0] 00636 * [+ + 0 0] [0 0] 00637 * [+ + x 0] actually work on [x 0] [x 0] 00638 * [+ + x x] [x x]. [x x] 00639 * 00640 CALL SGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA, 00641 $ WORK( N34+1 ), SVA( N34+1 ), MVL, 00642 $ V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL, 00643 $ 2, WORK( N+1 ), LWORK-N, IERR ) 00644 * 00645 CALL SGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA, 00646 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 00647 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2, 00648 $ WORK( N+1 ), LWORK-N, IERR ) 00649 * 00650 CALL SGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA, 00651 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 00652 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 00653 $ WORK( N+1 ), LWORK-N, IERR ) 00654 * 00655 CALL SGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA, 00656 $ WORK( N4+1 ), SVA( N4+1 ), MVL, 00657 $ V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1, 00658 $ WORK( N+1 ), LWORK-N, IERR ) 00659 * 00660 CALL SGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV, 00661 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 00662 $ IERR ) 00663 * 00664 CALL SGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V, 00665 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 00666 $ LWORK-N, IERR ) 00667 * 00668 * 00669 ELSE IF( UPPER ) THEN 00670 * 00671 * 00672 CALL SGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV, 00673 $ EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N, 00674 $ IERR ) 00675 * 00676 CALL SGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ), 00677 $ SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV, 00678 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 00679 $ IERR ) 00680 * 00681 CALL SGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V, 00682 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 00683 $ LWORK-N, IERR ) 00684 * 00685 CALL SGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA, 00686 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 00687 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 00688 $ WORK( N+1 ), LWORK-N, IERR ) 00689 00690 END IF 00691 * 00692 END IF 00693 * 00694 * .. Row-cyclic pivot strategy with de Rijk's pivoting .. 00695 * 00696 DO 1993 i = 1, NSWEEP 00697 * 00698 * .. go go go ... 00699 * 00700 MXAAPQ = ZERO 00701 MXSINJ = ZERO 00702 ISWROT = 0 00703 * 00704 NOTROT = 0 00705 PSKIPPED = 0 00706 * 00707 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs 00708 * 1 <= p < q <= N. This is the first step toward a blocked implementation 00709 * of the rotations. New implementation, based on block transformations, 00710 * is under development. 00711 * 00712 DO 2000 ibr = 1, NBL 00713 * 00714 igl = ( ibr-1 )*KBL + 1 00715 * 00716 DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL-ibr ) 00717 * 00718 igl = igl + ir1*KBL 00719 * 00720 DO 2001 p = igl, MIN0( igl+KBL-1, N-1 ) 00721 * 00722 * .. de Rijk's pivoting 00723 * 00724 q = ISAMAX( N-p+1, SVA( p ), 1 ) + p - 1 00725 IF( p.NE.q ) THEN 00726 CALL SSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 00727 IF( RSVEC )CALL SSWAP( MVL, V( 1, p ), 1, 00728 $ V( 1, q ), 1 ) 00729 TEMP1 = SVA( p ) 00730 SVA( p ) = SVA( q ) 00731 SVA( q ) = TEMP1 00732 TEMP1 = WORK( p ) 00733 WORK( p ) = WORK( q ) 00734 WORK( q ) = TEMP1 00735 END IF 00736 * 00737 IF( ir1.EQ.0 ) THEN 00738 * 00739 * Column norms are periodically updated by explicit 00740 * norm computation. 00741 * Caveat: 00742 * Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1) 00743 * as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to 00744 * overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to 00745 * underflow for ||A(:,p)||_2 < SQRT(underflow_threshold). 00746 * Hence, SNRM2 cannot be trusted, not even in the case when 00747 * the true norm is far from the under(over)flow boundaries. 00748 * If properly implemented SNRM2 is available, the IF-THEN-ELSE 00749 * below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)". 00750 * 00751 IF( ( SVA( p ).LT.ROOTBIG ) .AND. 00752 $ ( SVA( p ).GT.ROOTSFMIN ) ) THEN 00753 SVA( p ) = SNRM2( M, A( 1, p ), 1 )*WORK( p ) 00754 ELSE 00755 TEMP1 = ZERO 00756 AAPP = ONE 00757 CALL SLASSQ( M, A( 1, p ), 1, TEMP1, AAPP ) 00758 SVA( p ) = TEMP1*SQRT( AAPP )*WORK( p ) 00759 END IF 00760 AAPP = SVA( p ) 00761 ELSE 00762 AAPP = SVA( p ) 00763 END IF 00764 * 00765 IF( AAPP.GT.ZERO ) THEN 00766 * 00767 PSKIPPED = 0 00768 * 00769 DO 2002 q = p + 1, MIN0( igl+KBL-1, N ) 00770 * 00771 AAQQ = SVA( q ) 00772 * 00773 IF( AAQQ.GT.ZERO ) THEN 00774 * 00775 AAPP0 = AAPP 00776 IF( AAQQ.GE.ONE ) THEN 00777 ROTOK = ( SMALL*AAPP ).LE.AAQQ 00778 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 00779 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 00780 $ q ), 1 )*WORK( p )*WORK( q ) / 00781 $ AAQQ ) / AAPP 00782 ELSE 00783 CALL SCOPY( M, A( 1, p ), 1, 00784 $ WORK( N+1 ), 1 ) 00785 CALL SLASCL( 'G', 0, 0, AAPP, 00786 $ WORK( p ), M, 1, 00787 $ WORK( N+1 ), LDA, IERR ) 00788 AAPQ = SDOT( M, WORK( N+1 ), 1, 00789 $ A( 1, q ), 1 )*WORK( q ) / AAQQ 00790 END IF 00791 ELSE 00792 ROTOK = AAPP.LE.( AAQQ / SMALL ) 00793 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 00794 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 00795 $ q ), 1 )*WORK( p )*WORK( q ) / 00796 $ AAQQ ) / AAPP 00797 ELSE 00798 CALL SCOPY( M, A( 1, q ), 1, 00799 $ WORK( N+1 ), 1 ) 00800 CALL SLASCL( 'G', 0, 0, AAQQ, 00801 $ WORK( q ), M, 1, 00802 $ WORK( N+1 ), LDA, IERR ) 00803 AAPQ = SDOT( M, WORK( N+1 ), 1, 00804 $ A( 1, p ), 1 )*WORK( p ) / AAPP 00805 END IF 00806 END IF 00807 * 00808 MXAAPQ = AMAX1( MXAAPQ, ABS( AAPQ ) ) 00809 * 00810 * TO rotate or NOT to rotate, THAT is the question ... 00811 * 00812 IF( ABS( AAPQ ).GT.TOL ) THEN 00813 * 00814 * .. rotate 00815 *[RTD] ROTATED = ROTATED + ONE 00816 * 00817 IF( ir1.EQ.0 ) THEN 00818 NOTROT = 0 00819 PSKIPPED = 0 00820 ISWROT = ISWROT + 1 00821 END IF 00822 * 00823 IF( ROTOK ) THEN 00824 * 00825 AQOAP = AAQQ / AAPP 00826 APOAQ = AAPP / AAQQ 00827 THETA = -HALF*ABS( AQOAP-APOAQ ) / AAPQ 00828 * 00829 IF( ABS( THETA ).GT.BIGTHETA ) THEN 00830 * 00831 T = HALF / THETA 00832 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 00833 FASTR( 4 ) = -T*WORK( q ) / 00834 $ WORK( p ) 00835 CALL SROTM( M, A( 1, p ), 1, 00836 $ A( 1, q ), 1, FASTR ) 00837 IF( RSVEC )CALL SROTM( MVL, 00838 $ V( 1, p ), 1, 00839 $ V( 1, q ), 1, 00840 $ FASTR ) 00841 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 00842 $ ONE+T*APOAQ*AAPQ ) ) 00843 AAPP = AAPP*SQRT( AMAX1( ZERO, 00844 $ ONE-T*AQOAP*AAPQ ) ) 00845 MXSINJ = AMAX1( MXSINJ, ABS( T ) ) 00846 * 00847 ELSE 00848 * 00849 * .. choose correct signum for THETA and rotate 00850 * 00851 THSIGN = -SIGN( ONE, AAPQ ) 00852 T = ONE / ( THETA+THSIGN* 00853 $ SQRT( ONE+THETA*THETA ) ) 00854 CS = SQRT( ONE / ( ONE+T*T ) ) 00855 SN = T*CS 00856 * 00857 MXSINJ = AMAX1( MXSINJ, ABS( SN ) ) 00858 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 00859 $ ONE+T*APOAQ*AAPQ ) ) 00860 AAPP = AAPP*SQRT( AMAX1( ZERO, 00861 $ ONE-T*AQOAP*AAPQ ) ) 00862 * 00863 APOAQ = WORK( p ) / WORK( q ) 00864 AQOAP = WORK( q ) / WORK( p ) 00865 IF( WORK( p ).GE.ONE ) THEN 00866 IF( WORK( q ).GE.ONE ) THEN 00867 FASTR( 3 ) = T*APOAQ 00868 FASTR( 4 ) = -T*AQOAP 00869 WORK( p ) = WORK( p )*CS 00870 WORK( q ) = WORK( q )*CS 00871 CALL SROTM( M, A( 1, p ), 1, 00872 $ A( 1, q ), 1, 00873 $ FASTR ) 00874 IF( RSVEC )CALL SROTM( MVL, 00875 $ V( 1, p ), 1, V( 1, q ), 00876 $ 1, FASTR ) 00877 ELSE 00878 CALL SAXPY( M, -T*AQOAP, 00879 $ A( 1, q ), 1, 00880 $ A( 1, p ), 1 ) 00881 CALL SAXPY( M, CS*SN*APOAQ, 00882 $ A( 1, p ), 1, 00883 $ A( 1, q ), 1 ) 00884 WORK( p ) = WORK( p )*CS 00885 WORK( q ) = WORK( q ) / CS 00886 IF( RSVEC ) THEN 00887 CALL SAXPY( MVL, -T*AQOAP, 00888 $ V( 1, q ), 1, 00889 $ V( 1, p ), 1 ) 00890 CALL SAXPY( MVL, 00891 $ CS*SN*APOAQ, 00892 $ V( 1, p ), 1, 00893 $ V( 1, q ), 1 ) 00894 END IF 00895 END IF 00896 ELSE 00897 IF( WORK( q ).GE.ONE ) THEN 00898 CALL SAXPY( M, T*APOAQ, 00899 $ A( 1, p ), 1, 00900 $ A( 1, q ), 1 ) 00901 CALL SAXPY( M, -CS*SN*AQOAP, 00902 $ A( 1, q ), 1, 00903 $ A( 1, p ), 1 ) 00904 WORK( p ) = WORK( p ) / CS 00905 WORK( q ) = WORK( q )*CS 00906 IF( RSVEC ) THEN 00907 CALL SAXPY( MVL, T*APOAQ, 00908 $ V( 1, p ), 1, 00909 $ V( 1, q ), 1 ) 00910 CALL SAXPY( MVL, 00911 $ -CS*SN*AQOAP, 00912 $ V( 1, q ), 1, 00913 $ V( 1, p ), 1 ) 00914 END IF 00915 ELSE 00916 IF( WORK( p ).GE.WORK( q ) ) 00917 $ THEN 00918 CALL SAXPY( M, -T*AQOAP, 00919 $ A( 1, q ), 1, 00920 $ A( 1, p ), 1 ) 00921 CALL SAXPY( M, CS*SN*APOAQ, 00922 $ A( 1, p ), 1, 00923 $ A( 1, q ), 1 ) 00924 WORK( p ) = WORK( p )*CS 00925 WORK( q ) = WORK( q ) / CS 00926 IF( RSVEC ) THEN 00927 CALL SAXPY( MVL, 00928 $ -T*AQOAP, 00929 $ V( 1, q ), 1, 00930 $ V( 1, p ), 1 ) 00931 CALL SAXPY( MVL, 00932 $ CS*SN*APOAQ, 00933 $ V( 1, p ), 1, 00934 $ V( 1, q ), 1 ) 00935 END IF 00936 ELSE 00937 CALL SAXPY( M, T*APOAQ, 00938 $ A( 1, p ), 1, 00939 $ A( 1, q ), 1 ) 00940 CALL SAXPY( M, 00941 $ -CS*SN*AQOAP, 00942 $ A( 1, q ), 1, 00943 $ A( 1, p ), 1 ) 00944 WORK( p ) = WORK( p ) / CS 00945 WORK( q ) = WORK( q )*CS 00946 IF( RSVEC ) THEN 00947 CALL SAXPY( MVL, 00948 $ T*APOAQ, V( 1, p ), 00949 $ 1, V( 1, q ), 1 ) 00950 CALL SAXPY( MVL, 00951 $ -CS*SN*AQOAP, 00952 $ V( 1, q ), 1, 00953 $ V( 1, p ), 1 ) 00954 END IF 00955 END IF 00956 END IF 00957 END IF 00958 END IF 00959 * 00960 ELSE 00961 * .. have to use modified Gram-Schmidt like transformation 00962 CALL SCOPY( M, A( 1, p ), 1, 00963 $ WORK( N+1 ), 1 ) 00964 CALL SLASCL( 'G', 0, 0, AAPP, ONE, M, 00965 $ 1, WORK( N+1 ), LDA, 00966 $ IERR ) 00967 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, M, 00968 $ 1, A( 1, q ), LDA, IERR ) 00969 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 00970 CALL SAXPY( M, TEMP1, WORK( N+1 ), 1, 00971 $ A( 1, q ), 1 ) 00972 CALL SLASCL( 'G', 0, 0, ONE, AAQQ, M, 00973 $ 1, A( 1, q ), LDA, IERR ) 00974 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 00975 $ ONE-AAPQ*AAPQ ) ) 00976 MXSINJ = AMAX1( MXSINJ, SFMIN ) 00977 END IF 00978 * END IF ROTOK THEN ... ELSE 00979 * 00980 * In the case of cancellation in updating SVA(q), SVA(p) 00981 * recompute SVA(q), SVA(p). 00982 * 00983 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 00984 $ THEN 00985 IF( ( AAQQ.LT.ROOTBIG ) .AND. 00986 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN 00987 SVA( q ) = SNRM2( M, A( 1, q ), 1 )* 00988 $ WORK( q ) 00989 ELSE 00990 T = ZERO 00991 AAQQ = ONE 00992 CALL SLASSQ( M, A( 1, q ), 1, T, 00993 $ AAQQ ) 00994 SVA( q ) = T*SQRT( AAQQ )*WORK( q ) 00995 END IF 00996 END IF 00997 IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN 00998 IF( ( AAPP.LT.ROOTBIG ) .AND. 00999 $ ( AAPP.GT.ROOTSFMIN ) ) THEN 01000 AAPP = SNRM2( M, A( 1, p ), 1 )* 01001 $ WORK( p ) 01002 ELSE 01003 T = ZERO 01004 AAPP = ONE 01005 CALL SLASSQ( M, A( 1, p ), 1, T, 01006 $ AAPP ) 01007 AAPP = T*SQRT( AAPP )*WORK( p ) 01008 END IF 01009 SVA( p ) = AAPP 01010 END IF 01011 * 01012 ELSE 01013 * A(:,p) and A(:,q) already numerically orthogonal 01014 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 01015 *[RTD] SKIPPED = SKIPPED + 1 01016 PSKIPPED = PSKIPPED + 1 01017 END IF 01018 ELSE 01019 * A(:,q) is zero column 01020 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 01021 PSKIPPED = PSKIPPED + 1 01022 END IF 01023 * 01024 IF( ( i.LE.SWBAND ) .AND. 01025 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN 01026 IF( ir1.EQ.0 )AAPP = -AAPP 01027 NOTROT = 0 01028 GO TO 2103 01029 END IF 01030 * 01031 2002 CONTINUE 01032 * END q-LOOP 01033 * 01034 2103 CONTINUE 01035 * bailed out of q-loop 01036 * 01037 SVA( p ) = AAPP 01038 * 01039 ELSE 01040 SVA( p ) = AAPP 01041 IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) ) 01042 $ NOTROT = NOTROT + MIN0( igl+KBL-1, N ) - p 01043 END IF 01044 * 01045 2001 CONTINUE 01046 * end of the p-loop 01047 * end of doing the block ( ibr, ibr ) 01048 1002 CONTINUE 01049 * end of ir1-loop 01050 * 01051 * ... go to the off diagonal blocks 01052 * 01053 igl = ( ibr-1 )*KBL + 1 01054 * 01055 DO 2010 jbc = ibr + 1, NBL 01056 * 01057 jgl = ( jbc-1 )*KBL + 1 01058 * 01059 * doing the block at ( ibr, jbc ) 01060 * 01061 IJBLSK = 0 01062 DO 2100 p = igl, MIN0( igl+KBL-1, N ) 01063 * 01064 AAPP = SVA( p ) 01065 IF( AAPP.GT.ZERO ) THEN 01066 * 01067 PSKIPPED = 0 01068 * 01069 DO 2200 q = jgl, MIN0( jgl+KBL-1, N ) 01070 * 01071 AAQQ = SVA( q ) 01072 IF( AAQQ.GT.ZERO ) THEN 01073 AAPP0 = AAPP 01074 * 01075 * .. M x 2 Jacobi SVD .. 01076 * 01077 * Safe Gram matrix computation 01078 * 01079 IF( AAQQ.GE.ONE ) THEN 01080 IF( AAPP.GE.AAQQ ) THEN 01081 ROTOK = ( SMALL*AAPP ).LE.AAQQ 01082 ELSE 01083 ROTOK = ( SMALL*AAQQ ).LE.AAPP 01084 END IF 01085 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 01086 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 01087 $ q ), 1 )*WORK( p )*WORK( q ) / 01088 $ AAQQ ) / AAPP 01089 ELSE 01090 CALL SCOPY( M, A( 1, p ), 1, 01091 $ WORK( N+1 ), 1 ) 01092 CALL SLASCL( 'G', 0, 0, AAPP, 01093 $ WORK( p ), M, 1, 01094 $ WORK( N+1 ), LDA, IERR ) 01095 AAPQ = SDOT( M, WORK( N+1 ), 1, 01096 $ A( 1, q ), 1 )*WORK( q ) / AAQQ 01097 END IF 01098 ELSE 01099 IF( AAPP.GE.AAQQ ) THEN 01100 ROTOK = AAPP.LE.( AAQQ / SMALL ) 01101 ELSE 01102 ROTOK = AAQQ.LE.( AAPP / SMALL ) 01103 END IF 01104 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 01105 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 01106 $ q ), 1 )*WORK( p )*WORK( q ) / 01107 $ AAQQ ) / AAPP 01108 ELSE 01109 CALL SCOPY( M, A( 1, q ), 1, 01110 $ WORK( N+1 ), 1 ) 01111 CALL SLASCL( 'G', 0, 0, AAQQ, 01112 $ WORK( q ), M, 1, 01113 $ WORK( N+1 ), LDA, IERR ) 01114 AAPQ = SDOT( M, WORK( N+1 ), 1, 01115 $ A( 1, p ), 1 )*WORK( p ) / AAPP 01116 END IF 01117 END IF 01118 * 01119 MXAAPQ = AMAX1( MXAAPQ, ABS( AAPQ ) ) 01120 * 01121 * TO rotate or NOT to rotate, THAT is the question ... 01122 * 01123 IF( ABS( AAPQ ).GT.TOL ) THEN 01124 NOTROT = 0 01125 *[RTD] ROTATED = ROTATED + 1 01126 PSKIPPED = 0 01127 ISWROT = ISWROT + 1 01128 * 01129 IF( ROTOK ) THEN 01130 * 01131 AQOAP = AAQQ / AAPP 01132 APOAQ = AAPP / AAQQ 01133 THETA = -HALF*ABS( AQOAP-APOAQ ) / AAPQ 01134 IF( AAQQ.GT.AAPP0 )THETA = -THETA 01135 * 01136 IF( ABS( THETA ).GT.BIGTHETA ) THEN 01137 T = HALF / THETA 01138 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 01139 FASTR( 4 ) = -T*WORK( q ) / 01140 $ WORK( p ) 01141 CALL SROTM( M, A( 1, p ), 1, 01142 $ A( 1, q ), 1, FASTR ) 01143 IF( RSVEC )CALL SROTM( MVL, 01144 $ V( 1, p ), 1, 01145 $ V( 1, q ), 1, 01146 $ FASTR ) 01147 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 01148 $ ONE+T*APOAQ*AAPQ ) ) 01149 AAPP = AAPP*SQRT( AMAX1( ZERO, 01150 $ ONE-T*AQOAP*AAPQ ) ) 01151 MXSINJ = AMAX1( MXSINJ, ABS( T ) ) 01152 ELSE 01153 * 01154 * .. choose correct signum for THETA and rotate 01155 * 01156 THSIGN = -SIGN( ONE, AAPQ ) 01157 IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN 01158 T = ONE / ( THETA+THSIGN* 01159 $ SQRT( ONE+THETA*THETA ) ) 01160 CS = SQRT( ONE / ( ONE+T*T ) ) 01161 SN = T*CS 01162 MXSINJ = AMAX1( MXSINJ, ABS( SN ) ) 01163 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 01164 $ ONE+T*APOAQ*AAPQ ) ) 01165 AAPP = AAPP*SQRT( AMAX1( ZERO, 01166 $ ONE-T*AQOAP*AAPQ ) ) 01167 * 01168 APOAQ = WORK( p ) / WORK( q ) 01169 AQOAP = WORK( q ) / WORK( p ) 01170 IF( WORK( p ).GE.ONE ) THEN 01171 * 01172 IF( WORK( q ).GE.ONE ) THEN 01173 FASTR( 3 ) = T*APOAQ 01174 FASTR( 4 ) = -T*AQOAP 01175 WORK( p ) = WORK( p )*CS 01176 WORK( q ) = WORK( q )*CS 01177 CALL SROTM( M, A( 1, p ), 1, 01178 $ A( 1, q ), 1, 01179 $ FASTR ) 01180 IF( RSVEC )CALL SROTM( MVL, 01181 $ V( 1, p ), 1, V( 1, q ), 01182 $ 1, FASTR ) 01183 ELSE 01184 CALL SAXPY( M, -T*AQOAP, 01185 $ A( 1, q ), 1, 01186 $ A( 1, p ), 1 ) 01187 CALL SAXPY( M, CS*SN*APOAQ, 01188 $ A( 1, p ), 1, 01189 $ A( 1, q ), 1 ) 01190 IF( RSVEC ) THEN 01191 CALL SAXPY( MVL, -T*AQOAP, 01192 $ V( 1, q ), 1, 01193 $ V( 1, p ), 1 ) 01194 CALL SAXPY( MVL, 01195 $ CS*SN*APOAQ, 01196 $ V( 1, p ), 1, 01197 $ V( 1, q ), 1 ) 01198 END IF 01199 WORK( p ) = WORK( p )*CS 01200 WORK( q ) = WORK( q ) / CS 01201 END IF 01202 ELSE 01203 IF( WORK( q ).GE.ONE ) THEN 01204 CALL SAXPY( M, T*APOAQ, 01205 $ A( 1, p ), 1, 01206 $ A( 1, q ), 1 ) 01207 CALL SAXPY( M, -CS*SN*AQOAP, 01208 $ A( 1, q ), 1, 01209 $ A( 1, p ), 1 ) 01210 IF( RSVEC ) THEN 01211 CALL SAXPY( MVL, T*APOAQ, 01212 $ V( 1, p ), 1, 01213 $ V( 1, q ), 1 ) 01214 CALL SAXPY( MVL, 01215 $ -CS*SN*AQOAP, 01216 $ V( 1, q ), 1, 01217 $ V( 1, p ), 1 ) 01218 END IF 01219 WORK( p ) = WORK( p ) / CS 01220 WORK( q ) = WORK( q )*CS 01221 ELSE 01222 IF( WORK( p ).GE.WORK( q ) ) 01223 $ THEN 01224 CALL SAXPY( M, -T*AQOAP, 01225 $ A( 1, q ), 1, 01226 $ A( 1, p ), 1 ) 01227 CALL SAXPY( M, CS*SN*APOAQ, 01228 $ A( 1, p ), 1, 01229 $ A( 1, q ), 1 ) 01230 WORK( p ) = WORK( p )*CS 01231 WORK( q ) = WORK( q ) / CS 01232 IF( RSVEC ) THEN 01233 CALL SAXPY( MVL, 01234 $ -T*AQOAP, 01235 $ V( 1, q ), 1, 01236 $ V( 1, p ), 1 ) 01237 CALL SAXPY( MVL, 01238 $ CS*SN*APOAQ, 01239 $ V( 1, p ), 1, 01240 $ V( 1, q ), 1 ) 01241 END IF 01242 ELSE 01243 CALL SAXPY( M, T*APOAQ, 01244 $ A( 1, p ), 1, 01245 $ A( 1, q ), 1 ) 01246 CALL SAXPY( M, 01247 $ -CS*SN*AQOAP, 01248 $ A( 1, q ), 1, 01249 $ A( 1, p ), 1 ) 01250 WORK( p ) = WORK( p ) / CS 01251 WORK( q ) = WORK( q )*CS 01252 IF( RSVEC ) THEN 01253 CALL SAXPY( MVL, 01254 $ T*APOAQ, V( 1, p ), 01255 $ 1, V( 1, q ), 1 ) 01256 CALL SAXPY( MVL, 01257 $ -CS*SN*AQOAP, 01258 $ V( 1, q ), 1, 01259 $ V( 1, p ), 1 ) 01260 END IF 01261 END IF 01262 END IF 01263 END IF 01264 END IF 01265 * 01266 ELSE 01267 IF( AAPP.GT.AAQQ ) THEN 01268 CALL SCOPY( M, A( 1, p ), 1, 01269 $ WORK( N+1 ), 1 ) 01270 CALL SLASCL( 'G', 0, 0, AAPP, ONE, 01271 $ M, 1, WORK( N+1 ), LDA, 01272 $ IERR ) 01273 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, 01274 $ M, 1, A( 1, q ), LDA, 01275 $ IERR ) 01276 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 01277 CALL SAXPY( M, TEMP1, WORK( N+1 ), 01278 $ 1, A( 1, q ), 1 ) 01279 CALL SLASCL( 'G', 0, 0, ONE, AAQQ, 01280 $ M, 1, A( 1, q ), LDA, 01281 $ IERR ) 01282 SVA( q ) = AAQQ*SQRT( AMAX1( ZERO, 01283 $ ONE-AAPQ*AAPQ ) ) 01284 MXSINJ = AMAX1( MXSINJ, SFMIN ) 01285 ELSE 01286 CALL SCOPY( M, A( 1, q ), 1, 01287 $ WORK( N+1 ), 1 ) 01288 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, 01289 $ M, 1, WORK( N+1 ), LDA, 01290 $ IERR ) 01291 CALL SLASCL( 'G', 0, 0, AAPP, ONE, 01292 $ M, 1, A( 1, p ), LDA, 01293 $ IERR ) 01294 TEMP1 = -AAPQ*WORK( q ) / WORK( p ) 01295 CALL SAXPY( M, TEMP1, WORK( N+1 ), 01296 $ 1, A( 1, p ), 1 ) 01297 CALL SLASCL( 'G', 0, 0, ONE, AAPP, 01298 $ M, 1, A( 1, p ), LDA, 01299 $ IERR ) 01300 SVA( p ) = AAPP*SQRT( AMAX1( ZERO, 01301 $ ONE-AAPQ*AAPQ ) ) 01302 MXSINJ = AMAX1( MXSINJ, SFMIN ) 01303 END IF 01304 END IF 01305 * END IF ROTOK THEN ... ELSE 01306 * 01307 * In the case of cancellation in updating SVA(q) 01308 * .. recompute SVA(q) 01309 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 01310 $ THEN 01311 IF( ( AAQQ.LT.ROOTBIG ) .AND. 01312 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN 01313 SVA( q ) = SNRM2( M, A( 1, q ), 1 )* 01314 $ WORK( q ) 01315 ELSE 01316 T = ZERO 01317 AAQQ = ONE 01318 CALL SLASSQ( M, A( 1, q ), 1, T, 01319 $ AAQQ ) 01320 SVA( q ) = T*SQRT( AAQQ )*WORK( q ) 01321 END IF 01322 END IF 01323 IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN 01324 IF( ( AAPP.LT.ROOTBIG ) .AND. 01325 $ ( AAPP.GT.ROOTSFMIN ) ) THEN 01326 AAPP = SNRM2( M, A( 1, p ), 1 )* 01327 $ WORK( p ) 01328 ELSE 01329 T = ZERO 01330 AAPP = ONE 01331 CALL SLASSQ( M, A( 1, p ), 1, T, 01332 $ AAPP ) 01333 AAPP = T*SQRT( AAPP )*WORK( p ) 01334 END IF 01335 SVA( p ) = AAPP 01336 END IF 01337 * end of OK rotation 01338 ELSE 01339 NOTROT = NOTROT + 1 01340 *[RTD] SKIPPED = SKIPPED + 1 01341 PSKIPPED = PSKIPPED + 1 01342 IJBLSK = IJBLSK + 1 01343 END IF 01344 ELSE 01345 NOTROT = NOTROT + 1 01346 PSKIPPED = PSKIPPED + 1 01347 IJBLSK = IJBLSK + 1 01348 END IF 01349 * 01350 IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) ) 01351 $ THEN 01352 SVA( p ) = AAPP 01353 NOTROT = 0 01354 GO TO 2011 01355 END IF 01356 IF( ( i.LE.SWBAND ) .AND. 01357 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN 01358 AAPP = -AAPP 01359 NOTROT = 0 01360 GO TO 2203 01361 END IF 01362 * 01363 2200 CONTINUE 01364 * end of the q-loop 01365 2203 CONTINUE 01366 * 01367 SVA( p ) = AAPP 01368 * 01369 ELSE 01370 * 01371 IF( AAPP.EQ.ZERO )NOTROT = NOTROT + 01372 $ MIN0( jgl+KBL-1, N ) - jgl + 1 01373 IF( AAPP.LT.ZERO )NOTROT = 0 01374 * 01375 END IF 01376 * 01377 2100 CONTINUE 01378 * end of the p-loop 01379 2010 CONTINUE 01380 * end of the jbc-loop 01381 2011 CONTINUE 01382 *2011 bailed out of the jbc-loop 01383 DO 2012 p = igl, MIN0( igl+KBL-1, N ) 01384 SVA( p ) = ABS( SVA( p ) ) 01385 2012 CONTINUE 01386 *** 01387 2000 CONTINUE 01388 *2000 :: end of the ibr-loop 01389 * 01390 * .. update SVA(N) 01391 IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) ) 01392 $ THEN 01393 SVA( N ) = SNRM2( M, A( 1, N ), 1 )*WORK( N ) 01394 ELSE 01395 T = ZERO 01396 AAPP = ONE 01397 CALL SLASSQ( M, A( 1, N ), 1, T, AAPP ) 01398 SVA( N ) = T*SQRT( AAPP )*WORK( N ) 01399 END IF 01400 * 01401 * Additional steering devices 01402 * 01403 IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR. 01404 $ ( ISWROT.LE.N ) ) )SWBAND = i 01405 * 01406 IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.SQRT( FLOAT( N ) )* 01407 $ TOL ) .AND. ( FLOAT( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN 01408 GO TO 1994 01409 END IF 01410 * 01411 IF( NOTROT.GE.EMPTSW )GO TO 1994 01412 * 01413 1993 CONTINUE 01414 * end i=1:NSWEEP loop 01415 * 01416 * #:( Reaching this point means that the procedure has not converged. 01417 INFO = NSWEEP - 1 01418 GO TO 1995 01419 * 01420 1994 CONTINUE 01421 * #:) Reaching this point means numerical convergence after the i-th 01422 * sweep. 01423 * 01424 INFO = 0 01425 * #:) INFO = 0 confirms successful iterations. 01426 1995 CONTINUE 01427 * 01428 * Sort the singular values and find how many are above 01429 * the underflow threshold. 01430 * 01431 N2 = 0 01432 N4 = 0 01433 DO 5991 p = 1, N - 1 01434 q = ISAMAX( N-p+1, SVA( p ), 1 ) + p - 1 01435 IF( p.NE.q ) THEN 01436 TEMP1 = SVA( p ) 01437 SVA( p ) = SVA( q ) 01438 SVA( q ) = TEMP1 01439 TEMP1 = WORK( p ) 01440 WORK( p ) = WORK( q ) 01441 WORK( q ) = TEMP1 01442 CALL SSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 01443 IF( RSVEC )CALL SSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 ) 01444 END IF 01445 IF( SVA( p ).NE.ZERO ) THEN 01446 N4 = N4 + 1 01447 IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1 01448 END IF 01449 5991 CONTINUE 01450 IF( SVA( N ).NE.ZERO ) THEN 01451 N4 = N4 + 1 01452 IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1 01453 END IF 01454 * 01455 * Normalize the left singular vectors. 01456 * 01457 IF( LSVEC .OR. UCTOL ) THEN 01458 DO 1998 p = 1, N2 01459 CALL SSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 ) 01460 1998 CONTINUE 01461 END IF 01462 * 01463 * Scale the product of Jacobi rotations (assemble the fast rotations). 01464 * 01465 IF( RSVEC ) THEN 01466 IF( APPLV ) THEN 01467 DO 2398 p = 1, N 01468 CALL SSCAL( MVL, WORK( p ), V( 1, p ), 1 ) 01469 2398 CONTINUE 01470 ELSE 01471 DO 2399 p = 1, N 01472 TEMP1 = ONE / SNRM2( MVL, V( 1, p ), 1 ) 01473 CALL SSCAL( MVL, TEMP1, V( 1, p ), 1 ) 01474 2399 CONTINUE 01475 END IF 01476 END IF 01477 * 01478 * Undo scaling, if necessary (and possible). 01479 IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG / 01480 $ SKL ) ) ) .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( N2 ).GT. 01481 $ ( SFMIN / SKL ) ) ) ) THEN 01482 DO 2400 p = 1, N 01483 SVA( p ) = SKL*SVA( p ) 01484 2400 CONTINUE 01485 SKL = ONE 01486 END IF 01487 * 01488 WORK( 1 ) = SKL 01489 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE 01490 * then some of the singular values may overflow or underflow and 01491 * the spectrum is given in this factored representation. 01492 * 01493 WORK( 2 ) = FLOAT( N4 ) 01494 * N4 is the number of computed nonzero singular values of A. 01495 * 01496 WORK( 3 ) = FLOAT( N2 ) 01497 * N2 is the number of singular values of A greater than SFMIN. 01498 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers 01499 * that may carry some information. 01500 * 01501 WORK( 4 ) = FLOAT( i ) 01502 * i is the index of the last sweep before declaring convergence. 01503 * 01504 WORK( 5 ) = MXAAPQ 01505 * MXAAPQ is the largest absolute value of scaled pivots in the 01506 * last sweep 01507 * 01508 WORK( 6 ) = MXSINJ 01509 * MXSINJ is the largest absolute value of the sines of Jacobi angles 01510 * in the last sweep 01511 * 01512 RETURN 01513 * .. 01514 * .. END OF SGESVJ 01515 * .. 01516 END