LAPACK 3.3.0
|
00001 SUBROUTINE ZLAQR5( WANTT, WANTZ, KACC22, N, KTOP, KBOT, NSHFTS, S, 00002 $ H, LDH, ILOZ, IHIZ, Z, LDZ, V, LDV, U, LDU, NV, 00003 $ WV, LDWV, NH, WH, LDWH ) 00004 * 00005 * -- LAPACK auxiliary routine (version 3.3.0) -- 00006 * Univ. of Tennessee, Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd.. 00007 * November 2010 00008 * 00009 * .. Scalar Arguments .. 00010 INTEGER IHIZ, ILOZ, KACC22, KBOT, KTOP, LDH, LDU, LDV, 00011 $ LDWH, LDWV, LDZ, N, NH, NSHFTS, NV 00012 LOGICAL WANTT, WANTZ 00013 * .. 00014 * .. Array Arguments .. 00015 COMPLEX*16 H( LDH, * ), S( * ), U( LDU, * ), V( LDV, * ), 00016 $ WH( LDWH, * ), WV( LDWV, * ), Z( LDZ, * ) 00017 * .. 00018 * 00019 * This auxiliary subroutine called by ZLAQR0 performs a 00020 * single small-bulge multi-shift QR sweep. 00021 * 00022 * WANTT (input) logical scalar 00023 * WANTT = .true. if the triangular Schur factor 00024 * is being computed. WANTT is set to .false. otherwise. 00025 * 00026 * WANTZ (input) logical scalar 00027 * WANTZ = .true. if the unitary Schur factor is being 00028 * computed. WANTZ is set to .false. otherwise. 00029 * 00030 * KACC22 (input) integer with value 0, 1, or 2. 00031 * Specifies the computation mode of far-from-diagonal 00032 * orthogonal updates. 00033 * = 0: ZLAQR5 does not accumulate reflections and does not 00034 * use matrix-matrix multiply to update far-from-diagonal 00035 * matrix entries. 00036 * = 1: ZLAQR5 accumulates reflections and uses matrix-matrix 00037 * multiply to update the far-from-diagonal matrix entries. 00038 * = 2: ZLAQR5 accumulates reflections, uses matrix-matrix 00039 * multiply to update the far-from-diagonal matrix entries, 00040 * and takes advantage of 2-by-2 block structure during 00041 * matrix multiplies. 00042 * 00043 * N (input) integer scalar 00044 * N is the order of the Hessenberg matrix H upon which this 00045 * subroutine operates. 00046 * 00047 * KTOP (input) integer scalar 00048 * KBOT (input) integer scalar 00049 * These are the first and last rows and columns of an 00050 * isolated diagonal block upon which the QR sweep is to be 00051 * applied. It is assumed without a check that 00052 * either KTOP = 1 or H(KTOP,KTOP-1) = 0 00053 * and 00054 * either KBOT = N or H(KBOT+1,KBOT) = 0. 00055 * 00056 * NSHFTS (input) integer scalar 00057 * NSHFTS gives the number of simultaneous shifts. NSHFTS 00058 * must be positive and even. 00059 * 00060 * S (input/output) COMPLEX*16 array of size (NSHFTS) 00061 * S contains the shifts of origin that define the multi- 00062 * shift QR sweep. On output S may be reordered. 00063 * 00064 * H (input/output) COMPLEX*16 array of size (LDH,N) 00065 * On input H contains a Hessenberg matrix. On output a 00066 * multi-shift QR sweep with shifts SR(J)+i*SI(J) is applied 00067 * to the isolated diagonal block in rows and columns KTOP 00068 * through KBOT. 00069 * 00070 * LDH (input) integer scalar 00071 * LDH is the leading dimension of H just as declared in the 00072 * calling procedure. LDH.GE.MAX(1,N). 00073 * 00074 * ILOZ (input) INTEGER 00075 * IHIZ (input) INTEGER 00076 * Specify the rows of Z to which transformations must be 00077 * applied if WANTZ is .TRUE.. 1 .LE. ILOZ .LE. IHIZ .LE. N 00078 * 00079 * Z (input/output) COMPLEX*16 array of size (LDZ,IHI) 00080 * If WANTZ = .TRUE., then the QR Sweep unitary 00081 * similarity transformation is accumulated into 00082 * Z(ILOZ:IHIZ,ILO:IHI) from the right. 00083 * If WANTZ = .FALSE., then Z is unreferenced. 00084 * 00085 * LDZ (input) integer scalar 00086 * LDA is the leading dimension of Z just as declared in 00087 * the calling procedure. LDZ.GE.N. 00088 * 00089 * V (workspace) COMPLEX*16 array of size (LDV,NSHFTS/2) 00090 * 00091 * LDV (input) integer scalar 00092 * LDV is the leading dimension of V as declared in the 00093 * calling procedure. LDV.GE.3. 00094 * 00095 * U (workspace) COMPLEX*16 array of size 00096 * (LDU,3*NSHFTS-3) 00097 * 00098 * LDU (input) integer scalar 00099 * LDU is the leading dimension of U just as declared in the 00100 * in the calling subroutine. LDU.GE.3*NSHFTS-3. 00101 * 00102 * NH (input) integer scalar 00103 * NH is the number of columns in array WH available for 00104 * workspace. NH.GE.1. 00105 * 00106 * WH (workspace) COMPLEX*16 array of size (LDWH,NH) 00107 * 00108 * LDWH (input) integer scalar 00109 * Leading dimension of WH just as declared in the 00110 * calling procedure. LDWH.GE.3*NSHFTS-3. 00111 * 00112 * NV (input) integer scalar 00113 * NV is the number of rows in WV agailable for workspace. 00114 * NV.GE.1. 00115 * 00116 * WV (workspace) COMPLEX*16 array of size 00117 * (LDWV,3*NSHFTS-3) 00118 * 00119 * LDWV (input) integer scalar 00120 * LDWV is the leading dimension of WV as declared in the 00121 * in the calling subroutine. LDWV.GE.NV. 00122 * 00123 * ================================================================ 00124 * Based on contributions by 00125 * Karen Braman and Ralph Byers, Department of Mathematics, 00126 * University of Kansas, USA 00127 * 00128 * ================================================================ 00129 * Reference: 00130 * 00131 * K. Braman, R. Byers and R. Mathias, The Multi-Shift QR 00132 * Algorithm Part I: Maintaining Well Focused Shifts, and 00133 * Level 3 Performance, SIAM Journal of Matrix Analysis, 00134 * volume 23, pages 929--947, 2002. 00135 * 00136 * ================================================================ 00137 * .. Parameters .. 00138 COMPLEX*16 ZERO, ONE 00139 PARAMETER ( ZERO = ( 0.0d0, 0.0d0 ), 00140 $ ONE = ( 1.0d0, 0.0d0 ) ) 00141 DOUBLE PRECISION RZERO, RONE 00142 PARAMETER ( RZERO = 0.0d0, RONE = 1.0d0 ) 00143 * .. 00144 * .. Local Scalars .. 00145 COMPLEX*16 ALPHA, BETA, CDUM, REFSUM 00146 DOUBLE PRECISION H11, H12, H21, H22, SAFMAX, SAFMIN, SCL, 00147 $ SMLNUM, TST1, TST2, ULP 00148 INTEGER I2, I4, INCOL, J, J2, J4, JBOT, JCOL, JLEN, 00149 $ JROW, JTOP, K, K1, KDU, KMS, KNZ, KRCOL, KZS, 00150 $ M, M22, MBOT, MEND, MSTART, MTOP, NBMPS, NDCOL, 00151 $ NS, NU 00152 LOGICAL ACCUM, BLK22, BMP22 00153 * .. 00154 * .. External Functions .. 00155 DOUBLE PRECISION DLAMCH 00156 EXTERNAL DLAMCH 00157 * .. 00158 * .. Intrinsic Functions .. 00159 * 00160 INTRINSIC ABS, DBLE, DCONJG, DIMAG, MAX, MIN, MOD 00161 * .. 00162 * .. Local Arrays .. 00163 COMPLEX*16 VT( 3 ) 00164 * .. 00165 * .. External Subroutines .. 00166 EXTERNAL DLABAD, ZGEMM, ZLACPY, ZLAQR1, ZLARFG, ZLASET, 00167 $ ZTRMM 00168 * .. 00169 * .. Statement Functions .. 00170 DOUBLE PRECISION CABS1 00171 * .. 00172 * .. Statement Function definitions .. 00173 CABS1( CDUM ) = ABS( DBLE( CDUM ) ) + ABS( DIMAG( CDUM ) ) 00174 * .. 00175 * .. Executable Statements .. 00176 * 00177 * ==== If there are no shifts, then there is nothing to do. ==== 00178 * 00179 IF( NSHFTS.LT.2 ) 00180 $ RETURN 00181 * 00182 * ==== If the active block is empty or 1-by-1, then there 00183 * . is nothing to do. ==== 00184 * 00185 IF( KTOP.GE.KBOT ) 00186 $ RETURN 00187 * 00188 * ==== NSHFTS is supposed to be even, but if it is odd, 00189 * . then simply reduce it by one. ==== 00190 * 00191 NS = NSHFTS - MOD( NSHFTS, 2 ) 00192 * 00193 * ==== Machine constants for deflation ==== 00194 * 00195 SAFMIN = DLAMCH( 'SAFE MINIMUM' ) 00196 SAFMAX = RONE / SAFMIN 00197 CALL DLABAD( SAFMIN, SAFMAX ) 00198 ULP = DLAMCH( 'PRECISION' ) 00199 SMLNUM = SAFMIN*( DBLE( N ) / ULP ) 00200 * 00201 * ==== Use accumulated reflections to update far-from-diagonal 00202 * . entries ? ==== 00203 * 00204 ACCUM = ( KACC22.EQ.1 ) .OR. ( KACC22.EQ.2 ) 00205 * 00206 * ==== If so, exploit the 2-by-2 block structure? ==== 00207 * 00208 BLK22 = ( NS.GT.2 ) .AND. ( KACC22.EQ.2 ) 00209 * 00210 * ==== clear trash ==== 00211 * 00212 IF( KTOP+2.LE.KBOT ) 00213 $ H( KTOP+2, KTOP ) = ZERO 00214 * 00215 * ==== NBMPS = number of 2-shift bulges in the chain ==== 00216 * 00217 NBMPS = NS / 2 00218 * 00219 * ==== KDU = width of slab ==== 00220 * 00221 KDU = 6*NBMPS - 3 00222 * 00223 * ==== Create and chase chains of NBMPS bulges ==== 00224 * 00225 DO 210 INCOL = 3*( 1-NBMPS ) + KTOP - 1, KBOT - 2, 3*NBMPS - 2 00226 NDCOL = INCOL + KDU 00227 IF( ACCUM ) 00228 $ CALL ZLASET( 'ALL', KDU, KDU, ZERO, ONE, U, LDU ) 00229 * 00230 * ==== Near-the-diagonal bulge chase. The following loop 00231 * . performs the near-the-diagonal part of a small bulge 00232 * . multi-shift QR sweep. Each 6*NBMPS-2 column diagonal 00233 * . chunk extends from column INCOL to column NDCOL 00234 * . (including both column INCOL and column NDCOL). The 00235 * . following loop chases a 3*NBMPS column long chain of 00236 * . NBMPS bulges 3*NBMPS-2 columns to the right. (INCOL 00237 * . may be less than KTOP and and NDCOL may be greater than 00238 * . KBOT indicating phantom columns from which to chase 00239 * . bulges before they are actually introduced or to which 00240 * . to chase bulges beyond column KBOT.) ==== 00241 * 00242 DO 140 KRCOL = INCOL, MIN( INCOL+3*NBMPS-3, KBOT-2 ) 00243 * 00244 * ==== Bulges number MTOP to MBOT are active double implicit 00245 * . shift bulges. There may or may not also be small 00246 * . 2-by-2 bulge, if there is room. The inactive bulges 00247 * . (if any) must wait until the active bulges have moved 00248 * . down the diagonal to make room. The phantom matrix 00249 * . paradigm described above helps keep track. ==== 00250 * 00251 MTOP = MAX( 1, ( ( KTOP-1 )-KRCOL+2 ) / 3+1 ) 00252 MBOT = MIN( NBMPS, ( KBOT-KRCOL ) / 3 ) 00253 M22 = MBOT + 1 00254 BMP22 = ( MBOT.LT.NBMPS ) .AND. ( KRCOL+3*( M22-1 ) ).EQ. 00255 $ ( KBOT-2 ) 00256 * 00257 * ==== Generate reflections to chase the chain right 00258 * . one column. (The minimum value of K is KTOP-1.) ==== 00259 * 00260 DO 10 M = MTOP, MBOT 00261 K = KRCOL + 3*( M-1 ) 00262 IF( K.EQ.KTOP-1 ) THEN 00263 CALL ZLAQR1( 3, H( KTOP, KTOP ), LDH, S( 2*M-1 ), 00264 $ S( 2*M ), V( 1, M ) ) 00265 ALPHA = V( 1, M ) 00266 CALL ZLARFG( 3, ALPHA, V( 2, M ), 1, V( 1, M ) ) 00267 ELSE 00268 BETA = H( K+1, K ) 00269 V( 2, M ) = H( K+2, K ) 00270 V( 3, M ) = H( K+3, K ) 00271 CALL ZLARFG( 3, BETA, V( 2, M ), 1, V( 1, M ) ) 00272 * 00273 * ==== A Bulge may collapse because of vigilant 00274 * . deflation or destructive underflow. In the 00275 * . underflow case, try the two-small-subdiagonals 00276 * . trick to try to reinflate the bulge. ==== 00277 * 00278 IF( H( K+3, K ).NE.ZERO .OR. H( K+3, K+1 ).NE. 00279 $ ZERO .OR. H( K+3, K+2 ).EQ.ZERO ) THEN 00280 * 00281 * ==== Typical case: not collapsed (yet). ==== 00282 * 00283 H( K+1, K ) = BETA 00284 H( K+2, K ) = ZERO 00285 H( K+3, K ) = ZERO 00286 ELSE 00287 * 00288 * ==== Atypical case: collapsed. Attempt to 00289 * . reintroduce ignoring H(K+1,K) and H(K+2,K). 00290 * . If the fill resulting from the new 00291 * . reflector is too large, then abandon it. 00292 * . Otherwise, use the new one. ==== 00293 * 00294 CALL ZLAQR1( 3, H( K+1, K+1 ), LDH, S( 2*M-1 ), 00295 $ S( 2*M ), VT ) 00296 ALPHA = VT( 1 ) 00297 CALL ZLARFG( 3, ALPHA, VT( 2 ), 1, VT( 1 ) ) 00298 REFSUM = DCONJG( VT( 1 ) )* 00299 $ ( H( K+1, K )+DCONJG( VT( 2 ) )* 00300 $ H( K+2, K ) ) 00301 * 00302 IF( CABS1( H( K+2, K )-REFSUM*VT( 2 ) )+ 00303 $ CABS1( REFSUM*VT( 3 ) ).GT.ULP* 00304 $ ( CABS1( H( K, K ) )+CABS1( H( K+1, 00305 $ K+1 ) )+CABS1( H( K+2, K+2 ) ) ) ) THEN 00306 * 00307 * ==== Starting a new bulge here would 00308 * . create non-negligible fill. Use 00309 * . the old one with trepidation. ==== 00310 * 00311 H( K+1, K ) = BETA 00312 H( K+2, K ) = ZERO 00313 H( K+3, K ) = ZERO 00314 ELSE 00315 * 00316 * ==== Stating a new bulge here would 00317 * . create only negligible fill. 00318 * . Replace the old reflector with 00319 * . the new one. ==== 00320 * 00321 H( K+1, K ) = H( K+1, K ) - REFSUM 00322 H( K+2, K ) = ZERO 00323 H( K+3, K ) = ZERO 00324 V( 1, M ) = VT( 1 ) 00325 V( 2, M ) = VT( 2 ) 00326 V( 3, M ) = VT( 3 ) 00327 END IF 00328 END IF 00329 END IF 00330 10 CONTINUE 00331 * 00332 * ==== Generate a 2-by-2 reflection, if needed. ==== 00333 * 00334 K = KRCOL + 3*( M22-1 ) 00335 IF( BMP22 ) THEN 00336 IF( K.EQ.KTOP-1 ) THEN 00337 CALL ZLAQR1( 2, H( K+1, K+1 ), LDH, S( 2*M22-1 ), 00338 $ S( 2*M22 ), V( 1, M22 ) ) 00339 BETA = V( 1, M22 ) 00340 CALL ZLARFG( 2, BETA, V( 2, M22 ), 1, V( 1, M22 ) ) 00341 ELSE 00342 BETA = H( K+1, K ) 00343 V( 2, M22 ) = H( K+2, K ) 00344 CALL ZLARFG( 2, BETA, V( 2, M22 ), 1, V( 1, M22 ) ) 00345 H( K+1, K ) = BETA 00346 H( K+2, K ) = ZERO 00347 END IF 00348 END IF 00349 * 00350 * ==== Multiply H by reflections from the left ==== 00351 * 00352 IF( ACCUM ) THEN 00353 JBOT = MIN( NDCOL, KBOT ) 00354 ELSE IF( WANTT ) THEN 00355 JBOT = N 00356 ELSE 00357 JBOT = KBOT 00358 END IF 00359 DO 30 J = MAX( KTOP, KRCOL ), JBOT 00360 MEND = MIN( MBOT, ( J-KRCOL+2 ) / 3 ) 00361 DO 20 M = MTOP, MEND 00362 K = KRCOL + 3*( M-1 ) 00363 REFSUM = DCONJG( V( 1, M ) )* 00364 $ ( H( K+1, J )+DCONJG( V( 2, M ) )* 00365 $ H( K+2, J )+DCONJG( V( 3, M ) )*H( K+3, J ) ) 00366 H( K+1, J ) = H( K+1, J ) - REFSUM 00367 H( K+2, J ) = H( K+2, J ) - REFSUM*V( 2, M ) 00368 H( K+3, J ) = H( K+3, J ) - REFSUM*V( 3, M ) 00369 20 CONTINUE 00370 30 CONTINUE 00371 IF( BMP22 ) THEN 00372 K = KRCOL + 3*( M22-1 ) 00373 DO 40 J = MAX( K+1, KTOP ), JBOT 00374 REFSUM = DCONJG( V( 1, M22 ) )* 00375 $ ( H( K+1, J )+DCONJG( V( 2, M22 ) )* 00376 $ H( K+2, J ) ) 00377 H( K+1, J ) = H( K+1, J ) - REFSUM 00378 H( K+2, J ) = H( K+2, J ) - REFSUM*V( 2, M22 ) 00379 40 CONTINUE 00380 END IF 00381 * 00382 * ==== Multiply H by reflections from the right. 00383 * . Delay filling in the last row until the 00384 * . vigilant deflation check is complete. ==== 00385 * 00386 IF( ACCUM ) THEN 00387 JTOP = MAX( KTOP, INCOL ) 00388 ELSE IF( WANTT ) THEN 00389 JTOP = 1 00390 ELSE 00391 JTOP = KTOP 00392 END IF 00393 DO 80 M = MTOP, MBOT 00394 IF( V( 1, M ).NE.ZERO ) THEN 00395 K = KRCOL + 3*( M-1 ) 00396 DO 50 J = JTOP, MIN( KBOT, K+3 ) 00397 REFSUM = V( 1, M )*( H( J, K+1 )+V( 2, M )* 00398 $ H( J, K+2 )+V( 3, M )*H( J, K+3 ) ) 00399 H( J, K+1 ) = H( J, K+1 ) - REFSUM 00400 H( J, K+2 ) = H( J, K+2 ) - 00401 $ REFSUM*DCONJG( V( 2, M ) ) 00402 H( J, K+3 ) = H( J, K+3 ) - 00403 $ REFSUM*DCONJG( V( 3, M ) ) 00404 50 CONTINUE 00405 * 00406 IF( ACCUM ) THEN 00407 * 00408 * ==== Accumulate U. (If necessary, update Z later 00409 * . with with an efficient matrix-matrix 00410 * . multiply.) ==== 00411 * 00412 KMS = K - INCOL 00413 DO 60 J = MAX( 1, KTOP-INCOL ), KDU 00414 REFSUM = V( 1, M )*( U( J, KMS+1 )+V( 2, M )* 00415 $ U( J, KMS+2 )+V( 3, M )*U( J, KMS+3 ) ) 00416 U( J, KMS+1 ) = U( J, KMS+1 ) - REFSUM 00417 U( J, KMS+2 ) = U( J, KMS+2 ) - 00418 $ REFSUM*DCONJG( V( 2, M ) ) 00419 U( J, KMS+3 ) = U( J, KMS+3 ) - 00420 $ REFSUM*DCONJG( V( 3, M ) ) 00421 60 CONTINUE 00422 ELSE IF( WANTZ ) THEN 00423 * 00424 * ==== U is not accumulated, so update Z 00425 * . now by multiplying by reflections 00426 * . from the right. ==== 00427 * 00428 DO 70 J = ILOZ, IHIZ 00429 REFSUM = V( 1, M )*( Z( J, K+1 )+V( 2, M )* 00430 $ Z( J, K+2 )+V( 3, M )*Z( J, K+3 ) ) 00431 Z( J, K+1 ) = Z( J, K+1 ) - REFSUM 00432 Z( J, K+2 ) = Z( J, K+2 ) - 00433 $ REFSUM*DCONJG( V( 2, M ) ) 00434 Z( J, K+3 ) = Z( J, K+3 ) - 00435 $ REFSUM*DCONJG( V( 3, M ) ) 00436 70 CONTINUE 00437 END IF 00438 END IF 00439 80 CONTINUE 00440 * 00441 * ==== Special case: 2-by-2 reflection (if needed) ==== 00442 * 00443 K = KRCOL + 3*( M22-1 ) 00444 IF( BMP22 ) THEN 00445 IF ( V( 1, M22 ).NE.ZERO ) THEN 00446 DO 90 J = JTOP, MIN( KBOT, K+3 ) 00447 REFSUM = V( 1, M22 )*( H( J, K+1 )+V( 2, M22 )* 00448 $ H( J, K+2 ) ) 00449 H( J, K+1 ) = H( J, K+1 ) - REFSUM 00450 H( J, K+2 ) = H( J, K+2 ) - 00451 $ REFSUM*DCONJG( V( 2, M22 ) ) 00452 90 CONTINUE 00453 * 00454 IF( ACCUM ) THEN 00455 KMS = K - INCOL 00456 DO 100 J = MAX( 1, KTOP-INCOL ), KDU 00457 REFSUM = V( 1, M22 )*( U( J, KMS+1 )+ 00458 $ V( 2, M22 )*U( J, KMS+2 ) ) 00459 U( J, KMS+1 ) = U( J, KMS+1 ) - REFSUM 00460 U( J, KMS+2 ) = U( J, KMS+2 ) - 00461 $ REFSUM*DCONJG( V( 2, M22 ) ) 00462 100 CONTINUE 00463 ELSE IF( WANTZ ) THEN 00464 DO 110 J = ILOZ, IHIZ 00465 REFSUM = V( 1, M22 )*( Z( J, K+1 )+V( 2, M22 )* 00466 $ Z( J, K+2 ) ) 00467 Z( J, K+1 ) = Z( J, K+1 ) - REFSUM 00468 Z( J, K+2 ) = Z( J, K+2 ) - 00469 $ REFSUM*DCONJG( V( 2, M22 ) ) 00470 110 CONTINUE 00471 END IF 00472 END IF 00473 END IF 00474 * 00475 * ==== Vigilant deflation check ==== 00476 * 00477 MSTART = MTOP 00478 IF( KRCOL+3*( MSTART-1 ).LT.KTOP ) 00479 $ MSTART = MSTART + 1 00480 MEND = MBOT 00481 IF( BMP22 ) 00482 $ MEND = MEND + 1 00483 IF( KRCOL.EQ.KBOT-2 ) 00484 $ MEND = MEND + 1 00485 DO 120 M = MSTART, MEND 00486 K = MIN( KBOT-1, KRCOL+3*( M-1 ) ) 00487 * 00488 * ==== The following convergence test requires that 00489 * . the tradition small-compared-to-nearby-diagonals 00490 * . criterion and the Ahues & Tisseur (LAWN 122, 1997) 00491 * . criteria both be satisfied. The latter improves 00492 * . accuracy in some examples. Falling back on an 00493 * . alternate convergence criterion when TST1 or TST2 00494 * . is zero (as done here) is traditional but probably 00495 * . unnecessary. ==== 00496 * 00497 IF( H( K+1, K ).NE.ZERO ) THEN 00498 TST1 = CABS1( H( K, K ) ) + CABS1( H( K+1, K+1 ) ) 00499 IF( TST1.EQ.RZERO ) THEN 00500 IF( K.GE.KTOP+1 ) 00501 $ TST1 = TST1 + CABS1( H( K, K-1 ) ) 00502 IF( K.GE.KTOP+2 ) 00503 $ TST1 = TST1 + CABS1( H( K, K-2 ) ) 00504 IF( K.GE.KTOP+3 ) 00505 $ TST1 = TST1 + CABS1( H( K, K-3 ) ) 00506 IF( K.LE.KBOT-2 ) 00507 $ TST1 = TST1 + CABS1( H( K+2, K+1 ) ) 00508 IF( K.LE.KBOT-3 ) 00509 $ TST1 = TST1 + CABS1( H( K+3, K+1 ) ) 00510 IF( K.LE.KBOT-4 ) 00511 $ TST1 = TST1 + CABS1( H( K+4, K+1 ) ) 00512 END IF 00513 IF( CABS1( H( K+1, K ) ).LE.MAX( SMLNUM, ULP*TST1 ) ) 00514 $ THEN 00515 H12 = MAX( CABS1( H( K+1, K ) ), 00516 $ CABS1( H( K, K+1 ) ) ) 00517 H21 = MIN( CABS1( H( K+1, K ) ), 00518 $ CABS1( H( K, K+1 ) ) ) 00519 H11 = MAX( CABS1( H( K+1, K+1 ) ), 00520 $ CABS1( H( K, K )-H( K+1, K+1 ) ) ) 00521 H22 = MIN( CABS1( H( K+1, K+1 ) ), 00522 $ CABS1( H( K, K )-H( K+1, K+1 ) ) ) 00523 SCL = H11 + H12 00524 TST2 = H22*( H11 / SCL ) 00525 * 00526 IF( TST2.EQ.RZERO .OR. H21*( H12 / SCL ).LE. 00527 $ MAX( SMLNUM, ULP*TST2 ) )H( K+1, K ) = ZERO 00528 END IF 00529 END IF 00530 120 CONTINUE 00531 * 00532 * ==== Fill in the last row of each bulge. ==== 00533 * 00534 MEND = MIN( NBMPS, ( KBOT-KRCOL-1 ) / 3 ) 00535 DO 130 M = MTOP, MEND 00536 K = KRCOL + 3*( M-1 ) 00537 REFSUM = V( 1, M )*V( 3, M )*H( K+4, K+3 ) 00538 H( K+4, K+1 ) = -REFSUM 00539 H( K+4, K+2 ) = -REFSUM*DCONJG( V( 2, M ) ) 00540 H( K+4, K+3 ) = H( K+4, K+3 ) - 00541 $ REFSUM*DCONJG( V( 3, M ) ) 00542 130 CONTINUE 00543 * 00544 * ==== End of near-the-diagonal bulge chase. ==== 00545 * 00546 140 CONTINUE 00547 * 00548 * ==== Use U (if accumulated) to update far-from-diagonal 00549 * . entries in H. If required, use U to update Z as 00550 * . well. ==== 00551 * 00552 IF( ACCUM ) THEN 00553 IF( WANTT ) THEN 00554 JTOP = 1 00555 JBOT = N 00556 ELSE 00557 JTOP = KTOP 00558 JBOT = KBOT 00559 END IF 00560 IF( ( .NOT.BLK22 ) .OR. ( INCOL.LT.KTOP ) .OR. 00561 $ ( NDCOL.GT.KBOT ) .OR. ( NS.LE.2 ) ) THEN 00562 * 00563 * ==== Updates not exploiting the 2-by-2 block 00564 * . structure of U. K1 and NU keep track of 00565 * . the location and size of U in the special 00566 * . cases of introducing bulges and chasing 00567 * . bulges off the bottom. In these special 00568 * . cases and in case the number of shifts 00569 * . is NS = 2, there is no 2-by-2 block 00570 * . structure to exploit. ==== 00571 * 00572 K1 = MAX( 1, KTOP-INCOL ) 00573 NU = ( KDU-MAX( 0, NDCOL-KBOT ) ) - K1 + 1 00574 * 00575 * ==== Horizontal Multiply ==== 00576 * 00577 DO 150 JCOL = MIN( NDCOL, KBOT ) + 1, JBOT, NH 00578 JLEN = MIN( NH, JBOT-JCOL+1 ) 00579 CALL ZGEMM( 'C', 'N', NU, JLEN, NU, ONE, U( K1, K1 ), 00580 $ LDU, H( INCOL+K1, JCOL ), LDH, ZERO, WH, 00581 $ LDWH ) 00582 CALL ZLACPY( 'ALL', NU, JLEN, WH, LDWH, 00583 $ H( INCOL+K1, JCOL ), LDH ) 00584 150 CONTINUE 00585 * 00586 * ==== Vertical multiply ==== 00587 * 00588 DO 160 JROW = JTOP, MAX( KTOP, INCOL ) - 1, NV 00589 JLEN = MIN( NV, MAX( KTOP, INCOL )-JROW ) 00590 CALL ZGEMM( 'N', 'N', JLEN, NU, NU, ONE, 00591 $ H( JROW, INCOL+K1 ), LDH, U( K1, K1 ), 00592 $ LDU, ZERO, WV, LDWV ) 00593 CALL ZLACPY( 'ALL', JLEN, NU, WV, LDWV, 00594 $ H( JROW, INCOL+K1 ), LDH ) 00595 160 CONTINUE 00596 * 00597 * ==== Z multiply (also vertical) ==== 00598 * 00599 IF( WANTZ ) THEN 00600 DO 170 JROW = ILOZ, IHIZ, NV 00601 JLEN = MIN( NV, IHIZ-JROW+1 ) 00602 CALL ZGEMM( 'N', 'N', JLEN, NU, NU, ONE, 00603 $ Z( JROW, INCOL+K1 ), LDZ, U( K1, K1 ), 00604 $ LDU, ZERO, WV, LDWV ) 00605 CALL ZLACPY( 'ALL', JLEN, NU, WV, LDWV, 00606 $ Z( JROW, INCOL+K1 ), LDZ ) 00607 170 CONTINUE 00608 END IF 00609 ELSE 00610 * 00611 * ==== Updates exploiting U's 2-by-2 block structure. 00612 * . (I2, I4, J2, J4 are the last rows and columns 00613 * . of the blocks.) ==== 00614 * 00615 I2 = ( KDU+1 ) / 2 00616 I4 = KDU 00617 J2 = I4 - I2 00618 J4 = KDU 00619 * 00620 * ==== KZS and KNZ deal with the band of zeros 00621 * . along the diagonal of one of the triangular 00622 * . blocks. ==== 00623 * 00624 KZS = ( J4-J2 ) - ( NS+1 ) 00625 KNZ = NS + 1 00626 * 00627 * ==== Horizontal multiply ==== 00628 * 00629 DO 180 JCOL = MIN( NDCOL, KBOT ) + 1, JBOT, NH 00630 JLEN = MIN( NH, JBOT-JCOL+1 ) 00631 * 00632 * ==== Copy bottom of H to top+KZS of scratch ==== 00633 * (The first KZS rows get multiplied by zero.) ==== 00634 * 00635 CALL ZLACPY( 'ALL', KNZ, JLEN, H( INCOL+1+J2, JCOL ), 00636 $ LDH, WH( KZS+1, 1 ), LDWH ) 00637 * 00638 * ==== Multiply by U21' ==== 00639 * 00640 CALL ZLASET( 'ALL', KZS, JLEN, ZERO, ZERO, WH, LDWH ) 00641 CALL ZTRMM( 'L', 'U', 'C', 'N', KNZ, JLEN, ONE, 00642 $ U( J2+1, 1+KZS ), LDU, WH( KZS+1, 1 ), 00643 $ LDWH ) 00644 * 00645 * ==== Multiply top of H by U11' ==== 00646 * 00647 CALL ZGEMM( 'C', 'N', I2, JLEN, J2, ONE, U, LDU, 00648 $ H( INCOL+1, JCOL ), LDH, ONE, WH, LDWH ) 00649 * 00650 * ==== Copy top of H to bottom of WH ==== 00651 * 00652 CALL ZLACPY( 'ALL', J2, JLEN, H( INCOL+1, JCOL ), LDH, 00653 $ WH( I2+1, 1 ), LDWH ) 00654 * 00655 * ==== Multiply by U21' ==== 00656 * 00657 CALL ZTRMM( 'L', 'L', 'C', 'N', J2, JLEN, ONE, 00658 $ U( 1, I2+1 ), LDU, WH( I2+1, 1 ), LDWH ) 00659 * 00660 * ==== Multiply by U22 ==== 00661 * 00662 CALL ZGEMM( 'C', 'N', I4-I2, JLEN, J4-J2, ONE, 00663 $ U( J2+1, I2+1 ), LDU, 00664 $ H( INCOL+1+J2, JCOL ), LDH, ONE, 00665 $ WH( I2+1, 1 ), LDWH ) 00666 * 00667 * ==== Copy it back ==== 00668 * 00669 CALL ZLACPY( 'ALL', KDU, JLEN, WH, LDWH, 00670 $ H( INCOL+1, JCOL ), LDH ) 00671 180 CONTINUE 00672 * 00673 * ==== Vertical multiply ==== 00674 * 00675 DO 190 JROW = JTOP, MAX( INCOL, KTOP ) - 1, NV 00676 JLEN = MIN( NV, MAX( INCOL, KTOP )-JROW ) 00677 * 00678 * ==== Copy right of H to scratch (the first KZS 00679 * . columns get multiplied by zero) ==== 00680 * 00681 CALL ZLACPY( 'ALL', JLEN, KNZ, H( JROW, INCOL+1+J2 ), 00682 $ LDH, WV( 1, 1+KZS ), LDWV ) 00683 * 00684 * ==== Multiply by U21 ==== 00685 * 00686 CALL ZLASET( 'ALL', JLEN, KZS, ZERO, ZERO, WV, LDWV ) 00687 CALL ZTRMM( 'R', 'U', 'N', 'N', JLEN, KNZ, ONE, 00688 $ U( J2+1, 1+KZS ), LDU, WV( 1, 1+KZS ), 00689 $ LDWV ) 00690 * 00691 * ==== Multiply by U11 ==== 00692 * 00693 CALL ZGEMM( 'N', 'N', JLEN, I2, J2, ONE, 00694 $ H( JROW, INCOL+1 ), LDH, U, LDU, ONE, WV, 00695 $ LDWV ) 00696 * 00697 * ==== Copy left of H to right of scratch ==== 00698 * 00699 CALL ZLACPY( 'ALL', JLEN, J2, H( JROW, INCOL+1 ), LDH, 00700 $ WV( 1, 1+I2 ), LDWV ) 00701 * 00702 * ==== Multiply by U21 ==== 00703 * 00704 CALL ZTRMM( 'R', 'L', 'N', 'N', JLEN, I4-I2, ONE, 00705 $ U( 1, I2+1 ), LDU, WV( 1, 1+I2 ), LDWV ) 00706 * 00707 * ==== Multiply by U22 ==== 00708 * 00709 CALL ZGEMM( 'N', 'N', JLEN, I4-I2, J4-J2, ONE, 00710 $ H( JROW, INCOL+1+J2 ), LDH, 00711 $ U( J2+1, I2+1 ), LDU, ONE, WV( 1, 1+I2 ), 00712 $ LDWV ) 00713 * 00714 * ==== Copy it back ==== 00715 * 00716 CALL ZLACPY( 'ALL', JLEN, KDU, WV, LDWV, 00717 $ H( JROW, INCOL+1 ), LDH ) 00718 190 CONTINUE 00719 * 00720 * ==== Multiply Z (also vertical) ==== 00721 * 00722 IF( WANTZ ) THEN 00723 DO 200 JROW = ILOZ, IHIZ, NV 00724 JLEN = MIN( NV, IHIZ-JROW+1 ) 00725 * 00726 * ==== Copy right of Z to left of scratch (first 00727 * . KZS columns get multiplied by zero) ==== 00728 * 00729 CALL ZLACPY( 'ALL', JLEN, KNZ, 00730 $ Z( JROW, INCOL+1+J2 ), LDZ, 00731 $ WV( 1, 1+KZS ), LDWV ) 00732 * 00733 * ==== Multiply by U12 ==== 00734 * 00735 CALL ZLASET( 'ALL', JLEN, KZS, ZERO, ZERO, WV, 00736 $ LDWV ) 00737 CALL ZTRMM( 'R', 'U', 'N', 'N', JLEN, KNZ, ONE, 00738 $ U( J2+1, 1+KZS ), LDU, WV( 1, 1+KZS ), 00739 $ LDWV ) 00740 * 00741 * ==== Multiply by U11 ==== 00742 * 00743 CALL ZGEMM( 'N', 'N', JLEN, I2, J2, ONE, 00744 $ Z( JROW, INCOL+1 ), LDZ, U, LDU, ONE, 00745 $ WV, LDWV ) 00746 * 00747 * ==== Copy left of Z to right of scratch ==== 00748 * 00749 CALL ZLACPY( 'ALL', JLEN, J2, Z( JROW, INCOL+1 ), 00750 $ LDZ, WV( 1, 1+I2 ), LDWV ) 00751 * 00752 * ==== Multiply by U21 ==== 00753 * 00754 CALL ZTRMM( 'R', 'L', 'N', 'N', JLEN, I4-I2, ONE, 00755 $ U( 1, I2+1 ), LDU, WV( 1, 1+I2 ), 00756 $ LDWV ) 00757 * 00758 * ==== Multiply by U22 ==== 00759 * 00760 CALL ZGEMM( 'N', 'N', JLEN, I4-I2, J4-J2, ONE, 00761 $ Z( JROW, INCOL+1+J2 ), LDZ, 00762 $ U( J2+1, I2+1 ), LDU, ONE, 00763 $ WV( 1, 1+I2 ), LDWV ) 00764 * 00765 * ==== Copy the result back to Z ==== 00766 * 00767 CALL ZLACPY( 'ALL', JLEN, KDU, WV, LDWV, 00768 $ Z( JROW, INCOL+1 ), LDZ ) 00769 200 CONTINUE 00770 END IF 00771 END IF 00772 END IF 00773 210 CONTINUE 00774 * 00775 * ==== End of ZLAQR5 ==== 00776 * 00777 END