SUBROUTINE PSLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, WR, WI, $ ILOZ, IHIZ, Z, DESCZ, WORK, LWORK, IWORK, $ ILWORK, INFO ) * * -- ScaLAPACK routine (version 2.0.2) -- * Univ. of Tennessee, Univ. of California Berkeley, Univ. of Colorado Denver * May 1 2012 * * .. Scalar Arguments .. LOGICAL WANTT, WANTZ INTEGER IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N * .. * .. Array Arguments .. INTEGER DESCA( * ), DESCZ( * ), IWORK( * ) REAL A( * ), WI( * ), WORK( * ), WR( * ), Z( * ) * .. * * Purpose * ======= * * PSLAHQR is an auxiliary routine used to find the Schur decomposition * and or eigenvalues of a matrix already in Hessenberg form from * cols ILO to IHI. * * Notes * ===== * * Each global data object is described by an associated description * vector. This vector stores the information required to establish * the mapping between an object element and its corresponding process * and memory location. * * Let A be a generic term for any 2D block cyclicly distributed array. * Such a global array has an associated description vector DESCA. * In the following comments, the character _ should be read as * "of the global array". * * NOTATION STORED IN EXPLANATION * --------------- -------------- -------------------------------------- * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, * DTYPE_A = 1. * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating * the BLACS process grid A is distribu- * ted over. The context itself is glo- * bal, but the handle (the integer * value) may vary. * M_A (global) DESCA( M_ ) The number of rows in the global * array A. * N_A (global) DESCA( N_ ) The number of columns in the global * array A. * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute * the rows of the array. * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute * the columns of the array. * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first * row of the array A is distributed. * CSRC_A (global) DESCA( CSRC_ ) The process column over which the * first column of the array A is * distributed. * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local * array. LLD_A >= MAX(1,LOCr(M_A)). * * Let K be the number of rows or columns of a distributed matrix, * and assume that its process grid has dimension p x q. * LOCr( K ) denotes the number of elements of K that a process * would receive if K were distributed over the p processes of its * process column. * Similarly, LOCc( K ) denotes the number of elements of K that a * process would receive if K were distributed over the q processes of * its process row. * The values of LOCr() and LOCc() may be determined via a call to the * ScaLAPACK tool function, NUMROC: * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). * An upper bound for these quantities may be computed by: * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A * * Arguments * ========= * * WANTT (global input) LOGICAL * = .TRUE. : the full Schur form T is required; * = .FALSE.: only eigenvalues are required. * * WANTZ (global input) LOGICAL * = .TRUE. : the matrix of Schur vectors Z is required; * = .FALSE.: Schur vectors are not required. * * N (global input) INTEGER * The order of the Hessenberg matrix A (and Z if WANTZ). * N >= 0. * * ILO (global input) INTEGER * IHI (global input) INTEGER * It is assumed that A is already upper quasi-triangular in * rows and columns IHI+1:N, and that A(ILO,ILO-1) = 0 (unless * ILO = 1). PSLAHQR works primarily with the Hessenberg * submatrix in rows and columns ILO to IHI, but applies * transformations to all of H if WANTT is .TRUE.. * 1 <= ILO <= max(1,IHI); IHI <= N. * * A (global input/output) REAL array, dimension * (DESCA(LLD_),*) * On entry, the upper Hessenberg matrix A. * On exit, if WANTT is .TRUE., A is upper quasi-triangular in * rows and columns ILO:IHI, with any 2-by-2 or larger diagonal * blocks not yet in standard form. If WANTT is .FALSE., the * contents of A are unspecified on exit. * * DESCA (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix A. * * WR (global replicated output) REAL array, * dimension (N) * WI (global replicated output) REAL array, * dimension (N) * The real and imaginary parts, respectively, of the computed * eigenvalues ILO to IHI are stored in the corresponding * elements of WR and WI. If two eigenvalues are computed as a * complex conjugate pair, they are stored in consecutive * elements of WR and WI, say the i-th and (i+1)th, with * WI(i) > 0 and WI(i+1) < 0. If WANTT is .TRUE., the * eigenvalues are stored in the same order as on the diagonal * of the Schur form returned in A. A may be returned with * larger diagonal blocks until the next release. * * ILOZ (global input) INTEGER * IHIZ (global input) INTEGER * Specify the rows of Z to which transformations must be * applied if WANTZ is .TRUE.. * 1 <= ILOZ <= ILO; IHI <= IHIZ <= N. * * Z (global input/output) REAL array. * If WANTZ is .TRUE., on entry Z must contain the current * matrix Z of transformations accumulated by PDHSEQR, and on * exit Z has been updated; transformations are applied only to * the submatrix Z(ILOZ:IHIZ,ILO:IHI). * If WANTZ is .FALSE., Z is not referenced. * * DESCZ (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix Z. * * WORK (local output) REAL array of size LWORK * * LWORK (local input) INTEGER * WORK(LWORK) is a local array and LWORK is assumed big enough * so that LWORK >= 3*N + * MAX( 2*MAX(DESCZ(LLD_),DESCA(LLD_)) + 2*LOCc(N), * 7*Ceil(N/HBL)/LCM(NPROW,NPCOL)) ) * * IWORK (global and local input) INTEGER array of size ILWORK * * ILWORK (local input) INTEGER * This holds the some of the IBLK integer arrays. This is held * as a place holder for the next release. * * INFO (global output) INTEGER * < 0: parameter number -INFO incorrect or inconsistent * = 0: successful exit * > 0: PSLAHQR failed to compute all the eigenvalues ILO to IHI * in a total of 30*(IHI-ILO+1) iterations; if INFO = i, * elements i+1:ihi of WR and WI contain those eigenvalues * which have been successfully computed. * * Logic: * This algorithm is very similar to _LAHQR. Unlike _LAHQR, * instead of sending one double shift through the largest * unreduced submatrix, this algorithm sends multiple double shifts * and spaces them apart so that there can be parallelism across * several processor row/columns. Another critical difference is * that this algorithm aggregrates multiple transforms together in * order to apply them in a block fashion. * * Important Local Variables: * IBLK = The maximum number of bulges that can be computed. * Currently fixed. Future releases this won't be fixed. * HBL = The square block size (HBL=DESCA(MB_)=DESCA(NB_)) * ROTN = The number of transforms to block together * NBULGE = The number of bulges that will be attempted on the * current submatrix. * IBULGE = The current number of bulges started. * K1(*),K2(*) = The current bulge loops from K1(*) to K2(*). * * Subroutines: * This routine calls: * PSLACONSB -> To determine where to start each iteration * PSLAWIL -> Given the shift, get the transformation * SLASORTE -> Pair up eigenvalues so that reals are paired. * PSLACP3 -> Parallel array to local replicated array copy & * back. * SLAREF -> Row/column reflector applier. Core routine * here. * PSLASMSUB -> Finds negligible subdiagonal elements. * * Current Notes and/or Restrictions: * 1.) This code requires the distributed block size to be square * and at least six (6); unlike simpler codes like LU, this * algorithm is extremely sensitive to block size. Unwise * choices of too small a block size can lead to bad * performance. * 2.) This code requires A and Z to be distributed identically * and have identical contxts. * 3.) This release currently does not have a routine for * resolving the Schur blocks into regular 2x2 form after * this code is completed. Because of this, a significant * performance impact is required while the deflation is done * by sometimes a single column of processors. * 4.) This code does not currently block the initial transforms * so that none of the rows or columns for any bulge are * completed until all are started. To offset pipeline * start-up it is recommended that at least 2*LCM(NPROW,NPCOL) * bulges are used (if possible) * 5.) The maximum number of bulges currently supported is fixed at * 32. In future versions this will be limited only by the * incoming WORK array. * 6.) The matrix A must be in upper Hessenberg form. If elements * below the subdiagonal are nonzero, the resulting transforms * may be nonsimilar. This is also true with the LAPACK * routine. * 7.) For this release, it is assumed RSRC_=CSRC_=0 * 8.) Currently, all the eigenvalues are distributed to all the * nodes. Future releases will probably distribute the * eigenvalues by the column partitioning. * 9.) The internals of this routine are subject to change. * * Implemented by: G. Henry, November 17, 1996 * * ===================================================================== * * .. Parameters .. INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, $ LLD_, MB_, M_, NB_, N_, RSRC_ PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) REAL ZERO, ONE, HALF PARAMETER ( ZERO = 0.0, ONE = 1.0, HALF = 0.5 ) REAL CONST PARAMETER ( CONST = 1.50 ) INTEGER IBLK PARAMETER ( IBLK = 32 ) * .. * .. Local Scalars .. INTEGER CONTXT, DOWN, HBL, I, I1, I2, IAFIRST, IBULGE, $ ICBUF, ICOL, ICOL1, ICOL2, IDIA, IERR, II, $ IRBUF, IROW, IROW1, IROW2, ISPEC, ISTART, $ ISTARTCOL, ISTARTROW, ISTOP, ISUB, ISUP, $ ITERMAX, ITMP1, ITMP2, ITN, ITS, J, JAFIRST, $ JBLK, JJ, K, KI, L, LCMRC, LDA, LDZ, LEFT, $ LIHIH, LIHIZ, LILOH, LILOZ, LOCALI1, LOCALI2, $ LOCALK, LOCALM, M, MODKM1, MYCOL, MYROW, $ NBULGE, NH, NODE, NPCOL, NPROW, NR, NUM, NZ, $ RIGHT, ROTN, UP, VECSIDX REAL AVE, DISC, H00, H10, H11, H12, H21, H22, H33, $ H43H34, H44, OVFL, S, SMLNUM, SUM, T1, T1COPY, $ T2, T3, ULP, UNFL, V1SAVE, V2, V2SAVE, V3, $ V3SAVE, CS, SN * .. * .. Local Arrays .. INTEGER ICURCOL( IBLK ), ICURROW( IBLK ), K1( IBLK ), $ K2( IBLK ), KCOL( IBLK ), KP2COL( IBLK ), $ KP2ROW( IBLK ), KROW( IBLK ), LOCALK2( IBLK ) REAL S1( 2*IBLK, 2*IBLK ), SMALLA( 6, 6, IBLK ), $ VCOPY( 3 ) * .. * .. External Functions .. INTEGER ILCM, NUMROC REAL PSLAMCH EXTERNAL ILCM, NUMROC, PSLAMCH * .. * .. External Subroutines .. EXTERNAL BLACS_GRIDINFO, SCOPY, SGEBR2D, SGEBS2D, $ SGERV2D, SGESD2D, SGSUM2D, SLAHQR, SLAREF, $ SLARFG, SLASORTE, IGAMN2D, INFOG1L, INFOG2L, $ PSLABAD, PSLACONSB, PSLACP3, PSLASMSUB, $ PSLAWIL, PXERBLA, SLANV2 * .. * .. Intrinsic Functions .. INTRINSIC ABS, MAX, MIN, MOD, SIGN, SQRT * .. * .. Executable Statements .. * INFO = 0 * ITERMAX = 30*( IHI-ILO+1 ) * ITERMAX = 0 IF( N.EQ.0 ) $ RETURN * * NODE (IAFIRST,JAFIRST) OWNS A(1,1) * HBL = DESCA( MB_ ) CONTXT = DESCA( CTXT_ ) LDA = DESCA( LLD_ ) IAFIRST = DESCA( RSRC_ ) JAFIRST = DESCA( CSRC_ ) LDZ = DESCZ( LLD_ ) CALL BLACS_GRIDINFO( CONTXT, NPROW, NPCOL, MYROW, MYCOL ) NODE = MYROW*NPCOL + MYCOL NUM = NPROW*NPCOL LEFT = MOD( MYCOL+NPCOL-1, NPCOL ) RIGHT = MOD( MYCOL+1, NPCOL ) UP = MOD( MYROW+NPROW-1, NPROW ) DOWN = MOD( MYROW+1, NPROW ) LCMRC = ILCM( NPROW, NPCOL ) * * Determine the number of columns we have so we can check workspace * LOCALK = NUMROC( N, HBL, MYCOL, JAFIRST, NPCOL ) JJ = N / HBL IF( JJ*HBL.LT.N ) $ JJ = JJ + 1 JJ = 7*JJ / LCMRC IF( LWORK.LT.3*N+MAX( 2*MAX( LDA, LDZ )+2*LOCALK, JJ ) ) THEN INFO = -15 END IF IF( DESCZ( CTXT_ ).NE.DESCA( CTXT_ ) ) THEN INFO = -( 1300+CTXT_ ) END IF IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN INFO = -( 700+NB_ ) END IF IF( DESCZ( MB_ ).NE.DESCZ( NB_ ) ) THEN INFO = -( 1300+NB_ ) END IF IF( DESCA( MB_ ).NE.DESCZ( MB_ ) ) THEN INFO = -( 1300+MB_ ) END IF IF( ( DESCA( RSRC_ ).NE.0 ) .OR. ( DESCA( CSRC_ ).NE.0 ) ) THEN INFO = -( 700+RSRC_ ) END IF IF( ( DESCZ( RSRC_ ).NE.0 ) .OR. ( DESCZ( CSRC_ ).NE.0 ) ) THEN INFO = -( 1300+RSRC_ ) END IF IF( ( ILO.GT.N ) .OR. ( ILO.LT.1 ) ) THEN INFO = -4 END IF IF( ( IHI.GT.N ) .OR. ( IHI.LT.1 ) ) THEN INFO = -5 END IF IF( HBL.LT.5 ) THEN INFO = -( 700+MB_ ) END IF CALL IGAMN2D( CONTXT, 'ALL', ' ', 1, 1, INFO, 1, ITMP1, ITMP2, -1, $ -1, -1 ) IF( INFO.LT.0 ) THEN CALL PXERBLA( CONTXT, 'PSLAHQR', -INFO ) RETURN END IF * * Set work array indices * VECSIDX = 0 IDIA = 3*N ISUB = 3*N ISUP = 3*N IRBUF = 3*N ICBUF = 3*N * * Find a value for ROTN * ROTN = HBL / 3 ROTN = MAX( ROTN, HBL-2 ) ROTN = MIN( ROTN, 1 ) * IF( ILO.EQ.IHI ) THEN CALL INFOG2L( ILO, ILO, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW, ICOL, II, JJ ) IF( ( MYROW.EQ.II ) .AND. ( MYCOL.EQ.JJ ) ) THEN WR( ILO ) = A( ( ICOL-1 )*LDA+IROW ) ELSE WR( ILO ) = ZERO END IF WI( ILO ) = ZERO RETURN END IF * NH = IHI - ILO + 1 NZ = IHIZ - ILOZ + 1 * CALL INFOG1L( ILOZ, HBL, NPROW, MYROW, 0, LILOZ, LIHIZ ) LIHIZ = NUMROC( IHIZ, HBL, MYROW, 0, NPROW ) * * Set machine-dependent constants for the stopping criterion. * If NORM(H) <= SQRT(OVFL), overflow should not occur. * UNFL = PSLAMCH( CONTXT, 'SAFE MINIMUM' ) OVFL = ONE / UNFL CALL PSLABAD( CONTXT, UNFL, OVFL ) ULP = PSLAMCH( CONTXT, 'PRECISION' ) SMLNUM = UNFL*( NH / ULP ) * * I1 and I2 are the indices of the first row and last column of H * to which transformations must be applied. If eigenvalues only are * being computed, I1 and I2 are set inside the main loop. * IF( WANTT ) THEN I1 = 1 I2 = N END IF * * ITN is the total number of QR iterations allowed. * ITN = ITERMAX * * The main loop begins here. I is the loop index and decreases from * IHI to ILO in steps of our schur block size (<=2*IBLK). Each * iteration of the loop works with the active submatrix in rows * and columns L to I. Eigenvalues I+1 to IHI have already * converged. Either L = ILO or the global A(L,L-1) is negligible * so that the matrix splits. * I = IHI 10 CONTINUE L = ILO IF( I.LT.ILO ) $ GO TO 450 * * Perform QR iterations on rows and columns ILO to I until a * submatrix of order 1 or 2 splits off at the bottom because a * subdiagonal element has become negligible. * DO 420 ITS = 0, ITN * * Look for a single small subdiagonal element. * CALL PSLASMSUB( A, DESCA, I, L, K, SMLNUM, WORK( IRBUF+1 ), $ LWORK-IRBUF ) L = K * IF( L.GT.ILO ) THEN * * H(L,L-1) is negligible * CALL INFOG2L( L, L-1, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW, ICOL, ITMP1, ITMP2 ) IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN A( ( ICOL-1 )*LDA+IROW ) = ZERO END IF WORK( ISUB+L-1 ) = ZERO END IF * * Exit from loop if a submatrix of order 1 or 2 has split off. * M = L - 10 * IF ( L .GE. I - (2*IBLK-1) ) * IF ( L .GE. I - MAX(2*IBLK-1,HBL) ) IF( L.GE.I-1 ) $ GO TO 430 * * Now the active submatrix is in rows and columns L to I. If * eigenvalues only are being computed, only the active submatrix * need be transformed. * IF( .NOT.WANTT ) THEN I1 = L I2 = I END IF * * Copy submatrix of size 2*JBLK and prepare to do generalized * Wilkinson shift or an exceptional shift * JBLK = MIN( IBLK, ( ( I-L+1 ) / 2 )-1 ) IF( JBLK.GT.LCMRC ) THEN * * Make sure it's divisible by LCM (we want even workloads!) * JBLK = JBLK - MOD( JBLK, LCMRC ) END IF JBLK = MIN( JBLK, 2*LCMRC ) JBLK = MAX( JBLK, 1 ) * CALL PSLACP3( 2*JBLK, I-2*JBLK+1, A, DESCA, S1, 2*IBLK, -1, -1, $ 0 ) IF( ITS.EQ.20 .OR. ITS.EQ.40 ) THEN * * Exceptional shift. * DO 20 II = 2*JBLK, 2, -1 S1( II, II ) = CONST*( ABS( S1( II, II ) )+ $ ABS( S1( II, II-1 ) ) ) S1( II, II-1 ) = ZERO S1( II-1, II ) = ZERO 20 CONTINUE S1( 1, 1 ) = CONST*ABS( S1( 1, 1 ) ) ELSE CALL SLAHQR( .FALSE., .FALSE., 2*JBLK, 1, 2*JBLK, S1, $ 2*IBLK, WORK( IRBUF+1 ), WORK( ICBUF+1 ), 1, $ 2*JBLK, Z, LDZ, IERR ) * * Prepare to use Wilkinson's double shift * H44 = S1( 2*JBLK, 2*JBLK ) H33 = S1( 2*JBLK-1, 2*JBLK-1 ) H43H34 = S1( 2*JBLK-1, 2*JBLK )*S1( 2*JBLK, 2*JBLK-1 ) IF( ( JBLK.GT.1 ) .AND. ( ITS.GT.30 ) ) THEN S = S1( 2*JBLK-1, 2*JBLK-2 ) DISC = ( H33-H44 )*HALF DISC = DISC*DISC + H43H34 IF( DISC.GT.ZERO ) THEN * * Real roots: Use Wilkinson's shift twice * DISC = SQRT( DISC ) AVE = HALF*( H33+H44 ) IF( ABS( H33 )-ABS( H44 ).GT.ZERO ) THEN H33 = H33*H44 - H43H34 H44 = H33 / ( SIGN( DISC, AVE )+AVE ) ELSE H44 = SIGN( DISC, AVE ) + AVE END IF H33 = H44 H43H34 = ZERO END IF END IF END IF * * Look for two consecutive small subdiagonal elements: * PSLACONSB is the routine that does this. * c CALL PSLACONSB( A, DESCA, I, L, M, H44, H33, H43H34, c $ WORK( IRBUF+1 ), LWORK-IRBUF ) * * Skip small submatrices * * IF ( M .GE. I - 5 ) * $ GO TO 80 * * In principle PSLACONSB needs to check all shifts to decide * whether two consecutive small subdiagonal entries are suitable * as the starting position of the bulge chasing phase. It can be * dangerous to check the first pair of shifts only. Moreover it * is quite rare to obtain an M which is much larger than L. This * process is a bit expensive compared with the benefit. * Therefore it is sensible to abandon this routine. Total amount * of communications is saved in average. * M = L * Double-shift QR step * * NBULGE is the number of bulges that will be attempted * ISTOP = MIN( M+ROTN-MOD( M, ROTN ), I-2 ) ISTOP = MIN( ISTOP, M+HBL-3-MOD( M-1, HBL ) ) ISTOP = MIN( ISTOP, I2-2 ) ISTOP = MAX( ISTOP, M ) NBULGE = ( I-1-ISTOP ) / HBL * * Do not exceed maximum determined. * NBULGE = MIN( NBULGE, JBLK ) IF( NBULGE.GT.LCMRC ) THEN * * Make sure it's divisible by LCM (we want even workloads!) * NBULGE = NBULGE - MOD( NBULGE, LCMRC ) END IF NBULGE = MAX( NBULGE, 1 ) * IF( ( ITS.NE.20 ) .AND. ( ITS.NE.40 ) .AND. ( NBULGE.GT.1 ) ) $ THEN * * sort the eigenpairs so that they are in twos for double * shifts. only call if several need sorting * CALL SLASORTE( S1( 2*( JBLK-NBULGE )+1, $ 2*( JBLK-NBULGE )+1 ), 2*IBLK, 2*NBULGE, $ WORK( IRBUF+1 ), IERR ) END IF * * IBULGE is the number of bulges going so far * IBULGE = 1 * * "A" row defs : main row transforms from LOCALK to LOCALI2 * CALL INFOG1L( M, HBL, NPCOL, MYCOL, 0, ITMP1, LOCALK ) LOCALK = NUMROC( N, HBL, MYCOL, 0, NPCOL ) CALL INFOG1L( 1, HBL, NPCOL, MYCOL, 0, ICOL1, LOCALI2 ) LOCALI2 = NUMROC( I2, HBL, MYCOL, 0, NPCOL ) * * "A" col defs : main col transforms from LOCALI1 to LOCALM * CALL INFOG1L( I1, HBL, NPROW, MYROW, 0, LOCALI1, ICOL1 ) ICOL1 = NUMROC( N, HBL, MYROW, 0, NPROW ) CALL INFOG1L( 1, HBL, NPROW, MYROW, 0, LOCALM, ICOL1 ) ICOL1 = NUMROC( MIN( M+3, I ), HBL, MYROW, 0, NPROW ) * * Which row & column will start the bulges * ISTARTROW = MOD( ( M+1 ) / HBL, NPROW ) + IAFIRST ISTARTCOL = MOD( ( M+1 ) / HBL, NPCOL ) + JAFIRST * CALL INFOG1L( M, HBL, NPROW, MYROW, 0, II, ITMP2 ) ITMP2 = NUMROC( N, HBL, MYROW, 0, NPROW ) CALL INFOG1L( M, HBL, NPCOL, MYCOL, 0, JJ, ITMP2 ) ITMP2 = NUMROC( N, HBL, MYCOL, 0, NPCOL ) CALL INFOG1L( 1, HBL, NPROW, MYROW, 0, ISTOP, KP2ROW( 1 ) ) KP2ROW( 1 ) = NUMROC( M+2, HBL, MYROW, 0, NPROW ) CALL INFOG1L( 1, HBL, NPCOL, MYCOL, 0, ISTOP, KP2COL( 1 ) ) KP2COL( 1 ) = NUMROC( M+2, HBL, MYCOL, 0, NPCOL ) * * Set all values for bulges. All bulges are stored in * intermediate steps as loops over KI. Their current "task" * over the global M to I-1 values is always K1(KI) to K2(KI). * However, because there are many bulges, K1(KI) & K2(KI) might * go past that range while later bulges (KI+1,KI+2,etc..) are * finishing up. * * Rules: * If MOD(K1(KI)-1,HBL) < HBL-2 then MOD(K2(KI)-1,HBL)