SUBROUTINE PDGEBAL( JOB, N, A, DESCA, ILO, IHI, SCALE, INFO ) * * Contribution from the Department of Computing Science and HPC2N, * Umea University, Sweden * * -- ScaLAPACK computational routine (version 2.0.1) -- * University of Tennessee, Knoxville, Oak Ridge National Laboratory, * Univ. of Colorado Denver and University of California, Berkeley. * January, 2012 * IMPLICIT NONE * * .. Scalar Arguments .. CHARACTER JOB INTEGER IHI, ILO, INFO, N * .. * .. Array Arguments .. INTEGER DESCA( * ) DOUBLE PRECISION A( * ), SCALE( * ) * .. * * Purpose * ======= * * PDGEBAL balances a general real matrix A. This involves, first, * permuting A by a similarity transformation to isolate eigenvalues * in the first 1 to ILO-1 and last IHI+1 to N elements on the * diagonal; and second, applying a diagonal similarity transformation * to rows and columns ILO to IHI to make the rows and columns as * close in norm as possible. Both steps are optional. * * Balancing may reduce the 1-norm of the matrix, and improve the * accuracy of the computed eigenvalues and/or eigenvectors. * * Notes * ===== * * Each global data object is described by an associated description * vector. This vector stores the information required to establish * the mapping between an object element and its corresponding process * and memory location. * * Let A be a generic term for any 2D block cyclicly distributed array. * Such a global array has an associated description vector DESCA. * In the following comments, the character _ should be read as * "of the global array". * * NOTATION STORED IN EXPLANATION * --------------- -------------- -------------------------------------- * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, * DTYPE_A = 1. * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating * the BLACS process grid A is distribu- * ted over. The context itself is glo- * bal, but the handle (the integer * value) may vary. * M_A (global) DESCA( M_ ) The number of rows in the global * array A. * N_A (global) DESCA( N_ ) The number of columns in the global * array A. * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute * the rows of the array. * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute * the columns of the array. * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first * row of the array A is distributed. * CSRC_A (global) DESCA( CSRC_ ) The process column over which the * first column of the array A is * distributed. * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local * array. LLD_A >= MAX(1,LOCr(M_A)). * * Let K be the number of rows or columns of a distributed matrix, * and assume that its process grid has dimension p x q. * LOCr( K ) denotes the number of elements of K that a process * would receive if K were distributed over the p processes of its * process column. * Similarly, LOCc( K ) denotes the number of elements of K that a * process would receive if K were distributed over the q processes of * its process row. * The values of LOCr() and LOCc() may be determined via a call to the * ScaLAPACK tool function, NUMROC: * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). * An upper bound for these quantities may be computed by: * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A * * * Arguments * ========= * * JOB (global input) CHARACTER*1 * Specifies the operations to be performed on A: * = 'N': none: simply set ILO = 1, IHI = N, SCALE(I) = 1.0 * for i = 1,...,N; * = 'P': permute only; * = 'S': scale only; * = 'B': both permute and scale. * * N (global input) INTEGER * The order of the matrix A. N >= 0. * * A (local input/output) DOUBLE PRECISION array, dimension * (DESCA(LLD_,LOCc(N)) * On entry, the input matrix A. * On exit, A is overwritten by the balanced matrix. * If JOB = 'N', A is not referenced. * See Further Details. * * DESCA (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix A. * * ILO (global output) INTEGER * IHI (global output) INTEGER * ILO and IHI are set to integers such that on exit * A(i,j) = 0 if i > j and j = 1,...,ILO-1 or I = IHI+1,...,N. * If JOB = 'N' or 'S', ILO = 1 and IHI = N. * * SCALE (global output) DOUBLE PRECISION array, dimension (N) * Details of the permutations and scaling factors applied to * A. If P(j) is the index of the row and column interchanged * with row and column j and D(j) is the scaling factor * applied to row and column j, then * SCALE(j) = P(j) for j = 1,...,ILO-1 * = D(j) for j = ILO,...,IHI * = P(j) for j = IHI+1,...,N. * The order in which the interchanges are made is N to IHI+1, * then 1 to ILO-1. * * INFO (global output) INTEGER * = 0: successful exit. * < 0: if INFO = -i, the i-th argument had an illegal value. * * Further Details * =============== * * The permutations consist of row and column interchanges which put * the matrix in the form * * ( T1 X Y ) * P A P = ( 0 B Z ) * ( 0 0 T2 ) * * where T1 and T2 are upper triangular matrices whose eigenvalues lie * along the diagonal. The column indices ILO and IHI mark the starting * and ending columns of the submatrix B. Balancing consists of applying * a diagonal similarity transformation inv(D) * B * D to make the * 1-norms of each row of B and its corresponding column nearly equal. * The output matrix is * * ( T1 X*D Y ) * ( 0 inv(D)*B*D inv(D)*Z ). * ( 0 0 T2 ) * * Information about the permutations P and the diagonal matrix D is * returned in the vector SCALE. * * This subroutine is based on the EISPACK routine BALANC. In principle, * the parallelism is extracted by using PBLAS and BLACS routines for * the permutation and balancing. * * Modified by Tzu-Yi Chen, Computer Science Division, University of * California at Berkeley, USA * * Parallel version by Robert Granat and Meiyue Shao, Department of * Computing Science and HPC2N, Umea University, Sweden * * ===================================================================== * * .. Parameters .. INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, $ LLD_, MB_, M_, NB_, N_, RSRC_ PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) DOUBLE PRECISION ZERO, ONE PARAMETER ( ZERO = 0.0D+0, ONE = 1.0D+0 ) DOUBLE PRECISION SCLFAC PARAMETER ( SCLFAC = 2.0D+0 ) DOUBLE PRECISION FACTOR PARAMETER ( FACTOR = 0.95D+0 ) * .. * .. Local Scalars .. LOGICAL NOCONV INTEGER I, ICA, IEXC, IRA, J, K, L, M, LLDA, $ ICTXT, NPROW, NPCOL, MYROW, MYCOL, II, JJ, $ ARSRC, ACSRC DOUBLE PRECISION C, CA, F, G, R, RA, S, SFMAX1, SFMAX2, SFMIN1, $ SFMIN2, ELEM * .. * .. Local Arrays .. DOUBLE PRECISION CR( 2 ) * .. * .. External Functions .. LOGICAL DISNAN, LSAME INTEGER IDAMAX DOUBLE PRECISION DLAMCH EXTERNAL DISNAN, LSAME, DLAMCH * .. * .. External Subroutines .. EXTERNAL PDSCAL, PDSWAP, PDAMAX, PXERBLA, $ BLACS_GRIDINFO, CHK1MAT, DGSUM2D, $ INFOG2L, PDELGET * .. * .. Intrinsic Functions .. INTRINSIC ABS, MAX, MIN * .. * .. Executable Statements .. INFO = 0 ICTXT = DESCA( CTXT_ ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) * * Test the input parameters. * IF( .NOT.LSAME( JOB, 'N' ) .AND. .NOT.LSAME( JOB, 'P' ) .AND. $ .NOT.LSAME( JOB, 'S' ) .AND. .NOT.LSAME( JOB, 'B' ) ) THEN INFO = -1 ELSE IF( N.LT.0 ) THEN INFO = -2 ELSE CALL CHK1MAT( N, 2, N, 2, 1, 1, DESCA, 4, INFO ) END IF IF( INFO.NE.0 ) THEN CALL PXERBLA( ICTXT, 'PDGEBAL', -INFO ) RETURN END IF * * Extract local leading dimension of A. * LLDA = DESCA( LLD_ ) * K = 1 L = N * IF( N.EQ.0 ) $ GO TO 210 * IF( LSAME( JOB, 'N' ) ) THEN DO 10 I = 1, N SCALE( I ) = ONE 10 CONTINUE GO TO 210 END IF * IF( LSAME( JOB, 'S' ) ) $ GO TO 120 * * Permutation to isolate eigenvalues if possible. * GO TO 50 * * Row and column exchange. * 20 CONTINUE SCALE( M ) = J IF( J.EQ.M ) $ GO TO 30 * CALL PDSWAP( L, A, 1, J, DESCA, 1, A, 1, M, DESCA, 1 ) CALL PDSWAP( N-K+1, A, J, K, DESCA, DESCA(M_), A, M, K, DESCA, $ DESCA(M_) ) * 30 CONTINUE GO TO ( 40, 80 )IEXC * * Search for rows isolating an eigenvalue and push them down. * 40 CONTINUE IF( L.EQ.1 ) $ GO TO 210 L = L - 1 * 50 CONTINUE DO 70 J = L, 1, -1 * DO 60 I = 1, L IF( I.EQ.J ) $ GO TO 60 * * All processors need the information to make correct decisions. * CALL PDELGET( 'All', '1-Tree', ELEM, A, J, I, DESCA ) IF( ELEM.NE.ZERO ) $ GO TO 70 60 CONTINUE * M = L IEXC = 1 GO TO 20 70 CONTINUE * GO TO 90 * * Search for columns isolating an eigenvalue and push them left. * 80 CONTINUE K = K + 1 * 90 CONTINUE DO 110 J = K, L * DO 100 I = K, L IF( I.EQ.J ) $ GO TO 100 * * All processors need the information to make correct decisions. * CALL PDELGET( 'All', '1-Tree', ELEM, A, I, J, DESCA ) IF( ELEM.NE.ZERO ) $ GO TO 110 100 CONTINUE * M = K IEXC = 2 GO TO 20 110 CONTINUE * 120 CONTINUE DO 130 I = K, L SCALE( I ) = ONE 130 CONTINUE * IF( LSAME( JOB, 'P' ) ) $ GO TO 210 * * Balance the submatrix in rows K to L. * * Iterative loop for norm reduction. * SFMIN1 = DLAMCH( 'S' ) / DLAMCH( 'P' ) SFMAX1 = ONE / SFMIN1 SFMIN2 = SFMIN1*SCLFAC SFMAX2 = ONE / SFMIN2 140 CONTINUE NOCONV = .FALSE. * DO 200 I = K, L C = ZERO R = ZERO * * Compute local partial values of R and C in parallel and combine * with a call to the BLACS global summation routine distributing * information to all processors. * DO 150 J = K, L IF( J.EQ.I ) $ GO TO 150 CALL INFOG2L( J, I, DESCA, NPROW, NPCOL, MYROW, $ MYCOL, II, JJ, ARSRC, ACSRC ) IF( MYROW.EQ.ARSRC .AND. MYCOL.EQ.ACSRC ) THEN C = C + ABS( A( II + (JJ-1)*LLDA ) ) END IF CALL INFOG2L( I, J, DESCA, NPROW, NPCOL, MYROW, $ MYCOL, II, JJ, ARSRC, ACSRC ) IF( MYROW.EQ.ARSRC .AND. MYCOL.EQ.ACSRC ) THEN R = R + ABS( A( II + (JJ-1)*LLDA ) ) END IF 150 CONTINUE CR( 1 ) = C CR( 2 ) = R CALL DGSUM2D( ICTXT, 'All', '1-Tree', 2, 1, CR, 2, -1, -1 ) C = CR( 1 ) R = CR( 2 ) * * Find global maximum absolute values and indices in parallel. * CALL PDAMAX( L, CA, ICA, A, 1, I, DESCA, 1 ) CALL PDAMAX( N-K+1, RA, IRA, A, I, K, DESCA, DESCA(M_) ) * * Guard against zero C or R due to underflow. * IF( C.EQ.ZERO .OR. R.EQ.ZERO ) $ GO TO 200 G = R / SCLFAC F = ONE S = C + R 160 CONTINUE IF( C.GE.G .OR. MAX( F, C, CA ).GE.SFMAX2 .OR. $ MIN( R, G, RA ).LE.SFMIN2 )GO TO 170 IF( DISNAN( C+F+CA+R+G+RA ) ) THEN * * Exit if NaN to avoid infinite loop * INFO = -3 CALL PXERBLA( ICTXT, 'PDGEBAL', -INFO ) RETURN END IF F = F*SCLFAC C = C*SCLFAC CA = CA*SCLFAC R = R / SCLFAC G = G / SCLFAC RA = RA / SCLFAC GO TO 160 * 170 CONTINUE G = C / SCLFAC 180 CONTINUE IF( G.LT.R .OR. MAX( R, RA ).GE.SFMAX2 .OR. $ MIN( F, C, G, CA ).LE.SFMIN2 )GO TO 190 F = F / SCLFAC C = C / SCLFAC G = G / SCLFAC CA = CA / SCLFAC R = R*SCLFAC RA = RA*SCLFAC GO TO 180 * * Now balance. * 190 CONTINUE IF( ( C+R ).GE.FACTOR*S ) $ GO TO 200 IF( F.LT.ONE .AND. SCALE( I ).LT.ONE ) THEN IF( F*SCALE( I ).LE.SFMIN1 ) $ GO TO 200 END IF IF( F.GT.ONE .AND. SCALE( I ).GT.ONE ) THEN IF( SCALE( I ).GE.SFMAX1 / F ) $ GO TO 200 END IF G = ONE / F SCALE( I ) = SCALE( I )*F NOCONV = .TRUE. * CALL PDSCAL( N-K+1, G, A, I, K, DESCA, DESCA(M_) ) CALL PDSCAL( L, F, A, 1, I, DESCA, 1 ) * 200 CONTINUE * IF( NOCONV ) $ GO TO 140 * 210 CONTINUE ILO = K IHI = L * RETURN * * End of PDGEBAL * END