SCALAPACK 2.2.2
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
psgebal.f
Go to the documentation of this file.
1 SUBROUTINE psgebal( JOB, N, A, DESCA, ILO, IHI, SCALE, INFO )
2*
3* Contribution from the Department of Computing Science and HPC2N,
4* Umea University, Sweden
5*
6* -- ScaLAPACK computational routine (version 2.0.1) --
7* University of Tennessee, Knoxville, Oak Ridge National Laboratory,
8* Univ. of Colorado Denver and University of California, Berkeley.
9* January, 2012
10*
11 IMPLICIT NONE
12*
13* .. Scalar Arguments ..
14 CHARACTER JOB
15 INTEGER IHI, ILO, INFO, N
16* ..
17* .. Array Arguments ..
18 INTEGER DESCA( * )
19 REAL A( * ), SCALE( * )
20* ..
21*
22* Purpose
23* =======
24*
25* PSGEBAL balances a general real matrix A. This involves, first,
26* permuting A by a similarity transformation to isolate eigenvalues
27* in the first 1 to ILO-1 and last IHI+1 to N elements on the
28* diagonal; and second, applying a diagonal similarity transformation
29* to rows and columns ILO to IHI to make the rows and columns as
30* close in norm as possible. Both steps are optional.
31*
32* Balancing may reduce the 1-norm of the matrix, and improve the
33* accuracy of the computed eigenvalues and/or eigenvectors.
34*
35* Notes
36* =====
37*
38* Each global data object is described by an associated description
39* vector. This vector stores the information required to establish
40* the mapping between an object element and its corresponding process
41* and memory location.
42*
43* Let A be a generic term for any 2D block cyclicly distributed array.
44* Such a global array has an associated description vector DESCA.
45* In the following comments, the character _ should be read as
46* "of the global array".
47*
48* NOTATION STORED IN EXPLANATION
49* --------------- -------------- --------------------------------------
50* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
51* DTYPE_A = 1.
52* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
53* the BLACS process grid A is distribu-
54* ted over. The context itself is glo-
55* bal, but the handle (the integer
56* value) may vary.
57* M_A (global) DESCA( M_ ) The number of rows in the global
58* array A.
59* N_A (global) DESCA( N_ ) The number of columns in the global
60* array A.
61* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
62* the rows of the array.
63* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
64* the columns of the array.
65* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
66* row of the array A is distributed.
67* CSRC_A (global) DESCA( CSRC_ ) The process column over which the
68* first column of the array A is
69* distributed.
70* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
71* array. LLD_A >= MAX(1,LOCr(M_A)).
72*
73* Let K be the number of rows or columns of a distributed matrix,
74* and assume that its process grid has dimension p x q.
75* LOCr( K ) denotes the number of elements of K that a process
76* would receive if K were distributed over the p processes of its
77* process column.
78* Similarly, LOCc( K ) denotes the number of elements of K that a
79* process would receive if K were distributed over the q processes of
80* its process row.
81* The values of LOCr() and LOCc() may be determined via a call to the
82* ScaLAPACK tool function, NUMROC:
83* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
84* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
85* An upper bound for these quantities may be computed by:
86* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
87* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
88*
89*
90* Arguments
91* =========
92*
93* JOB (global input) CHARACTER*1
94* Specifies the operations to be performed on A:
95* = 'N': none: simply set ILO = 1, IHI = N, SCALE(I) = 1.0
96* for i = 1,...,N;
97* = 'P': permute only;
98* = 'S': scale only;
99* = 'B': both permute and scale.
100*
101* N (global input) INTEGER
102* The order of the matrix A. N >= 0.
103*
104* A (local input/output) REAL array, dimension
105* (DESCA(LLD_,LOCc(N))
106* On entry, the input matrix A.
107* On exit, A is overwritten by the balanced matrix.
108* If JOB = 'N', A is not referenced.
109* See Further Details.
110*
111* DESCA (global and local input) INTEGER array of dimension DLEN_.
112* The array descriptor for the distributed matrix A.
113*
114* ILO (global output) INTEGER
115* IHI (global output) INTEGER
116* ILO and IHI are set to integers such that on exit
117* A(i,j) = 0 if i > j and j = 1,...,ILO-1 or I = IHI+1,...,N.
118* If JOB = 'N' or 'S', ILO = 1 and IHI = N.
119*
120* SCALE (global output) REAL array, dimension (N)
121* Details of the permutations and scaling factors applied to
122* A. If P(j) is the index of the row and column interchanged
123* with row and column j and D(j) is the scaling factor
124* applied to row and column j, then
125* SCALE(j) = P(j) for j = 1,...,ILO-1
126* = D(j) for j = ILO,...,IHI
127* = P(j) for j = IHI+1,...,N.
128* The order in which the interchanges are made is N to IHI+1,
129* then 1 to ILO-1.
130*
131* INFO (global output) INTEGER
132* = 0: successful exit.
133* < 0: if INFO = -i, the i-th argument had an illegal value.
134*
135* Further Details
136* ===============
137*
138* The permutations consist of row and column interchanges which put
139* the matrix in the form
140*
141* ( T1 X Y )
142* P A P = ( 0 B Z )
143* ( 0 0 T2 )
144*
145* where T1 and T2 are upper triangular matrices whose eigenvalues lie
146* along the diagonal. The column indices ILO and IHI mark the starting
147* and ending columns of the submatrix B. Balancing consists of applying
148* a diagonal similarity transformation inv(D) * B * D to make the
149* 1-norms of each row of B and its corresponding column nearly equal.
150* The output matrix is
151*
152* ( T1 X*D Y )
153* ( 0 inv(D)*B*D inv(D)*Z ).
154* ( 0 0 T2 )
155*
156* Information about the permutations P and the diagonal matrix D is
157* returned in the vector SCALE.
158*
159* This subroutine is based on the EISPACK routine BALANC. In principle,
160* the parallelism is extracted by using PBLAS and BLACS routines for
161* the permutation and balancing.
162*
163* Modified by Tzu-Yi Chen, Computer Science Division, University of
164* California at Berkeley, USA
165*
166* Parallel version by Robert Granat and Meiyue Shao, Department of
167* Computing Science and HPC2N, Umea University, Sweden
168*
169* =====================================================================
170*
171* .. Parameters ..
172 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
173 $ LLD_, MB_, M_, NB_, N_, RSRC_
174 parameter( block_cyclic_2d = 1, dlen_ = 9, dtype_ = 1,
175 $ ctxt_ = 2, m_ = 3, n_ = 4, mb_ = 5, nb_ = 6,
176 $ rsrc_ = 7, csrc_ = 8, lld_ = 9 )
177 REAL ZERO, ONE
178 parameter( zero = 0.0e+0, one = 1.0e+0 )
179 REAL SCLFAC
180 parameter( sclfac = 2.0e+0 )
181 REAL FACTOR
182 parameter( factor = 0.95e+0 )
183* ..
184* .. Local Scalars ..
185 LOGICAL NOCONV
186 INTEGER I, ICA, IEXC, IRA, J, K, L, M, LLDA,
187 $ ICTXT, NPROW, NPCOL, MYROW, MYCOL, II, JJ,
188 $ ARSRC, ACSRC
189 REAL C, CA, F, G, R, RA, S, SFMAX1, SFMAX2, SFMIN1,
190 $ SFMIN2, ELEM
191* ..
192* .. Local Arrays ..
193 REAL CR( 2 )
194* ..
195* .. External Functions ..
196 LOGICAL SISNAN, LSAME
197 INTEGER IDAMAX
198 REAL SLAMCH
199 EXTERNAL sisnan, lsame, slamch
200* ..
201* .. External Subroutines ..
202 EXTERNAL psscal, psswap, psamax, pxerbla,
203 $ blacs_gridinfo, chk1mat, sgsum2d,
205* ..
206* .. Intrinsic Functions ..
207 INTRINSIC abs, max, min
208* ..
209* .. Executable Statements ..
210 info = 0
211 ictxt = desca( ctxt_ )
212 CALL blacs_gridinfo( ictxt, nprow, npcol, myrow, mycol )
213*
214* Test the input parameters.
215*
216 IF( .NOT.lsame( job, 'N' ) .AND. .NOT.lsame( job, 'P' ) .AND.
217 $ .NOT.lsame( job, 'S' ) .AND. .NOT.lsame( job, 'B' ) ) THEN
218 info = -1
219 ELSE IF( n.LT.0 ) THEN
220 info = -2
221 ELSE
222 CALL chk1mat( n, 2, n, 2, 1, 1, desca, 4, info )
223 END IF
224 IF( info.NE.0 ) THEN
225 CALL pxerbla( ictxt, 'PSGEBAL', -info )
226 RETURN
227 END IF
228*
229* Extract local leading dimension of A.
230*
231 llda = desca( lld_ )
232*
233 k = 1
234 l = n
235*
236 IF( n.EQ.0 )
237 $ GO TO 210
238*
239 IF( lsame( job, 'N' ) ) THEN
240 DO 10 i = 1, n
241 scale( i ) = one
242 10 CONTINUE
243 GO TO 210
244 END IF
245*
246 IF( lsame( job, 'S' ) )
247 $ GO TO 120
248*
249* Permutation to isolate eigenvalues if possible.
250*
251 GO TO 50
252*
253* Row and column exchange.
254*
255 20 CONTINUE
256 scale( m ) = j
257 IF( j.EQ.m )
258 $ GO TO 30
259*
260 CALL psswap( l, a, 1, j, desca, 1, a, 1, m, desca, 1 )
261 CALL psswap( n-k+1, a, j, k, desca, desca(m_), a, m, k, desca,
262 $ desca(m_) )
263*
264 30 CONTINUE
265 GO TO ( 40, 80 )iexc
266*
267* Search for rows isolating an eigenvalue and push them down.
268*
269 40 CONTINUE
270 IF( l.EQ.1 )
271 $ GO TO 210
272 l = l - 1
273*
274 50 CONTINUE
275 DO 70 j = l, 1, -1
276*
277 DO 60 i = 1, l
278 IF( i.EQ.j )
279 $ GO TO 60
280*
281* All processors need the information to make correct decisions.
282*
283 CALL pselget( 'All', '1-Tree', elem, a, j, i, desca )
284 IF( elem.NE.zero )
285 $ GO TO 70
286 60 CONTINUE
287*
288 m = l
289 iexc = 1
290 GO TO 20
291 70 CONTINUE
292*
293 GO TO 90
294*
295* Search for columns isolating an eigenvalue and push them left.
296*
297 80 CONTINUE
298 k = k + 1
299*
300 90 CONTINUE
301 DO 110 j = k, l
302*
303 DO 100 i = k, l
304 IF( i.EQ.j )
305 $ GO TO 100
306*
307* All processors need the information to make correct decisions.
308*
309 CALL pselget( 'All', '1-Tree', elem, a, i, j, desca )
310 IF( elem.NE.zero )
311 $ GO TO 110
312 100 CONTINUE
313*
314 m = k
315 iexc = 2
316 GO TO 20
317 110 CONTINUE
318*
319 120 CONTINUE
320 DO 130 i = k, l
321 scale( i ) = one
322 130 CONTINUE
323*
324 IF( lsame( job, 'P' ) )
325 $ GO TO 210
326*
327* Balance the submatrix in rows K to L.
328*
329* Iterative loop for norm reduction.
330*
331 sfmin1 = slamch( 'S' ) / slamch( 'P' )
332 sfmax1 = one / sfmin1
333 sfmin2 = sfmin1*sclfac
334 sfmax2 = one / sfmin2
335 140 CONTINUE
336 noconv = .false.
337*
338 DO 200 i = k, l
339 c = zero
340 r = zero
341*
342* Compute local partial values of R and C in parallel and combine
343* with a call to the BLACS global summation routine distributing
344* information to all processors.
345*
346 DO 150 j = k, l
347 IF( j.EQ.i )
348 $ GO TO 150
349 CALL infog2l( j, i, desca, nprow, npcol, myrow,
350 $ mycol, ii, jj, arsrc, acsrc )
351 IF( myrow.EQ.arsrc .AND. mycol.EQ.acsrc ) THEN
352 c = c + abs( a( ii + (jj-1)*llda ) )
353 END IF
354 CALL infog2l( i, j, desca, nprow, npcol, myrow,
355 $ mycol, ii, jj, arsrc, acsrc )
356 IF( myrow.EQ.arsrc .AND. mycol.EQ.acsrc ) THEN
357 r = r + abs( a( ii + (jj-1)*llda ) )
358 END IF
359 150 CONTINUE
360 cr( 1 ) = c
361 cr( 2 ) = r
362 CALL sgsum2d( ictxt, 'All', '1-Tree', 2, 1, cr, 2, -1, -1 )
363 c = cr( 1 )
364 r = cr( 2 )
365*
366* Find global maximum absolute values and indices in parallel.
367*
368 CALL psamax( l, ca, ica, a, 1, i, desca, 1 )
369 CALL psamax( n-k+1, ra, ira, a, i, k, desca, desca(m_) )
370*
371* Guard against zero C or R due to underflow.
372*
373 IF( c.EQ.zero .OR. r.EQ.zero )
374 $ GO TO 200
375 g = r / sclfac
376 f = one
377 s = c + r
378 160 CONTINUE
379 IF( c.GE.g .OR. max( f, c, ca ).GE.sfmax2 .OR.
380 $ min( r, g, ra ).LE.sfmin2 )GO TO 170
381 IF( sisnan( c+f+ca+r+g+ra ) ) THEN
382*
383* Exit if NaN to avoid infinite loop
384*
385 info = -3
386 CALL pxerbla( ictxt, 'PDGEBAL', -info )
387 RETURN
388 END IF
389 f = f*sclfac
390 c = c*sclfac
391 ca = ca*sclfac
392 r = r / sclfac
393 g = g / sclfac
394 ra = ra / sclfac
395 GO TO 160
396*
397 170 CONTINUE
398 g = c / sclfac
399 180 CONTINUE
400 IF( g.LT.r .OR. max( r, ra ).GE.sfmax2 .OR.
401 $ min( f, c, g, ca ).LE.sfmin2 )GO TO 190
402 f = f / sclfac
403 c = c / sclfac
404 g = g / sclfac
405 ca = ca / sclfac
406 r = r*sclfac
407 ra = ra*sclfac
408 GO TO 180
409*
410* Now balance.
411*
412 190 CONTINUE
413 IF( ( c+r ).GE.factor*s )
414 $ GO TO 200
415 IF( f.LT.one .AND. scale( i ).LT.one ) THEN
416 IF( f*scale( i ).LE.sfmin1 )
417 $ GO TO 200
418 END IF
419 IF( f.GT.one .AND. scale( i ).GT.one ) THEN
420 IF( scale( i ).GE.sfmax1 / f )
421 $ GO TO 200
422 END IF
423 g = one / f
424 scale( i ) = scale( i )*f
425 noconv = .true.
426*
427 CALL psscal( n-k+1, g, a, i, k, desca, desca(m_) )
428 CALL psscal( l, f, a, 1, i, desca, 1 )
429*
430 200 CONTINUE
431*
432 IF( noconv )
433 $ GO TO 140
434*
435 210 CONTINUE
436 ilo = k
437 ihi = l
438*
439 RETURN
440*
441* End of PSGEBAL
442*
443 END
subroutine chk1mat(ma, mapos0, na, napos0, ia, ja, desca, descapos0, info)
Definition chk1mat.f:3
subroutine infog2l(grindx, gcindx, desc, nprow, npcol, myrow, mycol, lrindx, lcindx, rsrc, csrc)
Definition infog2l.f:3
#define max(A, B)
Definition pcgemr.c:180
#define min(A, B)
Definition pcgemr.c:181
subroutine pselget(scope, top, alpha, a, ia, ja, desca)
Definition pselget.f:2
subroutine psgebal(job, n, a, desca, ilo, ihi, scale, info)
Definition psgebal.f:2
subroutine pxerbla(ictxt, srname, info)
Definition pxerbla.f:2