SCALAPACK 2.2.2
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
pstrord.f
Go to the documentation of this file.
1 SUBROUTINE pstrord( COMPQ, SELECT, PARA, N, T, IT, JT,
2 $ DESCT, Q, IQ, JQ, DESCQ, WR, WI, M, WORK, LWORK,
3 $ IWORK, LIWORK, INFO )
4*
5* Contribution from the Department of Computing Science and HPC2N,
6* Umea University, Sweden
7*
8* -- ScaLAPACK computational routine (version 2.0.2) --
9* Univ. of Tennessee, Univ. of California Berkeley, Univ. of Colorado Denver
10* May 1 2012
11*
12 IMPLICIT NONE
13*
14* .. Scalar Arguments ..
15 CHARACTER COMPQ
16 INTEGER INFO, LIWORK, LWORK, M, N,
17 $ it, jt, iq, jq
18* ..
19* .. Array Arguments ..
20 INTEGER SELECT( * )
21 INTEGER PARA( 6 ), DESCT( * ), DESCQ( * ), IWORK( * )
22 REAL Q( * ), T( * ), WI( * ), WORK( * ), WR( * )
23* ..
24*
25* Purpose
26* =======
27*
28* PSTRORD reorders the real Schur factorization of a real matrix
29* A = Q*T*Q**T, so that a selected cluster of eigenvalues appears
30* in the leading diagonal blocks of the upper quasi-triangular matrix
31* T, and the leading columns of Q form an orthonormal basis of the
32* corresponding right invariant subspace.
33*
34* T must be in Schur form (as returned by PSLAHQR), that is, block
35* upper triangular with 1-by-1 and 2-by-2 diagonal blocks.
36*
37* This subroutine uses a delay and accumulate procedure for performing
38* the off-diagonal updates (see references for details).
39*
40* Notes
41* =====
42*
43* Each global data object is described by an associated description
44* vector. This vector stores the information required to establish
45* the mapping between an object element and its corresponding process
46* and memory location.
47*
48* Let A be a generic term for any 2D block cyclicly distributed array.
49* Such a global array has an associated description vector DESCA.
50* In the following comments, the character _ should be read as
51* "of the global array".
52*
53* NOTATION STORED IN EXPLANATION
54* --------------- -------------- --------------------------------------
55* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
56* DTYPE_A = 1.
57* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
58* the BLACS process grid A is distribu-
59* ted over. The context itself is glo-
60* bal, but the handle (the integer
61* value) may vary.
62* M_A (global) DESCA( M_ ) The number of rows in the global
63* array A.
64* N_A (global) DESCA( N_ ) The number of columns in the global
65* array A.
66* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
67* the rows of the array.
68* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
69* the columns of the array.
70* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
71* row of the array A is distributed.
72* CSRC_A (global) DESCA( CSRC_ ) The process column over which the
73* first column of the array A is
74* distributed.
75* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
76* array. LLD_A >= MAX(1,LOCr(M_A)).
77*
78* Let K be the number of rows or columns of a distributed matrix,
79* and assume that its process grid has dimension p x q.
80* LOCr( K ) denotes the number of elements of K that a process
81* would receive if K were distributed over the p processes of its
82* process column.
83* Similarly, LOCc( K ) denotes the number of elements of K that a
84* process would receive if K were distributed over the q processes of
85* its process row.
86* The values of LOCr() and LOCc() may be determined via a call to the
87* ScaLAPACK tool function, NUMROC:
88* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
89* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
90* An upper bound for these quantities may be computed by:
91* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
92* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
93*
94* Arguments
95* =========
96*
97*
98* COMPQ (global input) CHARACTER*1
99* = 'V': update the matrix Q of Schur vectors;
100* = 'N': do not update Q.
101*
102* SELECT (global input/output) INTEGER array, dimension (N)
103* SELECT specifies the eigenvalues in the selected cluster. To
104* select a real eigenvalue w(j), SELECT(j) must be set to 1.
105* To select a complex conjugate pair of eigenvalues
106* w(j) and w(j+1), corresponding to a 2-by-2 diagonal block,
107* either SELECT(j) or SELECT(j+1) or both must be set to 1;
108* a complex conjugate pair of eigenvalues must be
109* either both included in the cluster or both excluded.
110* On output, the (partial) reordering is displayed.
111*
112* PARA (global input) INTEGER*6
113* Block parameters (some should be replaced by calls to
114* PILAENV and others by meaningful default values):
115* PARA(1) = maximum number of concurrent computational windows
116* allowed in the algorithm;
117* 0 < PARA(1) <= min(NPROW,NPCOL) must hold;
118* PARA(2) = number of eigenvalues in each window;
119* 0 < PARA(2) < PARA(3) must hold;
120* PARA(3) = window size; PARA(2) < PARA(3) < DESCT(MB_)
121* must hold;
122* PARA(4) = minimal percentage of flops required for
123* performing matrix-matrix multiplications instead
124* of pipelined orthogonal transformations;
125* 0 <= PARA(4) <= 100 must hold;
126* PARA(5) = width of block column slabs for row-wise
127* application of pipelined orthogonal
128* transformations in their factorized form;
129* 0 < PARA(5) <= DESCT(MB_) must hold.
130* PARA(6) = the maximum number of eigenvalues moved together
131* over a process border; in practice, this will be
132* approximately half of the cross border window size
133* 0 < PARA(6) <= PARA(2) must hold;
134*
135* N (global input) INTEGER
136* The order of the globally distributed matrix T. N >= 0.
137*
138* T (local input/output) REAL array,
139* dimension (LLD_T,LOCc(N)).
140* On entry, the local pieces of the global distributed
141* upper quasi-triangular matrix T, in Schur form. On exit, T is
142* overwritten by the local pieces of the reordered matrix T,
143* again in Schur form, with the selected eigenvalues in the
144* globally leading diagonal blocks.
145*
146* IT (global input) INTEGER
147* JT (global input) INTEGER
148* The row and column index in the global array T indicating the
149* first column of sub( T ). IT = JT = 1 must hold.
150*
151* DESCT (global and local input) INTEGER array of dimension DLEN_.
152* The array descriptor for the global distributed matrix T.
153*
154* Q (local input/output) REAL array,
155* dimension (LLD_Q,LOCc(N)).
156* On entry, if COMPQ = 'V', the local pieces of the global
157* distributed matrix Q of Schur vectors.
158* On exit, if COMPQ = 'V', Q has been postmultiplied by the
159* global orthogonal transformation matrix which reorders T; the
160* leading M columns of Q form an orthonormal basis for the
161* specified invariant subspace.
162* If COMPQ = 'N', Q is not referenced.
163*
164* IQ (global input) INTEGER
165* JQ (global input) INTEGER
166* The column index in the global array Q indicating the
167* first column of sub( Q ). IQ = JQ = 1 must hold.
168*
169* DESCQ (global and local input) INTEGER array of dimension DLEN_.
170* The array descriptor for the global distributed matrix Q.
171*
172* WR (global output) REAL array, dimension (N)
173* WI (global output) REAL array, dimension (N)
174* The real and imaginary parts, respectively, of the reordered
175* eigenvalues of T. The eigenvalues are in principle stored in
176* the same order as on the diagonal of T, with WR(i) = T(i,i)
177* and, if T(i:i+1,i:i+1) is a 2-by-2 diagonal block, WI(i) > 0
178* and WI(i+1) = -WI(i).
179* Note also that if a complex eigenvalue is sufficiently
180* ill-conditioned, then its value may differ significantly
181* from its value before reordering.
182*
183* M (global output) INTEGER
184* The dimension of the specified invariant subspace.
185* 0 <= M <= N.
186*
187* WORK (local workspace/output) REAL array,
188* dimension (LWORK)
189* On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
190*
191* LWORK (local input) INTEGER
192* The dimension of the array WORK.
193*
194* If LWORK = -1, then a workspace query is assumed; the routine
195* only calculates the optimal size of the WORK array, returns
196* this value as the first entry of the WORK array, and no error
197* message related to LWORK is issued by PXERBLA.
198*
199* IWORK (local workspace/output) INTEGER array, dimension (LIWORK)
200*
201* LIWORK (local input) INTEGER
202* The dimension of the array IWORK.
203*
204* If LIWORK = -1, then a workspace query is assumed; the
205* routine only calculates the optimal size of the IWORK array,
206* returns this value as the first entry of the IWORK array, and
207* no error message related to LIWORK is issued by PXERBLA.
208*
209* INFO (global output) INTEGER
210* = 0: successful exit
211* < 0: if INFO = -i, the i-th argument had an illegal value.
212* If the i-th argument is an array and the j-entry had
213* an illegal value, then INFO = -(i*1000+j), if the i-th
214* argument is a scalar and had an illegal value, then INFO = -i.
215* > 0: here we have several possibilites
216* *) Reordering of T failed because some eigenvalues are too
217* close to separate (the problem is very ill-conditioned);
218* T may have been partially reordered, and WR and WI
219* contain the eigenvalues in the same order as in T.
220* On exit, INFO = {the index of T where the swap failed}.
221* *) A 2-by-2 block to be reordered split into two 1-by-1
222* blocks and the second block failed to swap with an
223* adjacent block.
224* On exit, INFO = {the index of T where the swap failed}.
225* *) If INFO = N+1, there is no valid BLACS context (see the
226* BLACS documentation for details).
227* In a future release this subroutine may distinguish between
228* the case 1 and 2 above.
229*
230* Additional requirements
231* =======================
232*
233* The following alignment requirements must hold:
234* (a) DESCT( MB_ ) = DESCT( NB_ ) = DESCQ( MB_ ) = DESCQ( NB_ )
235* (b) DESCT( RSRC_ ) = DESCQ( RSRC_ )
236* (c) DESCT( CSRC_ ) = DESCQ( CSRC_ )
237*
238* All matrices must be blocked by a block factor larger than or
239* equal to two (3). This is to simplify reordering across processor
240* borders in the presence of 2-by-2 blocks.
241*
242* Limitations
243* ===========
244*
245* This algorithm cannot work on submatrices of T and Q, i.e.,
246* IT = JT = IQ = JQ = 1 must hold. This is however no limitation
247* since PDLAHQR does not compute Schur forms of submatrices anyway.
248*
249* References
250* ==========
251*
252* [1] Z. Bai and J. W. Demmel; On swapping diagonal blocks in real
253* Schur form, Linear Algebra Appl., 186:73--95, 1993. Also as
254* LAPACK Working Note 54.
255*
256* [2] D. Kressner; Block algorithms for reordering standard and
257* generalized Schur forms, ACM TOMS, 32(4):521-532, 2006.
258* Also LAPACK Working Note 171.
259*
260* [3] R. Granat, B. Kagstrom, and D. Kressner; Parallel eigenvalue
261* reordering in real Schur form, Concurrency and Computations:
262* Practice and Experience, 21(9):1225-1250, 2009. Also as
263* LAPACK Working Note 192.
264*
265* Parallel execution recommendations
266* ==================================
267*
268* Use a square grid, if possible, for maximum performance. The block
269* parameters in PARA should be kept well below the data distribution
270* block size. In particular, see [3] for recommended settings for
271* these parameters.
272*
273* In general, the parallel algorithm strives to perform as much work
274* as possible without crossing the block borders on the main block
275* diagonal.
276*
277* Contributors
278* ============
279*
280* Implemented by Robert Granat, Dept. of Computing Science and HPC2N,
281* Umea University, Sweden, March 2007,
282* in collaboration with Bo Kagstrom and Daniel Kressner.
283* Modified by Meiyue Shao, October 2011.
284*
285* Revisions
286* =========
287*
288* Please send bug-reports to granat@cs.umu.se
289*
290* Keywords
291* ========
292*
293* Real Schur form, eigenvalue reordering
294*
295* =====================================================================
296* ..
297* .. Parameters ..
298 CHARACTER TOP
299 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
300 $ lld_, mb_, m_, nb_, n_, rsrc_
301 REAL ZERO, ONE
302 PARAMETER ( TOP = '1-Tree',
303 $ block_cyclic_2d = 1, dlen_ = 9, dtype_ = 1,
304 $ ctxt_ = 2, m_ = 3, n_ = 4, mb_ = 5, nb_ = 6,
305 $ rsrc_ = 7, csrc_ = 8, lld_ = 9,
306 $ zero = 0.0, one = 1.0 )
307* ..
308* .. Local Scalars ..
309 LOGICAL LQUERY, PAIR, SWAP, WANTQ,
310 $ ISHH, FIRST, SKIP1CR, BORDER, LASTWAIT
311 INTEGER NPROW, NPCOL, MYROW, MYCOL, NB, NPROCS,
312 $ IERR, DIM1, INDX, LLDT, TRSRC, TCSRC, ILOC1,
313 $ jloc1, myierr, ictxt,
314 $ rsrc1, csrc1, iloc3, jloc3, trsrc3,
315 $ tcsrc3, iloc, jloc, trsrc4, tcsrc4,
316 $ flops, i, ilo, ihi, j, k, kk, kks,
317 $ ks, liwmin, lwmin, mmult, n1, n2,
318 $ ncb, ndtraf, nitraf, nwin, numwin, pdtraf,
319 $ pitraf, pdw, wineig, winsiz, lldq,
320 $ rsrc, csrc, ililo, ilihi, ilsel, irsrc,
321 $ icsrc, ipiw, ipw1, ipw2, ipw3, tihi, tilo,
322 $ lihi, window, lilo, lsel, buffer,
323 $ nmwin2, bufflen, lrows, lcols, iloc2, jloc2,
324 $ wneicr, window0, rsrc4, csrc4, lihi4, rsrc3,
325 $ csrc3, rsrc2, csrc2, lihic, lihi1, ilen4,
326 $ seli4, ilen1, dim4, ipw4, qrows, trows,
327 $ tcols, ipw5, ipw6, ipw7, ipw8, jloc4,
328 $ east, west, iloc4, south, north, indxs,
329 $ itt, jtt, ilen, dlen, indxe, trsrc1, tcsrc1,
330 $ trsrc2, tcsrc2, ilos, dir, tlihi, tlilo, tlsel,
331 $ round, last, win0s, win0e, wine
332 REAL ELEM, ELEM1, ELEM2, ELEM3, ELEM4, SN, CS, TMP,
333 $ ELEM5
334* ..
335* .. Local Arrays ..
336 INTEGER IBUFF( 8 ), IDUM1( 1 ), IDUM2( 1 ), MMAX( 1 ),
337 $ MMIN( 1 )
338* ..
339* .. External Functions ..
340 LOGICAL LSAME
341 INTEGER NUMROC, INDXG2P, INDXG2L
342 EXTERNAL lsame, numroc, indxg2p, indxg2l
343* ..
344* .. External Subroutines ..
345 EXTERNAL pslacpy, pxerbla, pchk1mat, pchk2mat,
346 $ sgemm, slamov, ilacpy, chk1mat,
347 $ infog2l, dgsum2d, sgesd2d, sgerv2d, sgebs2d,
348 $ sgebr2d, igsum2d, blacs_gridinfo, igebs2d,
349 $ igebr2d, igamx2d, igamn2d, bslaapp, bdtrexc
350* ..
351* .. Intrinsic Functions ..
352 INTRINSIC abs, max, sqrt, min
353* ..
354* .. Local Functions ..
355 INTEGER ICEIL
356* ..
357* .. Executable Statements ..
358*
359* Get grid parameters.
360*
361 ictxt = desct( ctxt_ )
362 CALL blacs_gridinfo( ictxt, nprow, npcol, myrow, mycol )
363 nprocs = nprow*npcol
364*
365* Test if grid is O.K., i.e., the context is valid.
366*
367 info = 0
368 IF( nprow.EQ.-1 ) THEN
369 info = n+1
370 END IF
371*
372* Check if workspace query.
373*
374 lquery = lwork.EQ.-1 .OR. liwork.EQ.-1
375*
376* Test dimensions for local sanity.
377*
378 IF( info.EQ.0 ) THEN
379 CALL chk1mat( n, 5, n, 5, it, jt, desct, 9, info )
380 END IF
381 IF( info.EQ.0 ) THEN
382 CALL chk1mat( n, 5, n, 5, iq, jq, descq, 13, info )
383 END IF
384*
385* Check the blocking sizes for alignment requirements.
386*
387 IF( info.EQ.0 ) THEN
388 IF( desct( mb_ ).NE.desct( nb_ ) ) info = -(1000*9 + mb_)
389 END IF
390 IF( info.EQ.0 ) THEN
391 IF( descq( mb_ ).NE.descq( nb_ ) ) info = -(1000*13 + mb_)
392 END IF
393 IF( info.EQ.0 ) THEN
394 IF( desct( mb_ ).NE.descq( mb_ ) ) info = -(1000*9 + mb_)
395 END IF
396*
397* Check the blocking sizes for minimum sizes.
398*
399 IF( info.EQ.0 ) THEN
400 IF( n.NE.desct( mb_ ) .AND. desct( mb_ ).LT.3 )
401 $ info = -(1000*9 + mb_)
402 IF( n.NE.descq( mb_ ) .AND. descq( mb_ ).LT.3 )
403 $ info = -(1000*13 + mb_)
404 END IF
405*
406* Check parameters in PARA.
407*
408 nb = desct( mb_ )
409 IF( info.EQ.0 ) THEN
410 IF( para(1).LT.1 .OR. para(1).GT.min(nprow,npcol) )
411 $ info = -(1000 * 4 + 1)
412 IF( para(2).LT.1 .OR. para(2).GE.para(3) )
413 $ info = -(1000 * 4 + 2)
414 IF( para(3).LT.1 .OR. para(3).GT.nb )
415 $ info = -(1000 * 4 + 3)
416 IF( para(4).LT.0 .OR. para(4).GT.100 )
417 $ info = -(1000 * 4 + 4)
418 IF( para(5).LT.1 .OR. para(5).GT.nb )
419 $ info = -(1000 * 4 + 5)
420 IF( para(6).LT.1 .OR. para(6).GT.para(2) )
421 $ info = -(1000 * 4 + 6)
422 END IF
423*
424* Check requirements on IT, JT, IQ and JQ.
425*
426 IF( info.EQ.0 ) THEN
427 IF( it.NE.1 ) info = -6
428 IF( jt.NE.it ) info = -7
429 IF( iq.NE.1 ) info = -10
430 IF( jq.NE.iq ) info = -11
431 END IF
432*
433* Test input parameters for global sanity.
434*
435 IF( info.EQ.0 ) THEN
436 CALL pchk1mat( n, 5, n, 5, it, jt, desct, 9, 0, idum1,
437 $ idum2, info )
438 END IF
439 IF( info.EQ.0 ) THEN
440 CALL pchk1mat( n, 5, n, 5, iq, jq, descq, 13, 0, idum1,
441 $ idum2, info )
442 END IF
443 IF( info.EQ.0 ) THEN
444 CALL pchk2mat( n, 5, n, 5, it, jt, desct, 9, n, 5, n, 5,
445 $ iq, jq, descq, 13, 0, idum1, idum2, info )
446 END IF
447*
448* Decode and test the input parameters.
449*
450 IF( info.EQ.0 .OR. lquery ) THEN
451*
452 wantq = lsame( compq, 'V' )
453 IF( n.LT.0 ) THEN
454 info = -4
455 ELSE
456*
457* Extract local leading dimension.
458*
459 lldt = desct( lld_ )
460 lldq = descq( lld_ )
461*
462* Check the SELECT vector for consistency and set M to the
463* dimension of the specified invariant subspace.
464*
465 m = 0
466 DO 10 k = 1, n
467 IF( k.LT.n ) THEN
468 CALL infog2l( k+1, k, desct, nprow, npcol,
469 $ myrow, mycol, itt, jtt, trsrc, tcsrc )
470 IF( myrow.EQ.trsrc .AND. mycol.EQ.tcsrc ) THEN
471 elem = t( (jtt-1)*lldt + itt )
472 IF( elem.NE.zero ) THEN
473 IF( SELECT(k).NE.0 .AND.
474 $ SELECT(k+1).EQ.0 ) THEN
475* INFO = -2
476 SELECT(k+1) = 1
477 ELSEIF( SELECT(k).EQ.0 .AND.
478 $ SELECT(k+1).NE.0 ) THEN
479* INFO = -2
480 SELECT(k) = 1
481 END IF
482 END IF
483 END IF
484 END IF
485 IF( SELECT(k).NE.0 ) m = m + 1
486 10 CONTINUE
487 mmax( 1 ) = m
488 mmin( 1 ) = m
489 IF( nprocs.GT.1 )
490 $ CALL igamx2d( ictxt, 'All', top, 1, 1, mmax( 1 ), 1, -1,
491 $ -1, -1, -1, -1 )
492 IF( nprocs.GT.1 )
493 $ CALL igamn2d( ictxt, 'All', top, 1, 1, mmin( 1 ), 1, -1,
494 $ -1, -1, -1, -1 )
495 IF( mmax( 1 ).GT.mmin( 1 ) ) THEN
496 m = mmax( 1 )
497 IF( nprocs.GT.1 )
498 $ CALL igamx2d( ictxt, 'All', top, n, 1, SELECT, n,
499 $ -1, -1, -1, -1, -1 )
500 END IF
501*
502* Compute needed workspace.
503*
504 n1 = m
505 n2 = n - m
506*
507 trows = numroc( n, nb, myrow, desct(rsrc_), nprow )
508 tcols = numroc( n, nb, mycol, desct(csrc_), npcol )
509 lwmin = n + 7*nb**2 + 2*trows*para( 3 ) + tcols*para( 3 ) +
510 $ max( trows*para( 3 ), tcols*para( 3 ) )
511 liwmin = 5*para( 1 ) + para( 2 )*para( 3 ) -
512 $ para( 2 ) * ( para( 2 ) + 1 ) / 2
513*
514 IF( lwork.LT.lwmin .AND. .NOT.lquery ) THEN
515 info = -17
516 ELSE IF( liwork.LT.liwmin .AND. .NOT.lquery ) THEN
517 info = -19
518 END IF
519 END IF
520 END IF
521*
522* Global maximum on info.
523*
524 IF( nprocs.GT.1 ) THEN
525 CALL igamx2d( ictxt, 'All', top, 1, 1, info, 1, -1, -1,
526 $ -1, -1, -1 )
527 END IF
528*
529* Return if some argument is incorrect.
530*
531 IF( info.NE.0 .AND. .NOT.lquery ) THEN
532 m = 0
533 CALL pxerbla( ictxt, 'PSTRORD', -info )
534 RETURN
535 ELSEIF( lquery ) THEN
536 work( 1 ) = float(lwmin)
537 iwork( 1 ) = liwmin
538 RETURN
539 END IF
540*
541* Quick return if possible.
542*
543 IF( m.EQ.n .OR. m.EQ.0 ) GO TO 545
544*
545* Set parameters.
546*
547 numwin = para( 1 )
548 wineig = max( para( 2 ), 2 )
549 winsiz = min( max( para( 3 ), para( 2 )*2 ), nb )
550 mmult = para( 4 )
551 ncb = para( 5 )
552 wneicr = para( 6 )
553*
554* Insert some pointers into INTEGER workspace.
555*
556* Information about all the active windows is stored
557* in IWORK( 1:5*NUMWIN ). Each processor has a copy.
558* LILO: start position
559* LIHI: stop position
560* LSEL: number of selected eigenvalues
561* RSRC: processor id (row)
562* CSRC: processor id (col)
563* IWORK( IPIW+ ) contain information of orthogonal transformations.
564*
565 ililo = 1
566 ilihi = ililo + numwin
567 ilsel = ilihi + numwin
568 irsrc = ilsel + numwin
569 icsrc = irsrc + numwin
570 ipiw = icsrc + numwin
571*
572* Insert some pointers into REAL workspace - for now we
573* only need two pointers.
574*
575 ipw1 = 1
576 ipw2 = ipw1 + nb
577*
578* Collect the selected blocks at the top-left corner of T.
579*
580* Globally: ignore eigenvalues that are already in order.
581* ILO is a global variable and is kept updated to be consistent
582* throughout the process mesh.
583*
584 ilo = 0
585 40 CONTINUE
586 ilo = ilo + 1
587 IF( ilo.LE.n ) THEN
588 IF( SELECT(ilo).NE.0 ) GO TO 40
589 END IF
590*
591* Globally: start the collection at the top of the matrix. Here,
592* IHI is a global variable and is kept updated to be consistent
593* throughout the process mesh.
594*
595 ihi = n
596*
597* Globally: While ( ILO <= M ) do
598 50 CONTINUE
599*
600 IF( ilo.LE.m ) THEN
601*
602* Depending on the value of ILO, find the diagonal block index J,
603* such that T(1+(J-1)*NB:1+J*NB,1+(J-1)*NB:1+J*NB) contains the
604* first unsorted eigenvalue. Check that J does not point to a
605* block with only one selected eigenvalue in the last position
606* which belongs to a splitted 2-by-2 block.
607*
608 ilos = ilo - 1
609 52 CONTINUE
610 ilos = ilos + 1
611 IF( SELECT(ilos).EQ.0 ) GO TO 52
612 IF( ilos.LT.n ) THEN
613 IF( SELECT(ilos+1).NE.0 .AND. mod(ilos,nb).EQ.0 ) THEN
614 CALL pselget( 'All', top, elem, t, ilos+1, ilos, desct )
615 IF( elem.NE.zero ) GO TO 52
616 END IF
617 END IF
618 j = iceil(ilos,nb)
619*
620* Globally: Set start values of LILO and LIHI for all processes.
621* Choose also the number of selected eigenvalues at top of each
622* diagonal block such that the number of eigenvalues which remain
623* to be reordered is an integer multiple of WINEIG.
624*
625* All the information is saved into the INTEGER workspace such
626* that all processors are aware of each others operations.
627*
628* Compute the number of concurrent windows.
629*
630 nmwin2 = (iceil(ihi,nb)*nb - (ilo-mod(ilo,nb)+1)+1) / nb
631 nmwin2 = min( min( numwin, nmwin2 ), iceil(n,nb) - j + 1 )
632*
633* For all windows, set LSEL = 0 and find a proper start value of
634* LILO such that LILO points at the first non-selected entry in
635* the corresponding diagonal block of T.
636*
637 DO 80 k = 1, nmwin2
638 iwork( ilsel+k-1) = 0
639 iwork( ililo+k-1) = max( ilo, (j-1)*nb+(k-1)*nb+1 )
640 lilo = iwork( ililo+k-1 )
641 82 CONTINUE
642 IF( SELECT(lilo).NE.0 .AND. lilo.LT.(j+k-1)*nb ) THEN
643 lilo = lilo + 1
644 IF( lilo.LE.n ) GO TO 82
645 END IF
646 iwork( ililo+k-1 ) = lilo
647*
648* Fix each LILO to ensure that no 2-by-2 block is cut in top
649* of the submatrix (LILO:LIHI,LILO:LIHI).
650*
651 lilo = iwork(ililo+k-1)
652 IF( lilo.GT.nb ) THEN
653 CALL pselget( 'All', top, elem, t, lilo, lilo-1, desct )
654 IF( elem.NE.zero ) THEN
655 IF( lilo.LT.(j+k-1)*nb ) THEN
656 iwork(ililo+k-1) = iwork(ililo+k-1) + 1
657 ELSE
658 iwork(ililo+k-1) = iwork(ililo+k-1) - 1
659 END IF
660 END IF
661 END IF
662*
663* Set a proper LIHI value for each window. Also find the
664* processors corresponding to the corresponding windows.
665*
666 iwork( ilihi+k-1 ) = iwork( ililo+k-1 )
667 iwork( irsrc+k-1 ) = indxg2p( iwork(ililo+k-1), nb, myrow,
668 $ desct( rsrc_ ), nprow )
669 iwork( icsrc+k-1 ) = indxg2p( iwork(ililo+k-1), nb, mycol,
670 $ desct( csrc_ ), npcol )
671 tilo = iwork(ililo+k-1)
672 tihi = min( n, iceil( tilo, nb ) * nb )
673 DO 90 kk = tihi, tilo, -1
674 IF( SELECT(kk).NE.0 ) THEN
675 iwork(ilihi+k-1) = max(iwork(ilihi+k-1) , kk )
676 iwork(ilsel+k-1) = iwork(ilsel+k-1) + 1
677 IF( iwork(ilsel+k-1).GT.wineig ) THEN
678 iwork(ilihi+k-1) = kk
679 iwork(ilsel+k-1) = 1
680 END IF
681 END IF
682 90 CONTINUE
683*
684* Fix each LIHI to avoid that bottom of window cuts 2-by-2
685* block. We exclude such a block if located on block (process)
686* border and on window border or if an inclusion would cause
687* violation on the maximum number of eigenvalues to reorder
688* inside each window. If only on window border, we include it.
689* The excluded block is included automatically later when a
690* subcluster is reordered into the block from South-East.
691*
692 lihi = iwork(ilihi+k-1)
693 IF( lihi.LT.n ) THEN
694 CALL pselget( 'All', top, elem, t, lihi+1, lihi, desct )
695 IF( elem.NE.zero ) THEN
696 IF( iceil( lihi, nb ) .NE. iceil( lihi+1, nb ) .OR.
697 $ iwork( ilsel+k-1 ).EQ.wineig ) THEN
698 iwork( ilihi+k-1 ) = iwork( ilihi+k-1 ) - 1
699 IF( iwork( ilsel+k-1 ).GT.2 )
700 $ iwork( ilsel+k-1 ) = iwork( ilsel+k-1 ) - 1
701 ELSE
702 iwork( ilihi+k-1 ) = iwork( ilihi+k-1 ) + 1
703 IF( SELECT(lihi+1).NE.0 )
704 $ iwork( ilsel+k-1 ) = iwork( ilsel+k-1 ) + 1
705 END IF
706 END IF
707 END IF
708 80 CONTINUE
709*
710* Fix the special cases of LSEL = 0 and LILO = LIHI for each
711* window by assuring that the stop-condition for local reordering
712* is fulfilled directly. Do this by setting LIHI = startposition
713* for the corresponding block and LILO = LIHI + 1.
714*
715 DO 85 k = 1, nmwin2
716 lilo = iwork( ililo + k - 1 )
717 lihi = iwork( ilihi + k - 1 )
718 lsel = iwork( ilsel + k - 1 )
719 IF( lsel.EQ.0 .OR. lilo.EQ.lihi ) THEN
720 lihi = iwork( ilihi + k - 1 )
721 iwork( ilihi + k - 1 ) = (iceil(lihi,nb)-1)*nb + 1
722 iwork( ililo + k - 1 ) = iwork( ilihi + k - 1 ) + 1
723 END IF
724 85 CONTINUE
725*
726* Associate all processors with the first computational window
727* that should be activated, if possible.
728*
729 lilo = ihi
730 lihi = ilo
731 lsel = m
732 first = .true.
733 DO 95 window = 1, nmwin2
734 rsrc = iwork(irsrc+window-1)
735 csrc = iwork(icsrc+window-1)
736 IF( myrow.EQ.rsrc .OR. mycol.EQ.csrc ) THEN
737 tlilo = iwork( ililo + window - 1 )
738 tlihi = iwork( ilihi + window - 1 )
739 tlsel = iwork( ilsel + window - 1 )
740 IF( (.NOT. ( lihi .GE. lilo + lsel ) ) .AND.
741 $ ( (tlihi .GE. tlilo + tlsel) .OR. first ) ) THEN
742 IF( first ) first = .false.
743 lilo = tlilo
744 lihi = tlihi
745 lsel = tlsel
746 GO TO 97
747 END IF
748 END IF
749 95 CONTINUE
750 97 CONTINUE
751*
752* Exclude all processors that are not involved in any
753* computational window right now.
754*
755 ierr = 0
756 IF( lilo.EQ.ihi .AND. lihi.EQ.ilo .AND. lsel.EQ.m )
757 $ GO TO 114
758*
759* Make sure all processors associated with a compuational window
760* enter the local reordering the first time.
761*
762 first = .true.
763*
764* Globally for all computational windows:
765* While ( LIHI >= LILO + LSEL ) do
766 round = 1
767 130 CONTINUE
768 IF( first .OR. ( lihi .GE. lilo + lsel ) ) THEN
769*
770* Perform computations in parallel: loop through all
771* compuational windows, do local reordering and accumulate
772* transformations, broadcast them in the corresponding block
773* row and columns and compute the corresponding updates.
774*
775 DO 110 window = 1, nmwin2
776 rsrc = iwork(irsrc+window-1)
777 csrc = iwork(icsrc+window-1)
778*
779* The process on the block diagonal computes the
780* reordering.
781*
782 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc ) THEN
783 lilo = iwork(ililo+window-1)
784 lihi = iwork(ilihi+window-1)
785 lsel = iwork(ilsel+window-1)
786*
787* Compute the local value of I -- start position.
788*
789 i = max( lilo, lihi - winsiz + 1 )
790*
791* Fix my I to avoid that top of window cuts a 2-by-2
792* block.
793*
794 IF( i.GT.lilo ) THEN
795 CALL infog2l( i, i-1, desct, nprow, npcol, myrow,
796 $ mycol, iloc, jloc, rsrc, csrc )
797 IF( t( lldt*(jloc-1) + iloc ).NE.zero )
798 $ i = i + 1
799 END IF
800*
801* Compute local indicies for submatrix to operate on.
802*
803 CALL infog2l( i, i, desct, nprow, npcol,
804 $ myrow, mycol, iloc1, jloc1, rsrc, csrc )
805*
806* The active window is ( I:LIHI, I:LIHI ). Reorder
807* eigenvalues within this window and pipeline
808* transformations.
809*
810 nwin = lihi - i + 1
811 ks = 0
812 pitraf = ipiw
813 pdtraf = ipw2
814*
815 pair = .false.
816 DO 140 k = i, lihi
817 IF( pair ) THEN
818 pair = .false.
819 ELSE
820 swap = SELECT( k ).NE.0
821 IF( k.LT.lihi ) THEN
822 CALL infog2l( k+1, k, desct, nprow, npcol,
823 $ myrow, mycol, iloc, jloc, rsrc, csrc )
824 IF( t( lldt*(jloc-1) + iloc ).NE.zero )
825 $ pair = .true.
826 END IF
827 IF( swap ) THEN
828 ks = ks + 1
829*
830* Swap the K-th block to position I+KS-1.
831*
832 ierr = 0
833 kk = k - i + 1
834 kks = ks
835 IF( kk.NE.ks ) THEN
836 nitraf = liwork - pitraf + 1
837 ndtraf = lwork - pdtraf + 1
838 CALL bstrexc( nwin,
839 $ t(lldt*(jloc1-1) + iloc1), lldt, kk,
840 $ kks, nitraf, iwork( pitraf ), ndtraf,
841 $ work( pdtraf ), work(ipw1), ierr )
842 pitraf = pitraf + nitraf
843 pdtraf = pdtraf + ndtraf
844*
845* Update array SELECT.
846*
847 IF ( pair ) THEN
848 DO 150 j = i+kk-1, i+kks, -1
849 SELECT(j+1) = SELECT(j-1)
850 150 CONTINUE
851 SELECT(i+kks-1) = 1
852 SELECT(i+kks) = 1
853 ELSE
854 DO 160 j = i+kk-1, i+kks, -1
855 SELECT(j) = SELECT(j-1)
856 160 CONTINUE
857 SELECT(i+kks-1) = 1
858 END IF
859*
860 IF ( ierr.EQ.1 .OR. ierr.EQ.2 ) THEN
861*
862* Some blocks are too close to swap:
863* prepare to leave in a clean fashion. If
864* IERR.EQ.2, we must update SELECT to
865* account for the fact that the 2 by 2
866* block to be reordered did split and the
867* first part of this block is already
868* reordered.
869*
870 IF ( ierr.EQ.2 ) THEN
871 SELECT( i+kks-3 ) = 1
872 SELECT( i+kks-1 ) = 0
873 kks = kks + 1
874 END IF
875*
876* Update off-diagonal blocks immediately.
877*
878 GO TO 170
879 END IF
880 ks = kks
881 END IF
882 IF( pair )
883 $ ks = ks + 1
884 END IF
885 END IF
886 140 CONTINUE
887 END IF
888 110 CONTINUE
889 170 CONTINUE
890*
891* The on-diagonal processes save their information from the
892* local reordering in the integer buffer. This buffer is
893* broadcasted to updating processors, see below.
894*
895 DO 175 window = 1, nmwin2
896 rsrc = iwork(irsrc+window-1)
897 csrc = iwork(icsrc+window-1)
898 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc ) THEN
899 ibuff( 1 ) = i
900 ibuff( 2 ) = nwin
901 ibuff( 3 ) = pitraf
902 ibuff( 4 ) = ks
903 ibuff( 5 ) = pdtraf
904 ibuff( 6 ) = ndtraf
905 ilen = pitraf - ipiw
906 dlen = pdtraf - ipw2
907 ibuff( 7 ) = ilen
908 ibuff( 8 ) = dlen
909 END IF
910 175 CONTINUE
911*
912* For the updates with respect to the local reordering, we
913* organize the updates in two phases where the update
914* "direction" (controlled by the DIR variable below) is first
915* chosen to be the corresponding rows, then the corresponding
916* columns.
917*
918 DO 1111 dir = 1, 2
919*
920* Broadcast information about the reordering and the
921* accumulated transformations: I, NWIN, PITRAF, NITRAF,
922* PDTRAF, NDTRAF. If no broadcast is performed, use an
923* artificial value of KS to prevent updating indicies for
924* windows already finished (use KS = -1).
925*
926 DO 111 window = 1, nmwin2
927 rsrc = iwork(irsrc+window-1)
928 csrc = iwork(icsrc+window-1)
929 IF( myrow.EQ.rsrc .OR. mycol.EQ.csrc ) THEN
930 lilo = iwork(ililo+window-1)
931 lihi = iwork(ilihi+window-1)
932 lsel = iwork(ilsel+window-1)
933 END IF
934 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc ) THEN
935 IF( npcol.GT.1 .AND. dir.EQ.1 )
936 $ CALL igebs2d( ictxt, 'Row', top, 8, 1, ibuff, 8 )
937 IF( nprow.GT.1 .AND. dir.EQ.2 )
938 $ CALL igebs2d( ictxt, 'Col', top, 8, 1, ibuff, 8 )
939 ELSEIF( myrow.EQ.rsrc .OR. mycol.EQ.csrc ) THEN
940 IF( npcol.GT.1 .AND. dir.EQ.1 .AND. myrow.EQ.rsrc )
941 $ THEN
942 IF( first .OR. (lihi .GE. lilo + lsel) ) THEN
943 CALL igebr2d( ictxt, 'Row', top, 8, 1, ibuff, 8,
944 $ rsrc, csrc )
945 i = ibuff( 1 )
946 nwin = ibuff( 2 )
947 pitraf = ibuff( 3 )
948 ks = ibuff( 4 )
949 pdtraf = ibuff( 5 )
950 ndtraf = ibuff( 6 )
951 ilen = ibuff( 7 )
952 dlen = ibuff( 8 )
953 ELSE
954 ilen = 0
955 dlen = 0
956 ks = -1
957 END IF
958 END IF
959 IF( nprow.GT.1 .AND. dir.EQ.2 .AND. mycol.EQ.csrc )
960 $ THEN
961 IF( first .OR. (lihi .GE. lilo + lsel) ) THEN
962 CALL igebr2d( ictxt, 'Col', top, 8, 1, ibuff, 8,
963 $ rsrc, csrc )
964 i = ibuff( 1 )
965 nwin = ibuff( 2 )
966 pitraf = ibuff( 3 )
967 ks = ibuff( 4 )
968 pdtraf = ibuff( 5 )
969 ndtraf = ibuff( 6 )
970 ilen = ibuff( 7 )
971 dlen = ibuff( 8 )
972 ELSE
973 ilen = 0
974 dlen = 0
975 ks = -1
976 END IF
977 END IF
978 END IF
979*
980* Broadcast the accumulated transformations - copy all
981* information from IWORK(IPIW:PITRAF-1) and
982* WORK(IPW2:PDTRAF-1) to a buffer and broadcast this
983* buffer in the corresponding block row and column. On
984* arrival, copy the information back to the correct part of
985* the workspace. This step is avoided if no computations
986* were performed at the diagonal processor, i.e.,
987* BUFFLEN = 0.
988*
989 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc ) THEN
990 buffer = pdtraf
991 bufflen = dlen + ilen
992 IF( bufflen.NE.0 ) THEN
993 DO 180 indx = 1, ilen
994 work( buffer+indx-1 ) =
995 $ float( iwork(ipiw+indx-1) )
996 180 CONTINUE
997 CALL slamov( 'All', dlen, 1, work( ipw2 ),
998 $ dlen, work(buffer+ilen), dlen )
999 IF( npcol.GT.1 .AND. dir.EQ.1 ) THEN
1000 CALL sgebs2d( ictxt, 'Row', top, bufflen, 1,
1001 $ work(buffer), bufflen )
1002 END IF
1003 IF( nprow.GT.1 .AND. dir.EQ.2 ) THEN
1004 CALL sgebs2d( ictxt, 'Col', top, bufflen, 1,
1005 $ work(buffer), bufflen )
1006 END IF
1007 END IF
1008 ELSEIF( myrow.EQ.rsrc .OR. mycol.EQ.csrc ) THEN
1009 IF( npcol.GT.1 .AND. dir.EQ.1 .AND. myrow.EQ.rsrc )
1010 $ THEN
1011 buffer = pdtraf
1012 bufflen = dlen + ilen
1013 IF( bufflen.NE.0 ) THEN
1014 CALL sgebr2d( ictxt, 'Row', top, bufflen, 1,
1015 $ work(buffer), bufflen, rsrc, csrc )
1016 END IF
1017 END IF
1018 IF( nprow.GT.1 .AND. dir.EQ.2 .AND. mycol.EQ.csrc )
1019 $ THEN
1020 buffer = pdtraf
1021 bufflen = dlen + ilen
1022 IF( bufflen.NE.0 ) THEN
1023 CALL sgebr2d( ictxt, 'Col', top, bufflen, 1,
1024 $ work(buffer), bufflen, rsrc, csrc )
1025 END IF
1026 END IF
1027 IF((npcol.GT.1.AND.dir.EQ.1.AND.myrow.EQ.rsrc).OR.
1028 $ (nprow.GT.1.AND.dir.EQ.2.AND.mycol.EQ.csrc ) )
1029 $ THEN
1030 IF( bufflen.NE.0 ) THEN
1031 DO 190 indx = 1, ilen
1032 iwork(ipiw+indx-1) =
1033 $ int(work( buffer+indx-1 ))
1034 190 CONTINUE
1035 CALL slamov( 'All', dlen, 1,
1036 $ work( buffer+ilen ), dlen,
1037 $ work( ipw2 ), dlen )
1038 END IF
1039 END IF
1040 END IF
1041 111 CONTINUE
1042*
1043* Now really perform the updates by applying the orthogonal
1044* transformations to the out-of-window parts of T and Q. This
1045* step is avoided if no reordering was performed by the on-
1046* diagonal processor from the beginning, i.e., BUFFLEN = 0.
1047*
1048* Count number of operations to decide whether to use
1049* matrix-matrix multiplications for updating off-diagonal
1050* parts or not.
1051*
1052 DO 112 window = 1, nmwin2
1053 rsrc = iwork(irsrc+window-1)
1054 csrc = iwork(icsrc+window-1)
1055*
1056 IF( (myrow.EQ.rsrc .AND. dir.EQ.1 ).OR.
1057 $ (mycol.EQ.csrc .AND. dir.EQ.2 ) ) THEN
1058 lilo = iwork(ililo+window-1)
1059 lihi = iwork(ilihi+window-1)
1060 lsel = iwork(ilsel+window-1)
1061*
1062* Skip update part for current WINDOW if BUFFLEN = 0.
1063*
1064 IF( bufflen.EQ.0 ) GO TO 295
1065*
1066 nitraf = pitraf - ipiw
1067 ishh = .false.
1068 flops = 0
1069 DO 200 k = 1, nitraf
1070 IF( iwork( ipiw + k - 1 ).LE.nwin ) THEN
1071 flops = flops + 6
1072 ELSE
1073 flops = flops + 11
1074 ishh = .true.
1075 END IF
1076 200 CONTINUE
1077*
1078* Compute amount of work space necessary for performing
1079* matrix-matrix multiplications.
1080*
1081 pdw = buffer
1082 ipw3 = pdw + nwin*nwin
1083 ELSE
1084 flops = 0
1085 END IF
1086*
1087 IF( flops.NE.0 .AND.
1088 $ ( flops*100 ) / ( 2*nwin*nwin ) .GE. mmult ) THEN
1089*
1090* Update off-diagonal blocks and Q using matrix-matrix
1091* multiplications; if there are no Householder
1092* reflectors it is preferable to take the triangular
1093* block structure of the transformation matrix into
1094* account.
1095*
1096 CALL slaset( 'All', nwin, nwin, zero, one,
1097 $ work( pdw ), nwin )
1098 CALL bslaapp( 1, nwin, nwin, ncb, work( pdw ), nwin,
1099 $ nitraf, iwork(ipiw), work( ipw2 ), work(ipw3) )
1100*
1101 IF( ishh ) THEN
1102*
1103* Loop through the local blocks of the distributed
1104* matrices T and Q and update them according to the
1105* performed reordering.
1106*
1107* Update the columns of T and Q affected by the
1108* reordering.
1109*
1110 IF( dir.EQ.2 ) THEN
1111 DO 210 indx = 1, i-1, nb
1112 CALL infog2l( indx, i, desct, nprow, npcol,
1113 $ myrow, mycol, iloc, jloc, rsrc1, csrc1 )
1114 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 )
1115 $ THEN
1116 lrows = min(nb,i-indx)
1117 CALL sgemm( 'No transpose',
1118 $ 'No transpose', lrows, nwin, nwin,
1119 $ one, t((jloc-1)*lldt+iloc), lldt,
1120 $ work( pdw ), nwin, zero,
1121 $ work(ipw3), lrows )
1122 CALL slamov( 'All', lrows, nwin,
1123 $ work(ipw3), lrows,
1124 $ t((jloc-1)*lldt+iloc), lldt )
1125 END IF
1126 210 CONTINUE
1127 IF( wantq ) THEN
1128 DO 220 indx = 1, n, nb
1129 CALL infog2l( indx, i, descq, nprow,
1130 $ npcol, myrow, mycol, iloc, jloc,
1131 $ rsrc1, csrc1 )
1132 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc1 )
1133 $ THEN
1134 lrows = min(nb,n-indx+1)
1135 CALL sgemm( 'No transpose',
1136 $ 'No transpose', lrows, nwin, nwin,
1137 $ one, q((jloc-1)*lldq+iloc), lldq,
1138 $ work( pdw ), nwin, zero,
1139 $ work(ipw3), lrows )
1140 CALL slamov( 'All', lrows, nwin,
1141 $ work(ipw3), lrows,
1142 $ q((jloc-1)*lldq+iloc), lldq )
1143 END IF
1144 220 CONTINUE
1145 END IF
1146 END IF
1147*
1148* Update the rows of T affected by the reordering
1149*
1150 IF( dir.EQ.1 ) THEN
1151 IF( lihi.LT.n ) THEN
1152 IF( mod(lihi,nb).GT.0 ) THEN
1153 indx = lihi + 1
1154 CALL infog2l( i, indx, desct, nprow,
1155 $ npcol, myrow, mycol, iloc, jloc,
1156 $ rsrc1, csrc1 )
1157 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc1 )
1158 $ THEN
1159 lcols = mod( min( nb-mod(lihi,nb),
1160 $ n-lihi ), nb )
1161 CALL sgemm( 'Transpose',
1162 $ 'No Transpose', nwin, lcols, nwin,
1163 $ one, work( pdw ), nwin,
1164 $ t((jloc-1)*lldt+iloc), lldt, zero,
1165 $ work(ipw3), nwin )
1166 CALL slamov( 'All', nwin, lcols,
1167 $ work(ipw3), nwin,
1168 $ t((jloc-1)*lldt+iloc), lldt )
1169 END IF
1170 END IF
1171 indxs = iceil(lihi,nb)*nb + 1
1172 DO 230 indx = indxs, n, nb
1173 CALL infog2l( i, indx, desct, nprow,
1174 $ npcol, myrow, mycol, iloc, jloc,
1175 $ rsrc1, csrc1 )
1176 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc1 )
1177 $ THEN
1178 lcols = min( nb, n-indx+1 )
1179 CALL sgemm( 'Transpose',
1180 $ 'No Transpose', nwin, lcols, nwin,
1181 $ one, work( pdw ), nwin,
1182 $ t((jloc-1)*lldt+iloc), lldt, zero,
1183 $ work(ipw3), nwin )
1184 CALL slamov( 'All', nwin, lcols,
1185 $ work(ipw3), nwin,
1186 $ t((jloc-1)*lldt+iloc), lldt )
1187 END IF
1188 230 CONTINUE
1189 END IF
1190 END IF
1191 ELSE
1192*
1193* The NWIN-by-NWIN matrix U containing the
1194* accumulated orthogonal transformations has the
1195* following structure:
1196*
1197* [ U11 U12 ]
1198* U = [ ],
1199* [ U21 U22 ]
1200*
1201* where U21 is KS-by-KS upper triangular and U12 is
1202* (NWIN-KS)-by-(NWIN-KS) lower triangular.
1203*
1204* Update the columns of T and Q affected by the
1205* reordering.
1206*
1207* Compute T2*U21 + T1*U11 in workspace.
1208*
1209 IF( dir.EQ.2 ) THEN
1210 DO 240 indx = 1, i-1, nb
1211 CALL infog2l( indx, i, desct, nprow, npcol,
1212 $ myrow, mycol, iloc, jloc, rsrc1, csrc1 )
1213 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 )
1214 $ THEN
1215 jloc1 = indxg2l( i+nwin-ks, nb, mycol,
1216 $ desct( csrc_ ), npcol )
1217 lrows = min(nb,i-indx)
1218 CALL slamov( 'All', lrows, ks,
1219 $ t((jloc1-1)*lldt+iloc ), lldt,
1220 $ work(ipw3), lrows )
1221 CALL strmm( 'Right', 'Upper',
1222 $ 'No transpose',
1223 $ 'Non-unit', lrows, ks, one,
1224 $ work( pdw+nwin-ks ), nwin,
1225 $ work(ipw3), lrows )
1226 CALL sgemm( 'No transpose',
1227 $ 'No transpose', lrows, ks, nwin-ks,
1228 $ one, t((jloc-1)*lldt+iloc), lldt,
1229 $ work( pdw ), nwin, one, work(ipw3),
1230 $ lrows )
1231*
1232* Compute T1*U12 + T2*U22 in workspace.
1233*
1234 CALL slamov( 'All', lrows, nwin-ks,
1235 $ t((jloc-1)*lldt+iloc), lldt,
1236 $ work( ipw3+ks*lrows ), lrows )
1237 CALL strmm( 'Right', 'Lower',
1238 $ 'No transpose', 'Non-unit',
1239 $ lrows, nwin-ks, one,
1240 $ work( pdw+nwin*ks ), nwin,
1241 $ work( ipw3+ks*lrows ), lrows )
1242 CALL sgemm( 'No transpose',
1243 $ 'No transpose', lrows, nwin-ks, ks,
1244 $ one, t((jloc1-1)*lldt+iloc), lldt,
1245 $ work( pdw+nwin*ks+nwin-ks ), nwin,
1246 $ one, work( ipw3+ks*lrows ), lrows )
1247*
1248* Copy workspace to T.
1249*
1250 CALL slamov( 'All', lrows, nwin,
1251 $ work(ipw3), lrows,
1252 $ t((jloc-1)*lldt+iloc), lldt )
1253 END IF
1254 240 CONTINUE
1255 IF( wantq ) THEN
1256*
1257* Compute Q2*U21 + Q1*U11 in workspace.
1258*
1259 DO 250 indx = 1, n, nb
1260 CALL infog2l( indx, i, descq, nprow,
1261 $ npcol, myrow, mycol, iloc, jloc,
1262 $ rsrc1, csrc1 )
1263 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc1 )
1264 $ THEN
1265 jloc1 = indxg2l( i+nwin-ks, nb,
1266 $ mycol, descq( csrc_ ), npcol )
1267 lrows = min(nb,n-indx+1)
1268 CALL slamov( 'All', lrows, ks,
1269 $ q((jloc1-1)*lldq+iloc ), lldq,
1270 $ work(ipw3), lrows )
1271 CALL strmm( 'Right', 'Upper',
1272 $ 'No transpose', 'Non-unit',
1273 $ lrows, ks, one,
1274 $ work( pdw+nwin-ks ), nwin,
1275 $ work(ipw3), lrows )
1276 CALL sgemm( 'No transpose',
1277 $ 'No transpose', lrows, ks,
1278 $ nwin-ks, one,
1279 $ q((jloc-1)*lldq+iloc), lldq,
1280 $ work( pdw ), nwin, one,
1281 $ work(ipw3), lrows )
1282*
1283* Compute Q1*U12 + Q2*U22 in workspace.
1284*
1285 CALL slamov( 'All', lrows, nwin-ks,
1286 $ q((jloc-1)*lldq+iloc), lldq,
1287 $ work( ipw3+ks*lrows ), lrows)
1288 CALL strmm( 'Right', 'Lower',
1289 $ 'No transpose', 'Non-unit',
1290 $ lrows, nwin-ks, one,
1291 $ work( pdw+nwin*ks ), nwin,
1292 $ work( ipw3+ks*lrows ), lrows)
1293 CALL sgemm( 'No transpose',
1294 $ 'No transpose', lrows, nwin-ks,
1295 $ ks, one, q((jloc1-1)*lldq+iloc),
1296 $ lldq, work(pdw+nwin*ks+nwin-ks),
1297 $ nwin, one, work( ipw3+ks*lrows ),
1298 $ lrows )
1299*
1300* Copy workspace to Q.
1301*
1302 CALL slamov( 'All', lrows, nwin,
1303 $ work(ipw3), lrows,
1304 $ q((jloc-1)*lldq+iloc), lldq )
1305 END IF
1306 250 CONTINUE
1307 END IF
1308 END IF
1309*
1310 IF( dir.EQ.1 ) THEN
1311 IF ( lihi.LT.n ) THEN
1312*
1313* Compute U21**T*T2 + U11**T*T1 in workspace.
1314*
1315 IF( mod(lihi,nb).GT.0 ) THEN
1316 indx = lihi + 1
1317 CALL infog2l( i, indx, desct, nprow,
1318 $ npcol, myrow, mycol, iloc, jloc,
1319 $ rsrc1, csrc1 )
1320 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc1 )
1321 $ THEN
1322 iloc1 = indxg2l( i+nwin-ks, nb, myrow,
1323 $ desct( rsrc_ ), nprow )
1324 lcols = mod( min( nb-mod(lihi,nb),
1325 $ n-lihi ), nb )
1326 CALL slamov( 'All', ks, lcols,
1327 $ t((jloc-1)*lldt+iloc1), lldt,
1328 $ work(ipw3), nwin )
1329 CALL strmm( 'Left', 'Upper',
1330 $ 'Transpose', 'Non-unit', ks,
1331 $ lcols, one, work( pdw+nwin-ks ),
1332 $ nwin, work(ipw3), nwin )
1333 CALL sgemm( 'Transpose',
1334 $ 'No transpose', ks, lcols,
1335 $ nwin-ks, one, work(pdw), nwin,
1336 $ t((jloc-1)*lldt+iloc), lldt, one,
1337 $ work(ipw3), nwin )
1338*
1339* Compute U12**T*T1 + U22**T*T2 in
1340* workspace.
1341*
1342 CALL slamov( 'All', nwin-ks, lcols,
1343 $ t((jloc-1)*lldt+iloc), lldt,
1344 $ work( ipw3+ks ), nwin )
1345 CALL strmm( 'Left', 'Lower',
1346 $ 'Transpose', 'Non-unit',
1347 $ nwin-ks, lcols, one,
1348 $ work( pdw+nwin*ks ), nwin,
1349 $ work( ipw3+ks ), nwin )
1350 CALL sgemm( 'Transpose',
1351 $ 'No Transpose', nwin-ks, lcols,
1352 $ ks, one,
1353 $ work( pdw+nwin*ks+nwin-ks ),
1354 $ nwin, t((jloc-1)*lldt+iloc1),
1355 $ lldt, one, work( ipw3+ks ),
1356 $ nwin )
1357*
1358* Copy workspace to T.
1359*
1360 CALL slamov( 'All', nwin, lcols,
1361 $ work(ipw3), nwin,
1362 $ t((jloc-1)*lldt+iloc), lldt )
1363 END IF
1364 END IF
1365 indxs = iceil(lihi,nb)*nb + 1
1366 DO 260 indx = indxs, n, nb
1367 CALL infog2l( i, indx, desct, nprow,
1368 $ npcol, myrow, mycol, iloc, jloc,
1369 $ rsrc1, csrc1 )
1370 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc1 )
1371 $ THEN
1372*
1373* Compute U21**T*T2 + U11**T*T1 in
1374* workspace.
1375*
1376 iloc1 = indxg2l( i+nwin-ks, nb,
1377 $ myrow, desct( rsrc_ ), nprow )
1378 lcols = min( nb, n-indx+1 )
1379 CALL slamov( 'All', ks, lcols,
1380 $ t((jloc-1)*lldt+iloc1), lldt,
1381 $ work(ipw3), nwin )
1382 CALL strmm( 'Left', 'Upper',
1383 $ 'Transpose', 'Non-unit', ks,
1384 $ lcols, one,
1385 $ work( pdw+nwin-ks ), nwin,
1386 $ work(ipw3), nwin )
1387 CALL sgemm( 'Transpose',
1388 $ 'No transpose', ks, lcols,
1389 $ nwin-ks, one, work(pdw), nwin,
1390 $ t((jloc-1)*lldt+iloc), lldt, one,
1391 $ work(ipw3), nwin )
1392*
1393* Compute U12**T*T1 + U22**T*T2 in
1394* workspace.
1395*
1396 CALL slamov( 'All', nwin-ks, lcols,
1397 $ t((jloc-1)*lldt+iloc), lldt,
1398 $ work( ipw3+ks ), nwin )
1399 CALL strmm( 'Left', 'Lower',
1400 $ 'Transpose', 'Non-unit',
1401 $ nwin-ks, lcols, one,
1402 $ work( pdw+nwin*ks ), nwin,
1403 $ work( ipw3+ks ), nwin )
1404 CALL sgemm( 'Transpose',
1405 $ 'No Transpose', nwin-ks, lcols,
1406 $ ks, one,
1407 $ work( pdw+nwin*ks+nwin-ks ),
1408 $ nwin, t((jloc-1)*lldt+iloc1),
1409 $ lldt, one, work(ipw3+ks), nwin )
1410*
1411* Copy workspace to T.
1412*
1413 CALL slamov( 'All', nwin, lcols,
1414 $ work(ipw3), nwin,
1415 $ t((jloc-1)*lldt+iloc), lldt )
1416 END IF
1417 260 CONTINUE
1418 END IF
1419 END IF
1420 END IF
1421 ELSEIF( flops.NE.0 ) THEN
1422*
1423* Update off-diagonal blocks and Q using the pipelined
1424* elementary transformations.
1425*
1426 IF( dir.EQ.2 ) THEN
1427 DO 270 indx = 1, i-1, nb
1428 CALL infog2l( indx, i, desct, nprow, npcol,
1429 $ myrow, mycol, iloc, jloc, rsrc1, csrc1 )
1430 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
1431 lrows = min(nb,i-indx)
1432 CALL bslaapp( 1, lrows, nwin, ncb,
1433 $ t((jloc-1)*lldt+iloc ), lldt, nitraf,
1434 $ iwork(ipiw), work( ipw2 ),
1435 $ work(ipw3) )
1436 END IF
1437 270 CONTINUE
1438 IF( wantq ) THEN
1439 DO 280 indx = 1, n, nb
1440 CALL infog2l( indx, i, descq, nprow, npcol,
1441 $ myrow, mycol, iloc, jloc, rsrc1, csrc1 )
1442 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 )
1443 $ THEN
1444 lrows = min(nb,n-indx+1)
1445 CALL bslaapp( 1, lrows, nwin, ncb,
1446 $ q((jloc-1)*lldq+iloc), lldq, nitraf,
1447 $ iwork(ipiw), work( ipw2 ),
1448 $ work(ipw3) )
1449 END IF
1450 280 CONTINUE
1451 END IF
1452 END IF
1453 IF( dir.EQ.1 ) THEN
1454 IF( lihi.LT.n ) THEN
1455 IF( mod(lihi,nb).GT.0 ) THEN
1456 indx = lihi + 1
1457 CALL infog2l( i, indx, desct, nprow, npcol,
1458 $ myrow, mycol, iloc, jloc, rsrc1, csrc1 )
1459 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 )
1460 $ THEN
1461 lcols = mod( min( nb-mod(lihi,nb),
1462 $ n-lihi ), nb )
1463 CALL bslaapp( 0, nwin, lcols, ncb,
1464 $ t((jloc-1)*lldt+iloc), lldt, nitraf,
1465 $ iwork(ipiw), work( ipw2 ),
1466 $ work(ipw3) )
1467 END IF
1468 END IF
1469 indxs = iceil(lihi,nb)*nb + 1
1470 DO 290 indx = indxs, n, nb
1471 CALL infog2l( i, indx, desct, nprow, npcol,
1472 $ myrow, mycol, iloc, jloc, rsrc1, csrc1 )
1473 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 )
1474 $ THEN
1475 lcols = min( nb, n-indx+1 )
1476 CALL bslaapp( 0, nwin, lcols, ncb,
1477 $ t((jloc-1)*lldt+iloc), lldt, nitraf,
1478 $ iwork(ipiw), work( ipw2 ),
1479 $ work(ipw3) )
1480 END IF
1481 290 CONTINUE
1482 END IF
1483 END IF
1484 END IF
1485*
1486* If I was not involved in the updates for the current
1487* window or the window was fully processed, I go here and
1488* try again for the next window.
1489*
1490 295 CONTINUE
1491*
1492* Update LIHI and LIHI depending on the number of
1493* eigenvalues really moved - for on-diagonal processes we
1494* do this update only once since each on-diagonal process
1495* is only involved with one window at one time. The
1496* indicies are updated in three cases:
1497* 1) When some reordering was really performed
1498* -- indicated by BUFFLEN > 0.
1499* 2) When no selected eigenvalues was found in the
1500* current window -- indicated by KS = 0.
1501* 3) When some selected eigenvalues was found in the
1502* current window but no one of them was moved
1503* (KS > 0 and BUFFLEN = 0)
1504* False index updating is avoided by sometimes setting
1505* KS = -1. This will affect processors involved in more
1506* than one window and where the first one ends up with
1507* KS = 0 and for the second one is done already.
1508*
1509 IF( myrow.EQ.rsrc.AND.mycol.EQ.csrc ) THEN
1510 IF( dir.EQ.2 ) THEN
1511 IF( bufflen.NE.0 .OR. ks.EQ.0 .OR.
1512 $ ( bufflen.EQ.0 .AND. ks.GT.0 ) )
1513 $ lihi = i + ks - 1
1514 iwork( ilihi+window-1 ) = lihi
1515 IF( .NOT. lihi.GE.lilo+lsel ) THEN
1516 lilo = lilo + lsel
1517 iwork( ililo+window-1 ) = lilo
1518 END IF
1519 END IF
1520 ELSEIF( myrow.EQ.rsrc .AND. dir.EQ.1 ) THEN
1521 IF( bufflen.NE.0 .OR. ks.EQ.0 .OR.
1522 $ ( bufflen.EQ.0 .AND. ks.GT.0 ) )
1523 $ lihi = i + ks - 1
1524 iwork( ilihi+window-1 ) = lihi
1525 IF( .NOT. lihi.GE.lilo+lsel ) THEN
1526 lilo = lilo + lsel
1527 iwork( ililo+window-1 ) = lilo
1528 END IF
1529 ELSEIF( mycol.EQ.csrc .AND. dir.EQ.2 ) THEN
1530 IF( bufflen.NE.0 .OR. ks.EQ.0 .OR.
1531 $ ( bufflen.EQ.0 .AND. ks.GT.0 ) )
1532 $ lihi = i + ks - 1
1533 iwork( ilihi+window-1 ) = lihi
1534 IF( .NOT. lihi.GE.lilo+lsel ) THEN
1535 lilo = lilo + lsel
1536 iwork( ililo+window-1 ) = lilo
1537 END IF
1538 END IF
1539*
1540 112 CONTINUE
1541*
1542* End of direction loop for updates with respect to local
1543* reordering.
1544*
1545 1111 CONTINUE
1546*
1547* Associate each process with one of the corresponding
1548* computational windows such that the test for another round
1549* of local reordering is carried out properly. Since the
1550* column updates were computed after the row updates, it is
1551* sufficient to test for changing the association to the
1552* window in the corresponding process row.
1553*
1554 DO 113 window = 1, nmwin2
1555 rsrc = iwork( irsrc + window - 1 )
1556 IF( myrow.EQ.rsrc .AND. (.NOT. lihi.GE.lilo+lsel ) ) THEN
1557 lilo = iwork( ililo + window - 1 )
1558 lihi = iwork( ilihi + window - 1 )
1559 lsel = iwork( ilsel + window - 1 )
1560 END IF
1561 113 CONTINUE
1562*
1563* End While ( LIHI >= LILO + LSEL )
1564 round = round + 1
1565 IF( first ) first = .false.
1566 GO TO 130
1567 END IF
1568*
1569* All processors excluded from the local reordering go here.
1570*
1571 114 CONTINUE
1572*
1573* Barrier to collect the processes before proceeding.
1574*
1575 CALL blacs_barrier( ictxt, 'All' )
1576*
1577* Compute global maximum of IERR so that we know if some process
1578* experienced a failure in the reordering.
1579*
1580 myierr = ierr
1581 IF( nprocs.GT.1 ) THEN
1582 CALL igamx2d( ictxt, 'All', top, 1, 1, ierr, 1, -1,
1583 $ -1, -1, -1, -1 )
1584 END IF
1585*
1586 IF( ierr.NE.0 ) THEN
1587*
1588* When calling BDTREXC, the block at position I+KKS-1 failed
1589* to swap.
1590*
1591 IF( myierr.NE.0 ) info = max(1,i+kks-1)
1592 IF( nprocs.GT.1 ) THEN
1593 CALL igamx2d( ictxt, 'All', top, 1, 1, info, 1, -1,
1594 $ -1, -1, -1, -1 )
1595 END IF
1596 GO TO 300
1597 END IF
1598*
1599* Now, for each compuational window, move the selected
1600* eigenvalues across the process border. Do this by forming the
1601* processors into groups of four working together to bring the
1602* window over the border. The processes are numbered as follows
1603*
1604* 1 | 2
1605* --+--
1606* 3 | 4
1607*
1608* where '|' and '-' denotes the process (and block) borders.
1609* This implies that the cluster to be reordered over the border
1610* is held by process 4, process 1 will receive the cluster after
1611* the reordering, process 3 holds the local (2,1)th element of a
1612* 2-by-2 diagonal block located on the block border and process 2
1613* holds the closest off-diagonal part of the window that is
1614* affected by the cross-border reordering.
1615*
1616* The active window is now ( I : LIHI[4], I : LIHI[4] ), where
1617* I = MAX( ILO, LIHI - 2*MOD(LIHI,NB) ). If this active window is
1618* too large compared to the value of PARA( 6 ), it will be
1619* truncated in both ends such that a maximum of PARA( 6 )
1620* eigenvalues is reordered across the border this time.
1621*
1622* The active window will be collected and built in workspace at
1623* process 1 and 4, which both compute the reordering and return
1624* the updated parts to the corresponding processes 2-3. Next, the
1625* accumulated transformations are broadcasted for updates in the
1626* block rows and column that corresponds to the process rows and
1627* columns where process 1 and 4 reside.
1628*
1629* The off-diagonal blocks are updated by the processes receiving
1630* from the broadcasts of the orthogonal transformations. Since
1631* the active window is split over the process borders, the
1632* updates of T and Q requires that stripes of block rows of
1633* columns are exchanged between neighboring processes in the
1634* corresponding process rows and columns.
1635*
1636* First, form each group of processors involved in the
1637* crossborder reordering. Do this in two (or three) phases:
1638* 1) Reorder each odd window over the border.
1639* 2) Reorder each even window over the border.
1640* 3) Reorder the last odd window over the border, if it was not
1641* processed in the first phase.
1642*
1643* When reordering the odd windows over the border, we must make
1644* sure that no process row or column is involved in both the
1645* first and the last window at the same time. This happens when
1646* the total number of windows is odd, greater than one and equal
1647* to the minumum process mesh dimension. Therefore the last odd
1648* window may be reordered over the border at last.
1649*
1650 lastwait = nmwin2.GT.1 .AND. mod(nmwin2,2).EQ.1 .AND.
1651 $ nmwin2.EQ.min(nprow,npcol)
1652*
1653 last = 0
1654 308 CONTINUE
1655 IF( lastwait ) THEN
1656 IF( last.EQ.0 ) THEN
1657 win0s = 1
1658 win0e = 2
1659 wine = nmwin2 - 1
1660 ELSE
1661 win0s = nmwin2
1662 win0e = nmwin2
1663 wine = nmwin2
1664 END IF
1665 ELSE
1666 win0s = 1
1667 win0e = 2
1668 wine = nmwin2
1669 END IF
1670 DO 310 window0 = win0s, win0e
1671 DO 320 window = window0, wine, 2
1672*
1673* Define the process holding the down-right part of the
1674* window.
1675*
1676 rsrc4 = iwork(irsrc+window-1)
1677 csrc4 = iwork(icsrc+window-1)
1678*
1679* Define the other processes in the group of four.
1680*
1681 rsrc3 = rsrc4
1682 csrc3 = mod( csrc4 - 1 + npcol, npcol )
1683 rsrc2 = mod( rsrc4 - 1 + nprow, nprow )
1684 csrc2 = csrc4
1685 rsrc1 = rsrc2
1686 csrc1 = csrc3
1687 IF( ( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) .OR.
1688 $ ( myrow.EQ.rsrc2 .AND. mycol.EQ.csrc2 ) .OR.
1689 $ ( myrow.EQ.rsrc3 .AND. mycol.EQ.csrc3 ) .OR.
1690 $ ( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) ) THEN
1691*
1692* Compute the correct active window - for reordering
1693* into a block that has not been active at all before,
1694* we try to reorder as many of our eigenvalues over the
1695* border as possible without knowing of the situation on
1696* the other side - this may cause very few eigenvalues
1697* to be reordered over the border this time (perhaps not
1698* any) but this should be an initial problem. Anyway,
1699* the bottom-right position of the block will be at
1700* position LIHIC.
1701*
1702 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
1703 lihi4 = ( iwork( ililo + window - 1 ) +
1704 $ iwork( ilihi + window - 1 ) ) / 2
1705 lihic = min(lihi4,(iceil(lihi4,nb)-1)*nb+wneicr)
1706*
1707* Fix LIHIC to avoid that bottom of window cuts
1708* 2-by-2 block and make sure all processors in the
1709* group knows about the correct value.
1710*
1711 IF( (.NOT. lihic.LE.nb) .AND. lihic.LT.n ) THEN
1712 iloc = indxg2l( lihic+1, nb, myrow,
1713 $ desct( rsrc_ ), nprow )
1714 jloc = indxg2l( lihic, nb, mycol,
1715 $ desct( csrc_ ), npcol )
1716 IF( t( (jloc-1)*lldt+iloc ).NE.zero ) THEN
1717 IF( mod( lihic, nb ).EQ.1 .OR.
1718 $ ( mod( lihic, nb ).EQ.2 .AND.
1719 $ SELECT(lihic-2).EQ.0 ) )
1720 $ THEN
1721 lihic = lihic + 1
1722 ELSE
1723 lihic = lihic - 1
1724 END IF
1725 END IF
1726 END IF
1727 IF( rsrc4.NE.rsrc1 .OR. csrc4.NE.csrc1 )
1728 $ CALL igesd2d( ictxt, 1, 1, lihic, 1, rsrc1,
1729 $ csrc1 )
1730 IF( rsrc4.NE.rsrc2 .OR. csrc4.NE.csrc2 )
1731 $ CALL igesd2d( ictxt, 1, 1, lihic, 1, rsrc2,
1732 $ csrc2 )
1733 IF( rsrc4.NE.rsrc3 .OR. csrc4.NE.csrc3 )
1734 $ CALL igesd2d( ictxt, 1, 1, lihic, 1, rsrc3,
1735 $ csrc3 )
1736 END IF
1737 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
1738 IF( rsrc4.NE.rsrc1 .OR. csrc4.NE.csrc1 )
1739 $ CALL igerv2d( ictxt, 1, 1, lihic, 1, rsrc4,
1740 $ csrc4 )
1741 END IF
1742 IF( myrow.EQ.rsrc2 .AND. mycol.EQ.csrc2 ) THEN
1743 IF( rsrc4.NE.rsrc2 .OR. csrc4.NE.csrc2 )
1744 $ CALL igerv2d( ictxt, 1, 1, lihic, 1, rsrc4,
1745 $ csrc4 )
1746 END IF
1747 IF( myrow.EQ.rsrc3 .AND. mycol.EQ.csrc3 ) THEN
1748 IF( rsrc4.NE.rsrc3 .OR. csrc4.NE.csrc3 )
1749 $ CALL igerv2d( ictxt, 1, 1, lihic, 1, rsrc4,
1750 $ csrc4 )
1751 END IF
1752*
1753* Avoid going over the border with the first window if
1754* it resides in the block where the last global position
1755* T(ILO,ILO) is or ILO has been updated to point to a
1756* position right of T(LIHIC,LIHIC).
1757*
1758 skip1cr = window.EQ.1 .AND.
1759 $ iceil(lihic,nb).LE.iceil(ilo,nb)
1760*
1761* Decide I, where to put top of window, such that top of
1762* window does not cut 2-by-2 block. Make sure that we do
1763* not end up in a situation where a 2-by-2 block
1764* splitted on the border is left in its original place
1765* -- this can cause infinite loops.
1766* Remedy: make sure that the part of the window that
1767* resides left to the border is at least of dimension
1768* two (2) in case we have 2-by-2 blocks in top of the
1769* cross border window.
1770*
1771* Also make sure all processors in the group knows about
1772* the correct value of I. When skipping the crossborder
1773* reordering, just set I = LIHIC.
1774*
1775 IF( .NOT. skip1cr ) THEN
1776 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
1777 IF( window.EQ.1 ) THEN
1778 lihi1 = ilo
1779 ELSE
1780 lihi1 = iwork( ilihi + window - 2 )
1781 END IF
1782 i = max( lihi1,
1783 $ min( lihic-2*mod(lihic,nb) + 1,
1784 $ (iceil(lihic,nb)-1)*nb - 1 ) )
1785 iloc = indxg2l( i, nb, myrow, desct( rsrc_ ),
1786 $ nprow )
1787 jloc = indxg2l( i-1, nb, mycol, desct( csrc_ ),
1788 $ npcol )
1789 IF( t( (jloc-1)*lldt+iloc ).NE.zero )
1790 $ i = i - 1
1791 IF( rsrc1.NE.rsrc4 .OR. csrc1.NE.csrc4 )
1792 $ CALL igesd2d( ictxt, 1, 1, i, 1, rsrc4,
1793 $ csrc4 )
1794 IF( rsrc1.NE.rsrc2 .OR. csrc1.NE.csrc2 )
1795 $ CALL igesd2d( ictxt, 1, 1, i, 1, rsrc2,
1796 $ csrc2 )
1797 IF( rsrc1.NE.rsrc3 .OR. csrc1.NE.csrc3 )
1798 $ CALL igesd2d( ictxt, 1, 1, i, 1, rsrc3,
1799 $ csrc3 )
1800 END IF
1801 IF( myrow.EQ.rsrc2 .AND. mycol.EQ.csrc2 ) THEN
1802 IF( rsrc1.NE.rsrc2 .OR. csrc1.NE.csrc2 )
1803 $ CALL igerv2d( ictxt, 1, 1, i, 1, rsrc1,
1804 $ csrc1 )
1805 END IF
1806 IF( myrow.EQ.rsrc3 .AND. mycol.EQ.csrc3 ) THEN
1807 IF( rsrc1.NE.rsrc3 .OR. csrc1.NE.csrc3 )
1808 $ CALL igerv2d( ictxt, 1, 1, i, 1, rsrc1,
1809 $ csrc1 )
1810 END IF
1811 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
1812 IF( rsrc1.NE.rsrc4 .OR. csrc1.NE.csrc4 )
1813 $ CALL igerv2d( ictxt, 1, 1, i, 1, rsrc1,
1814 $ csrc1 )
1815 END IF
1816 ELSE
1817 i = lihic
1818 END IF
1819*
1820* Finalize computation of window size: active window is
1821* now (I:LIHIC,I:LIHIC).
1822*
1823 nwin = lihic - i + 1
1824 ks = 0
1825*
1826* Skip rest of this part if appropriate.
1827*
1828 IF( skip1cr ) GO TO 360
1829*
1830* Divide workspace -- put active window in
1831* WORK(IPW2:IPW2+NWIN**2-1) and orthogonal
1832* transformations in WORK(IPW3:...).
1833*
1834 CALL slaset( 'All', nwin, nwin, zero, zero,
1835 $ work( ipw2 ), nwin )
1836*
1837 pitraf = ipiw
1838 ipw3 = ipw2 + nwin*nwin
1839 pdtraf = ipw3
1840*
1841* Exchange the current view of SELECT for the active
1842* window between process 1 and 4 to make sure that
1843* exactly the same job is performed for both processes.
1844*
1845 IF( rsrc1.NE.rsrc4 .OR. csrc1.NE.csrc4 ) THEN
1846 ilen4 = mod(lihic,nb)
1847 seli4 = iceil(i,nb)*nb+1
1848 ilen1 = nwin - ilen4
1849 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
1850 CALL igesd2d( ictxt, ilen1, 1, SELECT(i),
1851 $ ilen1, rsrc4, csrc4 )
1852 CALL igerv2d( ictxt, ilen4, 1, SELECT(seli4),
1853 $ ilen4, rsrc4, csrc4 )
1854 END IF
1855 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
1856 CALL igesd2d( ictxt, ilen4, 1, SELECT(seli4),
1857 $ ilen4, rsrc1, csrc1 )
1858 CALL igerv2d( ictxt, ilen1, 1, SELECT(i),
1859 $ ilen1, rsrc1, csrc1 )
1860 END IF
1861 END IF
1862*
1863* Form the active window by a series of point-to-point
1864* sends and receives.
1865*
1866 dim1 = nb - mod(i-1,nb)
1867 dim4 = nwin - dim1
1868 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
1869 iloc = indxg2l( i, nb, myrow, desct( rsrc_ ),
1870 $ nprow )
1871 jloc = indxg2l( i, nb, mycol, desct( csrc_ ),
1872 $ npcol )
1873 CALL slamov( 'All', dim1, dim1,
1874 $ t((jloc-1)*lldt+iloc), lldt, work(ipw2),
1875 $ nwin )
1876 IF( rsrc1.NE.rsrc4 .OR. csrc1.NE.csrc4 ) THEN
1877 CALL sgesd2d( ictxt, dim1, dim1,
1878 $ work(ipw2), nwin, rsrc4, csrc4 )
1879 CALL sgerv2d( ictxt, dim4, dim4,
1880 $ work(ipw2+dim1*nwin+dim1), nwin, rsrc4,
1881 $ csrc4 )
1882 END IF
1883 END IF
1884 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
1885 iloc = indxg2l( i+dim1, nb, myrow, desct( rsrc_ ),
1886 $ nprow )
1887 jloc = indxg2l( i+dim1, nb, mycol, desct( csrc_ ),
1888 $ npcol )
1889 CALL slamov( 'All', dim4, dim4,
1890 $ t((jloc-1)*lldt+iloc), lldt,
1891 $ work(ipw2+dim1*nwin+dim1), nwin )
1892 IF( rsrc4.NE.rsrc1 .OR. csrc4.NE.csrc1 ) THEN
1893 CALL sgesd2d( ictxt, dim4, dim4,
1894 $ work(ipw2+dim1*nwin+dim1), nwin, rsrc1,
1895 $ csrc1 )
1896 CALL sgerv2d( ictxt, dim1, dim1,
1897 $ work(ipw2), nwin, rsrc1, csrc1 )
1898 END IF
1899 END IF
1900 IF( myrow.EQ.rsrc2 .AND. mycol.EQ.csrc2 ) THEN
1901 iloc = indxg2l( i, nb, myrow, desct( rsrc_ ),
1902 $ nprow )
1903 jloc = indxg2l( i+dim1, nb, mycol, desct( csrc_ ),
1904 $ npcol )
1905 CALL slamov( 'All', dim1, dim4,
1906 $ t((jloc-1)*lldt+iloc), lldt,
1907 $ work(ipw2+dim1*nwin), nwin )
1908 IF( rsrc2.NE.rsrc1 .OR. csrc2.NE.csrc1 ) THEN
1909 CALL sgesd2d( ictxt, dim1, dim4,
1910 $ work(ipw2+dim1*nwin), nwin, rsrc1, csrc1 )
1911 END IF
1912 END IF
1913 IF( myrow.EQ.rsrc2 .AND. mycol.EQ.csrc2 ) THEN
1914 IF( rsrc2.NE.rsrc4 .OR. csrc2.NE.csrc4 ) THEN
1915 CALL sgesd2d( ictxt, dim1, dim4,
1916 $ work(ipw2+dim1*nwin), nwin, rsrc4, csrc4 )
1917 END IF
1918 END IF
1919 IF( myrow.EQ.rsrc3 .AND. mycol.EQ.csrc3 ) THEN
1920 iloc = indxg2l( i+dim1, nb, myrow, desct( rsrc_ ),
1921 $ nprow )
1922 jloc = indxg2l( i+dim1-1, nb, mycol,
1923 $ desct( csrc_ ), npcol )
1924 CALL slamov( 'All', 1, 1,
1925 $ t((jloc-1)*lldt+iloc), lldt,
1926 $ work(ipw2+(dim1-1)*nwin+dim1), nwin )
1927 IF( rsrc3.NE.rsrc1 .OR. csrc3.NE.csrc1 ) THEN
1928 CALL sgesd2d( ictxt, 1, 1,
1929 $ work(ipw2+(dim1-1)*nwin+dim1), nwin,
1930 $ rsrc1, csrc1 )
1931 END IF
1932 END IF
1933 IF( myrow.EQ.rsrc3 .AND. mycol.EQ.csrc3 ) THEN
1934 IF( rsrc3.NE.rsrc4 .OR. csrc3.NE.csrc4 ) THEN
1935 CALL sgesd2d( ictxt, 1, 1,
1936 $ work(ipw2+(dim1-1)*nwin+dim1), nwin,
1937 $ rsrc4, csrc4 )
1938 END IF
1939 END IF
1940 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
1941 IF( rsrc1.NE.rsrc2 .OR. csrc1.NE.csrc2 ) THEN
1942 CALL sgerv2d( ictxt, dim1, dim4,
1943 $ work(ipw2+dim1*nwin), nwin, rsrc2,
1944 $ csrc2 )
1945 END IF
1946 IF( rsrc1.NE.rsrc3 .OR. csrc1.NE.csrc3 ) THEN
1947 CALL sgerv2d( ictxt, 1, 1,
1948 $ work(ipw2+(dim1-1)*nwin+dim1), nwin,
1949 $ rsrc3, csrc3 )
1950 END IF
1951 END IF
1952 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
1953 IF( rsrc4.NE.rsrc2 .OR. csrc4.NE.csrc2 ) THEN
1954 CALL sgerv2d( ictxt, dim1, dim4,
1955 $ work(ipw2+dim1*nwin), nwin, rsrc2,
1956 $ csrc2 )
1957 END IF
1958 IF( rsrc4.NE.rsrc3 .OR. csrc4.NE.csrc3 ) THEN
1959 CALL sgerv2d( ictxt, 1, 1,
1960 $ work(ipw2+(dim1-1)*nwin+dim1), nwin,
1961 $ rsrc3, csrc3 )
1962 END IF
1963 END IF
1964*
1965* Compute the reordering (just as in the total local
1966* case) and accumulate the transformations (ONLY
1967* ON-DIAGONAL PROCESSES).
1968*
1969 IF( ( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) .OR.
1970 $ ( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) ) THEN
1971 pair = .false.
1972 DO 330 k = i, lihic
1973 IF( pair ) THEN
1974 pair = .false.
1975 ELSE
1976 swap = SELECT( k ).NE.0
1977 IF( k.LT.lihic ) THEN
1978 elem = work(ipw2+(k-i)*nwin+k-i+1)
1979 IF( elem.NE.zero )
1980 $ pair = .true.
1981 END IF
1982 IF( swap ) THEN
1983 ks = ks + 1
1984*
1985* Swap the K-th block to position I+KS-1.
1986*
1987 ierr = 0
1988 kk = k - i + 1
1989 kks = ks
1990 IF( kk.NE.ks ) THEN
1991 nitraf = liwork - pitraf + 1
1992 ndtraf = lwork - pdtraf + 1
1993 CALL bstrexc( nwin, work(ipw2), nwin,
1994 $ kk, kks, nitraf, iwork( pitraf ),
1995 $ ndtraf, work( pdtraf ),
1996 $ work(ipw1), ierr )
1997 pitraf = pitraf + nitraf
1998 pdtraf = pdtraf + ndtraf
1999*
2000* Update array SELECT.
2001*
2002 IF ( pair ) THEN
2003 DO 340 j = i+kk-1, i+kks, -1
2004 SELECT(j+1) = SELECT(j-1)
2005 340 CONTINUE
2006 SELECT(i+kks-1) = 1
2007 SELECT(i+kks) = 1
2008 ELSE
2009 DO 350 j = i+kk-1, i+kks, -1
2010 SELECT(j) = SELECT(j-1)
2011 350 CONTINUE
2012 SELECT(i+kks-1) = 1
2013 END IF
2014*
2015 IF ( ierr.EQ.1 .OR. ierr.EQ.2 ) THEN
2016*
2017 IF ( ierr.EQ.2 ) THEN
2018 SELECT( i+kks-3 ) = 1
2019 SELECT( i+kks-1 ) = 0
2020 kks = kks + 1
2021 END IF
2022*
2023 GO TO 360
2024 END IF
2025 ks = kks
2026 END IF
2027 IF( pair )
2028 $ ks = ks + 1
2029 END IF
2030 END IF
2031 330 CONTINUE
2032 END IF
2033 360 CONTINUE
2034*
2035* Save information about the reordering.
2036*
2037 IF( ( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) .OR.
2038 $ ( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) ) THEN
2039 ibuff( 1 ) = i
2040 ibuff( 2 ) = nwin
2041 ibuff( 3 ) = pitraf
2042 ibuff( 4 ) = ks
2043 ibuff( 5 ) = pdtraf
2044 ibuff( 6 ) = ndtraf
2045 ilen = pitraf - ipiw + 1
2046 dlen = pdtraf - ipw3 + 1
2047 ibuff( 7 ) = ilen
2048 ibuff( 8 ) = dlen
2049*
2050* Put reordered data back into global matrix if a
2051* reordering took place.
2052*
2053 IF( .NOT. skip1cr ) THEN
2054 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
2055 iloc = indxg2l( i, nb, myrow, desct( rsrc_ ),
2056 $ nprow )
2057 jloc = indxg2l( i, nb, mycol, desct( csrc_ ),
2058 $ npcol )
2059 CALL slamov( 'All', dim1, dim1, work(ipw2),
2060 $ nwin, t((jloc-1)*lldt+iloc), lldt )
2061 END IF
2062 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
2063 iloc = indxg2l( i+dim1, nb, myrow,
2064 $ desct( rsrc_ ), nprow )
2065 jloc = indxg2l( i+dim1, nb, mycol,
2066 $ desct( csrc_ ), npcol )
2067 CALL slamov( 'All', dim4, dim4,
2068 $ work(ipw2+dim1*nwin+dim1), nwin,
2069 $ t((jloc-1)*lldt+iloc), lldt )
2070 END IF
2071 END IF
2072 END IF
2073*
2074* Break if appropriate -- IBUFF(3:8) may now contain
2075* nonsens, but that's no problem. The processors outside
2076* the cross border group only needs to know about I and
2077* NWIN to get a correct value of SKIP1CR (see below) and
2078* to skip the cross border updates if necessary.
2079*
2080 IF( window.EQ.1 .AND. skip1cr ) GO TO 325
2081*
2082* Return reordered data to process 2 and 3.
2083*
2084 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
2085 IF( rsrc1.NE.rsrc3 .OR. csrc1.NE.csrc3 ) THEN
2086 CALL sgesd2d( ictxt, 1, 1,
2087 $ work( ipw2+(dim1-1)*nwin+dim1 ), nwin,
2088 $ rsrc3, csrc3 )
2089 END IF
2090 END IF
2091 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
2092 IF( rsrc4.NE.rsrc2 .OR. csrc4.NE.csrc2 ) THEN
2093 CALL sgesd2d( ictxt, dim1, dim4,
2094 $ work( ipw2+dim1*nwin), nwin, rsrc2,
2095 $ csrc2 )
2096 END IF
2097 END IF
2098 IF( myrow.EQ.rsrc2 .AND. mycol.EQ.csrc2 ) THEN
2099 iloc = indxg2l( i, nb, myrow, desct( rsrc_ ),
2100 $ nprow )
2101 jloc = indxg2l( i+dim1, nb, mycol,
2102 $ desct( csrc_ ), npcol )
2103 IF( rsrc2.NE.rsrc4 .OR. csrc2.NE.csrc4 ) THEN
2104 CALL sgerv2d( ictxt, dim1, dim4,
2105 $ work(ipw2+dim1*nwin), nwin, rsrc4, csrc4 )
2106 END IF
2107 CALL slamov( 'All', dim1, dim4,
2108 $ work( ipw2+dim1*nwin ), nwin,
2109 $ t((jloc-1)*lldt+iloc), lldt )
2110 END IF
2111 IF( myrow.EQ.rsrc3 .AND. mycol.EQ.csrc3 ) THEN
2112 iloc = indxg2l( i+dim1, nb, myrow,
2113 $ desct( rsrc_ ), nprow )
2114 jloc = indxg2l( i+dim1-1, nb, mycol,
2115 $ desct( csrc_ ), npcol )
2116 IF( rsrc3.NE.rsrc1 .OR. csrc3.NE.csrc1 ) THEN
2117 CALL sgerv2d( ictxt, 1, 1,
2118 $ work( ipw2+(dim1-1)*nwin+dim1 ), nwin,
2119 $ rsrc1, csrc1 )
2120 END IF
2121 t((jloc-1)*lldt+iloc) =
2122 $ work( ipw2+(dim1-1)*nwin+dim1 )
2123 END IF
2124 END IF
2125*
2126 325 CONTINUE
2127*
2128 320 CONTINUE
2129*
2130* For the crossborder updates, we use the same directions as
2131* in the local reordering case above.
2132*
2133 DO 2222 dir = 1, 2
2134*
2135* Broadcast information about the reordering.
2136*
2137 DO 321 window = window0, wine, 2
2138 rsrc4 = iwork(irsrc+window-1)
2139 csrc4 = iwork(icsrc+window-1)
2140 rsrc1 = mod( rsrc4 - 1 + nprow, nprow )
2141 csrc1 = mod( csrc4 - 1 + npcol, npcol )
2142 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
2143 IF( npcol.GT.1 .AND. dir.EQ.1 )
2144 $ CALL igebs2d( ictxt, 'Row', top, 8, 1,
2145 $ ibuff, 8 )
2146 IF( nprow.GT.1 .AND. dir.EQ.2 )
2147 $ CALL igebs2d( ictxt, 'Col', top, 8, 1,
2148 $ ibuff, 8 )
2149 skip1cr = window.EQ.1 .AND.
2150 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2151 ELSEIF( myrow.EQ.rsrc1 .OR. mycol.EQ.csrc1 ) THEN
2152 IF( npcol.GT.1 .AND. dir.EQ.1 .AND.
2153 $ myrow.EQ.rsrc1 ) THEN
2154 CALL igebr2d( ictxt, 'Row', top, 8, 1,
2155 $ ibuff, 8, rsrc1, csrc1 )
2156 i = ibuff( 1 )
2157 nwin = ibuff( 2 )
2158 pitraf = ibuff( 3 )
2159 ks = ibuff( 4 )
2160 pdtraf = ibuff( 5 )
2161 ndtraf = ibuff( 6 )
2162 ilen = ibuff( 7 )
2163 dlen = ibuff( 8 )
2164 bufflen = ilen + dlen
2165 ipw3 = ipw2 + nwin*nwin
2166 dim1 = nb - mod(i-1,nb)
2167 dim4 = nwin - dim1
2168 lihic = nwin + i - 1
2169 skip1cr = window.EQ.1 .AND.
2170 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2171 END IF
2172 IF( nprow.GT.1 .AND. dir.EQ.2 .AND.
2173 $ mycol.EQ.csrc1 ) THEN
2174 CALL igebr2d( ictxt, 'Col', top, 8, 1,
2175 $ ibuff, 8, rsrc1, csrc1 )
2176 i = ibuff( 1 )
2177 nwin = ibuff( 2 )
2178 pitraf = ibuff( 3 )
2179 ks = ibuff( 4 )
2180 pdtraf = ibuff( 5 )
2181 ndtraf = ibuff( 6 )
2182 ilen = ibuff( 7 )
2183 dlen = ibuff( 8 )
2184 bufflen = ilen + dlen
2185 ipw3 = ipw2 + nwin*nwin
2186 dim1 = nb - mod(i-1,nb)
2187 dim4 = nwin - dim1
2188 lihic = nwin + i - 1
2189 skip1cr = window.EQ.1 .AND.
2190 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2191 END IF
2192 END IF
2193 IF( rsrc1.NE.rsrc4 ) THEN
2194 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
2195 IF( npcol.GT.1 .AND. dir.EQ.1 )
2196 $ CALL igebs2d( ictxt, 'Row', top, 8, 1,
2197 $ ibuff, 8 )
2198 skip1cr = window.EQ.1 .AND.
2199 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2200 ELSEIF( myrow.EQ.rsrc4 ) THEN
2201 IF( npcol.GT.1 .AND. dir.EQ.1 ) THEN
2202 CALL igebr2d( ictxt, 'Row', top, 8, 1,
2203 $ ibuff, 8, rsrc4, csrc4 )
2204 i = ibuff( 1 )
2205 nwin = ibuff( 2 )
2206 pitraf = ibuff( 3 )
2207 ks = ibuff( 4 )
2208 pdtraf = ibuff( 5 )
2209 ndtraf = ibuff( 6 )
2210 ilen = ibuff( 7 )
2211 dlen = ibuff( 8 )
2212 bufflen = ilen + dlen
2213 ipw3 = ipw2 + nwin*nwin
2214 dim1 = nb - mod(i-1,nb)
2215 dim4 = nwin - dim1
2216 lihic = nwin + i - 1
2217 skip1cr = window.EQ.1 .AND.
2218 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2219 END IF
2220 END IF
2221 END IF
2222 IF( csrc1.NE.csrc4 ) THEN
2223 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
2224 IF( nprow.GT.1 .AND. dir.EQ.2 )
2225 $ CALL igebs2d( ictxt, 'Col', top, 8, 1,
2226 $ ibuff, 8 )
2227 skip1cr = window.EQ.1 .AND.
2228 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2229 ELSEIF( mycol.EQ.csrc4 ) THEN
2230 IF( nprow.GT.1 .AND. dir.EQ.2 ) THEN
2231 CALL igebr2d( ictxt, 'Col', top, 8, 1,
2232 $ ibuff, 8, rsrc4, csrc4 )
2233 i = ibuff( 1 )
2234 nwin = ibuff( 2 )
2235 pitraf = ibuff( 3 )
2236 ks = ibuff( 4 )
2237 pdtraf = ibuff( 5 )
2238 ndtraf = ibuff( 6 )
2239 ilen = ibuff( 7 )
2240 dlen = ibuff( 8 )
2241 bufflen = ilen + dlen
2242 ipw3 = ipw2 + nwin*nwin
2243 dim1 = nb - mod(i-1,nb)
2244 dim4 = nwin - dim1
2245 lihic = nwin + i - 1
2246 skip1cr = window.EQ.1 .AND.
2247 $ iceil(lihic,nb).LE.iceil(ilo,nb)
2248 END IF
2249 END IF
2250 END IF
2251*
2252* Skip rest of broadcasts and updates if appropriate.
2253*
2254 IF( skip1cr ) GO TO 326
2255*
2256* Broadcast the orthogonal transformations.
2257*
2258 IF( myrow.EQ.rsrc1 .AND. mycol.EQ.csrc1 ) THEN
2259 buffer = pdtraf
2260 bufflen = dlen + ilen
2261 IF( (nprow.GT.1 .AND. dir.EQ.2) .OR.
2262 $ (npcol.GT.1 .AND. dir.EQ.1) ) THEN
2263 DO 370 indx = 1, ilen
2264 work( buffer+indx-1 ) =
2265 $ float( iwork(ipiw+indx-1) )
2266 370 CONTINUE
2267 CALL slamov( 'All', dlen, 1, work( ipw3 ),
2268 $ dlen, work(buffer+ilen), dlen )
2269 END IF
2270 IF( npcol.GT.1 .AND. dir.EQ.1 ) THEN
2271 CALL sgebs2d( ictxt, 'Row', top, bufflen, 1,
2272 $ work(buffer), bufflen )
2273 END IF
2274 IF( nprow.GT.1 .AND. dir.EQ.2 ) THEN
2275 CALL sgebs2d( ictxt, 'Col', top, bufflen, 1,
2276 $ work(buffer), bufflen )
2277 END IF
2278 ELSEIF( myrow.EQ.rsrc1 .OR. mycol.EQ.csrc1 ) THEN
2279 IF( npcol.GT.1 .AND. dir.EQ.1 .AND.
2280 $ myrow.EQ.rsrc1 ) THEN
2281 buffer = pdtraf
2282 bufflen = dlen + ilen
2283 CALL sgebr2d( ictxt, 'Row', top, bufflen, 1,
2284 $ work(buffer), bufflen, rsrc1, csrc1 )
2285 END IF
2286 IF( nprow.GT.1 .AND. dir.EQ.2 .AND.
2287 $ mycol.EQ.csrc1 ) THEN
2288 buffer = pdtraf
2289 bufflen = dlen + ilen
2290 CALL sgebr2d( ictxt, 'Col', top, bufflen, 1,
2291 $ work(buffer), bufflen, rsrc1, csrc1 )
2292 END IF
2293 IF( (npcol.GT.1.AND.dir.EQ.1.AND.myrow.EQ.rsrc1)
2294 $ .OR. (nprow.GT.1.AND.dir.EQ.2.AND.
2295 $ mycol.EQ.csrc1) ) THEN
2296 DO 380 indx = 1, ilen
2297 iwork(ipiw+indx-1) =
2298 $ int( work( buffer+indx-1 ) )
2299 380 CONTINUE
2300 CALL slamov( 'All', dlen, 1,
2301 $ work( buffer+ilen ), dlen,
2302 $ work( ipw3 ), dlen )
2303 END IF
2304 END IF
2305 IF( rsrc1.NE.rsrc4 ) THEN
2306 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
2307 buffer = pdtraf
2308 bufflen = dlen + ilen
2309 IF( npcol.GT.1 .AND. dir.EQ.1 ) THEN
2310 DO 390 indx = 1, ilen
2311 work( buffer+indx-1 ) =
2312 $ float( iwork(ipiw+indx-1) )
2313 390 CONTINUE
2314 CALL slamov( 'All', dlen, 1, work( ipw3 ),
2315 $ dlen, work(buffer+ilen), dlen )
2316 CALL sgebs2d( ictxt, 'Row', top, bufflen,
2317 $ 1, work(buffer), bufflen )
2318 END IF
2319 ELSEIF( myrow.EQ.rsrc4 .AND. dir.EQ.1 .AND.
2320 $ npcol.GT.1 ) THEN
2321 buffer = pdtraf
2322 bufflen = dlen + ilen
2323 CALL sgebr2d( ictxt, 'Row', top, bufflen,
2324 $ 1, work(buffer), bufflen, rsrc4, csrc4 )
2325 DO 400 indx = 1, ilen
2326 iwork(ipiw+indx-1) =
2327 $ int( work( buffer+indx-1 ) )
2328 400 CONTINUE
2329 CALL slamov( 'All', dlen, 1,
2330 $ work( buffer+ilen ), dlen,
2331 $ work( ipw3 ), dlen )
2332 END IF
2333 END IF
2334 IF( csrc1.NE.csrc4 ) THEN
2335 IF( myrow.EQ.rsrc4 .AND. mycol.EQ.csrc4 ) THEN
2336 buffer = pdtraf
2337 bufflen = dlen + ilen
2338 IF( nprow.GT.1 .AND. dir.EQ.2 ) THEN
2339 DO 395 indx = 1, ilen
2340 work( buffer+indx-1 ) =
2341 $ float( iwork(ipiw+indx-1) )
2342 395 CONTINUE
2343 CALL slamov( 'All', dlen, 1, work( ipw3 ),
2344 $ dlen, work(buffer+ilen), dlen )
2345 CALL sgebs2d( ictxt, 'Col', top, bufflen,
2346 $ 1, work(buffer), bufflen )
2347 END IF
2348 ELSEIF( mycol.EQ.csrc4 .AND. dir.EQ.2 .AND.
2349 $ nprow.GT.1 ) THEN
2350 buffer = pdtraf
2351 bufflen = dlen + ilen
2352 CALL sgebr2d( ictxt, 'Col', top, bufflen, 1,
2353 $ work(buffer), bufflen, rsrc4, csrc4 )
2354 DO 402 indx = 1, ilen
2355 iwork(ipiw+indx-1) =
2356 $ int( work( buffer+indx-1 ) )
2357 402 CONTINUE
2358 CALL slamov( 'All', dlen, 1,
2359 $ work( buffer+ilen ), dlen,
2360 $ work( ipw3 ), dlen )
2361 END IF
2362 END IF
2363*
2364 326 CONTINUE
2365*
2366 321 CONTINUE
2367*
2368* Compute crossborder updates.
2369*
2370 DO 322 window = window0, wine, 2
2371 IF( window.EQ.1 .AND. skip1cr ) GO TO 327
2372 rsrc4 = iwork(irsrc+window-1)
2373 csrc4 = iwork(icsrc+window-1)
2374 rsrc1 = mod( rsrc4 - 1 + nprow, nprow )
2375 csrc1 = mod( csrc4 - 1 + npcol, npcol )
2376*
2377* Prepare workspaces for updates:
2378* IPW3 holds now the orthogonal transformations
2379* IPW4 holds the explicit orthogonal matrix, if formed
2380* IPW5 holds the crossborder block column of T
2381* IPW6 holds the crossborder block row of T
2382* IPW7 holds the crossborder block column of Q
2383* (if WANTQ=.TRUE.)
2384* IPW8 points to the leftover workspace used as lhs in
2385* matrix multiplications
2386*
2387 IF( ((mycol.EQ.csrc1.OR.mycol.EQ.csrc4).AND.dir.EQ.2)
2388 $ .OR. ((myrow.EQ.rsrc1.OR.myrow.EQ.rsrc4).AND.
2389 $ dir.EQ.1)) THEN
2390 ipw4 = buffer
2391 IF( dir.EQ.2 ) THEN
2392 IF( wantq ) THEN
2393 qrows = numroc( n, nb, myrow, descq( rsrc_ ),
2394 $ nprow )
2395 ELSE
2396 qrows = 0
2397 END IF
2398 trows = numroc( i-1, nb, myrow, desct( rsrc_ ),
2399 $ nprow )
2400 ELSE
2401 qrows = 0
2402 trows = 0
2403 END IF
2404 IF( dir.EQ.1 ) THEN
2405 tcols = numroc( n - (i+dim1-1), nb, mycol,
2406 $ csrc4, npcol )
2407 IF( mycol.EQ.csrc4 ) tcols = tcols - dim4
2408 ELSE
2409 tcols = 0
2410 END IF
2411 ipw5 = ipw4 + nwin*nwin
2412 ipw6 = ipw5 + trows * nwin
2413 IF( wantq ) THEN
2414 ipw7 = ipw6 + nwin * tcols
2415 ipw8 = ipw7 + qrows * nwin
2416 ELSE
2417 ipw8 = ipw6 + nwin * tcols
2418 END IF
2419 END IF
2420*
2421* Let each process row and column involved in the updates
2422* exchange data in T and Q with their neighbours.
2423*
2424 IF( dir.EQ.2 ) THEN
2425 IF( mycol.EQ.csrc1 .OR. mycol.EQ.csrc4 ) THEN
2426 DO 410 indx = 1, nprow
2427 IF( mycol.EQ.csrc1 ) THEN
2428 CALL infog2l( 1+(indx-1)*nb, i, desct,
2429 $ nprow, npcol, myrow, mycol, iloc,
2430 $ jloc1, rsrc, csrc1 )
2431 IF( myrow.EQ.rsrc ) THEN
2432 CALL slamov( 'All', trows, dim1,
2433 $ t((jloc1-1)*lldt+iloc), lldt,
2434 $ work(ipw5), trows )
2435 IF( npcol.GT.1 ) THEN
2436 east = mod( mycol + 1, npcol )
2437 CALL sgesd2d( ictxt, trows, dim1,
2438 $ work(ipw5), trows, rsrc,
2439 $ east )
2440 CALL sgerv2d( ictxt, trows, dim4,
2441 $ work(ipw5+trows*dim1), trows,
2442 $ rsrc, east )
2443 END IF
2444 END IF
2445 END IF
2446 IF( mycol.EQ.csrc4 ) THEN
2447 CALL infog2l( 1+(indx-1)*nb, i+dim1,
2448 $ desct, nprow, npcol, myrow, mycol,
2449 $ iloc, jloc4, rsrc, csrc4 )
2450 IF( myrow.EQ.rsrc ) THEN
2451 CALL slamov( 'All', trows, dim4,
2452 $ t((jloc4-1)*lldt+iloc), lldt,
2453 $ work(ipw5+trows*dim1), trows )
2454 IF( npcol.GT.1 ) THEN
2455 west = mod( mycol-1+npcol, npcol )
2456 CALL sgesd2d( ictxt, trows, dim4,
2457 $ work(ipw5+trows*dim1), trows,
2458 $ rsrc, west )
2459 CALL sgerv2d( ictxt, trows, dim1,
2460 $ work(ipw5), trows, rsrc,
2461 $ west )
2462 END IF
2463 END IF
2464 END IF
2465 410 CONTINUE
2466 END IF
2467 END IF
2468*
2469 IF( dir.EQ.1 ) THEN
2470 IF( myrow.EQ.rsrc1 .OR. myrow.EQ.rsrc4 ) THEN
2471 DO 420 indx = 1, npcol
2472 IF( myrow.EQ.rsrc1 ) THEN
2473 IF( indx.EQ.1 ) THEN
2474 CALL infog2l( i, lihic+1, desct, nprow,
2475 $ npcol, myrow, mycol, iloc1, jloc,
2476 $ rsrc1, csrc )
2477 ELSE
2478 CALL infog2l( i,
2479 $ (iceil(lihic,nb)+(indx-2))*nb+1,
2480 $ desct, nprow, npcol, myrow, mycol,
2481 $ iloc1, jloc, rsrc1, csrc )
2482 END IF
2483 IF( mycol.EQ.csrc ) THEN
2484 CALL slamov( 'All', dim1, tcols,
2485 $ t((jloc-1)*lldt+iloc1), lldt,
2486 $ work(ipw6), nwin )
2487 IF( nprow.GT.1 ) THEN
2488 south = mod( myrow + 1, nprow )
2489 CALL sgesd2d( ictxt, dim1, tcols,
2490 $ work(ipw6), nwin, south,
2491 $ csrc )
2492 CALL sgerv2d( ictxt, dim4, tcols,
2493 $ work(ipw6+dim1), nwin, south,
2494 $ csrc )
2495 END IF
2496 END IF
2497 END IF
2498 IF( myrow.EQ.rsrc4 ) THEN
2499 IF( indx.EQ.1 ) THEN
2500 CALL infog2l( i+dim1, lihic+1, desct,
2501 $ nprow, npcol, myrow, mycol, iloc4,
2502 $ jloc, rsrc4, csrc )
2503 ELSE
2504 CALL infog2l( i+dim1,
2505 $ (iceil(lihic,nb)+(indx-2))*nb+1,
2506 $ desct, nprow, npcol, myrow, mycol,
2507 $ iloc4, jloc, rsrc4, csrc )
2508 END IF
2509 IF( mycol.EQ.csrc ) THEN
2510 CALL slamov( 'All', dim4, tcols,
2511 $ t((jloc-1)*lldt+iloc4), lldt,
2512 $ work(ipw6+dim1), nwin )
2513 IF( nprow.GT.1 ) THEN
2514 north = mod( myrow-1+nprow, nprow )
2515 CALL sgesd2d( ictxt, dim4, tcols,
2516 $ work(ipw6+dim1), nwin, north,
2517 $ csrc )
2518 CALL sgerv2d( ictxt, dim1, tcols,
2519 $ work(ipw6), nwin, north,
2520 $ csrc )
2521 END IF
2522 END IF
2523 END IF
2524 420 CONTINUE
2525 END IF
2526 END IF
2527*
2528 IF( dir.EQ.2 ) THEN
2529 IF( wantq ) THEN
2530 IF( mycol.EQ.csrc1 .OR. mycol.EQ.csrc4 ) THEN
2531 DO 430 indx = 1, nprow
2532 IF( mycol.EQ.csrc1 ) THEN
2533 CALL infog2l( 1+(indx-1)*nb, i, descq,
2534 $ nprow, npcol, myrow, mycol, iloc,
2535 $ jloc1, rsrc, csrc1 )
2536 IF( myrow.EQ.rsrc ) THEN
2537 CALL slamov( 'All', qrows, dim1,
2538 $ q((jloc1-1)*lldq+iloc), lldq,
2539 $ work(ipw7), qrows )
2540 IF( npcol.GT.1 ) THEN
2541 east = mod( mycol + 1, npcol )
2542 CALL sgesd2d( ictxt, qrows, dim1,
2543 $ work(ipw7), qrows, rsrc,
2544 $ east )
2545 CALL sgerv2d( ictxt, qrows, dim4,
2546 $ work(ipw7+qrows*dim1),
2547 $ qrows, rsrc, east )
2548 END IF
2549 END IF
2550 END IF
2551 IF( mycol.EQ.csrc4 ) THEN
2552 CALL infog2l( 1+(indx-1)*nb, i+dim1,
2553 $ descq, nprow, npcol, myrow, mycol,
2554 $ iloc, jloc4, rsrc, csrc4 )
2555 IF( myrow.EQ.rsrc ) THEN
2556 CALL slamov( 'All', qrows, dim4,
2557 $ q((jloc4-1)*lldq+iloc), lldq,
2558 $ work(ipw7+qrows*dim1), qrows )
2559 IF( npcol.GT.1 ) THEN
2560 west = mod( mycol-1+npcol,
2561 $ npcol )
2562 CALL sgesd2d( ictxt, qrows, dim4,
2563 $ work(ipw7+qrows*dim1),
2564 $ qrows, rsrc, west )
2565 CALL sgerv2d( ictxt, qrows, dim1,
2566 $ work(ipw7), qrows, rsrc,
2567 $ west )
2568 END IF
2569 END IF
2570 END IF
2571 430 CONTINUE
2572 END IF
2573 END IF
2574 END IF
2575*
2576 327 CONTINUE
2577*
2578 322 CONTINUE
2579*
2580 DO 323 window = window0, wine, 2
2581 rsrc4 = iwork(irsrc+window-1)
2582 csrc4 = iwork(icsrc+window-1)
2583 rsrc1 = mod( rsrc4 - 1 + nprow, nprow )
2584 csrc1 = mod( csrc4 - 1 + npcol, npcol )
2585 flops = 0
2586 IF( ((mycol.EQ.csrc1.OR.mycol.EQ.csrc4).AND.dir.EQ.2)
2587 $ .OR. ((myrow.EQ.rsrc1.OR.myrow.EQ.rsrc4).AND.
2588 $ dir.EQ.1) ) THEN
2589*
2590* Skip this part of the updates if appropriate.
2591*
2592 IF( window.EQ.1 .AND. skip1cr ) GO TO 328
2593*
2594* Count number of operations to decide whether to use
2595* matrix-matrix multiplications for updating
2596* off-diagonal parts or not.
2597*
2598 nitraf = pitraf - ipiw
2599 ishh = .false.
2600 DO 405 k = 1, nitraf
2601 IF( iwork( ipiw + k - 1 ).LE.nwin ) THEN
2602 flops = flops + 6
2603 ELSE
2604 flops = flops + 11
2605 ishh = .true.
2606 END IF
2607 405 CONTINUE
2608*
2609* Perform updates in parallel.
2610*
2611 IF( flops.NE.0 .AND.
2612 $ ( 2*flops*100 )/( 2*nwin*nwin ) .GE. mmult )
2613 $ THEN
2614*
2615 CALL slaset( 'All', nwin, nwin, zero, one,
2616 $ work( ipw4 ), nwin )
2617 work(ipw8) = float(myrow)
2618 work(ipw8+1) = float(mycol)
2619 CALL bslaapp( 1, nwin, nwin, ncb, work( ipw4 ),
2620 $ nwin, nitraf, iwork(ipiw), work( ipw3 ),
2621 $ work(ipw8) )
2622*
2623* Test if sparsity structure of orthogonal matrix
2624* can be exploited (see below).
2625*
2626 IF( ishh .OR. dim1.NE.ks .OR. dim4.NE.ks ) THEN
2627*
2628* Update the columns of T and Q affected by the
2629* reordering.
2630*
2631 IF( dir.EQ.2 ) THEN
2632 DO 440 indx = 1, min(i-1,1+(nprow-1)*nb),
2633 $ nb
2634 IF( mycol.EQ.csrc1 ) THEN
2635 CALL infog2l( indx, i, desct, nprow,
2636 $ npcol, myrow, mycol, iloc,
2637 $ jloc, rsrc, csrc1 )
2638 IF( myrow.EQ.rsrc ) THEN
2639 CALL sgemm( 'No transpose',
2640 $ 'No transpose', trows, dim1,
2641 $ nwin, one, work( ipw5 ),
2642 $ trows, work( ipw4 ), nwin,
2643 $ zero, work(ipw8), trows )
2644 CALL slamov( 'All', trows, dim1,
2645 $ work(ipw8), trows,
2646 $ t((jloc-1)*lldt+iloc),
2647 $ lldt )
2648 END IF
2649 END IF
2650 IF( mycol.EQ.csrc4 ) THEN
2651 CALL infog2l( indx, i+dim1, desct,
2652 $ nprow, npcol, myrow, mycol,
2653 $ iloc, jloc, rsrc, csrc4 )
2654 IF( myrow.EQ.rsrc ) THEN
2655 CALL sgemm( 'No transpose',
2656 $ 'No transpose', trows, dim4,
2657 $ nwin, one, work( ipw5 ),
2658 $ trows,
2659 $ work( ipw4+nwin*dim1 ),
2660 $ nwin, zero, work(ipw8),
2661 $ trows )
2662 CALL slamov( 'All', trows, dim4,
2663 $ work(ipw8), trows,
2664 $ t((jloc-1)*lldt+iloc),
2665 $ lldt )
2666 END IF
2667 END IF
2668 440 CONTINUE
2669*
2670 IF( wantq ) THEN
2671 DO 450 indx = 1, min(n,1+(nprow-1)*nb),
2672 $ nb
2673 IF( mycol.EQ.csrc1 ) THEN
2674 CALL infog2l( indx, i, descq,
2675 $ nprow, npcol, myrow, mycol,
2676 $ iloc, jloc, rsrc, csrc1 )
2677 IF( myrow.EQ.rsrc ) THEN
2678 CALL sgemm( 'No transpose',
2679 $ 'No transpose', qrows,
2680 $ dim1, nwin, one,
2681 $ work( ipw7 ), qrows,
2682 $ work( ipw4 ), nwin,
2683 $ zero, work(ipw8),
2684 $ qrows )
2685 CALL slamov( 'All', qrows,
2686 $ dim1, work(ipw8), qrows,
2687 $ q((jloc-1)*lldq+iloc),
2688 $ lldq )
2689 END IF
2690 END IF
2691 IF( mycol.EQ.csrc4 ) THEN
2692 CALL infog2l( indx, i+dim1,
2693 $ descq, nprow, npcol, myrow,
2694 $ mycol, iloc, jloc, rsrc,
2695 $ csrc4 )
2696 IF( myrow.EQ.rsrc ) THEN
2697 CALL sgemm( 'No transpose',
2698 $ 'No transpose', qrows,
2699 $ dim4, nwin, one,
2700 $ work( ipw7 ), qrows,
2701 $ work( ipw4+nwin*dim1 ),
2702 $ nwin, zero, work(ipw8),
2703 $ qrows )
2704 CALL slamov( 'All', qrows,
2705 $ dim4, work(ipw8), qrows,
2706 $ q((jloc-1)*lldq+iloc),
2707 $ lldq )
2708 END IF
2709 END IF
2710 450 CONTINUE
2711 END IF
2712 END IF
2713*
2714* Update the rows of T affected by the
2715* reordering.
2716*
2717 IF( dir.EQ.1 ) THEN
2718 IF ( lihic.LT.n ) THEN
2719 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc4
2720 $ .AND.mod(lihic,nb).NE.0 ) THEN
2721 indx = lihic + 1
2722 CALL infog2l( i, indx, desct, nprow,
2723 $ npcol, myrow, mycol, iloc,
2724 $ jloc, rsrc1, csrc4 )
2725 CALL sgemm( 'Transpose',
2726 $ 'No Transpose', dim1, tcols,
2727 $ nwin, one, work(ipw4), nwin,
2728 $ work( ipw6 ), nwin, zero,
2729 $ work(ipw8), dim1 )
2730 CALL slamov( 'All', dim1, tcols,
2731 $ work(ipw8), dim1,
2732 $ t((jloc-1)*lldt+iloc), lldt )
2733 END IF
2734 IF( myrow.EQ.rsrc4.AND.mycol.EQ.csrc4
2735 $ .AND.mod(lihic,nb).NE.0 ) THEN
2736 indx = lihic + 1
2737 CALL infog2l( i+dim1, indx, desct,
2738 $ nprow, npcol, myrow, mycol,
2739 $ iloc, jloc, rsrc4, csrc4 )
2740 CALL sgemm( 'Transpose',
2741 $ 'No Transpose', dim4, tcols,
2742 $ nwin, one,
2743 $ work( ipw4+dim1*nwin ), nwin,
2744 $ work( ipw6), nwin, zero,
2745 $ work(ipw8), dim4 )
2746 CALL slamov( 'All', dim4, tcols,
2747 $ work(ipw8), dim4,
2748 $ t((jloc-1)*lldt+iloc), lldt )
2749 END IF
2750 indxs = iceil(lihic,nb)*nb + 1
2751 indxe = min(n,indxs+(npcol-2)*nb)
2752 DO 460 indx = indxs, indxe, nb
2753 IF( myrow.EQ.rsrc1 ) THEN
2754 CALL infog2l( i, indx, desct,
2755 $ nprow, npcol, myrow, mycol,
2756 $ iloc, jloc, rsrc1, csrc )
2757 IF( mycol.EQ.csrc ) THEN
2758 CALL sgemm( 'Transpose',
2759 $ 'No Transpose', dim1,
2760 $ tcols, nwin, one,
2761 $ work( ipw4 ), nwin,
2762 $ work( ipw6 ), nwin,
2763 $ zero, work(ipw8), dim1 )
2764 CALL slamov( 'All', dim1,
2765 $ tcols, work(ipw8), dim1,
2766 $ t((jloc-1)*lldt+iloc),
2767 $ lldt )
2768 END IF
2769 END IF
2770 IF( myrow.EQ.rsrc4 ) THEN
2771 CALL infog2l( i+dim1, indx,
2772 $ desct, nprow, npcol, myrow,
2773 $ mycol, iloc, jloc, rsrc4,
2774 $ csrc )
2775 IF( mycol.EQ.csrc ) THEN
2776 CALL sgemm( 'Transpose',
2777 $ 'No Transpose', dim4,
2778 $ tcols, nwin, one,
2779 $ work( ipw4+nwin*dim1 ),
2780 $ nwin, work( ipw6 ),
2781 $ nwin, zero, work(ipw8),
2782 $ dim4 )
2783 CALL slamov( 'All', dim4,
2784 $ tcols, work(ipw8), dim4,
2785 $ t((jloc-1)*lldt+iloc),
2786 $ lldt )
2787 END IF
2788 END IF
2789 460 CONTINUE
2790 END IF
2791 END IF
2792 ELSE
2793*
2794* The NWIN-by-NWIN matrix U containing the
2795* accumulated orthogonal transformations has
2796* the following structure:
2797*
2798* [ U11 U12 ]
2799* U = [ ],
2800* [ U21 U22 ]
2801*
2802* where U21 is KS-by-KS upper triangular and
2803* U12 is (NWIN-KS)-by-(NWIN-KS) lower
2804* triangular. For reordering over the border
2805* the structure is only exploited when the
2806* border cuts the columns of U conformally with
2807* the structure itself. This happens exactly
2808* when all eigenvalues in the subcluster was
2809* moved to the other side of the border and
2810* fits perfectly in their new positions, i.e.,
2811* the reordering stops when the last eigenvalue
2812* to cross the border is reordered to the
2813* position closest to the border. Tested by
2814* checking is KS = DIM1 = DIM4 (see above).
2815* This should hold quite often. But this branch
2816* is entered only if all involved eigenvalues
2817* are real.
2818*
2819* Update the columns of T and Q affected by the
2820* reordering.
2821*
2822* Compute T2*U21 + T1*U11 on the left side of
2823* the border.
2824*
2825 IF( dir.EQ.2 ) THEN
2826 indxe = min(i-1,1+(nprow-1)*nb)
2827 DO 470 indx = 1, indxe, nb
2828 IF( mycol.EQ.csrc1 ) THEN
2829 CALL infog2l( indx, i, desct, nprow,
2830 $ npcol, myrow, mycol, iloc,
2831 $ jloc, rsrc, csrc1 )
2832 IF( myrow.EQ.rsrc ) THEN
2833 CALL slamov( 'All', trows, ks,
2834 $ work( ipw5+trows*dim4),
2835 $ trows, work(ipw8), trows )
2836 CALL strmm( 'Right', 'Upper',
2837 $ 'No transpose',
2838 $ 'Non-unit', trows, ks,
2839 $ one, work( ipw4+dim4 ),
2840 $ nwin, work(ipw8), trows )
2841 CALL sgemm( 'No transpose',
2842 $ 'No transpose', trows, ks,
2843 $ dim4, one, work( ipw5 ),
2844 $ trows, work( ipw4 ), nwin,
2845 $ one, work(ipw8), trows )
2846 CALL slamov( 'All', trows, ks,
2847 $ work(ipw8), trows,
2848 $ t((jloc-1)*lldt+iloc),
2849 $ lldt )
2850 END IF
2851 END IF
2852*
2853* Compute T1*U12 + T2*U22 on the right
2854* side of the border.
2855*
2856 IF( mycol.EQ.csrc4 ) THEN
2857 CALL infog2l( indx, i+dim1, desct,
2858 $ nprow, npcol, myrow, mycol,
2859 $ iloc, jloc, rsrc, csrc4 )
2860 IF( myrow.EQ.rsrc ) THEN
2861 CALL slamov( 'All', trows, dim4,
2862 $ work(ipw5), trows,
2863 $ work( ipw8 ), trows )
2864 CALL strmm( 'Right', 'Lower',
2865 $ 'No transpose',
2866 $ 'Non-unit', trows, dim4,
2867 $ one, work( ipw4+nwin*ks ),
2868 $ nwin, work( ipw8 ), trows )
2869 CALL sgemm( 'No transpose',
2870 $ 'No transpose', trows, dim4,
2871 $ ks, one,
2872 $ work( ipw5+trows*dim4),
2873 $ trows,
2874 $ work( ipw4+nwin*ks+dim4 ),
2875 $ nwin, one, work( ipw8 ),
2876 $ trows )
2877 CALL slamov( 'All', trows, dim4,
2878 $ work(ipw8), trows,
2879 $ t((jloc-1)*lldt+iloc),
2880 $ lldt )
2881 END IF
2882 END IF
2883 470 CONTINUE
2884 IF( wantq ) THEN
2885*
2886* Compute Q2*U21 + Q1*U11 on the left
2887* side of border.
2888*
2889 indxe = min(n,1+(nprow-1)*nb)
2890 DO 480 indx = 1, indxe, nb
2891 IF( mycol.EQ.csrc1 ) THEN
2892 CALL infog2l( indx, i, descq,
2893 $ nprow, npcol, myrow, mycol,
2894 $ iloc, jloc, rsrc, csrc1 )
2895 IF( myrow.EQ.rsrc ) THEN
2896 CALL slamov( 'All', qrows, ks,
2897 $ work( ipw7+qrows*dim4),
2898 $ qrows, work(ipw8),
2899 $ qrows )
2900 CALL strmm( 'Right', 'Upper',
2901 $ 'No transpose',
2902 $ 'Non-unit', qrows,
2903 $ ks, one,
2904 $ work( ipw4+dim4 ), nwin,
2905 $ work(ipw8), qrows )
2906 CALL sgemm( 'No transpose',
2907 $ 'No transpose', qrows,
2908 $ ks, dim4, one,
2909 $ work( ipw7 ), qrows,
2910 $ work( ipw4 ), nwin, one,
2911 $ work(ipw8), qrows )
2912 CALL slamov( 'All', qrows, ks,
2913 $ work(ipw8), qrows,
2914 $ q((jloc-1)*lldq+iloc),
2915 $ lldq )
2916 END IF
2917 END IF
2918*
2919* Compute Q1*U12 + Q2*U22 on the right
2920* side of border.
2921*
2922 IF( mycol.EQ.csrc4 ) THEN
2923 CALL infog2l( indx, i+dim1,
2924 $ descq, nprow, npcol, myrow,
2925 $ mycol, iloc, jloc, rsrc,
2926 $ csrc4 )
2927 IF( myrow.EQ.rsrc ) THEN
2928 CALL slamov( 'All', qrows,
2929 $ dim4, work(ipw7), qrows,
2930 $ work( ipw8 ), qrows )
2931 CALL strmm( 'Right', 'Lower',
2932 $ 'No transpose',
2933 $ 'Non-unit', qrows,
2934 $ dim4, one,
2935 $ work( ipw4+nwin*ks ),
2936 $ nwin, work( ipw8 ),
2937 $ qrows )
2938 CALL sgemm( 'No transpose',
2939 $ 'No transpose', qrows,
2940 $ dim4, ks, one,
2941 $ work(ipw7+qrows*(dim4)),
2942 $ qrows,
2943 $ work(ipw4+nwin*ks+dim4),
2944 $ nwin, one, work( ipw8 ),
2945 $ qrows )
2946 CALL slamov( 'All', qrows,
2947 $ dim4, work(ipw8), qrows,
2948 $ q((jloc-1)*lldq+iloc),
2949 $ lldq )
2950 END IF
2951 END IF
2952 480 CONTINUE
2953 END IF
2954 END IF
2955*
2956 IF( dir.EQ.1 ) THEN
2957 IF ( lihic.LT.n ) THEN
2958*
2959* Compute U21**T*T2 + U11**T*T1 on the
2960* upper side of the border.
2961*
2962 IF( myrow.EQ.rsrc1.AND.mycol.EQ.csrc4
2963 $ .AND.mod(lihic,nb).NE.0 ) THEN
2964 indx = lihic + 1
2965 CALL infog2l( i, indx, desct, nprow,
2966 $ npcol, myrow, mycol, iloc,
2967 $ jloc, rsrc1, csrc4 )
2968 CALL slamov( 'All', ks, tcols,
2969 $ work( ipw6+dim4 ), nwin,
2970 $ work(ipw8), ks )
2971 CALL strmm( 'Left', 'Upper',
2972 $ 'Transpose', 'Non-unit',
2973 $ ks, tcols, one,
2974 $ work( ipw4+dim4 ), nwin,
2975 $ work(ipw8), ks )
2976 CALL sgemm( 'Transpose',
2977 $ 'No transpose', ks, tcols,
2978 $ dim4, one, work(ipw4), nwin,
2979 $ work(ipw6), nwin, one,
2980 $ work(ipw8), ks )
2981 CALL slamov( 'All', ks, tcols,
2982 $ work(ipw8), ks,
2983 $ t((jloc-1)*lldt+iloc), lldt )
2984 END IF
2985*
2986* Compute U12**T*T1 + U22**T*T2 on the
2987* lower side of the border.
2988*
2989 IF( myrow.EQ.rsrc4.AND.mycol.EQ.csrc4
2990 $ .AND.mod(lihic,nb).NE.0 ) THEN
2991 indx = lihic + 1
2992 CALL infog2l( i+dim1, indx, desct,
2993 $ nprow, npcol, myrow, mycol,
2994 $ iloc, jloc, rsrc4, csrc4 )
2995 CALL slamov( 'All', dim4, tcols,
2996 $ work( ipw6 ), nwin,
2997 $ work( ipw8 ), dim4 )
2998 CALL strmm( 'Left', 'Lower',
2999 $ 'Transpose', 'Non-unit',
3000 $ dim4, tcols, one,
3001 $ work( ipw4+nwin*ks ), nwin,
3002 $ work( ipw8 ), dim4 )
3003 CALL sgemm( 'Transpose',
3004 $ 'No Transpose', dim4, tcols,
3005 $ ks, one,
3006 $ work( ipw4+nwin*ks+dim4 ),
3007 $ nwin, work( ipw6+dim1 ), nwin,
3008 $ one, work( ipw8), dim4 )
3009 CALL slamov( 'All', dim4, tcols,
3010 $ work(ipw8), dim4,
3011 $ t((jloc-1)*lldt+iloc), lldt )
3012 END IF
3013*
3014* Compute U21**T*T2 + U11**T*T1 on upper
3015* side on border.
3016*
3017 indxs = iceil(lihic,nb)*nb+1
3018 indxe = min(n,indxs+(npcol-2)*nb)
3019 DO 490 indx = indxs, indxe, nb
3020 IF( myrow.EQ.rsrc1 ) THEN
3021 CALL infog2l( i, indx, desct,
3022 $ nprow, npcol, myrow, mycol,
3023 $ iloc, jloc, rsrc1, csrc )
3024 IF( mycol.EQ.csrc ) THEN
3025 CALL slamov( 'All', ks, tcols,
3026 $ work( ipw6+dim4 ), nwin,
3027 $ work(ipw8), ks )
3028 CALL strmm( 'Left', 'Upper',
3029 $ 'Transpose',
3030 $ 'Non-unit', ks,
3031 $ tcols, one,
3032 $ work( ipw4+dim4 ), nwin,
3033 $ work(ipw8), ks )
3034 CALL sgemm( 'Transpose',
3035 $ 'No transpose', ks,
3036 $ tcols, dim4, one,
3037 $ work(ipw4), nwin,
3038 $ work(ipw6), nwin, one,
3039 $ work(ipw8), ks )
3040 CALL slamov( 'All', ks, tcols,
3041 $ work(ipw8), ks,
3042 $ t((jloc-1)*lldt+iloc),
3043 $ lldt )
3044 END IF
3045 END IF
3046*
3047* Compute U12**T*T1 + U22**T*T2 on
3048* lower side of border.
3049*
3050 IF( myrow.EQ.rsrc4 ) THEN
3051 CALL infog2l( i+dim1, indx,
3052 $ desct, nprow, npcol, myrow,
3053 $ mycol, iloc, jloc, rsrc4,
3054 $ csrc )
3055 IF( mycol.EQ.csrc ) THEN
3056 CALL slamov( 'All', dim4,
3057 $ tcols, work( ipw6 ),
3058 $ nwin, work( ipw8 ),
3059 $ dim4 )
3060 CALL strmm( 'Left', 'Lower',
3061 $ 'Transpose',
3062 $ 'Non-unit', dim4,
3063 $ tcols, one,
3064 $ work( ipw4+nwin*ks ),
3065 $ nwin, work( ipw8 ),
3066 $ dim4 )
3067 CALL sgemm( 'Transpose',
3068 $ 'No Transpose', dim4,
3069 $ tcols, ks, one,
3070 $ work(ipw4+nwin*ks+dim4),
3071 $ nwin, work( ipw6+dim1 ),
3072 $ nwin, one, work( ipw8),
3073 $ dim4 )
3074 CALL slamov( 'All', dim4,
3075 $ tcols, work(ipw8), dim4,
3076 $ t((jloc-1)*lldt+iloc),
3077 $ lldt )
3078 END IF
3079 END IF
3080 490 CONTINUE
3081 END IF
3082 END IF
3083 END IF
3084 ELSEIF( flops.NE.0 ) THEN
3085*
3086* Update off-diagonal blocks and Q using the
3087* pipelined elementary transformations. Now we
3088* have a delicate problem: how to do this without
3089* redundant work? For now, we let the processes
3090* involved compute the whole crossborder block
3091* rows and column saving only the part belonging
3092* to the corresponding side of the border. To make
3093* this a realistic alternative, we have modified
3094* the ratio r_flops (see Reference [2] above) to
3095* give more favor to the ordinary matrix
3096* multiplication.
3097*
3098 IF( dir.EQ.2 ) THEN
3099 indxe = min(i-1,1+(nprow-1)*nb)
3100 DO 500 indx = 1, indxe, nb
3101 CALL infog2l( indx, i, desct, nprow,
3102 $ npcol, myrow, mycol, iloc, jloc,
3103 $ rsrc, csrc )
3104 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc )
3105 $ THEN
3106 CALL bslaapp( 1, trows, nwin, ncb,
3107 $ work(ipw5), trows, nitraf,
3108 $ iwork(ipiw), work( ipw3 ),
3109 $ work(ipw8) )
3110 CALL slamov( 'All', trows, dim1,
3111 $ work(ipw5), trows,
3112 $ t((jloc-1)*lldt+iloc ), lldt )
3113 END IF
3114 CALL infog2l( indx, i+dim1, desct, nprow,
3115 $ npcol, myrow, mycol, iloc, jloc,
3116 $ rsrc, csrc )
3117 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc )
3118 $ THEN
3119 IF( npcol.GT.1 )
3120 $ CALL bslaapp( 1, trows, nwin, ncb,
3121 $ work(ipw5), trows, nitraf,
3122 $ iwork(ipiw), work( ipw3 ),
3123 $ work(ipw8) )
3124 CALL slamov( 'All', trows, dim4,
3125 $ work(ipw5+trows*dim1), trows,
3126 $ t((jloc-1)*lldt+iloc ), lldt )
3127 END IF
3128 500 CONTINUE
3129 IF( wantq ) THEN
3130 indxe = min(n,1+(nprow-1)*nb)
3131 DO 510 indx = 1, indxe, nb
3132 CALL infog2l( indx, i, descq, nprow,
3133 $ npcol, myrow, mycol, iloc, jloc,
3134 $ rsrc, csrc )
3135 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc )
3136 $ THEN
3137 CALL bslaapp( 1, qrows, nwin, ncb,
3138 $ work(ipw7), qrows, nitraf,
3139 $ iwork(ipiw), work( ipw3 ),
3140 $ work(ipw8) )
3141 CALL slamov( 'All', qrows, dim1,
3142 $ work(ipw7), qrows,
3143 $ q((jloc-1)*lldq+iloc ), lldq )
3144 END IF
3145 CALL infog2l( indx, i+dim1, descq,
3146 $ nprow, npcol, myrow, mycol, iloc,
3147 $ jloc, rsrc, csrc )
3148 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc )
3149 $ THEN
3150 IF( npcol.GT.1 )
3151 $ CALL bslaapp( 1, qrows, nwin,
3152 $ ncb, work(ipw7), qrows,
3153 $ nitraf, iwork(ipiw),
3154 $ work( ipw3 ), work(ipw8) )
3155 CALL slamov( 'All', qrows, dim4,
3156 $ work(ipw7+qrows*dim1), qrows,
3157 $ q((jloc-1)*lldq+iloc ), lldq )
3158 END IF
3159 510 CONTINUE
3160 END IF
3161 END IF
3162*
3163 IF( dir.EQ.1 ) THEN
3164 IF( lihic.LT.n ) THEN
3165 indx = lihic + 1
3166 CALL infog2l( i, indx, desct, nprow,
3167 $ npcol, myrow, mycol, iloc, jloc,
3168 $ rsrc, csrc )
3169 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc.AND.
3170 $ mod(lihic,nb).NE.0 ) THEN
3171 CALL bslaapp( 0, nwin, tcols, ncb,
3172 $ work( ipw6 ), nwin, nitraf,
3173 $ iwork(ipiw), work( ipw3 ),
3174 $ work(ipw8) )
3175 CALL slamov( 'All', dim1, tcols,
3176 $ work( ipw6 ), nwin,
3177 $ t((jloc-1)*lldt+iloc), lldt )
3178 END IF
3179 CALL infog2l( i+dim1, indx, desct, nprow,
3180 $ npcol, myrow, mycol, iloc, jloc,
3181 $ rsrc, csrc )
3182 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc.AND.
3183 $ mod(lihic,nb).NE.0 ) THEN
3184 IF( nprow.GT.1 )
3185 $ CALL bslaapp( 0, nwin, tcols, ncb,
3186 $ work( ipw6 ), nwin, nitraf,
3187 $ iwork(ipiw), work( ipw3 ),
3188 $ work(ipw8) )
3189 CALL slamov( 'All', dim4, tcols,
3190 $ work( ipw6+dim1 ), nwin,
3191 $ t((jloc-1)*lldt+iloc), lldt )
3192 END IF
3193 indxs = iceil(lihic,nb)*nb + 1
3194 indxe = min(n,indxs+(npcol-2)*nb)
3195 DO 520 indx = indxs, indxe, nb
3196 CALL infog2l( i, indx, desct, nprow,
3197 $ npcol, myrow, mycol, iloc, jloc,
3198 $ rsrc, csrc )
3199 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc )
3200 $ THEN
3201 CALL bslaapp( 0, nwin, tcols, ncb,
3202 $ work(ipw6), nwin, nitraf,
3203 $ iwork(ipiw), work( ipw3 ),
3204 $ work(ipw8) )
3205 CALL slamov( 'All', dim1, tcols,
3206 $ work( ipw6 ), nwin,
3207 $ t((jloc-1)*lldt+iloc), lldt )
3208 END IF
3209 CALL infog2l( i+dim1, indx, desct,
3210 $ nprow, npcol, myrow, mycol, iloc,
3211 $ jloc, rsrc, csrc )
3212 IF( myrow.EQ.rsrc .AND. mycol.EQ.csrc )
3213 $ THEN
3214 IF( nprow.GT.1 )
3215 $ CALL bslaapp( 0, nwin, tcols,
3216 $ ncb, work(ipw6), nwin, nitraf,
3217 $ iwork(ipiw), work( ipw3 ),
3218 $ work(ipw8) )
3219 CALL slamov( 'All', dim4, tcols,
3220 $ work( ipw6+dim1 ), nwin,
3221 $ t((jloc-1)*lldt+iloc), lldt )
3222 END IF
3223 520 CONTINUE
3224 END IF
3225 END IF
3226 END IF
3227 END IF
3228*
3229 328 CONTINUE
3230*
3231 323 CONTINUE
3232*
3233* End of loops over directions (DIR).
3234*
3235 2222 CONTINUE
3236*
3237* End of loops over diagonal blocks for reordering over the
3238* block diagonal.
3239*
3240 310 CONTINUE
3241 last = last + 1
3242 IF( lastwait .AND. last.LT.2 ) GO TO 308
3243*
3244* Barrier to collect the processes before proceeding.
3245*
3246 CALL blacs_barrier( ictxt, 'All' )
3247*
3248* Compute global maximum of IERR so that we know if some process
3249* experienced a failure in the reordering.
3250*
3251 myierr = ierr
3252 IF( nprocs.GT.1 ) THEN
3253 CALL igamx2d( ictxt, 'All', top, 1, 1, ierr, 1, -1,
3254 $ -1, -1, -1, -1 )
3255 END IF
3256*
3257 IF( ierr.NE.0 ) THEN
3258*
3259* When calling BDTREXC, the block at position I+KKS-1 failed
3260* to swap.
3261*
3262 IF( myierr.NE.0 ) info = max(1,i+kks-1)
3263 IF( nprocs.GT.1 ) THEN
3264 CALL igamx2d( ictxt, 'All', top, 1, 1, info, 1, -1,
3265 $ -1, -1, -1, -1 )
3266 END IF
3267 GO TO 300
3268 END IF
3269*
3270* Do a global update of the SELECT vector.
3271*
3272 DO 530 k = 1, n
3273 rsrc = indxg2p( k, nb, myrow, desct( rsrc_ ), nprow )
3274 csrc = indxg2p( k, nb, mycol, desct( csrc_ ), npcol )
3275 IF( myrow.NE.rsrc .OR. mycol.NE.csrc )
3276 $ SELECT( k ) = 0
3277 530 CONTINUE
3278 IF( nprocs.GT.1 )
3279 $ CALL igsum2d( ictxt, 'All', top, n, 1, SELECT, n, -1, -1 )
3280*
3281* Find the global minumum of ILO and IHI.
3282*
3283 ilo = ilo - 1
3284 523 CONTINUE
3285 ilo = ilo + 1
3286 IF( ilo.LE.n ) THEN
3287 IF( SELECT(ilo).NE.0 ) GO TO 523
3288 END IF
3289 ihi = ihi + 1
3290 527 CONTINUE
3291 ihi = ihi - 1
3292 IF( ihi.GE.1 ) THEN
3293 IF( SELECT(ihi).EQ.0 ) GO TO 527
3294 END IF
3295*
3296* End While ( ILO <= M )
3297 GO TO 50
3298 END IF
3299*
3300 300 CONTINUE
3301*
3302* In case an error occured, do an additional global update of
3303* SELECT.
3304*
3305 IF( info.NE.0 ) THEN
3306 DO 540 k = 1, n
3307 rsrc = indxg2p( k, nb, myrow, desct( rsrc_ ), nprow )
3308 csrc = indxg2p( k, nb, mycol, desct( csrc_ ), npcol )
3309 IF( myrow.NE.rsrc .OR. mycol.NE.csrc )
3310 $ SELECT( k ) = 0
3311 540 CONTINUE
3312 IF( nprocs.GT.1 )
3313 $ CALL igsum2d( ictxt, 'All', top, n, 1, SELECT, n, -1, -1 )
3314 END IF
3315*
3316 545 CONTINUE
3317*
3318* Store the output eigenvalues in WR and WI: first let all the
3319* processes compute the eigenvalue inside their diagonal blocks in
3320* parallel, except for the eigenvalue located next to a block
3321* border. After that, compute all eigenvalues located next to the
3322* block borders. Finally, do a global summation over WR and WI so
3323* that all processors receive the result. Notice: real eigenvalues
3324* extracted from a non-canonical 2-by-2 block are not stored in
3325* any particular order.
3326*
3327 DO 550 k = 1, n
3328 wr( k ) = zero
3329 wi( k ) = zero
3330 550 CONTINUE
3331*
3332* Loop 560: extract eigenvalues from the blocks which are not laid
3333* out across a border of the processor mesh, except for those 1x1
3334* blocks on the border.
3335*
3336 pair = .false.
3337 DO 560 k = 1, n
3338 IF( .NOT. pair ) THEN
3339 border = ( k.NE.n .AND. mod( k, nb ).EQ.0 ) .OR.
3340 % ( k.NE.1 .AND. mod( k, nb ).EQ.1 )
3341 IF( .NOT. border ) THEN
3342 CALL infog2l( k, k, desct, nprow, npcol, myrow, mycol,
3343 $ iloc1, jloc1, trsrc1, tcsrc1 )
3344 IF( myrow.EQ.trsrc1 .AND. mycol.EQ.tcsrc1 ) THEN
3345 elem1 = t((jloc1-1)*lldt+iloc1)
3346 IF( k.LT.n ) THEN
3347 elem3 = t((jloc1-1)*lldt+iloc1+1)
3348 ELSE
3349 elem3 = zero
3350 END IF
3351 IF( elem3.NE.zero ) THEN
3352 elem2 = t((jloc1)*lldt+iloc1)
3353 elem4 = t((jloc1)*lldt+iloc1+1)
3354 CALL slanv2( elem1, elem2, elem3, elem4,
3355 $ wr( k ), wi( k ), wr( k+1 ), wi( k+1 ), sn,
3356 $ cs )
3357 pair = .true.
3358 ELSE
3359 IF( k.GT.1 ) THEN
3360 tmp = t((jloc1-2)*lldt+iloc1)
3361 IF( tmp.NE.zero ) THEN
3362 elem1 = t((jloc1-2)*lldt+iloc1-1)
3363 elem2 = t((jloc1-1)*lldt+iloc1-1)
3364 elem3 = t((jloc1-2)*lldt+iloc1)
3365 elem4 = t((jloc1-1)*lldt+iloc1)
3366 CALL slanv2( elem1, elem2, elem3, elem4,
3367 $ wr( k-1 ), wi( k-1 ), wr( k ),
3368 $ wi( k ), sn, cs )
3369 ELSE
3370 wr( k ) = elem1
3371 END IF
3372 ELSE
3373 wr( k ) = elem1
3374 END IF
3375 END IF
3376 END IF
3377 END IF
3378 ELSE
3379 pair = .false.
3380 END IF
3381 560 CONTINUE
3382*
3383* Loop 570: extract eigenvalues from the blocks which are laid
3384* out across a border of the processor mesh. The processors are
3385* numbered as below:
3386*
3387* 1 | 2
3388* --+--
3389* 3 | 4
3390*
3391 DO 570 k = nb, n-1, nb
3392 CALL infog2l( k, k, desct, nprow, npcol, myrow, mycol,
3393 $ iloc1, jloc1, trsrc1, tcsrc1 )
3394 CALL infog2l( k, k+1, desct, nprow, npcol, myrow, mycol,
3395 $ iloc2, jloc2, trsrc2, tcsrc2 )
3396 CALL infog2l( k+1, k, desct, nprow, npcol, myrow, mycol,
3397 $ iloc3, jloc3, trsrc3, tcsrc3 )
3398 CALL infog2l( k+1, k+1, desct, nprow, npcol, myrow, mycol,
3399 $ iloc4, jloc4, trsrc4, tcsrc4 )
3400 IF( myrow.EQ.trsrc2 .AND. mycol.EQ.tcsrc2 ) THEN
3401 elem2 = t((jloc2-1)*lldt+iloc2)
3402 IF( trsrc1.NE.trsrc2 .OR. tcsrc1.NE.tcsrc2 )
3403 $ CALL sgesd2d( ictxt, 1, 1, elem2, 1, trsrc1, tcsrc1 )
3404 END IF
3405 IF( myrow.EQ.trsrc3 .AND. mycol.EQ.tcsrc3 ) THEN
3406 elem3 = t((jloc3-1)*lldt+iloc3)
3407 IF( trsrc1.NE.trsrc3 .OR. tcsrc1.NE.tcsrc3 )
3408 $ CALL sgesd2d( ictxt, 1, 1, elem3, 1, trsrc1, tcsrc1 )
3409 END IF
3410 IF( myrow.EQ.trsrc4 .AND. mycol.EQ.tcsrc4 ) THEN
3411 work(1) = t((jloc4-1)*lldt+iloc4)
3412 IF( k+1.LT.n ) THEN
3413 work(2) = t((jloc4-1)*lldt+iloc4+1)
3414 ELSE
3415 work(2) = zero
3416 END IF
3417 IF( trsrc1.NE.trsrc4 .OR. tcsrc1.NE.tcsrc4 )
3418 $ CALL sgesd2d( ictxt, 2, 1, work, 2, trsrc1, tcsrc1 )
3419 END IF
3420 IF( myrow.EQ.trsrc1 .AND. mycol.EQ.tcsrc1 ) THEN
3421 elem1 = t((jloc1-1)*lldt+iloc1)
3422 IF( trsrc1.NE.trsrc2 .OR. tcsrc1.NE.tcsrc2 )
3423 $ CALL sgerv2d( ictxt, 1, 1, elem2, 1, trsrc2, tcsrc2 )
3424 IF( trsrc1.NE.trsrc3 .OR. tcsrc1.NE.tcsrc3 )
3425 $ CALL sgerv2d( ictxt, 1, 1, elem3, 1, trsrc3, tcsrc3 )
3426 IF( trsrc1.NE.trsrc4 .OR. tcsrc1.NE.tcsrc4 )
3427 $ CALL sgerv2d( ictxt, 2, 1, work, 2, trsrc4, tcsrc4 )
3428 elem4 = work(1)
3429 elem5 = work(2)
3430 IF( elem5.EQ.zero ) THEN
3431 IF( wr( k ).EQ.zero .AND. wi( k ).EQ.zero ) THEN
3432 CALL slanv2( elem1, elem2, elem3, elem4, wr( k ),
3433 $ wi( k ), wr( k+1 ), wi( k+1 ), sn, cs )
3434 ELSEIF( wr( k+1 ).EQ.zero .AND. wi( k+1 ).EQ.zero ) THEN
3435 wr( k+1 ) = elem4
3436 END IF
3437 ELSEIF( wr( k ).EQ.zero .AND. wi( k ).EQ.zero ) THEN
3438 wr( k ) = elem1
3439 END IF
3440 END IF
3441 570 CONTINUE
3442*
3443 IF( nprocs.GT.1 ) THEN
3444 CALL sgsum2d( ictxt, 'All', top, n, 1, wr, n, -1, -1 )
3445 CALL sgsum2d( ictxt, 'All', top, n, 1, wi, n, -1, -1 )
3446 END IF
3447*
3448* Store storage requirements in workspaces.
3449*
3450 work( 1 ) = float(lwmin)
3451 iwork( 1 ) = liwmin
3452*
3453* Return to calling program.
3454*
3455 RETURN
3456*
3457* End of PSTRORD
3458*
3459 END
3460*
subroutine bdtrexc(n, t, ldt, ifst, ilst, nitraf, itraf, ndtraf, dtraf, work, info)
Definition bdtrexc.f:3
subroutine bslaapp(iside, m, n, nb, a, lda, nitraf, itraf, dtraf, work)
Definition bslaapp.f:3
subroutine bstrexc(n, t, ldt, ifst, ilst, nitraf, itraf, ndtraf, dtraf, work, info)
Definition bstrexc.f:3
subroutine chk1mat(ma, mapos0, na, napos0, ia, ja, desca, descapos0, info)
Definition chk1mat.f:3
subroutine ilacpy(uplo, m, n, a, lda, b, ldb)
Definition ilacpy.f:2
subroutine infog2l(grindx, gcindx, desc, nprow, npcol, myrow, mycol, lrindx, lcindx, rsrc, csrc)
Definition infog2l.f:3
#define max(A, B)
Definition pcgemr.c:180
#define min(A, B)
Definition pcgemr.c:181
subroutine pchk1mat(ma, mapos0, na, napos0, ia, ja, desca, descapos0, nextra, ex, expos, info)
Definition pchkxmat.f:3
subroutine pchk2mat(ma, mapos0, na, napos0, ia, ja, desca, descapos0, mb, mbpos0, nb, nbpos0, ib, jb, descb, descbpos0, nextra, ex, expos, info)
Definition pchkxmat.f:175
subroutine pselget(scope, top, alpha, a, ia, ja, desca)
Definition pselget.f:2
subroutine pslacpy(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)
Definition pslacpy.f:3
subroutine pstrord(compq, select, para, n, t, it, jt, desct, q, iq, jq, descq, wr, wi, m, work, lwork, iwork, liwork, info)
Definition pstrord.f:4
subroutine pxerbla(ictxt, srname, info)
Definition pxerbla.f:2