LAPACK 3.12.1
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ zlatrs3()

subroutine zlatrs3 ( character uplo,
character trans,
character diag,
character normin,
integer n,
integer nrhs,
complex*16, dimension( lda, * ) a,
integer lda,
complex*16, dimension( ldx, * ) x,
integer ldx,
double precision, dimension( * ) scale,
double precision, dimension( * ) cnorm,
double precision, dimension( * ) work,
integer lwork,
integer info )

ZLATRS3 solves a triangular system of equations with the scale factors set to prevent overflow.

Purpose:
!>
!> ZLATRS3 solves one of the triangular systems
!>
!>    A * X = B * diag(scale),  A**T * X = B * diag(scale), or
!>    A**H * X = B * diag(scale)
!>
!> with scaling to prevent overflow.  Here A is an upper or lower
!> triangular matrix, A**T denotes the transpose of A, A**H denotes the
!> conjugate transpose of A. X and B are n-by-nrhs matrices and scale
!> is an nrhs-element vector of scaling factors. A scaling factor scale(j)
!> is usually less than or equal to 1, chosen such that X(:,j) is less
!> than the overflow threshold. If the matrix A is singular (A(j,j) = 0
!> for some j), then a non-trivial solution to A*X = 0 is returned. If
!> the system is so badly scaled that the solution cannot be represented
!> as (1/scale(k))*X(:,k), then x(:,k) = 0 and scale(k) is returned.
!>
!> This is a BLAS-3 version of LATRS for solving several right
!> hand sides simultaneously.
!>
!> 
Parameters
[in]UPLO
!>          UPLO is CHARACTER*1
!>          Specifies whether the matrix A is upper or lower triangular.
!>          = 'U':  Upper triangular
!>          = 'L':  Lower triangular
!> 
[in]TRANS
!>          TRANS is CHARACTER*1
!>          Specifies the operation applied to A.
!>          = 'N':  Solve A * x = s*b  (No transpose)
!>          = 'T':  Solve A**T* x = s*b  (Transpose)
!>          = 'C':  Solve A**T* x = s*b  (Conjugate transpose)
!> 
[in]DIAG
!>          DIAG is CHARACTER*1
!>          Specifies whether or not the matrix A is unit triangular.
!>          = 'N':  Non-unit triangular
!>          = 'U':  Unit triangular
!> 
[in]NORMIN
!>          NORMIN is CHARACTER*1
!>          Specifies whether CNORM has been set or not.
!>          = 'Y':  CNORM contains the column norms on entry
!>          = 'N':  CNORM is not set on entry.  On exit, the norms will
!>                  be computed and stored in CNORM.
!> 
[in]N
!>          N is INTEGER
!>          The order of the matrix A.  N >= 0.
!> 
[in]NRHS
!>          NRHS is INTEGER
!>          The number of columns of X.  NRHS >= 0.
!> 
[in]A
!>          A is COMPLEX*16 array, dimension (LDA,N)
!>          The triangular matrix A.  If UPLO = 'U', the leading n by n
!>          upper triangular part of the array A contains the upper
!>          triangular matrix, and the strictly lower triangular part of
!>          A is not referenced.  If UPLO = 'L', the leading n by n lower
!>          triangular part of the array A contains the lower triangular
!>          matrix, and the strictly upper triangular part of A is not
!>          referenced.  If DIAG = 'U', the diagonal elements of A are
!>          also not referenced and are assumed to be 1.
!> 
[in]LDA
!>          LDA is INTEGER
!>          The leading dimension of the array A.  LDA >= max (1,N).
!> 
[in,out]X
!>          X is COMPLEX*16 array, dimension (LDX,NRHS)
!>          On entry, the right hand side B of the triangular system.
!>          On exit, X is overwritten by the solution matrix X.
!> 
[in]LDX
!>          LDX is INTEGER
!>          The leading dimension of the array X.  LDX >= max (1,N).
!> 
[out]SCALE
!>          SCALE is DOUBLE PRECISION array, dimension (NRHS)
!>          The scaling factor s(k) is for the triangular system
!>          A * x(:,k) = s(k)*b(:,k)  or  A**T* x(:,k) = s(k)*b(:,k).
!>          If SCALE = 0, the matrix A is singular or badly scaled.
!>          If A(j,j) = 0 is encountered, a non-trivial vector x(:,k)
!>          that is an exact or approximate solution to A*x(:,k) = 0
!>          is returned. If the system so badly scaled that solution
!>          cannot be presented as x(:,k) * 1/s(k), then x(:,k) = 0
!>          is returned.
!> 
[in,out]CNORM
!>          CNORM is DOUBLE PRECISION array, dimension (N)
!>
!>          If NORMIN = 'Y', CNORM is an input argument and CNORM(j)
!>          contains the norm of the off-diagonal part of the j-th column
!>          of A.  If TRANS = 'N', CNORM(j) must be greater than or equal
!>          to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j)
!>          must be greater than or equal to the 1-norm.
!>
!>          If NORMIN = 'N', CNORM is an output argument and CNORM(j)
!>          returns the 1-norm of the offdiagonal part of the j-th column
!>          of A.
!> 
[out]WORK
!>          WORK is DOUBLE PRECISION array, dimension (LWORK).
!>          On exit, if INFO = 0, WORK(1) returns the optimal size of
!>          WORK.
!> 
[in]LWORK
!>          LWORK is INTEGER
!>          The dimension of the array WORK.
!>
!>          If MIN(N,NRHS) = 0, LWORK >= 1, else
!>          LWORK >= MAX(1, 2*NBA * MAX(NBA, MIN(NRHS, 32)), where
!>          NBA = (N + NB - 1)/NB and NB is the optimal block size.
!>
!>          If LWORK = -1, then a workspace query is assumed; the routine
!>          only calculates the optimal dimensions of the WORK array, returns
!>          this value as the first entry of the WORK array, and no error
!>          message related to LWORK is issued by XERBLA.
!> 
[out]INFO
!>          INFO is INTEGER
!>          = 0:  successful exit
!>          < 0:  if INFO = -k, the k-th argument had an illegal value
!> 
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:

Definition at line 233 of file zlatrs3.f.

235 IMPLICIT NONE
236*
237* .. Scalar Arguments ..
238 CHARACTER DIAG, TRANS, NORMIN, UPLO
239 INTEGER INFO, LDA, LWORK, LDX, N, NRHS
240* ..
241* .. Array Arguments ..
242 COMPLEX*16 A( LDA, * ), X( LDX, * )
243 DOUBLE PRECISION CNORM( * ), SCALE( * ), WORK( * )
244* ..
245*
246* =====================================================================
247*
248* .. Parameters ..
249 DOUBLE PRECISION ZERO, ONE
250 parameter( zero = 0.0d+0, one = 1.0d+0 )
251 COMPLEX*16 CZERO, CONE
252 parameter( cone = ( 1.0d+0, 0.0d+0 ) )
253 parameter( czero = ( 0.0d+0, 0.0d+0 ) )
254 INTEGER NBMAX, NBMIN, NBRHS, NRHSMIN
255 parameter( nrhsmin = 2, nbrhs = 32 )
256 parameter( nbmin = 8, nbmax = 64 )
257* ..
258* .. Local Arrays ..
259 DOUBLE PRECISION W( NBMAX ), XNRM( NBRHS )
260* ..
261* .. Local Scalars ..
262 LOGICAL LQUERY, NOTRAN, NOUNIT, UPPER
263 INTEGER AWRK, I, IFIRST, IINC, ILAST, II, I1, I2, J,
264 $ JFIRST, JINC, JLAST, J1, J2, K, KK, K1, K2,
265 $ LANRM, LDS, LSCALE, NB, NBA, NBX, RHS, LWMIN
266 DOUBLE PRECISION ANRM, BIGNUM, BNRM, RSCAL, SCAL, SCALOC,
267 $ SCAMIN, SMLNUM, TMAX
268* ..
269* .. External Functions ..
270 LOGICAL LSAME
271 INTEGER ILAENV
272 DOUBLE PRECISION DLAMCH, ZLANGE, DLARMM
273 EXTERNAL ilaenv, lsame, dlamch, zlange,
274 $ dlarmm
275* ..
276* .. External Subroutines ..
277 EXTERNAL zlatrs, zdscal, xerbla
278* ..
279* .. Intrinsic Functions ..
280 INTRINSIC abs, max, min
281* ..
282* .. Executable Statements ..
283*
284 info = 0
285 upper = lsame( uplo, 'U' )
286 notran = lsame( trans, 'N' )
287 nounit = lsame( diag, 'N' )
288 lquery = ( lwork.EQ.-1 )
289*
290* Partition A and X into blocks.
291*
292 nb = max( nbmin, ilaenv( 1, 'ZLATRS', '', n, n, -1, -1 ) )
293 nb = min( nbmax, nb )
294 nba = max( 1, (n + nb - 1) / nb )
295 nbx = max( 1, (nrhs + nbrhs - 1) / nbrhs )
296*
297* Compute the workspace
298*
299* The workspace comprises two parts.
300* The first part stores the local scale factors. Each simultaneously
301* computed right-hand side requires one local scale factor per block
302* row. WORK( I + KK * LDS ) is the scale factor of the vector
303* segment associated with the I-th block row and the KK-th vector
304* in the block column.
305*
306 lscale = nba * max( nba, min( nrhs, nbrhs ) )
307 lds = nba
308*
309* The second part stores upper bounds of the triangular A. There are
310* a total of NBA x NBA blocks, of which only the upper triangular
311* part or the lower triangular part is referenced. The upper bound of
312* the block A( I, J ) is stored as WORK( AWRK + I + J * NBA ).
313*
314 lanrm = nba * nba
315 awrk = lscale
316*
317 IF( min( n, nrhs ).EQ.0 ) THEN
318 lwmin = 1
319 ELSE
320 lwmin = lscale + lanrm
321 END IF
322 work( 1 ) = lwmin
323*
324* Test the input parameters.
325*
326 IF( .NOT.upper .AND. .NOT.lsame( uplo, 'L' ) ) THEN
327 info = -1
328 ELSE IF( .NOT.notran .AND. .NOT.lsame( trans, 'T' ) .AND. .NOT.
329 $ lsame( trans, 'C' ) ) THEN
330 info = -2
331 ELSE IF( .NOT.nounit .AND. .NOT.lsame( diag, 'U' ) ) THEN
332 info = -3
333 ELSE IF( .NOT.lsame( normin, 'Y' ) .AND. .NOT.
334 $ lsame( normin, 'N' ) ) THEN
335 info = -4
336 ELSE IF( n.LT.0 ) THEN
337 info = -5
338 ELSE IF( nrhs.LT.0 ) THEN
339 info = -6
340 ELSE IF( lda.LT.max( 1, n ) ) THEN
341 info = -8
342 ELSE IF( ldx.LT.max( 1, n ) ) THEN
343 info = -10
344 ELSE IF( .NOT.lquery .AND. lwork.LT.lwmin ) THEN
345 info = -14
346 END IF
347 IF( info.NE.0 ) THEN
348 CALL xerbla( 'ZLATRS3', -info )
349 RETURN
350 ELSE IF( lquery ) THEN
351 RETURN
352 END IF
353*
354* Initialize scaling factors
355*
356 DO kk = 1, nrhs
357 scale( kk ) = one
358 END DO
359*
360* Quick return if possible
361*
362 IF( min( n, nrhs ).EQ.0 )
363 $ RETURN
364*
365* Determine machine dependent constant to control overflow.
366*
367 bignum = dlamch( 'Overflow' )
368 smlnum = dlamch( 'Safe Minimum' )
369*
370* Use unblocked code for small problems
371*
372 IF( nrhs.LT.nrhsmin ) THEN
373 CALL zlatrs( uplo, trans, diag, normin, n, a, lda, x( 1, 1),
374 $ scale( 1 ), cnorm, info )
375 DO k = 2, nrhs
376 CALL zlatrs( uplo, trans, diag, 'Y', n, a, lda, x( 1,
377 $ k ),
378 $ scale( k ), cnorm, info )
379 END DO
380 RETURN
381 END IF
382*
383* Compute norms of blocks of A excluding diagonal blocks and find
384* the block with the largest norm TMAX.
385*
386 tmax = zero
387 DO j = 1, nba
388 j1 = (j-1)*nb + 1
389 j2 = min( j*nb, n ) + 1
390 IF ( upper ) THEN
391 ifirst = 1
392 ilast = j - 1
393 ELSE
394 ifirst = j + 1
395 ilast = nba
396 END IF
397 DO i = ifirst, ilast
398 i1 = (i-1)*nb + 1
399 i2 = min( i*nb, n ) + 1
400*
401* Compute upper bound of A( I1:I2-1, J1:J2-1 ).
402*
403 IF( notran ) THEN
404 anrm = zlange( 'I', i2-i1, j2-j1, a( i1, j1 ), lda,
405 $ w )
406 work( awrk + i+(j-1)*nba ) = anrm
407 ELSE
408 anrm = zlange( '1', i2-i1, j2-j1, a( i1, j1 ), lda,
409 $ w )
410 work( awrk + j+(i-1) * nba ) = anrm
411 END IF
412 tmax = max( tmax, anrm )
413 END DO
414 END DO
415*
416 IF( .NOT. tmax.LE.dlamch('Overflow') ) THEN
417*
418* Some matrix entries have huge absolute value. At least one upper
419* bound norm( A(I1:I2-1, J1:J2-1), 'I') is not a valid floating-point
420* number, either due to overflow in LANGE or due to Inf in A.
421* Fall back to LATRS. Set normin = 'N' for every right-hand side to
422* force computation of TSCAL in LATRS to avoid the likely overflow
423* in the computation of the column norms CNORM.
424*
425 DO k = 1, nrhs
426 CALL zlatrs( uplo, trans, diag, 'N', n, a, lda, x( 1,
427 $ k ),
428 $ scale( k ), cnorm, info )
429 END DO
430 RETURN
431 END IF
432*
433* Every right-hand side requires workspace to store NBA local scale
434* factors. To save workspace, X is computed successively in block columns
435* of width NBRHS, requiring a total of NBA x NBRHS space. If sufficient
436* workspace is available, larger values of NBRHS or NBRHS = NRHS are viable.
437 DO k = 1, nbx
438* Loop over block columns (index = K) of X and, for column-wise scalings,
439* over individual columns (index = KK).
440* K1: column index of the first column in X( J, K )
441* K2: column index of the first column in X( J, K+1 )
442* so the K2 - K1 is the column count of the block X( J, K )
443 k1 = (k-1)*nbrhs + 1
444 k2 = min( k*nbrhs, nrhs ) + 1
445*
446* Initialize local scaling factors of current block column X( J, K )
447*
448 DO kk = 1, k2 - k1
449 DO i = 1, nba
450 work( i+kk*lds ) = one
451 END DO
452 END DO
453*
454 IF( notran ) THEN
455*
456* Solve A * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
457*
458 IF( upper ) THEN
459 jfirst = nba
460 jlast = 1
461 jinc = -1
462 ELSE
463 jfirst = 1
464 jlast = nba
465 jinc = 1
466 END IF
467 ELSE
468*
469* Solve op(A) * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
470* where op(A) = A**T or op(A) = A**H
471*
472 IF( upper ) THEN
473 jfirst = 1
474 jlast = nba
475 jinc = 1
476 ELSE
477 jfirst = nba
478 jlast = 1
479 jinc = -1
480 END IF
481 END IF
482
483 DO j = jfirst, jlast, jinc
484* J1: row index of the first row in A( J, J )
485* J2: row index of the first row in A( J+1, J+1 )
486* so that J2 - J1 is the row count of the block A( J, J )
487 j1 = (j-1)*nb + 1
488 j2 = min( j*nb, n ) + 1
489*
490* Solve op(A( J, J )) * X( J, RHS ) = SCALOC * B( J, RHS )
491*
492 DO kk = 1, k2 - k1
493 rhs = k1 + kk - 1
494 IF( kk.EQ.1 ) THEN
495 CALL zlatrs( uplo, trans, diag, 'N', j2-j1,
496 $ a( j1, j1 ), lda, x( j1, rhs ),
497 $ scaloc, cnorm, info )
498 ELSE
499 CALL zlatrs( uplo, trans, diag, 'Y', j2-j1,
500 $ a( j1, j1 ), lda, x( j1, rhs ),
501 $ scaloc, cnorm, info )
502 END IF
503* Find largest absolute value entry in the vector segment
504* X( J1:J2-1, RHS ) as an upper bound for the worst case
505* growth in the linear updates.
506 xnrm( kk ) = zlange( 'I', j2-j1, 1, x( j1, rhs ),
507 $ ldx, w )
508*
509 IF( scaloc .EQ. zero ) THEN
510* LATRS found that A is singular through A(j,j) = 0.
511* Reset the computation x(1:n) = 0, x(j) = 1, SCALE = 0
512* and compute op(A)*x = 0. Note that X(J1:J2-1, KK) is
513* set by LATRS.
514 scale( rhs ) = zero
515 DO ii = 1, j1-1
516 x( ii, kk ) = czero
517 END DO
518 DO ii = j2, n
519 x( ii, kk ) = czero
520 END DO
521* Discard the local scale factors.
522 DO ii = 1, nba
523 work( ii+kk*lds ) = one
524 END DO
525 scaloc = one
526 ELSE IF( scaloc*work( j+kk*lds ) .EQ. zero ) THEN
527* LATRS computed a valid scale factor, but combined with
528* the current scaling the solution does not have a
529* scale factor > 0.
530*
531* Set WORK( J+KK*LDS ) to smallest valid scale
532* factor and increase SCALOC accordingly.
533 scal = work( j+kk*lds ) / smlnum
534 scaloc = scaloc * scal
535 work( j+kk*lds ) = smlnum
536* If LATRS overestimated the growth, x may be
537* rescaled to preserve a valid combined scale
538* factor WORK( J, KK ) > 0.
539 rscal = one / scaloc
540 IF( xnrm( kk )*rscal .LE. bignum ) THEN
541 xnrm( kk ) = xnrm( kk ) * rscal
542 CALL zdscal( j2-j1, rscal, x( j1, rhs ), 1 )
543 scaloc = one
544 ELSE
545* The system op(A) * x = b is badly scaled and its
546* solution cannot be represented as (1/scale) * x.
547* Set x to zero. This approach deviates from LATRS
548* where a completely meaningless non-zero vector
549* is returned that is not a solution to op(A) * x = b.
550 scale( rhs ) = zero
551 DO ii = 1, n
552 x( ii, kk ) = czero
553 END DO
554* Discard the local scale factors.
555 DO ii = 1, nba
556 work( ii+kk*lds ) = one
557 END DO
558 scaloc = one
559 END IF
560 END IF
561 scaloc = scaloc * work( j+kk*lds )
562 work( j+kk*lds ) = scaloc
563 END DO
564*
565* Linear block updates
566*
567 IF( notran ) THEN
568 IF( upper ) THEN
569 ifirst = j - 1
570 ilast = 1
571 iinc = -1
572 ELSE
573 ifirst = j + 1
574 ilast = nba
575 iinc = 1
576 END IF
577 ELSE
578 IF( upper ) THEN
579 ifirst = j + 1
580 ilast = nba
581 iinc = 1
582 ELSE
583 ifirst = j - 1
584 ilast = 1
585 iinc = -1
586 END IF
587 END IF
588*
589 DO i = ifirst, ilast, iinc
590* I1: row index of the first column in X( I, K )
591* I2: row index of the first column in X( I+1, K )
592* so the I2 - I1 is the row count of the block X( I, K )
593 i1 = (i-1)*nb + 1
594 i2 = min( i*nb, n ) + 1
595*
596* Prepare the linear update to be executed with GEMM.
597* For each column, compute a consistent scaling, a
598* scaling factor to survive the linear update, and
599* rescale the column segments, if necessary. Then
600* the linear update is safely executed.
601*
602 DO kk = 1, k2 - k1
603 rhs = k1 + kk - 1
604* Compute consistent scaling
605 scamin = min( work( i+kk*lds), work( j+kk*lds ) )
606*
607* Compute scaling factor to survive the linear update
608* simulating consistent scaling.
609*
610 bnrm = zlange( 'I', i2-i1, 1, x( i1, rhs ), ldx,
611 $ w )
612 bnrm = bnrm*( scamin / work( i+kk*lds ) )
613 xnrm( kk ) = xnrm( kk )*( scamin / work( j+kk*lds) )
614 anrm = work( awrk + i+(j-1)*nba )
615 scaloc = dlarmm( anrm, xnrm( kk ), bnrm )
616*
617* Simultaneously apply the robust update factor and the
618* consistency scaling factor to X( I, KK ) and X( J, KK ).
619*
620 scal = ( scamin / work( i+kk*lds) )*scaloc
621 IF( scal.NE.one ) THEN
622 CALL zdscal( i2-i1, scal, x( i1, rhs ), 1 )
623 work( i+kk*lds ) = scamin*scaloc
624 END IF
625*
626 scal = ( scamin / work( j+kk*lds ) )*scaloc
627 IF( scal.NE.one ) THEN
628 CALL zdscal( j2-j1, scal, x( j1, rhs ), 1 )
629 work( j+kk*lds ) = scamin*scaloc
630 END IF
631 END DO
632*
633 IF( notran ) THEN
634*
635* B( I, K ) := B( I, K ) - A( I, J ) * X( J, K )
636*
637 CALL zgemm( 'N', 'N', i2-i1, k2-k1, j2-j1, -cone,
638 $ a( i1, j1 ), lda, x( j1, k1 ), ldx,
639 $ cone, x( i1, k1 ), ldx )
640 ELSE IF( lsame( trans, 'T' ) ) THEN
641*
642* B( I, K ) := B( I, K ) - A( I, J )**T * X( J, K )
643*
644 CALL zgemm( 'T', 'N', i2-i1, k2-k1, j2-j1, -cone,
645 $ a( j1, i1 ), lda, x( j1, k1 ), ldx,
646 $ cone, x( i1, k1 ), ldx )
647 ELSE
648*
649* B( I, K ) := B( I, K ) - A( I, J )**H * X( J, K )
650*
651 CALL zgemm( 'C', 'N', i2-i1, k2-k1, j2-j1, -cone,
652 $ a( j1, i1 ), lda, x( j1, k1 ), ldx,
653 $ cone, x( i1, k1 ), ldx )
654 END IF
655 END DO
656 END DO
657
658*
659* Reduce local scaling factors
660*
661 DO kk = 1, k2 - k1
662 rhs = k1 + kk - 1
663 DO i = 1, nba
664 scale( rhs ) = min( scale( rhs ), work( i+kk*lds ) )
665 END DO
666 END DO
667*
668* Realize consistent scaling
669*
670 DO kk = 1, k2 - k1
671 rhs = k1 + kk - 1
672 IF( scale( rhs ).NE.one .AND. scale( rhs ).NE. zero ) THEN
673 DO i = 1, nba
674 i1 = (i - 1) * nb + 1
675 i2 = min( i * nb, n ) + 1
676 scal = scale( rhs ) / work( i+kk*lds )
677 IF( scal.NE.one )
678 $ CALL zdscal( i2-i1, scal, x( i1, rhs ), 1 )
679 END DO
680 END IF
681 END DO
682 END DO
683 RETURN
684*
685* End of ZLATRS3
686*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine zgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
ZGEMM
Definition zgemm.f:188
integer function ilaenv(ispec, name, opts, n1, n2, n3, n4)
ILAENV
Definition ilaenv.f:160
double precision function dlamch(cmach)
DLAMCH
Definition dlamch.f:69
double precision function zlange(norm, m, n, a, lda, work)
ZLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition zlange.f:113
double precision function dlarmm(anorm, bnorm, cnorm)
DLARMM
Definition dlarmm.f:61
subroutine zlatrs(uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info)
ZLATRS solves a triangular system of equations with the scale factor set to prevent overflow.
Definition zlatrs.f:238
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
subroutine zdscal(n, da, zx, incx)
ZDSCAL
Definition zdscal.f:78
Here is the call graph for this function:
Here is the caller graph for this function: