LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ zlatrs3()

subroutine zlatrs3 ( character  uplo,
character  trans,
character  diag,
character  normin,
integer  n,
integer  nrhs,
complex*16, dimension( lda, * )  a,
integer  lda,
complex*16, dimension( ldx, * )  x,
integer  ldx,
double precision, dimension( * )  scale,
double precision, dimension( * )  cnorm,
double precision, dimension( * )  work,
integer  lwork,
integer  info 
)

ZLATRS3 solves a triangular system of equations with the scale factors set to prevent overflow.

Purpose:
 ZLATRS3 solves one of the triangular systems

    A * X = B * diag(scale),  A**T * X = B * diag(scale), or
    A**H * X = B * diag(scale)

 with scaling to prevent overflow.  Here A is an upper or lower
 triangular matrix, A**T denotes the transpose of A, A**H denotes the
 conjugate transpose of A. X and B are n-by-nrhs matrices and scale
 is an nrhs-element vector of scaling factors. A scaling factor scale(j)
 is usually less than or equal to 1, chosen such that X(:,j) is less
 than the overflow threshold. If the matrix A is singular (A(j,j) = 0
 for some j), then a non-trivial solution to A*X = 0 is returned. If
 the system is so badly scaled that the solution cannot be represented
 as (1/scale(k))*X(:,k), then x(:,k) = 0 and scale(k) is returned.

 This is a BLAS-3 version of LATRS for solving several right
 hand sides simultaneously.
Parameters
[in]UPLO
          UPLO is CHARACTER*1
          Specifies whether the matrix A is upper or lower triangular.
          = 'U':  Upper triangular
          = 'L':  Lower triangular
[in]TRANS
          TRANS is CHARACTER*1
          Specifies the operation applied to A.
          = 'N':  Solve A * x = s*b  (No transpose)
          = 'T':  Solve A**T* x = s*b  (Transpose)
          = 'C':  Solve A**T* x = s*b  (Conjugate transpose)
[in]DIAG
          DIAG is CHARACTER*1
          Specifies whether or not the matrix A is unit triangular.
          = 'N':  Non-unit triangular
          = 'U':  Unit triangular
[in]NORMIN
          NORMIN is CHARACTER*1
          Specifies whether CNORM has been set or not.
          = 'Y':  CNORM contains the column norms on entry
          = 'N':  CNORM is not set on entry.  On exit, the norms will
                  be computed and stored in CNORM.
[in]N
          N is INTEGER
          The order of the matrix A.  N >= 0.
[in]NRHS
          NRHS is INTEGER
          The number of columns of X.  NRHS >= 0.
[in]A
          A is COMPLEX*16 array, dimension (LDA,N)
          The triangular matrix A.  If UPLO = 'U', the leading n by n
          upper triangular part of the array A contains the upper
          triangular matrix, and the strictly lower triangular part of
          A is not referenced.  If UPLO = 'L', the leading n by n lower
          triangular part of the array A contains the lower triangular
          matrix, and the strictly upper triangular part of A is not
          referenced.  If DIAG = 'U', the diagonal elements of A are
          also not referenced and are assumed to be 1.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max (1,N).
[in,out]X
          X is COMPLEX*16 array, dimension (LDX,NRHS)
          On entry, the right hand side B of the triangular system.
          On exit, X is overwritten by the solution matrix X.
[in]LDX
          LDX is INTEGER
          The leading dimension of the array X.  LDX >= max (1,N).
[out]SCALE
          SCALE is DOUBLE PRECISION array, dimension (NRHS)
          The scaling factor s(k) is for the triangular system
          A * x(:,k) = s(k)*b(:,k)  or  A**T* x(:,k) = s(k)*b(:,k).
          If SCALE = 0, the matrix A is singular or badly scaled.
          If A(j,j) = 0 is encountered, a non-trivial vector x(:,k)
          that is an exact or approximate solution to A*x(:,k) = 0
          is returned. If the system so badly scaled that solution
          cannot be presented as x(:,k) * 1/s(k), then x(:,k) = 0
          is returned.
[in,out]CNORM
          CNORM is DOUBLE PRECISION array, dimension (N)

          If NORMIN = 'Y', CNORM is an input argument and CNORM(j)
          contains the norm of the off-diagonal part of the j-th column
          of A.  If TRANS = 'N', CNORM(j) must be greater than or equal
          to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j)
          must be greater than or equal to the 1-norm.

          If NORMIN = 'N', CNORM is an output argument and CNORM(j)
          returns the 1-norm of the offdiagonal part of the j-th column
          of A.
[out]WORK
          WORK is DOUBLE PRECISION array, dimension (LWORK).
          On exit, if INFO = 0, WORK(1) returns the optimal size of
          WORK.
[in]LWORKLWORK is INTEGER LWORK >= MAX(1, 2*NBA * MAX(NBA, MIN(NRHS, 32)), where NBA = (N + NB - 1)/NB and NB is the optimal block size.

If LWORK = -1, then a workspace query is assumed; the routine only calculates the optimal dimensions of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA.

Parameters
[out]INFO
          INFO is INTEGER
          = 0:  successful exit
          < 0:  if INFO = -k, the k-th argument had an illegal value
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:

Definition at line 228 of file zlatrs3.f.

230 IMPLICIT NONE
231*
232* .. Scalar Arguments ..
233 CHARACTER DIAG, TRANS, NORMIN, UPLO
234 INTEGER INFO, LDA, LWORK, LDX, N, NRHS
235* ..
236* .. Array Arguments ..
237 COMPLEX*16 A( LDA, * ), X( LDX, * )
238 DOUBLE PRECISION CNORM( * ), SCALE( * ), WORK( * )
239* ..
240*
241* =====================================================================
242*
243* .. Parameters ..
244 DOUBLE PRECISION ZERO, ONE
245 parameter( zero = 0.0d+0, one = 1.0d+0 )
246 COMPLEX*16 CZERO, CONE
247 parameter( cone = ( 1.0d+0, 0.0d+0 ) )
248 parameter( czero = ( 0.0d+0, 0.0d+0 ) )
249 INTEGER NBMAX, NBMIN, NBRHS, NRHSMIN
250 parameter( nrhsmin = 2, nbrhs = 32 )
251 parameter( nbmin = 8, nbmax = 64 )
252* ..
253* .. Local Arrays ..
254 DOUBLE PRECISION W( NBMAX ), XNRM( NBRHS )
255* ..
256* .. Local Scalars ..
257 LOGICAL LQUERY, NOTRAN, NOUNIT, UPPER
258 INTEGER AWRK, I, IFIRST, IINC, ILAST, II, I1, I2, J,
259 $ JFIRST, JINC, JLAST, J1, J2, K, KK, K1, K2,
260 $ LANRM, LDS, LSCALE, NB, NBA, NBX, RHS
261 DOUBLE PRECISION ANRM, BIGNUM, BNRM, RSCAL, SCAL, SCALOC,
262 $ SCAMIN, SMLNUM, TMAX
263* ..
264* .. External Functions ..
265 LOGICAL LSAME
266 INTEGER ILAENV
267 DOUBLE PRECISION DLAMCH, ZLANGE, DLARMM
268 EXTERNAL ilaenv, lsame, dlamch, zlange, dlarmm
269* ..
270* .. External Subroutines ..
271 EXTERNAL zlatrs, zdscal, xerbla
272* ..
273* .. Intrinsic Functions ..
274 INTRINSIC abs, max, min
275* ..
276* .. Executable Statements ..
277*
278 info = 0
279 upper = lsame( uplo, 'U' )
280 notran = lsame( trans, 'N' )
281 nounit = lsame( diag, 'N' )
282 lquery = ( lwork.EQ.-1 )
283*
284* Partition A and X into blocks.
285*
286 nb = max( nbmin, ilaenv( 1, 'ZLATRS', '', n, n, -1, -1 ) )
287 nb = min( nbmax, nb )
288 nba = max( 1, (n + nb - 1) / nb )
289 nbx = max( 1, (nrhs + nbrhs - 1) / nbrhs )
290*
291* Compute the workspace
292*
293* The workspace comprises two parts.
294* The first part stores the local scale factors. Each simultaneously
295* computed right-hand side requires one local scale factor per block
296* row. WORK( I + KK * LDS ) is the scale factor of the vector
297* segment associated with the I-th block row and the KK-th vector
298* in the block column.
299 lscale = nba * max( nba, min( nrhs, nbrhs ) )
300 lds = nba
301* The second part stores upper bounds of the triangular A. There are
302* a total of NBA x NBA blocks, of which only the upper triangular
303* part or the lower triangular part is referenced. The upper bound of
304* the block A( I, J ) is stored as WORK( AWRK + I + J * NBA ).
305 lanrm = nba * nba
306 awrk = lscale
307 work( 1 ) = lscale + lanrm
308*
309* Test the input parameters.
310*
311 IF( .NOT.upper .AND. .NOT.lsame( uplo, 'L' ) ) THEN
312 info = -1
313 ELSE IF( .NOT.notran .AND. .NOT.lsame( trans, 'T' ) .AND. .NOT.
314 $ lsame( trans, 'C' ) ) THEN
315 info = -2
316 ELSE IF( .NOT.nounit .AND. .NOT.lsame( diag, 'U' ) ) THEN
317 info = -3
318 ELSE IF( .NOT.lsame( normin, 'Y' ) .AND. .NOT.
319 $ lsame( normin, 'N' ) ) THEN
320 info = -4
321 ELSE IF( n.LT.0 ) THEN
322 info = -5
323 ELSE IF( nrhs.LT.0 ) THEN
324 info = -6
325 ELSE IF( lda.LT.max( 1, n ) ) THEN
326 info = -8
327 ELSE IF( ldx.LT.max( 1, n ) ) THEN
328 info = -10
329 ELSE IF( .NOT.lquery .AND. lwork.LT.work( 1 ) ) THEN
330 info = -14
331 END IF
332 IF( info.NE.0 ) THEN
333 CALL xerbla( 'ZLATRS3', -info )
334 RETURN
335 ELSE IF( lquery ) THEN
336 RETURN
337 END IF
338*
339* Initialize scaling factors
340*
341 DO kk = 1, nrhs
342 scale( kk ) = one
343 END DO
344*
345* Quick return if possible
346*
347 IF( min( n, nrhs ).EQ.0 )
348 $ RETURN
349*
350* Determine machine dependent constant to control overflow.
351*
352 bignum = dlamch( 'Overflow' )
353 smlnum = dlamch( 'Safe Minimum' )
354*
355* Use unblocked code for small problems
356*
357 IF( nrhs.LT.nrhsmin ) THEN
358 CALL zlatrs( uplo, trans, diag, normin, n, a, lda, x( 1, 1),
359 $ scale( 1 ), cnorm, info )
360 DO k = 2, nrhs
361 CALL zlatrs( uplo, trans, diag, 'Y', n, a, lda, x( 1, k ),
362 $ scale( k ), cnorm, info )
363 END DO
364 RETURN
365 END IF
366*
367* Compute norms of blocks of A excluding diagonal blocks and find
368* the block with the largest norm TMAX.
369*
370 tmax = zero
371 DO j = 1, nba
372 j1 = (j-1)*nb + 1
373 j2 = min( j*nb, n ) + 1
374 IF ( upper ) THEN
375 ifirst = 1
376 ilast = j - 1
377 ELSE
378 ifirst = j + 1
379 ilast = nba
380 END IF
381 DO i = ifirst, ilast
382 i1 = (i-1)*nb + 1
383 i2 = min( i*nb, n ) + 1
384*
385* Compute upper bound of A( I1:I2-1, J1:J2-1 ).
386*
387 IF( notran ) THEN
388 anrm = zlange( 'I', i2-i1, j2-j1, a( i1, j1 ), lda, w )
389 work( awrk + i+(j-1)*nba ) = anrm
390 ELSE
391 anrm = zlange( '1', i2-i1, j2-j1, a( i1, j1 ), lda, w )
392 work( awrk + j+(i-1) * nba ) = anrm
393 END IF
394 tmax = max( tmax, anrm )
395 END DO
396 END DO
397*
398 IF( .NOT. tmax.LE.dlamch('Overflow') ) THEN
399*
400* Some matrix entries have huge absolute value. At least one upper
401* bound norm( A(I1:I2-1, J1:J2-1), 'I') is not a valid floating-point
402* number, either due to overflow in LANGE or due to Inf in A.
403* Fall back to LATRS. Set normin = 'N' for every right-hand side to
404* force computation of TSCAL in LATRS to avoid the likely overflow
405* in the computation of the column norms CNORM.
406*
407 DO k = 1, nrhs
408 CALL zlatrs( uplo, trans, diag, 'N', n, a, lda, x( 1, k ),
409 $ scale( k ), cnorm, info )
410 END DO
411 RETURN
412 END IF
413*
414* Every right-hand side requires workspace to store NBA local scale
415* factors. To save workspace, X is computed successively in block columns
416* of width NBRHS, requiring a total of NBA x NBRHS space. If sufficient
417* workspace is available, larger values of NBRHS or NBRHS = NRHS are viable.
418 DO k = 1, nbx
419* Loop over block columns (index = K) of X and, for column-wise scalings,
420* over individual columns (index = KK).
421* K1: column index of the first column in X( J, K )
422* K2: column index of the first column in X( J, K+1 )
423* so the K2 - K1 is the column count of the block X( J, K )
424 k1 = (k-1)*nbrhs + 1
425 k2 = min( k*nbrhs, nrhs ) + 1
426*
427* Initialize local scaling factors of current block column X( J, K )
428*
429 DO kk = 1, k2 - k1
430 DO i = 1, nba
431 work( i+kk*lds ) = one
432 END DO
433 END DO
434*
435 IF( notran ) THEN
436*
437* Solve A * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
438*
439 IF( upper ) THEN
440 jfirst = nba
441 jlast = 1
442 jinc = -1
443 ELSE
444 jfirst = 1
445 jlast = nba
446 jinc = 1
447 END IF
448 ELSE
449*
450* Solve op(A) * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
451* where op(A) = A**T or op(A) = A**H
452*
453 IF( upper ) THEN
454 jfirst = 1
455 jlast = nba
456 jinc = 1
457 ELSE
458 jfirst = nba
459 jlast = 1
460 jinc = -1
461 END IF
462 END IF
463
464 DO j = jfirst, jlast, jinc
465* J1: row index of the first row in A( J, J )
466* J2: row index of the first row in A( J+1, J+1 )
467* so that J2 - J1 is the row count of the block A( J, J )
468 j1 = (j-1)*nb + 1
469 j2 = min( j*nb, n ) + 1
470*
471* Solve op(A( J, J )) * X( J, RHS ) = SCALOC * B( J, RHS )
472*
473 DO kk = 1, k2 - k1
474 rhs = k1 + kk - 1
475 IF( kk.EQ.1 ) THEN
476 CALL zlatrs( uplo, trans, diag, 'N', j2-j1,
477 $ a( j1, j1 ), lda, x( j1, rhs ),
478 $ scaloc, cnorm, info )
479 ELSE
480 CALL zlatrs( uplo, trans, diag, 'Y', j2-j1,
481 $ a( j1, j1 ), lda, x( j1, rhs ),
482 $ scaloc, cnorm, info )
483 END IF
484* Find largest absolute value entry in the vector segment
485* X( J1:J2-1, RHS ) as an upper bound for the worst case
486* growth in the linear updates.
487 xnrm( kk ) = zlange( 'I', j2-j1, 1, x( j1, rhs ),
488 $ ldx, w )
489*
490 IF( scaloc .EQ. zero ) THEN
491* LATRS found that A is singular through A(j,j) = 0.
492* Reset the computation x(1:n) = 0, x(j) = 1, SCALE = 0
493* and compute op(A)*x = 0. Note that X(J1:J2-1, KK) is
494* set by LATRS.
495 scale( rhs ) = zero
496 DO ii = 1, j1-1
497 x( ii, kk ) = czero
498 END DO
499 DO ii = j2, n
500 x( ii, kk ) = czero
501 END DO
502* Discard the local scale factors.
503 DO ii = 1, nba
504 work( ii+kk*lds ) = one
505 END DO
506 scaloc = one
507 ELSE IF( scaloc*work( j+kk*lds ) .EQ. zero ) THEN
508* LATRS computed a valid scale factor, but combined with
509* the current scaling the solution does not have a
510* scale factor > 0.
511*
512* Set WORK( J+KK*LDS ) to smallest valid scale
513* factor and increase SCALOC accordingly.
514 scal = work( j+kk*lds ) / smlnum
515 scaloc = scaloc * scal
516 work( j+kk*lds ) = smlnum
517* If LATRS overestimated the growth, x may be
518* rescaled to preserve a valid combined scale
519* factor WORK( J, KK ) > 0.
520 rscal = one / scaloc
521 IF( xnrm( kk )*rscal .LE. bignum ) THEN
522 xnrm( kk ) = xnrm( kk ) * rscal
523 CALL zdscal( j2-j1, rscal, x( j1, rhs ), 1 )
524 scaloc = one
525 ELSE
526* The system op(A) * x = b is badly scaled and its
527* solution cannot be represented as (1/scale) * x.
528* Set x to zero. This approach deviates from LATRS
529* where a completely meaningless non-zero vector
530* is returned that is not a solution to op(A) * x = b.
531 scale( rhs ) = zero
532 DO ii = 1, n
533 x( ii, kk ) = czero
534 END DO
535* Discard the local scale factors.
536 DO ii = 1, nba
537 work( ii+kk*lds ) = one
538 END DO
539 scaloc = one
540 END IF
541 END IF
542 scaloc = scaloc * work( j+kk*lds )
543 work( j+kk*lds ) = scaloc
544 END DO
545*
546* Linear block updates
547*
548 IF( notran ) THEN
549 IF( upper ) THEN
550 ifirst = j - 1
551 ilast = 1
552 iinc = -1
553 ELSE
554 ifirst = j + 1
555 ilast = nba
556 iinc = 1
557 END IF
558 ELSE
559 IF( upper ) THEN
560 ifirst = j + 1
561 ilast = nba
562 iinc = 1
563 ELSE
564 ifirst = j - 1
565 ilast = 1
566 iinc = -1
567 END IF
568 END IF
569*
570 DO i = ifirst, ilast, iinc
571* I1: row index of the first column in X( I, K )
572* I2: row index of the first column in X( I+1, K )
573* so the I2 - I1 is the row count of the block X( I, K )
574 i1 = (i-1)*nb + 1
575 i2 = min( i*nb, n ) + 1
576*
577* Prepare the linear update to be executed with GEMM.
578* For each column, compute a consistent scaling, a
579* scaling factor to survive the linear update, and
580* rescale the column segments, if necessary. Then
581* the linear update is safely executed.
582*
583 DO kk = 1, k2 - k1
584 rhs = k1 + kk - 1
585* Compute consistent scaling
586 scamin = min( work( i+kk*lds), work( j+kk*lds ) )
587*
588* Compute scaling factor to survive the linear update
589* simulating consistent scaling.
590*
591 bnrm = zlange( 'I', i2-i1, 1, x( i1, rhs ), ldx, w )
592 bnrm = bnrm*( scamin / work( i+kk*lds ) )
593 xnrm( kk ) = xnrm( kk )*( scamin / work( j+kk*lds) )
594 anrm = work( awrk + i+(j-1)*nba )
595 scaloc = dlarmm( anrm, xnrm( kk ), bnrm )
596*
597* Simultaneously apply the robust update factor and the
598* consistency scaling factor to X( I, KK ) and X( J, KK ).
599*
600 scal = ( scamin / work( i+kk*lds) )*scaloc
601 IF( scal.NE.one ) THEN
602 CALL zdscal( i2-i1, scal, x( i1, rhs ), 1 )
603 work( i+kk*lds ) = scamin*scaloc
604 END IF
605*
606 scal = ( scamin / work( j+kk*lds ) )*scaloc
607 IF( scal.NE.one ) THEN
608 CALL zdscal( j2-j1, scal, x( j1, rhs ), 1 )
609 work( j+kk*lds ) = scamin*scaloc
610 END IF
611 END DO
612*
613 IF( notran ) THEN
614*
615* B( I, K ) := B( I, K ) - A( I, J ) * X( J, K )
616*
617 CALL zgemm( 'N', 'N', i2-i1, k2-k1, j2-j1, -cone,
618 $ a( i1, j1 ), lda, x( j1, k1 ), ldx,
619 $ cone, x( i1, k1 ), ldx )
620 ELSE IF( lsame( trans, 'T' ) ) THEN
621*
622* B( I, K ) := B( I, K ) - A( I, J )**T * X( J, K )
623*
624 CALL zgemm( 'T', 'N', i2-i1, k2-k1, j2-j1, -cone,
625 $ a( j1, i1 ), lda, x( j1, k1 ), ldx,
626 $ cone, x( i1, k1 ), ldx )
627 ELSE
628*
629* B( I, K ) := B( I, K ) - A( I, J )**H * X( J, K )
630*
631 CALL zgemm( 'C', 'N', i2-i1, k2-k1, j2-j1, -cone,
632 $ a( j1, i1 ), lda, x( j1, k1 ), ldx,
633 $ cone, x( i1, k1 ), ldx )
634 END IF
635 END DO
636 END DO
637
638*
639* Reduce local scaling factors
640*
641 DO kk = 1, k2 - k1
642 rhs = k1 + kk - 1
643 DO i = 1, nba
644 scale( rhs ) = min( scale( rhs ), work( i+kk*lds ) )
645 END DO
646 END DO
647*
648* Realize consistent scaling
649*
650 DO kk = 1, k2 - k1
651 rhs = k1 + kk - 1
652 IF( scale( rhs ).NE.one .AND. scale( rhs ).NE. zero ) THEN
653 DO i = 1, nba
654 i1 = (i - 1) * nb + 1
655 i2 = min( i * nb, n ) + 1
656 scal = scale( rhs ) / work( i+kk*lds )
657 IF( scal.NE.one )
658 $ CALL zdscal( i2-i1, scal, x( i1, rhs ), 1 )
659 END DO
660 END IF
661 END DO
662 END DO
663 RETURN
664*
665* End of ZLATRS3
666*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine zgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
ZGEMM
Definition zgemm.f:188
integer function ilaenv(ispec, name, opts, n1, n2, n3, n4)
ILAENV
Definition ilaenv.f:162
double precision function dlamch(cmach)
DLAMCH
Definition dlamch.f:69
double precision function zlange(norm, m, n, a, lda, work)
ZLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition zlange.f:115
double precision function dlarmm(anorm, bnorm, cnorm)
DLARMM
Definition dlarmm.f:61
subroutine zlatrs(uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info)
ZLATRS solves a triangular system of equations with the scale factor set to prevent overflow.
Definition zlatrs.f:239
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
subroutine zdscal(n, da, zx, incx)
ZDSCAL
Definition zdscal.f:78
Here is the call graph for this function:
Here is the caller graph for this function: