LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ dlatrs3()

subroutine dlatrs3 ( character  uplo,
character  trans,
character  diag,
character  normin,
integer  n,
integer  nrhs,
double precision, dimension( lda, * )  a,
integer  lda,
double precision, dimension( ldx, * )  x,
integer  ldx,
double precision, dimension( * )  scale,
double precision, dimension( * )  cnorm,
double precision, dimension( * )  work,
integer  lwork,
integer  info 
)

DLATRS3 solves a triangular system of equations with the scale factors set to prevent overflow.

Purpose:
 DLATRS3 solves one of the triangular systems

    A * X = B * diag(scale)  or  A**T * X = B * diag(scale)

 with scaling to prevent overflow.  Here A is an upper or lower
 triangular matrix, A**T denotes the transpose of A. X and B are
 n by nrhs matrices and scale is an nrhs element vector of scaling
 factors. A scaling factor scale(j) is usually less than or equal
 to 1, chosen such that X(:,j) is less than the overflow threshold.
 If the matrix A is singular (A(j,j) = 0 for some j), then
 a non-trivial solution to A*X = 0 is returned. If the system is
 so badly scaled that the solution cannot be represented as
 (1/scale(k))*X(:,k), then x(:,k) = 0 and scale(k) is returned.

 This is a BLAS-3 version of LATRS for solving several right
 hand sides simultaneously.
Parameters
[in]UPLO
          UPLO is CHARACTER*1
          Specifies whether the matrix A is upper or lower triangular.
          = 'U':  Upper triangular
          = 'L':  Lower triangular
[in]TRANS
          TRANS is CHARACTER*1
          Specifies the operation applied to A.
          = 'N':  Solve A * x = s*b  (No transpose)
          = 'T':  Solve A**T* x = s*b  (Transpose)
          = 'C':  Solve A**T* x = s*b  (Conjugate transpose = Transpose)
[in]DIAG
          DIAG is CHARACTER*1
          Specifies whether or not the matrix A is unit triangular.
          = 'N':  Non-unit triangular
          = 'U':  Unit triangular
[in]NORMIN
          NORMIN is CHARACTER*1
          Specifies whether CNORM has been set or not.
          = 'Y':  CNORM contains the column norms on entry
          = 'N':  CNORM is not set on entry.  On exit, the norms will
                  be computed and stored in CNORM.
[in]N
          N is INTEGER
          The order of the matrix A.  N >= 0.
[in]NRHS
          NRHS is INTEGER
          The number of columns of X.  NRHS >= 0.
[in]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          The triangular matrix A.  If UPLO = 'U', the leading n by n
          upper triangular part of the array A contains the upper
          triangular matrix, and the strictly lower triangular part of
          A is not referenced.  If UPLO = 'L', the leading n by n lower
          triangular part of the array A contains the lower triangular
          matrix, and the strictly upper triangular part of A is not
          referenced.  If DIAG = 'U', the diagonal elements of A are
          also not referenced and are assumed to be 1.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max (1,N).
[in,out]X
          X is DOUBLE PRECISION array, dimension (LDX,NRHS)
          On entry, the right hand side B of the triangular system.
          On exit, X is overwritten by the solution matrix X.
[in]LDX
          LDX is INTEGER
          The leading dimension of the array X.  LDX >= max (1,N).
[out]SCALE
          SCALE is DOUBLE PRECISION array, dimension (NRHS)
          The scaling factor s(k) is for the triangular system
          A * x(:,k) = s(k)*b(:,k)  or  A**T* x(:,k) = s(k)*b(:,k).
          If SCALE = 0, the matrix A is singular or badly scaled.
          If A(j,j) = 0 is encountered, a non-trivial vector x(:,k)
          that is an exact or approximate solution to A*x(:,k) = 0
          is returned. If the system so badly scaled that solution
          cannot be presented as x(:,k) * 1/s(k), then x(:,k) = 0
          is returned.
[in,out]CNORM
          CNORM is DOUBLE PRECISION array, dimension (N)

          If NORMIN = 'Y', CNORM is an input argument and CNORM(j)
          contains the norm of the off-diagonal part of the j-th column
          of A.  If TRANS = 'N', CNORM(j) must be greater than or equal
          to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j)
          must be greater than or equal to the 1-norm.

          If NORMIN = 'N', CNORM is an output argument and CNORM(j)
          returns the 1-norm of the offdiagonal part of the j-th column
          of A.
[out]WORK
          WORK is DOUBLE PRECISION array, dimension (LWORK).
          On exit, if INFO = 0, WORK(1) returns the optimal size of
          WORK.
[in]LWORKLWORK is INTEGER LWORK >= MAX(1, 2*NBA * MAX(NBA, MIN(NRHS, 32)), where NBA = (N + NB - 1)/NB and NB is the optimal block size.

If LWORK = -1, then a workspace query is assumed; the routine only calculates the optimal dimensions of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA.

Parameters
[out]INFO
          INFO is INTEGER
          = 0:  successful exit
          < 0:  if INFO = -k, the k-th argument had an illegal value
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:

Definition at line 227 of file dlatrs3.f.

229 IMPLICIT NONE
230*
231* .. Scalar Arguments ..
232 CHARACTER DIAG, TRANS, NORMIN, UPLO
233 INTEGER INFO, LDA, LWORK, LDX, N, NRHS
234* ..
235* .. Array Arguments ..
236 DOUBLE PRECISION A( LDA, * ), CNORM( * ), X( LDX, * ),
237 $ SCALE( * ), WORK( * )
238* ..
239*
240* =====================================================================
241*
242* .. Parameters ..
243 DOUBLE PRECISION ZERO, ONE
244 parameter( zero = 0.0d+0, one = 1.0d+0 )
245 INTEGER NBMAX, NBMIN, NBRHS, NRHSMIN
246 parameter( nrhsmin = 2, nbrhs = 32 )
247 parameter( nbmin = 8, nbmax = 64 )
248* ..
249* .. Local Arrays ..
250 DOUBLE PRECISION W( NBMAX ), XNRM( NBRHS )
251* ..
252* .. Local Scalars ..
253 LOGICAL LQUERY, NOTRAN, NOUNIT, UPPER
254 INTEGER AWRK, I, IFIRST, IINC, ILAST, II, I1, I2, J,
255 $ JFIRST, JINC, JLAST, J1, J2, K, KK, K1, K2,
256 $ LANRM, LDS, LSCALE, NB, NBA, NBX, RHS
257 DOUBLE PRECISION ANRM, BIGNUM, BNRM, RSCAL, SCAL, SCALOC,
258 $ SCAMIN, SMLNUM, TMAX
259* ..
260* .. External Functions ..
261 LOGICAL LSAME
262 INTEGER ILAENV
263 DOUBLE PRECISION DLAMCH, DLANGE, DLARMM
264 EXTERNAL dlamch, dlange, dlarmm, ilaenv, lsame
265* ..
266* .. External Subroutines ..
267 EXTERNAL dlatrs, dscal, xerbla
268* ..
269* .. Intrinsic Functions ..
270 INTRINSIC abs, max, min
271* ..
272* .. Executable Statements ..
273*
274 info = 0
275 upper = lsame( uplo, 'U' )
276 notran = lsame( trans, 'N' )
277 nounit = lsame( diag, 'N' )
278 lquery = ( lwork.EQ.-1 )
279*
280* Partition A and X into blocks
281*
282 nb = max( 8, ilaenv( 1, 'DLATRS', '', n, n, -1, -1 ) )
283 nb = min( nbmax, nb )
284 nba = max( 1, (n + nb - 1) / nb )
285 nbx = max( 1, (nrhs + nbrhs - 1) / nbrhs )
286*
287* Compute the workspace
288*
289* The workspace comprises two parts.
290* The first part stores the local scale factors. Each simultaneously
291* computed right-hand side requires one local scale factor per block
292* row. WORK( I+KK*LDS ) is the scale factor of the vector
293* segment associated with the I-th block row and the KK-th vector
294* in the block column.
295 lscale = nba * max( nba, min( nrhs, nbrhs ) )
296 lds = nba
297* The second part stores upper bounds of the triangular A. There are
298* a total of NBA x NBA blocks, of which only the upper triangular
299* part or the lower triangular part is referenced. The upper bound of
300* the block A( I, J ) is stored as WORK( AWRK + I + J * NBA ).
301 lanrm = nba * nba
302 awrk = lscale
303 work( 1 ) = lscale + lanrm
304*
305* Test the input parameters
306*
307 IF( .NOT.upper .AND. .NOT.lsame( uplo, 'L' ) ) THEN
308 info = -1
309 ELSE IF( .NOT.notran .AND. .NOT.lsame( trans, 'T' ) .AND. .NOT.
310 $ lsame( trans, 'C' ) ) THEN
311 info = -2
312 ELSE IF( .NOT.nounit .AND. .NOT.lsame( diag, 'U' ) ) THEN
313 info = -3
314 ELSE IF( .NOT.lsame( normin, 'Y' ) .AND. .NOT.
315 $ lsame( normin, 'N' ) ) THEN
316 info = -4
317 ELSE IF( n.LT.0 ) THEN
318 info = -5
319 ELSE IF( nrhs.LT.0 ) THEN
320 info = -6
321 ELSE IF( lda.LT.max( 1, n ) ) THEN
322 info = -8
323 ELSE IF( ldx.LT.max( 1, n ) ) THEN
324 info = -10
325 ELSE IF( .NOT.lquery .AND. lwork.LT.work( 1 ) ) THEN
326 info = -14
327 END IF
328 IF( info.NE.0 ) THEN
329 CALL xerbla( 'DLATRS3', -info )
330 RETURN
331 ELSE IF( lquery ) THEN
332 RETURN
333 END IF
334*
335* Initialize scaling factors
336*
337 DO kk = 1, nrhs
338 scale( kk ) = one
339 END DO
340*
341* Quick return if possible
342*
343 IF( min( n, nrhs ).EQ.0 )
344 $ RETURN
345*
346* Determine machine dependent constant to control overflow.
347*
348 bignum = dlamch( 'Overflow' )
349 smlnum = dlamch( 'Safe Minimum' )
350*
351* Use unblocked code for small problems
352*
353 IF( nrhs.LT.nrhsmin ) THEN
354 CALL dlatrs( uplo, trans, diag, normin, n, a, lda, x( 1, 1),
355 $ scale( 1 ), cnorm, info )
356 DO k = 2, nrhs
357 CALL dlatrs( uplo, trans, diag, 'Y', n, a, lda, x( 1, k ),
358 $ scale( k ), cnorm, info )
359 END DO
360 RETURN
361 END IF
362*
363* Compute norms of blocks of A excluding diagonal blocks and find
364* the block with the largest norm TMAX.
365*
366 tmax = zero
367 DO j = 1, nba
368 j1 = (j-1)*nb + 1
369 j2 = min( j*nb, n ) + 1
370 IF ( upper ) THEN
371 ifirst = 1
372 ilast = j - 1
373 ELSE
374 ifirst = j + 1
375 ilast = nba
376 END IF
377 DO i = ifirst, ilast
378 i1 = (i-1)*nb + 1
379 i2 = min( i*nb, n ) + 1
380*
381* Compute upper bound of A( I1:I2-1, J1:J2-1 ).
382*
383 IF( notran ) THEN
384 anrm = dlange( 'I', i2-i1, j2-j1, a( i1, j1 ), lda, w )
385 work( awrk + i+(j-1)*nba ) = anrm
386 ELSE
387 anrm = dlange( '1', i2-i1, j2-j1, a( i1, j1 ), lda, w )
388 work( awrk + j+(i-1)*nba ) = anrm
389 END IF
390 tmax = max( tmax, anrm )
391 END DO
392 END DO
393*
394 IF( .NOT. tmax.LE.dlamch('Overflow') ) THEN
395*
396* Some matrix entries have huge absolute value. At least one upper
397* bound norm( A(I1:I2-1, J1:J2-1), 'I') is not a valid floating-point
398* number, either due to overflow in LANGE or due to Inf in A.
399* Fall back to LATRS. Set normin = 'N' for every right-hand side to
400* force computation of TSCAL in LATRS to avoid the likely overflow
401* in the computation of the column norms CNORM.
402*
403 DO k = 1, nrhs
404 CALL dlatrs( uplo, trans, diag, 'N', n, a, lda, x( 1, k ),
405 $ scale( k ), cnorm, info )
406 END DO
407 RETURN
408 END IF
409*
410* Every right-hand side requires workspace to store NBA local scale
411* factors. To save workspace, X is computed successively in block columns
412* of width NBRHS, requiring a total of NBA x NBRHS space. If sufficient
413* workspace is available, larger values of NBRHS or NBRHS = NRHS are viable.
414 DO k = 1, nbx
415* Loop over block columns (index = K) of X and, for column-wise scalings,
416* over individual columns (index = KK).
417* K1: column index of the first column in X( J, K )
418* K2: column index of the first column in X( J, K+1 )
419* so the K2 - K1 is the column count of the block X( J, K )
420 k1 = (k-1)*nbrhs + 1
421 k2 = min( k*nbrhs, nrhs ) + 1
422*
423* Initialize local scaling factors of current block column X( J, K )
424*
425 DO kk = 1, k2-k1
426 DO i = 1, nba
427 work( i+kk*lds ) = one
428 END DO
429 END DO
430*
431 IF( notran ) THEN
432*
433* Solve A * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
434*
435 IF( upper ) THEN
436 jfirst = nba
437 jlast = 1
438 jinc = -1
439 ELSE
440 jfirst = 1
441 jlast = nba
442 jinc = 1
443 END IF
444 ELSE
445*
446* Solve A**T * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
447*
448 IF( upper ) THEN
449 jfirst = 1
450 jlast = nba
451 jinc = 1
452 ELSE
453 jfirst = nba
454 jlast = 1
455 jinc = -1
456 END IF
457 END IF
458*
459 DO j = jfirst, jlast, jinc
460* J1: row index of the first row in A( J, J )
461* J2: row index of the first row in A( J+1, J+1 )
462* so that J2 - J1 is the row count of the block A( J, J )
463 j1 = (j-1)*nb + 1
464 j2 = min( j*nb, n ) + 1
465*
466* Solve op(A( J, J )) * X( J, RHS ) = SCALOC * B( J, RHS )
467* for all right-hand sides in the current block column,
468* one RHS at a time.
469*
470 DO kk = 1, k2-k1
471 rhs = k1 + kk - 1
472 IF( kk.EQ.1 ) THEN
473 CALL dlatrs( uplo, trans, diag, 'N', j2-j1,
474 $ a( j1, j1 ), lda, x( j1, rhs ),
475 $ scaloc, cnorm, info )
476 ELSE
477 CALL dlatrs( uplo, trans, diag, 'Y', j2-j1,
478 $ a( j1, j1 ), lda, x( j1, rhs ),
479 $ scaloc, cnorm, info )
480 END IF
481* Find largest absolute value entry in the vector segment
482* X( J1:J2-1, RHS ) as an upper bound for the worst case
483* growth in the linear updates.
484 xnrm( kk ) = dlange( 'I', j2-j1, 1, x( j1, rhs ),
485 $ ldx, w )
486*
487 IF( scaloc .EQ. zero ) THEN
488* LATRS found that A is singular through A(j,j) = 0.
489* Reset the computation x(1:n) = 0, x(j) = 1, SCALE = 0
490* and compute A*x = 0 (or A**T*x = 0). Note that
491* X(J1:J2-1, KK) is set by LATRS.
492 scale( rhs ) = zero
493 DO ii = 1, j1-1
494 x( ii, kk ) = zero
495 END DO
496 DO ii = j2, n
497 x( ii, kk ) = zero
498 END DO
499* Discard the local scale factors.
500 DO ii = 1, nba
501 work( ii+kk*lds ) = one
502 END DO
503 scaloc = one
504 ELSE IF( scaloc * work( j+kk*lds ) .EQ. zero ) THEN
505* LATRS computed a valid scale factor, but combined with
506* the current scaling the solution does not have a
507* scale factor > 0.
508*
509* Set WORK( J+KK*LDS ) to smallest valid scale
510* factor and increase SCALOC accordingly.
511 scal = work( j+kk*lds ) / smlnum
512 scaloc = scaloc * scal
513 work( j+kk*lds ) = smlnum
514* If LATRS overestimated the growth, x may be
515* rescaled to preserve a valid combined scale
516* factor WORK( J, KK ) > 0.
517 rscal = one / scaloc
518 IF( xnrm( kk ) * rscal .LE. bignum ) THEN
519 xnrm( kk ) = xnrm( kk ) * rscal
520 CALL dscal( j2-j1, rscal, x( j1, rhs ), 1 )
521 scaloc = one
522 ELSE
523* The system op(A) * x = b is badly scaled and its
524* solution cannot be represented as (1/scale) * x.
525* Set x to zero. This approach deviates from LATRS
526* where a completely meaningless non-zero vector
527* is returned that is not a solution to op(A) * x = b.
528 scale( rhs ) = zero
529 DO ii = 1, n
530 x( ii, kk ) = zero
531 END DO
532* Discard the local scale factors.
533 DO ii = 1, nba
534 work( ii+kk*lds ) = one
535 END DO
536 scaloc = one
537 END IF
538 END IF
539 scaloc = scaloc * work( j+kk*lds )
540 work( j+kk*lds ) = scaloc
541 END DO
542*
543* Linear block updates
544*
545 IF( notran ) THEN
546 IF( upper ) THEN
547 ifirst = j - 1
548 ilast = 1
549 iinc = -1
550 ELSE
551 ifirst = j + 1
552 ilast = nba
553 iinc = 1
554 END IF
555 ELSE
556 IF( upper ) THEN
557 ifirst = j + 1
558 ilast = nba
559 iinc = 1
560 ELSE
561 ifirst = j - 1
562 ilast = 1
563 iinc = -1
564 END IF
565 END IF
566*
567 DO i = ifirst, ilast, iinc
568* I1: row index of the first column in X( I, K )
569* I2: row index of the first column in X( I+1, K )
570* so the I2 - I1 is the row count of the block X( I, K )
571 i1 = (i-1)*nb + 1
572 i2 = min( i*nb, n ) + 1
573*
574* Prepare the linear update to be executed with GEMM.
575* For each column, compute a consistent scaling, a
576* scaling factor to survive the linear update, and
577* rescale the column segments, if necessary. Then
578* the linear update is safely executed.
579*
580 DO kk = 1, k2-k1
581 rhs = k1 + kk - 1
582* Compute consistent scaling
583 scamin = min( work( i + kk*lds), work( j + kk*lds ) )
584*
585* Compute scaling factor to survive the linear update
586* simulating consistent scaling.
587*
588 bnrm = dlange( 'I', i2-i1, 1, x( i1, rhs ), ldx, w )
589 bnrm = bnrm*( scamin / work( i+kk*lds ) )
590 xnrm( kk ) = xnrm( kk )*(scamin / work( j+kk*lds ))
591 anrm = work( awrk + i+(j-1)*nba )
592 scaloc = dlarmm( anrm, xnrm( kk ), bnrm )
593*
594* Simultaneously apply the robust update factor and the
595* consistency scaling factor to B( I, KK ) and B( J, KK ).
596*
597 scal = ( scamin / work( i+kk*lds) )*scaloc
598 IF( scal.NE.one ) THEN
599 CALL dscal( i2-i1, scal, x( i1, rhs ), 1 )
600 work( i+kk*lds ) = scamin*scaloc
601 END IF
602*
603 scal = ( scamin / work( j+kk*lds ) )*scaloc
604 IF( scal.NE.one ) THEN
605 CALL dscal( j2-j1, scal, x( j1, rhs ), 1 )
606 work( j+kk*lds ) = scamin*scaloc
607 END IF
608 END DO
609*
610 IF( notran ) THEN
611*
612* B( I, K ) := B( I, K ) - A( I, J ) * X( J, K )
613*
614 CALL dgemm( 'N', 'N', i2-i1, k2-k1, j2-j1, -one,
615 $ a( i1, j1 ), lda, x( j1, k1 ), ldx,
616 $ one, x( i1, k1 ), ldx )
617 ELSE
618*
619* B( I, K ) := B( I, K ) - A( J, I )**T * X( J, K )
620*
621 CALL dgemm( 'T', 'N', i2-i1, k2-k1, j2-j1, -one,
622 $ a( j1, i1 ), lda, x( j1, k1 ), ldx,
623 $ one, x( i1, k1 ), ldx )
624 END IF
625 END DO
626 END DO
627*
628* Reduce local scaling factors
629*
630 DO kk = 1, k2-k1
631 rhs = k1 + kk - 1
632 DO i = 1, nba
633 scale( rhs ) = min( scale( rhs ), work( i+kk*lds ) )
634 END DO
635 END DO
636*
637* Realize consistent scaling
638*
639 DO kk = 1, k2-k1
640 rhs = k1 + kk - 1
641 IF( scale( rhs ).NE.one .AND. scale( rhs ).NE. zero ) THEN
642 DO i = 1, nba
643 i1 = (i-1)*nb + 1
644 i2 = min( i*nb, n ) + 1
645 scal = scale( rhs ) / work( i+kk*lds )
646 IF( scal.NE.one )
647 $ CALL dscal( i2-i1, scal, x( i1, rhs ), 1 )
648 END DO
649 END IF
650 END DO
651 END DO
652 RETURN
653*
654* End of DLATRS3
655*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine dgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
DGEMM
Definition dgemm.f:188
integer function ilaenv(ispec, name, opts, n1, n2, n3, n4)
ILAENV
Definition ilaenv.f:162
double precision function dlamch(cmach)
DLAMCH
Definition dlamch.f:69
double precision function dlange(norm, m, n, a, lda, work)
DLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition dlange.f:114
double precision function dlarmm(anorm, bnorm, cnorm)
DLARMM
Definition dlarmm.f:61
subroutine dlatrs(uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info)
DLATRS solves a triangular system of equations with the scale factor set to prevent overflow.
Definition dlatrs.f:238
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
Here is the call graph for this function:
Here is the caller graph for this function: