LAPACK 3.12.1
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ slatrs3()

subroutine slatrs3 ( character uplo,
character trans,
character diag,
character normin,
integer n,
integer nrhs,
real, dimension( lda, * ) a,
integer lda,
real, dimension( ldx, * ) x,
integer ldx,
real, dimension( * ) scale,
real, dimension( * ) cnorm,
real, dimension( * ) work,
integer lwork,
integer info )

SLATRS3 solves a triangular system of equations with the scale factors set to prevent overflow.

Purpose:
!>
!> SLATRS3 solves one of the triangular systems
!>
!>    A * X = B * diag(scale)  or  A**T * X = B * diag(scale)
!>
!> with scaling to prevent overflow.  Here A is an upper or lower
!> triangular matrix, A**T denotes the transpose of A. X and B are
!> n by nrhs matrices and scale is an nrhs element vector of scaling
!> factors. A scaling factor scale(j) is usually less than or equal
!> to 1, chosen such that X(:,j) is less than the overflow threshold.
!> If the matrix A is singular (A(j,j) = 0 for some j), then
!> a non-trivial solution to A*X = 0 is returned. If the system is
!> so badly scaled that the solution cannot be represented as
!> (1/scale(k))*X(:,k), then x(:,k) = 0 and scale(k) is returned.
!>
!> This is a BLAS-3 version of LATRS for solving several right
!> hand sides simultaneously.
!>
!> 
Parameters
[in]UPLO
!>          UPLO is CHARACTER*1
!>          Specifies whether the matrix A is upper or lower triangular.
!>          = 'U':  Upper triangular
!>          = 'L':  Lower triangular
!> 
[in]TRANS
!>          TRANS is CHARACTER*1
!>          Specifies the operation applied to A.
!>          = 'N':  Solve A * x = s*b  (No transpose)
!>          = 'T':  Solve A**T* x = s*b  (Transpose)
!>          = 'C':  Solve A**T* x = s*b  (Conjugate transpose = Transpose)
!> 
[in]DIAG
!>          DIAG is CHARACTER*1
!>          Specifies whether or not the matrix A is unit triangular.
!>          = 'N':  Non-unit triangular
!>          = 'U':  Unit triangular
!> 
[in]NORMIN
!>          NORMIN is CHARACTER*1
!>          Specifies whether CNORM has been set or not.
!>          = 'Y':  CNORM contains the column norms on entry
!>          = 'N':  CNORM is not set on entry.  On exit, the norms will
!>                  be computed and stored in CNORM.
!> 
[in]N
!>          N is INTEGER
!>          The order of the matrix A.  N >= 0.
!> 
[in]NRHS
!>          NRHS is INTEGER
!>          The number of columns of X.  NRHS >= 0.
!> 
[in]A
!>          A is REAL array, dimension (LDA,N)
!>          The triangular matrix A.  If UPLO = 'U', the leading n by n
!>          upper triangular part of the array A contains the upper
!>          triangular matrix, and the strictly lower triangular part of
!>          A is not referenced.  If UPLO = 'L', the leading n by n lower
!>          triangular part of the array A contains the lower triangular
!>          matrix, and the strictly upper triangular part of A is not
!>          referenced.  If DIAG = 'U', the diagonal elements of A are
!>          also not referenced and are assumed to be 1.
!> 
[in]LDA
!>          LDA is INTEGER
!>          The leading dimension of the array A.  LDA >= max (1,N).
!> 
[in,out]X
!>          X is REAL array, dimension (LDX,NRHS)
!>          On entry, the right hand side B of the triangular system.
!>          On exit, X is overwritten by the solution matrix X.
!> 
[in]LDX
!>          LDX is INTEGER
!>          The leading dimension of the array X.  LDX >= max (1,N).
!> 
[out]SCALE
!>          SCALE is REAL array, dimension (NRHS)
!>          The scaling factor s(k) is for the triangular system
!>          A * x(:,k) = s(k)*b(:,k)  or  A**T* x(:,k) = s(k)*b(:,k).
!>          If SCALE = 0, the matrix A is singular or badly scaled.
!>          If A(j,j) = 0 is encountered, a non-trivial vector x(:,k)
!>          that is an exact or approximate solution to A*x(:,k) = 0
!>          is returned. If the system so badly scaled that solution
!>          cannot be presented as x(:,k) * 1/s(k), then x(:,k) = 0
!>          is returned.
!> 
[in,out]CNORM
!>          CNORM is REAL array, dimension (N)
!>
!>          If NORMIN = 'Y', CNORM is an input argument and CNORM(j)
!>          contains the norm of the off-diagonal part of the j-th column
!>          of A.  If TRANS = 'N', CNORM(j) must be greater than or equal
!>          to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j)
!>          must be greater than or equal to the 1-norm.
!>
!>          If NORMIN = 'N', CNORM is an output argument and CNORM(j)
!>          returns the 1-norm of the offdiagonal part of the j-th column
!>          of A.
!> 
[out]WORK
!>          WORK is REAL array, dimension (MAX(1,LWORK))
!>          On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
!> 
[in]LWORK
!>          LWORK is INTEGER
!>          The dimension of the array WORK.
!>
!>          If MIN(N,NRHS) = 0, LWORK >= 1, else
!>          LWORK >= MAX(1, 2*NBA * MAX(NBA, MIN(NRHS, 32)), where
!>          NBA = (N + NB - 1)/NB and NB is the optimal block size.
!>
!>          If LWORK = -1, then a workspace query is assumed; the routine
!>          only calculates the optimal dimensions of the WORK array, returns
!>          this value as the first entry of the WORK array, and no error
!>          message related to LWORK is issued by XERBLA.
!> 
[out]INFO
!>          INFO is INTEGER
!>          = 0:  successful exit
!>          < 0:  if INFO = -k, the k-th argument had an illegal value
!> 
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:

Definition at line 231 of file slatrs3.f.

233 IMPLICIT NONE
234*
235* .. Scalar Arguments ..
236 CHARACTER DIAG, TRANS, NORMIN, UPLO
237 INTEGER INFO, LDA, LWORK, LDX, N, NRHS
238* ..
239* .. Array Arguments ..
240 REAL A( LDA, * ), CNORM( * ), X( LDX, * ),
241 $ SCALE( * ), WORK( * )
242* ..
243*
244* =====================================================================
245*
246* .. Parameters ..
247 REAL ZERO, ONE
248 parameter( zero = 0.0e+0, one = 1.0e+0 )
249 INTEGER NBMAX, NBMIN, NBRHS, NRHSMIN
250 parameter( nrhsmin = 2, nbrhs = 32 )
251 parameter( nbmin = 8, nbmax = 64 )
252* ..
253* .. Local Arrays ..
254 REAL W( NBMAX ), XNRM( NBRHS )
255* ..
256* .. Local Scalars ..
257 LOGICAL LQUERY, NOTRAN, NOUNIT, UPPER
258 INTEGER AWRK, I, IFIRST, IINC, ILAST, II, I1, I2, J,
259 $ JFIRST, JINC, JLAST, J1, J2, K, KK, K1, K2,
260 $ LANRM, LDS, LSCALE, NB, NBA, NBX, RHS, LWMIN
261 REAL ANRM, BIGNUM, BNRM, RSCAL, SCAL, SCALOC,
262 $ SCAMIN, SMLNUM, TMAX
263* ..
264* .. External Functions ..
265 LOGICAL LSAME
266 INTEGER ILAENV
267 REAL SLAMCH, SLANGE, SLARMM
268 EXTERNAL ilaenv, lsame, slamch, slange,
269 $ slarmm
270* ..
271* .. External Subroutines ..
272 REAL SROUNDUP_LWORK
274* ..
275* .. Intrinsic Functions ..
276 INTRINSIC abs, max, min
277* ..
278* .. Executable Statements ..
279*
280 info = 0
281 upper = lsame( uplo, 'U' )
282 notran = lsame( trans, 'N' )
283 nounit = lsame( diag, 'N' )
284 lquery = ( lwork.EQ.-1 )
285*
286* Partition A and X into blocks.
287*
288 nb = max( 8, ilaenv( 1, 'SLATRS', '', n, n, -1, -1 ) )
289 nb = min( nbmax, nb )
290 nba = max( 1, (n + nb - 1) / nb )
291 nbx = max( 1, (nrhs + nbrhs - 1) / nbrhs )
292*
293* Compute the workspace
294*
295* The workspace comprises two parts.
296* The first part stores the local scale factors. Each simultaneously
297* computed right-hand side requires one local scale factor per block
298* row. WORK( I + KK * LDS ) is the scale factor of the vector
299* segment associated with the I-th block row and the KK-th vector
300* in the block column.
301*
302 lscale = nba * max( nba, min( nrhs, nbrhs ) )
303 lds = nba
304*
305* The second part stores upper bounds of the triangular A. There are
306* a total of NBA x NBA blocks, of which only the upper triangular
307* part or the lower triangular part is referenced. The upper bound of
308* the block A( I, J ) is stored as WORK( AWRK + I + J * NBA ).
309*
310 lanrm = nba * nba
311 awrk = lscale
312*
313 IF( min( n, nrhs ).EQ.0 ) THEN
314 lwmin = 1
315 ELSE
316 lwmin = lscale + lanrm
317 END IF
318 work( 1 ) = sroundup_lwork( lwmin )
319*
320* Test the input parameters.
321*
322 IF( .NOT.upper .AND. .NOT.lsame( uplo, 'L' ) ) THEN
323 info = -1
324 ELSE IF( .NOT.notran .AND. .NOT.lsame( trans, 'T' ) .AND. .NOT.
325 $ lsame( trans, 'C' ) ) THEN
326 info = -2
327 ELSE IF( .NOT.nounit .AND. .NOT.lsame( diag, 'U' ) ) THEN
328 info = -3
329 ELSE IF( .NOT.lsame( normin, 'Y' ) .AND. .NOT.
330 $ lsame( normin, 'N' ) ) THEN
331 info = -4
332 ELSE IF( n.LT.0 ) THEN
333 info = -5
334 ELSE IF( nrhs.LT.0 ) THEN
335 info = -6
336 ELSE IF( lda.LT.max( 1, n ) ) THEN
337 info = -8
338 ELSE IF( ldx.LT.max( 1, n ) ) THEN
339 info = -10
340 ELSE IF( .NOT.lquery .AND. lwork.LT.lwmin ) THEN
341 info = -14
342 END IF
343 IF( info.NE.0 ) THEN
344 CALL xerbla( 'SLATRS3', -info )
345 RETURN
346 ELSE IF( lquery ) THEN
347 RETURN
348 END IF
349*
350* Initialize scaling factors
351*
352 DO kk = 1, nrhs
353 scale( kk ) = one
354 END DO
355*
356* Quick return if possible
357*
358 IF( min( n, nrhs ).EQ.0 )
359 $ RETURN
360*
361* Determine machine dependent constant to control overflow.
362*
363 bignum = slamch( 'Overflow' )
364 smlnum = slamch( 'Safe Minimum' )
365*
366* Use unblocked code for small problems
367*
368 IF( nrhs.LT.nrhsmin ) THEN
369 CALL slatrs( uplo, trans, diag, normin, n, a, lda, x( 1, 1),
370 $ scale( 1 ), cnorm, info )
371 DO k = 2, nrhs
372 CALL slatrs( uplo, trans, diag, 'Y', n, a, lda, x( 1,
373 $ k ),
374 $ scale( k ), cnorm, info )
375 END DO
376 RETURN
377 END IF
378*
379* Compute norms of blocks of A excluding diagonal blocks and find
380* the block with the largest norm TMAX.
381*
382 tmax = zero
383 DO j = 1, nba
384 j1 = (j-1)*nb + 1
385 j2 = min( j*nb, n ) + 1
386 IF ( upper ) THEN
387 ifirst = 1
388 ilast = j - 1
389 ELSE
390 ifirst = j + 1
391 ilast = nba
392 END IF
393 DO i = ifirst, ilast
394 i1 = (i-1)*nb + 1
395 i2 = min( i*nb, n ) + 1
396*
397* Compute upper bound of A( I1:I2-1, J1:J2-1 ).
398*
399 IF( notran ) THEN
400 anrm = slange( 'I', i2-i1, j2-j1, a( i1, j1 ), lda,
401 $ w )
402 work( awrk + i+(j-1)*nba ) = anrm
403 ELSE
404 anrm = slange( '1', i2-i1, j2-j1, a( i1, j1 ), lda,
405 $ w )
406 work( awrk + j+(i-1)*nba ) = anrm
407 END IF
408 tmax = max( tmax, anrm )
409 END DO
410 END DO
411*
412 IF( .NOT. tmax.LE.slamch('Overflow') ) THEN
413*
414* Some matrix entries have huge absolute value. At least one upper
415* bound norm( A(I1:I2-1, J1:J2-1), 'I') is not a valid floating-point
416* number, either due to overflow in LANGE or due to Inf in A.
417* Fall back to LATRS. Set normin = 'N' for every right-hand side to
418* force computation of TSCAL in LATRS to avoid the likely overflow
419* in the computation of the column norms CNORM.
420*
421 DO k = 1, nrhs
422 CALL slatrs( uplo, trans, diag, 'N', n, a, lda, x( 1,
423 $ k ),
424 $ scale( k ), cnorm, info )
425 END DO
426 RETURN
427 END IF
428*
429* Every right-hand side requires workspace to store NBA local scale
430* factors. To save workspace, X is computed successively in block columns
431* of width NBRHS, requiring a total of NBA x NBRHS space. If sufficient
432* workspace is available, larger values of NBRHS or NBRHS = NRHS are viable.
433 DO k = 1, nbx
434* Loop over block columns (index = K) of X and, for column-wise scalings,
435* over individual columns (index = KK).
436* K1: column index of the first column in X( J, K )
437* K2: column index of the first column in X( J, K+1 )
438* so the K2 - K1 is the column count of the block X( J, K )
439 k1 = (k-1)*nbrhs + 1
440 k2 = min( k*nbrhs, nrhs ) + 1
441*
442* Initialize local scaling factors of current block column X( J, K )
443*
444 DO kk = 1, k2 - k1
445 DO i = 1, nba
446 work( i+kk*lds ) = one
447 END DO
448 END DO
449*
450 IF( notran ) THEN
451*
452* Solve A * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
453*
454 IF( upper ) THEN
455 jfirst = nba
456 jlast = 1
457 jinc = -1
458 ELSE
459 jfirst = 1
460 jlast = nba
461 jinc = 1
462 END IF
463 ELSE
464*
465* Solve A**T * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
466*
467 IF( upper ) THEN
468 jfirst = 1
469 jlast = nba
470 jinc = 1
471 ELSE
472 jfirst = nba
473 jlast = 1
474 jinc = -1
475 END IF
476 END IF
477*
478 DO j = jfirst, jlast, jinc
479* J1: row index of the first row in A( J, J )
480* J2: row index of the first row in A( J+1, J+1 )
481* so that J2 - J1 is the row count of the block A( J, J )
482 j1 = (j-1)*nb + 1
483 j2 = min( j*nb, n ) + 1
484*
485* Solve op(A( J, J )) * X( J, RHS ) = SCALOC * B( J, RHS )
486* for all right-hand sides in the current block column,
487* one RHS at a time.
488*
489 DO kk = 1, k2-k1
490 rhs = k1 + kk - 1
491 IF( kk.EQ.1 ) THEN
492 CALL slatrs( uplo, trans, diag, 'N', j2-j1,
493 $ a( j1, j1 ), lda, x( j1, rhs ),
494 $ scaloc, cnorm, info )
495 ELSE
496 CALL slatrs( uplo, trans, diag, 'Y', j2-j1,
497 $ a( j1, j1 ), lda, x( j1, rhs ),
498 $ scaloc, cnorm, info )
499 END IF
500* Find largest absolute value entry in the vector segment
501* X( J1:J2-1, RHS ) as an upper bound for the worst case
502* growth in the linear updates.
503 xnrm( kk ) = slange( 'I', j2-j1, 1, x( j1, rhs ),
504 $ ldx, w )
505*
506 IF( scaloc .EQ. zero ) THEN
507* LATRS found that A is singular through A(j,j) = 0.
508* Reset the computation x(1:n) = 0, x(j) = 1, SCALE = 0
509* and compute A*x = 0 (or A**T*x = 0). Note that
510* X(J1:J2-1, KK) is set by LATRS.
511 scale( rhs ) = zero
512 DO ii = 1, j1-1
513 x( ii, kk ) = zero
514 END DO
515 DO ii = j2, n
516 x( ii, kk ) = zero
517 END DO
518* Discard the local scale factors.
519 DO ii = 1, nba
520 work( ii+kk*lds ) = one
521 END DO
522 scaloc = one
523 ELSE IF( scaloc*work( j+kk*lds ) .EQ. zero ) THEN
524* LATRS computed a valid scale factor, but combined with
525* the current scaling the solution does not have a
526* scale factor > 0.
527*
528* Set WORK( J+KK*LDS ) to smallest valid scale
529* factor and increase SCALOC accordingly.
530 scal = work( j+kk*lds ) / smlnum
531 scaloc = scaloc * scal
532 work( j+kk*lds ) = smlnum
533* If LATRS overestimated the growth, x may be
534* rescaled to preserve a valid combined scale
535* factor WORK( J, KK ) > 0.
536 rscal = one / scaloc
537 IF( xnrm( kk )*rscal .LE. bignum ) THEN
538 xnrm( kk ) = xnrm( kk ) * rscal
539 CALL sscal( j2-j1, rscal, x( j1, rhs ), 1 )
540 scaloc = one
541 ELSE
542* The system op(A) * x = b is badly scaled and its
543* solution cannot be represented as (1/scale) * x.
544* Set x to zero. This approach deviates from LATRS
545* where a completely meaningless non-zero vector
546* is returned that is not a solution to op(A) * x = b.
547 scale( rhs ) = zero
548 DO ii = 1, n
549 x( ii, kk ) = zero
550 END DO
551* Discard the local scale factors.
552 DO ii = 1, nba
553 work( ii+kk*lds ) = one
554 END DO
555 scaloc = one
556 END IF
557 END IF
558 scaloc = scaloc * work( j+kk*lds )
559 work( j+kk*lds ) = scaloc
560 END DO
561*
562* Linear block updates
563*
564 IF( notran ) THEN
565 IF( upper ) THEN
566 ifirst = j - 1
567 ilast = 1
568 iinc = -1
569 ELSE
570 ifirst = j + 1
571 ilast = nba
572 iinc = 1
573 END IF
574 ELSE
575 IF( upper ) THEN
576 ifirst = j + 1
577 ilast = nba
578 iinc = 1
579 ELSE
580 ifirst = j - 1
581 ilast = 1
582 iinc = -1
583 END IF
584 END IF
585*
586 DO i = ifirst, ilast, iinc
587* I1: row index of the first column in X( I, K )
588* I2: row index of the first column in X( I+1, K )
589* so the I2 - I1 is the row count of the block X( I, K )
590 i1 = (i-1)*nb + 1
591 i2 = min( i*nb, n ) + 1
592*
593* Prepare the linear update to be executed with GEMM.
594* For each column, compute a consistent scaling, a
595* scaling factor to survive the linear update, and
596* rescale the column segments, if necessary. Then
597* the linear update is safely executed.
598*
599 DO kk = 1, k2-k1
600 rhs = k1 + kk - 1
601* Compute consistent scaling
602 scamin = min( work( i+kk*lds), work( j+kk*lds ) )
603*
604* Compute scaling factor to survive the linear update
605* simulating consistent scaling.
606*
607 bnrm = slange( 'I', i2-i1, 1, x( i1, rhs ), ldx,
608 $ w )
609 bnrm = bnrm*( scamin / work( i+kk*lds ) )
610 xnrm( kk ) = xnrm( kk )*(scamin / work( j+kk*lds ))
611 anrm = work( awrk + i+(j-1)*nba )
612 scaloc = slarmm( anrm, xnrm( kk ), bnrm )
613*
614* Simultaneously apply the robust update factor and the
615* consistency scaling factor to B( I, KK ) and B( J, KK ).
616*
617 scal = ( scamin / work( i+kk*lds) )*scaloc
618 IF( scal.NE.one ) THEN
619 CALL sscal( i2-i1, scal, x( i1, rhs ), 1 )
620 work( i+kk*lds ) = scamin*scaloc
621 END IF
622*
623 scal = ( scamin / work( j+kk*lds ) )*scaloc
624 IF( scal.NE.one ) THEN
625 CALL sscal( j2-j1, scal, x( j1, rhs ), 1 )
626 work( j+kk*lds ) = scamin*scaloc
627 END IF
628 END DO
629*
630 IF( notran ) THEN
631*
632* B( I, K ) := B( I, K ) - A( I, J ) * X( J, K )
633*
634 CALL sgemm( 'N', 'N', i2-i1, k2-k1, j2-j1, -one,
635 $ a( i1, j1 ), lda, x( j1, k1 ), ldx,
636 $ one, x( i1, k1 ), ldx )
637 ELSE
638*
639* B( I, K ) := B( I, K ) - A( I, J )**T * X( J, K )
640*
641 CALL sgemm( 'T', 'N', i2-i1, k2-k1, j2-j1, -one,
642 $ a( j1, i1 ), lda, x( j1, k1 ), ldx,
643 $ one, x( i1, k1 ), ldx )
644 END IF
645 END DO
646 END DO
647*
648* Reduce local scaling factors
649*
650 DO kk = 1, k2-k1
651 rhs = k1 + kk - 1
652 DO i = 1, nba
653 scale( rhs ) = min( scale( rhs ), work( i+kk*lds ) )
654 END DO
655 END DO
656*
657* Realize consistent scaling
658*
659 DO kk = 1, k2-k1
660 rhs = k1 + kk - 1
661 IF( scale( rhs ).NE.one .AND. scale( rhs ).NE. zero ) THEN
662 DO i = 1, nba
663 i1 = (i-1)*nb + 1
664 i2 = min( i*nb, n ) + 1
665 scal = scale( rhs ) / work( i+kk*lds )
666 IF( scal.NE.one )
667 $ CALL sscal( i2-i1, scal, x( i1, rhs ), 1 )
668 END DO
669 END IF
670 END DO
671 END DO
672 RETURN
673*
674 work( 1 ) = sroundup_lwork( lwmin )
675*
676* End of SLATRS3
677*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
SGEMM
Definition sgemm.f:188
integer function ilaenv(ispec, name, opts, n1, n2, n3, n4)
ILAENV
Definition ilaenv.f:160
real function slamch(cmach)
SLAMCH
Definition slamch.f:68
real function slange(norm, m, n, a, lda, work)
SLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition slange.f:112
real function slarmm(anorm, bnorm, cnorm)
SLARMM
Definition slarmm.f:61
subroutine slatrs(uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info)
SLATRS solves a triangular system of equations with the scale factor set to prevent overflow.
Definition slatrs.f:237
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real function sroundup_lwork(lwork)
SROUNDUP_LWORK
subroutine sscal(n, sa, sx, incx)
SSCAL
Definition sscal.f:79
Here is the call graph for this function:
Here is the caller graph for this function: