◆ chetrd_he2hb()

subroutine chetrd_he2hb	(	character	uplo,
		integer	n,
		integer	kd,
		complex, dimension( lda, * )	a,
		integer	lda,
		complex, dimension( ldab, * )	ab,
		integer	ldab,
		complex, dimension( * )	tau,
		complex, dimension( * )	work,
		integer	lwork,
		integer	info )

CHETRD_HE2HB

Download CHETRD_HE2HB + dependencies [TGZ] [ZIP] [TXT]

Purpose:

!>
!> CHETRD_HE2HB reduces a complex Hermitian matrix A to complex Hermitian
!> band-diagonal form AB by a unitary similarity transformation:
!> Q**H * A * Q = AB.
!>

Parameters

[in]	UPLO	!> UPLO is CHARACTER*1 !> = 'U': Upper triangle of A is stored; !> = 'L': Lower triangle of A is stored. !>
[in]	N	!> N is INTEGER !> The order of the matrix A. N >= 0. !>
[in]	KD	!> KD is INTEGER !> The number of superdiagonals of the reduced matrix if UPLO = 'U', !> or the number of subdiagonals if UPLO = 'L'. KD >= 0. !> The reduced matrix is stored in the array AB. !>
[in,out]	A	!> A is COMPLEX array, dimension (LDA,N) !> On entry, the Hermitian matrix A. If UPLO = 'U', the leading !> N-by-N upper triangular part of A contains the upper !> triangular part of the matrix A, and the strictly lower !> triangular part of A is not referenced. If UPLO = 'L', the !> leading N-by-N lower triangular part of A contains the lower !> triangular part of the matrix A, and the strictly upper !> triangular part of A is not referenced. !> On exit, if UPLO = 'U', the diagonal and first superdiagonal !> of A are overwritten by the corresponding elements of the !> tridiagonal matrix T, and the elements above the first !> superdiagonal, with the array TAU, represent the unitary !> matrix Q as a product of elementary reflectors; if UPLO !> = 'L', the diagonal and first subdiagonal of A are over- !> written by the corresponding elements of the tridiagonal !> matrix T, and the elements below the first subdiagonal, with !> the array TAU, represent the unitary matrix Q as a product !> of elementary reflectors. See Further Details. !>
[in]	LDA	!> LDA is INTEGER !> The leading dimension of the array A. LDA >= max(1,N). !>
[out]	AB	!> AB is COMPLEX array, dimension (LDAB,N) !> On exit, the upper or lower triangle of the Hermitian band !> matrix A, stored in the first KD+1 rows of the array. The !> j-th column of A is stored in the j-th column of the array AB !> as follows: !> if UPLO = 'U', AB(kd+1+i-j,j) = A(i,j) for max(1,j-kd)<=i<=j; !> if UPLO = 'L', AB(1+i-j,j) = A(i,j) for j<=i<=min(n,j+kd). !>
[in]	LDAB	!> LDAB is INTEGER !> The leading dimension of the array AB. LDAB >= KD+1. !>
[out]	TAU	!> TAU is COMPLEX array, dimension (N-KD) !> The scalar factors of the elementary reflectors (see Further !> Details). !>
[out]	WORK	!> WORK is COMPLEX array, dimension (MAX(1,LWORK)) !> On exit, if INFO = 0, or if LWORK = -1, !> WORK(1) returns the size of LWORK. !>
[in]	LWORK	!> LWORK is INTEGER !> The dimension of the array WORK which should be calculated !> by a workspace query. !> If N <= KD+1, LWORK >= 1, else LWORK = MAX(1, LWORK_QUERY). !> !> If LWORK = -1, then a workspace query is assumed; the routine !> only calculates the optimal size of the WORK array, returns !> this value as the first entry of the WORK array, and no error !> message related to LWORK is issued by XERBLA. !> LWORK_QUERY = NKD + Nmax(KD,FACTOPTNB) + 2KDKD !> where FACTOPTNB is the blocking used by the QR or LQ !> algorithm, usually FACTOPTNB=128 is a good choice otherwise !> putting LWORK=-1 will provide the size of WORK. !>
[out]	INFO	!> INFO is INTEGER !> = 0: successful exit !> < 0: if INFO = -i, the i-th argument had an illegal value !>

Author: Univ. of Tennessee; Univ. of California Berkeley; Univ. of Colorado Denver; NAG Ltd.

Further Details:

!>
!>  Implemented by Azzam Haidar.
!>
!>  All details are available on technical report, SC11, SC13 papers.
!>
!>  Azzam Haidar, Hatem Ltaief, and Jack Dongarra.
!>  Parallel reduction to condensed forms for symmetric eigenvalue problems
!>  using aggregated fine-grained and memory-aware kernels. In Proceedings
!>  of 2011 International Conference for High Performance Computing,
!>  Networking, Storage and Analysis (SC '11), New York, NY, USA,
!>  Article 8 , 11 pages.
!>  http://doi.acm.org/10.1145/2063384.2063394
!>
!>  A. Haidar, J. Kurzak, P. Luszczek, 2013.
!>  An improved parallel singular value algorithm and its implementation 
!>  for multicore hardware, In Proceedings of 2013 International Conference
!>  for High Performance Computing, Networking, Storage and Analysis (SC '13).
!>  Denver, Colorado, USA, 2013.
!>  Article 90, 12 pages.
!>  http://doi.acm.org/10.1145/2503210.2503292
!>
!>  A. Haidar, R. Solca, S. Tomov, T. Schulthess and J. Dongarra.
!>  A novel hybrid CPU-GPU generalized eigensolver for electronic structure 
!>  calculations based on fine-grained memory aware tasks.
!>  International Journal of High Performance Computing Applications.
!>  Volume 28 Issue 2, Pages 196-209, May 2014.
!>  http://hpc.sagepub.com/content/28/2/196 
!>
!>

!>
!>  If UPLO = 'U', the matrix Q is represented as a product of elementary
!>  reflectors
!>
!>     Q = H(k)**H . . . H(2)**H H(1)**H, where k = n-kd.
!>
!>  Each H(i) has the form
!>
!>     H(i) = I - tau * v * v**H
!>
!>  where tau is a complex scalar, and v is a complex vector with
!>  v(1:i+kd-1) = 0 and v(i+kd) = 1; conjg(v(i+kd+1:n)) is stored on exit in
!>  A(i,i+kd+1:n), and tau in TAU(i).
!>
!>  If UPLO = 'L', the matrix Q is represented as a product of elementary
!>  reflectors
!>
!>     Q = H(1) H(2) . . . H(k), where k = n-kd.
!>
!>  Each H(i) has the form
!>
!>     H(i) = I - tau * v * v**H
!>
!>  where tau is a complex scalar, and v is a complex vector with
!>  v(kd+1:i) = 0 and v(i+kd+1) = 1; v(i+kd+2:n) is stored on exit in
!>  A(i+kd+2:n,i), and tau in TAU(i).
!>
!>  The contents of A on exit are illustrated by the following examples
!>  with n = 5:
!>
!>  if UPLO = 'U':                       if UPLO = 'L':
!>
!>    (  ab  ab/v1  v1      v1     v1    )              (  ab                            )
!>    (      ab     ab/v2   v2     v2    )              (  ab/v1  ab                     )
!>    (             ab      ab/v3  v3    )              (  v1     ab/v2  ab              )
!>    (                     ab     ab/v4 )              (  v1     v2     ab/v3  ab       )
!>    (                            ab    )              (  v1     v2     v3     ab/v4 ab )
!>
!>  where d and e denote diagonal and off-diagonal elements of T, and vi
!>  denotes an element of the vector defining H(i).
!>

Definition at line 241 of file chetrd_he2hb.f.

*
      IMPLICIT NONE
*
*  -- LAPACK computational routine --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*
*     .. Scalar Arguments ..
      CHARACTER          UPLO
      INTEGER            INFO, LDA, LDAB, LWORK, N, KD
*     ..
*     .. Array Arguments ..
      COMPLEX            A( LDA, * ), AB( LDAB, * ), 
     $                   TAU( * ), WORK( * )
*     ..
*
*  =====================================================================
*
*     .. Parameters ..
      REAL               RONE
      COMPLEX            ZERO, ONE, HALF
      parameter( rone = 1.0e+0,
     $                   zero = ( 0.0e+0, 0.0e+0 ),
     $                   one = ( 1.0e+0, 0.0e+0 ),
     $                   half = ( 0.5e+0, 0.0e+0 ) )
*     ..
*     .. Local Scalars ..
      LOGICAL            LQUERY, UPPER
      INTEGER            I, J, IINFO, LWMIN, PN, PK, LK,
     $                   LDT, LDW, LDS2, LDS1, 
     $                   LS2, LS1, LW, LT,
     $                   TPOS, WPOS, S2POS, S1POS
*     ..
*     .. External Subroutines ..
      EXTERNAL           xerbla, cher2k, chemm, cgemm,
     $                   ccopy,
     $                   clarft, cgelqf, cgeqrf, claset
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          min, max
*     ..
*     .. External Functions ..
      LOGICAL            LSAME
      INTEGER            ILAENV2STAGE 
      REAL               SROUNDUP_LWORK
      EXTERNAL           lsame, ilaenv2stage, sroundup_lwork
*     ..
*     .. Executable Statements ..
*
*     Determine the minimal workspace size required 
*     and test the input parameters
*
      info   = 0
      upper  = lsame( uplo, 'U' )
      lquery = ( lwork.EQ.-1 )
      IF( n.LE.kd+1 ) THEN
         lwmin = 1
      ELSE
         lwmin = ilaenv2stage( 4, 'CHETRD_HE2HB', '', n, kd, -1, -1 )
      END IF
*
      IF( .NOT.upper .AND. .NOT.lsame( uplo, 'L' ) ) THEN
         info = -1
      ELSE IF( n.LT.0 ) THEN
         info = -2
      ELSE IF( kd.LT.0 ) THEN
         info = -3
      ELSE IF( lda.LT.max( 1, n ) ) THEN
         info = -5
      ELSE IF( ldab.LT.max( 1, kd+1 ) ) THEN
         info = -7
      ELSE IF( lwork.LT.lwmin .AND. .NOT.lquery ) THEN
         info = -10
      END IF
*
      IF( info.NE.0 ) THEN
         CALL xerbla( 'CHETRD_HE2HB', -info )
         RETURN
      ELSE IF( lquery ) THEN
         work( 1 ) = sroundup_lwork( lwmin )
         RETURN
      END IF
*
*     Quick return if possible        
*     Copy the upper/lower portion of A into AB 
*
      IF( n.LE.kd+1 ) THEN
          IF( upper ) THEN
              DO 100 i = 1, n
                  lk = min( kd+1, i )
                  CALL ccopy( lk, a( i-lk+1, i ), 1, 
     $                            ab( kd+1-lk+1, i ), 1 )
  100         CONTINUE
          ELSE
              DO 110 i = 1, n
                  lk = min( kd+1, n-i+1 )
                  CALL ccopy( lk, a( i, i ), 1, ab( 1, i ), 1 )
  110         CONTINUE
          ENDIF
          work( 1 ) = 1
          RETURN
      END IF
*
*     Determine the pointer position for the workspace
*      
      ldt    = kd
      lds1   = kd
      lt     = ldt*kd
      lw     = n*kd
      ls1    = lds1*kd
      ls2    = lwmin - lt - lw - ls1
*      LS2 = N*MAX(KD,FACTOPTNB) 
      tpos   = 1
      wpos   = tpos  + lt
      s1pos  = wpos  + lw
      s2pos  = s1pos + ls1 
      IF( upper ) THEN
          ldw    = kd
          lds2   = kd
      ELSE
          ldw    = n
          lds2   = n
      ENDIF
*
*
*     Set the workspace of the triangular matrix T to zero once such a
*     way every time T is generated the upper/lower portion will be always zero
*   
      CALL claset( "A", ldt, kd, zero, zero, work( tpos ), ldt )
*
      IF( upper ) THEN
          DO 10 i = 1, n - kd, kd
             pn = n-i-kd+1
             pk = min( n-i-kd+1, kd )
*        
*            Compute the LQ factorization of the current block
*        
             CALL cgelqf( kd, pn, a( i, i+kd ), lda,
     $                    tau( i ), work( s2pos ), ls2, iinfo )
*        
*            Copy the upper portion of A into AB
*        
             DO 20 j = i, i+pk-1
                lk = min( kd, n-j ) + 1
                CALL ccopy( lk, a( j, j ), lda, ab( kd+1, j ),
     $                      ldab-1 )
   20        CONTINUE
*                
             CALL claset( 'Lower', pk, pk, zero, one, 
     $                    a( i, i+kd ), lda )
*        
*            Form the matrix T
*        
             CALL clarft( 'Forward', 'Rowwise', pn, pk,
     $                    a( i, i+kd ), lda, tau( i ), 
     $                    work( tpos ), ldt )
*        
*            Compute W:
*             
             CALL cgemm( 'Conjugate', 'No transpose', pk, pn, pk,
     $                   one,  work( tpos ), ldt,
     $                         a( i, i+kd ), lda,
     $                   zero, work( s2pos ), lds2 )
*        
             CALL chemm( 'Right', uplo, pk, pn,
     $                   one,  a( i+kd, i+kd ), lda,
     $                         work( s2pos ), lds2,
     $                   zero, work( wpos ), ldw )
*        
             CALL cgemm( 'No transpose', 'Conjugate', pk, pk, pn,
     $                   one,  work( wpos ), ldw,
     $                         work( s2pos ), lds2,
     $                   zero, work( s1pos ), lds1 )
*        
             CALL cgemm( 'No transpose', 'No transpose', pk, pn, pk,
     $                   -half, work( s1pos ), lds1, 
     $                          a( i, i+kd ), lda,
     $                   one,   work( wpos ), ldw )
*             
*        
*            Update the unreduced submatrix A(i+kd:n,i+kd:n), using
*            an update of the form:  A := A - V'*W - W'*V
*        
             CALL cher2k( uplo, 'Conjugate', pn, pk,
     $                    -one, a( i, i+kd ), lda,
     $                          work( wpos ), ldw,
     $                    rone, a( i+kd, i+kd ), lda )
   10     CONTINUE
*
*        Copy the upper band to AB which is the band storage matrix
*
         DO 30 j = n-kd+1, n
            lk = min(kd, n-j) + 1
            CALL ccopy( lk, a( j, j ), lda, ab( kd+1, j ), ldab-1 )
   30    CONTINUE
*
      ELSE
*
*         Reduce the lower triangle of A to lower band matrix
*        
          DO 40 i = 1, n - kd, kd
             pn = n-i-kd+1
             pk = min( n-i-kd+1, kd )
*        
*            Compute the QR factorization of the current block
*        
             CALL cgeqrf( pn, kd, a( i+kd, i ), lda,
     $                    tau( i ), work( s2pos ), ls2, iinfo )
*        
*            Copy the upper portion of A into AB 
*        
             DO 50 j = i, i+pk-1
                lk = min( kd, n-j ) + 1
                CALL ccopy( lk, a( j, j ), 1, ab( 1, j ), 1 )
   50        CONTINUE
*                
             CALL claset( 'Upper', pk, pk, zero, one, 
     $                    a( i+kd, i ), lda )
*        
*            Form the matrix T
*        
             CALL clarft( 'Forward', 'Columnwise', pn, pk,
     $                    a( i+kd, i ), lda, tau( i ), 
     $                    work( tpos ), ldt )
*        
*            Compute W:
*             
             CALL cgemm( 'No transpose', 'No transpose', pn, pk, pk,
     $                   one, a( i+kd, i ), lda,
     $                         work( tpos ), ldt,
     $                   zero, work( s2pos ), lds2 )
*        
             CALL chemm( 'Left', uplo, pn, pk,
     $                   one, a( i+kd, i+kd ), lda,
     $                         work( s2pos ), lds2,
     $                   zero, work( wpos ), ldw )
*        
             CALL cgemm( 'Conjugate', 'No transpose', pk, pk, pn,
     $                   one, work( s2pos ), lds2,
     $                         work( wpos ), ldw,
     $                   zero, work( s1pos ), lds1 )
*        
             CALL cgemm( 'No transpose', 'No transpose', pn, pk, pk,
     $                   -half, a( i+kd, i ), lda,
     $                         work( s1pos ), lds1,
     $                   one, work( wpos ), ldw )
*             
*        
*            Update the unreduced submatrix A(i+kd:n,i+kd:n), using
*            an update of the form:  A := A - V*W' - W*V'
*        
             CALL cher2k( uplo, 'No transpose', pn, pk,
     $                    -one, a( i+kd, i ), lda,
     $                           work( wpos ), ldw,
     $                    rone, a( i+kd, i+kd ), lda )
*            ==================================================================
*            RESTORE A FOR COMPARISON AND CHECKING TO BE REMOVED
*             DO 45 J = I, I+PK-1
*                LK = MIN( KD, N-J ) + 1
*                CALL CCOPY( LK, AB( 1, J ), 1, A( J, J ), 1 )
*   45        CONTINUE
*            ==================================================================
   40     CONTINUE
*
*        Copy the lower band to AB which is the band storage matrix
*
         DO 60 j = n-kd+1, n
            lk = min(kd, n-j) + 1
            CALL ccopy( lk, a( j, j ), 1, ab( 1, j ), 1 )
   60    CONTINUE
 
      END IF
*
      work( 1 ) = sroundup_lwork( lwmin )
      RETURN
*
*     End of CHETRD_HE2HB
*

Here is the call graph for this function:

Here is the caller graph for this function: