=====================
== ScaLAPACK 1.8.0 ==
=====================

Release date: Th 04/05/2007.

This material is based upon work supported by the National Science Foundation
under Grant No. NSF-0444486.

  * ScaLAPACK 1.8.0: What's new
  * Thanks
  * Developer list
  * More details

=============================================
== ScaLAPACK 1.8.0: What's new since 1.7.0 ==
=============================================

   1) externalisation of the LAPACK routines: starting from 1.8.0, you NEED the
      LAPACK library installed on your machine in order to link/run a ScaLAPACK
      application

   2) add p[cz]gesvd, the complex version of the SVD driver

   3) add p[sdcz]lawrite and [psdcz]laread, tools for easy I/O

   4) new directory EXAMPLE that contains a ScaLAPACK example in the 4
      precisions

   5) bug fixes

=======================================
== Thanks for bug-report/patches to  ==
=======================================

   Ake Sandgren
   HPC2N, Umea University

   Robert Granat
   Umea University

   Greg Henry
   Intel

   Alan Edelman, Sudarshan Raghunathan 
   Interactive Super Computing

   Yasuhiro Nakahara 
   Canon inc.

   Mark Fahey 
   ORNL

   Desheng Wang
   Caltech

===========================
= Principal Investigators =
===========================

    Jim Demmel (University or California at Berkeley, USA)
    Jack Dongarra (University of Tennessee and ORNL, USA)

===================================================
== ScaLAPACK developers involved in this release ==
===================================================

    Peng Du (University of Tennessee, USA)
    Julie Langou (University of Tennessee, USA)
    Julien Langou (University of Colorado at Denver and Health Sciences Center, USA)
    Piotr Luszczek (University of Tennessee, USA)
    Osni Marques (Lawrence Berkeley National Laboratory, USA)

=================
== More details =
=================

   ----------------------------------------------------------------------------
   1) externalisation of the LAPACK library 
   ----------------------------------------------------------------------------

   Comments:
   =========

   Until 1.7.x, the LAPACK library was hard-coded in ScaLAPACK, it has been
   removed starting from 1.8.0. Consequently, the ScaLAPACK library needs to
   link with an existing LAPACK library in order to work properly. 
      
   Changes:
   ========

   Remove all the LAPACK routines from TOOLS/LAPACK

   ----------------------------------------------------------------------------
   2) add the complex version of the SVD driver
   ----------------------------------------------------------------------------

   Comments:
   =========

   Contributed codes by Peng Du (Graduate Research Assistant at UTK, Fall
   2005) supervised by Julien.

   Changes:
   ========

   A    SRC/pcgesvd.f
   A    SRC/pzgesvd.f
   M    SRC/Makefile

   ----------------------------------------------------------------------------
   3) add [sdcz]lawrite and [psdcz]laread: they have been adapated from ScaEx
   example from Antoine Petitet. 
   ----------------------------------------------------------------------------

   Comments:
   =========

   p[sdcz]lawrite and p[sdcz]laread are in the TOOLS directory.
   They provide an easy way to write/read a matrix to/from a file.

   Changes:
   ========
   M TOOLS/Makefile
   A TOOLS/pclaread.f
   A TOOLS/pclawrite.f
   A TOOLS/pdlaread.f
   A TOOLS/pdlawrite.f
   A TOOLS/pslaread.f
   A TOOLS/pslawrite.f
   A TOOLS/pzlaread.f
   A TOOLS/pzlawrite.f

   ---------------------------------------------------------------------------------
   4) a new directory EXAMPLE that contains a ScaLAPACK example in the 4 precisions.
   ---------------------------------------------------------------------------------

   Comments:
   =========

   In the EXAMPLE directory, you now have a program (declined in the 4
   precisions) that solves a linear system by calling the ScaLAPACK routine
   PDGESV. The input matrix and right-hand side are read from a file. The
   solution is written to a file.  To compile and create the example
   executables (assuming that all librairies have previously been built), type
   ***make example*** or ***make*** if you are in the EXAMPLE directory.  This
   will create the four executables in the TESTING directory:
   - xsscaex: for the example using single precision,
   - xdscaex: for the example using double precision,
   - xcscaex: for the example using complex precision,
   - xzscaex: for the example using double complex precision,
   and copy the input files in the TESTING directory. The input files are
   CSCAEXMAT.dat, CSCAEXRHS.dat, DSCAEXMAT.dat, DSCAEXRHS.dat, SCAEX.dat,
   SSCAEXMAT.dat, SSCAEXRHS.dat, ZSCAEXMAT.dat and ZSCAEXRHS.dat.

   To run the example programs using MPI, type 
      mpirun -np <number of processes> xsscaex
   (This is the single precision example.)

   The results will be written in CSCAEXSOL.dat for xcscaex, DSCAEXSOL.dat for
   xdscaex, SSCAEXSOL.dat for xsscaex andZSCAEXSOL.dat for xzscaex. 

   Changes:
   ========
   A EXAMPLE
   A EXAMPLE/CSCAEXMAT.dat
   A EXAMPLE/CSCAEXRHS.dat
   A EXAMPLE/DSCAEXMAT.dat
   A EXAMPLE/DSCAEXRHS.dat
   A EXAMPLE/Makefile
   A EXAMPLE/SCAEX.dat
   A EXAMPLE/SSCAEXMAT.dat
   A EXAMPLE/SSCAEXRHS.dat
   A EXAMPLE/ZSCAEXMAT.dat
   A EXAMPLE/ZSCAEXRHS.dat
   A EXAMPLE/pcscaex.f
   A EXAMPLE/pdscaex.f
   A EXAMPLE/pdscaexinfo.f
   A EXAMPLE/psscaex.f
   A EXAMPLE/pzscaex.f
   M Makefile

   ----------------------------------------------------------------------------
   4) bug fixes 
   ----------------------------------------------------------------------------

   ---------------------------------------------------
   4.1) Add a define for crot and zrot in SRC/pblas.h
   ---------------------------------------------------

      Changes:
      ========

      M    SRC/pblas.h

   -------------------------------------------------------
   4.2) Patches provided by Ake Sandgren and Robert Granat
   -------------------------------------------------------

      All these was found with pathscale compiler with -trapuv -O0 -g which
      initialized everything to NaN and turns FPE traps on.

      Comments:
      =========

      The set of patches does two things.
      1 - reduce the usage of uninitialized variables
      2 - fix a couple of incorrect calls to blacs (bad LDA)

      * gehdrv *

      The gehdrv patch is just the complete patch related to
      https://icl.cs.utk.edu/lapack-forum/viewtopic.php?p=1153#1153

      * pzsepinfo *

      pxsepinfo doesnt initialize THRESH when INFO != 0.

      * pxlahrd and lasorte *

      The lahqr patch and a fix to lasorte needed by lahqr which used to get
      IERR != 0 back from lasorte.

      The T2 = T1*V2 and T3 = T1*V3 moves are needed due to uninitialized
      data.

      The 2 changed IF-statements where brought about to make getting and
      sending SMALLA consistent.

      The ISTOP change at the bottom is a copy of the corresponding statement
      at the top of the loop.

      The init of VCOPY and SMALLA are neccesary.

      lasorte couldn't handle a situation where the top S(1,1) eigenvalue was
      real.

      This set of patches have been tested as can be seen on
         https://icl.cs.utk.edu/lapack-forum/viewtopic.php?p=1196#1196

      The current pxlahrd fix might not be the best. Maybe something should be
      done in pxlarfg instead since alpha isn't set in all cases there, like
      myrow != ixrow for row distribution and likewise for column distribution.

      * pxlasmsub *

      pxlasmsub destroys irow1/icol1 in the "find some norm of the local H"
      part.

      * pxrot *

      pxrot used incorrect LDA values for buff in several places, not sure if
      the intention was to have buff Mx1 or 1xM but it shouldn't really matter
      should it?

      * PBLAS/pxscal *

      PBLAS/pxscal must not test ALPHA unless it is really going to be used
      since scalapack routines sometimes call pxscal with ALPHA uninitialized
      when myrow != Xrow/mycol != Xcol.

      * pxstein *

      pxstein must initialize ONENRM since it isn't always initialized in the
      "IF( NBLK.EQ.IBLOCK( NEXT-1 ) .AND. NBLK.NE.OLNBLK ) THEN" case before
      being used in the "IF( TMPFAC.GT.ODM18 ) THEN" case. Maybe setting to
      ZERO is wrong but its not worse then the original code.

      * pxtrevc *

      pxtrevc and pxevcdriver are just incorrect LDA param to blacs routines.

      Changes:
      ========

      M    PBLAS/SRC/pcscal_.c
      M    PBLAS/SRC/pdscal_.c
      M    PBLAS/SRC/psscal_.c
      M    PBLAS/SRC/pzscal_.c
      M    SLmake.inc
      M    SRC/dlasorte.f
      M    SRC/pclahqr.f
      M    SRC/pclahrd.f
      M    SRC/pclasmsub.f
      M    SRC/pcrot.c
      M    SRC/pcstein.f
      M    SRC/pctrevc.f
      M    SRC/pdlahqr.f
      M    SRC/pdlahrd.f
      M    SRC/pdlasmsub.f
      M    SRC/pdstein.f
      M    SRC/pslahqr.f
      M    SRC/pslahrd.f
      M    SRC/pslasmsub.f
      M    SRC/psstein.f
      M    SRC/pzlahqr.f
      M    SRC/pzlahrd.f
      M    SRC/pzlasmsub.f
      M    SRC/pzrot.c
      M    SRC/pzstein.f
      M    SRC/pztrevc.f
      M    SRC/slasorte.f
      M    TESTING/EIG/pcevcdriver.f
      M    TESTING/EIG/pcgehdrv.f
      M    TESTING/EIG/pcgsepreq.f
      M    TESTING/EIG/pdgehdrv.f
      M    TESTING/EIG/pdgsepreq.f
      M    TESTING/EIG/psgehdrv.f
      M    TESTING/EIG/psgsepreq.f
      M    TESTING/EIG/pzevcdriver.f
      M    TESTING/EIG/pzgehdrv.f
      M    TESTING/EIG/pzgsepreq.f

   ----------------
   4.3) pxinvdriver
   ----------------

      Comments:
      =========

      Following up on the latest modification (see below). We have increased the size of the integer
      workspace in the rectangular case.  We now report the new integer block size
      calculation in the tester. So that the LIWORK given by the tester to the
      PxGETRI is big enough ...

      Changes:
      ========

      M    TESTING/LIN/pcinvdriver.f
      M    TESTING/LIN/pdinvdriver.f
      M    TESTING/LIN/psinvdriver.f
      M    TESTING/LIN/pzinvdriver.f

   -----------------------------------------------------------------
   4.4) Correct the integer workspace (IWORK) calculation in PxGETRI
   -----------------------------------------------------------------

      Comments:
      =========

      Bug report send by Desheng Wang from Caltech on scalapack@cs.utk.edu,
      Mon. May, 1st 2006.

      Fix:
      Replace the line 221-222:

               LIWMIN = NQ + MAX( ICEIL( ICEIL( MP, DESCA( MB_ ) ),
     $                            LCM / NPROW ), DESCA( NB_ ) ) 

      By:

               LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW
     $                  + MOD ( IA - 1, DESCA( MB_ ) ), DESCA ( NB_ ),
     $                  MYCOL, DESCA( CSRC_ ), NPCOL ) +
     $                  MAX ( DESCA( MB_ ) * ICEIL ( ICEIL(
     $                  NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW,
     $                  DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ),
     $                  DESCA( MB_ ) ), LCM / NPROW ), DESCA( NB_ ) )

      Yep, slightly more complex...

      The error in the first computation is that it misinterprets the statement
      in PxLAPIV: The formula for the integer worskpace calculation in PxLAPIV is

          LDW = LOCc( M_P + MOD(IP-1, MB_P) ) +
                 MB_P * CEIL( CEIL(LOCr(M_P)/MB_P) / (LCM/NPROW) )

      where M_P is the local size of the IPIV. But the IPIV is slighlty bigger
      than A, the global size of IPIV is:

           MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW (and not DESCA(M_)).

      The other quantities are given by

      M_P     is the global length of the pivot vector
              MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW
      I_P     is IA
              I_P = IA
      MB_P    is the block size use for the block cyclic distribution of the 
              pivot vector
              MB_P = DESCA (MB_ )
      LOCc ( . ) 
              NUMROC ( . , DESCA ( NB_ ), MYCOL, DESCA ( CSRC_ ), NPCOL )
      LOCr ( . )
              NUMROC ( . , DESCA ( MB_ ), MYROW, DESCA ( RSRC_ ), NPROW )
      CEIL ( X / Y )
              ICEIL( X, Y )
      LCM 
              LCM = ILCM( NPROW, NPCOL )

      and this gives the new formula to compute the integer workspace.

      Changes:
      ========

      M    SRC/pcgetri.f
      M    SRC/pdgetri.f
      M    SRC/psgetri.f
      M    SRC/pzgetri.f

   -----------------------------------------------------------------
   4.5) Correct the integer workspace (IWORK) calculation in PxGETRI
   -----------------------------------------------------------------

      Comments:
      =========

      Bug report from Yasuhiro Nakahara (Canon inc.) on 03/13/2006.
      Patch from Greg Henry (Intel) and Mark Fahey (ORNL).

      Description: pzlahqr routine was aborted due to a segmentation fault.
      I found an invalid memory access at the line 525 in pzlahqr.f.
      In the DO-loop, with II=1, S1(1, 0) was accessed.

      Greg said:

      > There is an easy fix for this- the idea of exceptional shifts is to
      > just try something outside the norm based on the size of the diagonal
      > elements.  The offending part can be removed from the code without a
      > loss of generality.  I think I may be able to come with an alternate
      > solution.

      move from

            DO 20 II = 2*JBLK, 1, -1
               S1( II, II ) = CONST*( CABS1( S1( II, II ) )+
     $                        CABS1( S1( II, II-1 ) ) )
               S1( II, II-1 ) = ZERO
               S1( II-1, II ) = ZERO
   20       CONTINUE


      (with problem when II=1 ...) to

            DO 20 II = 2*JBLK, 2, -1
               S1( II, II ) = CONST*( CABS1( S1( II, II ) )+
     $                        CABS1( S1( II, II-1 ) ) )
               S1( II, II-1 ) = ZERO
               S1( II-1, II ) = ZERO
   20       CONTINUE
            S1( 1, 1 ) = CONST*CABS1( S1( 1, 1 ) )

      Note that this part of the code is not exercized by the testing.
      (So the bug was hard to find.)

      Changes:
      ========

      M    SRC/pclahqr.f
      M    SRC/pdlahqr.f
      M    SRC/pslahqr.f
      M    SRC/pzlahqr.f
      
   ----------------------------------------------------------------------
   4.6) Correct typo in the [S,D,C,Z]gesvd files for the delaclaration of
        P[S,D,C,Z]ORMBRQLN 
   ----------------------------------------------------------------------

      Changes:
      ========

      M    SRC/pcgesvd.f
      M    SRC/pdgesvd.f
      M    SRC/psgesvd.f
      M    SRC/pzgesvd.f

   -----------------------------------------------------------------
   4.7) Modify typo in comment + description of workspace.
   -----------------------------------------------------------------

      Comments:
      =========

      When RANGE='V', work need to be of dimension 3 

      Changes:
      ========

      M    SRC/pcheevx.f
      M    SRC/pchegvx.f
      M    SRC/pdsyevx.f
      M    SRC/pdsygvx.f
      M    SRC/pssyevx.f
      M    SRC/pssygvx.f
      M    SRC/pzheevx.f
      M    SRC/pzhegvx.f

   -----------------------------------------------------------------
   4.8) Correction of a Typo mistake in the work comment.
   -----------------------------------------------------------------

      Changes:
      ========
      M    SRC/pdsyevx.f

   ----------------------------------------------------------------------
   4.9) modify the workspace size of xBDSQR to follow the revision 184 of
        LAPACK the workspace size of xBDSQR has moved from  
   ----------------------------------------------------------------------

      Comments:
      =========

        modify the workspace size of xBDSQR to follow the revision 184 of LAPACK
        the workspace size of xBDSQR has moved from  
*          WDBDSQR = MAX(1, 4*SIZE )
        to 
*          WDBDSQR = MAX(1, 2*SIZE + (2*SIZE - 4)*MAX(WANTU, WANTVT))
        and is now back to
*          WDBDSQR = MAX(1, 4*SIZE )
        so SVD of ScaLAPACK is following (at least let us take the max of both until
        LAPACK is fixed on its workspace size)

      Changes:
      ========

      M    SRC/psgesvd.f
      M    SRC/pcgesvd.f
      M    SRC/pzgesvd.f
      M    SRC/pdgesvd.f

   -----------------------------------------------------------
   4.10) correct a bug in the workspace utilisation of p_gesvd
   -----------------------------------------------------------

      Comments:
      =========

      [Julien/Osni] correct a bug in the workspace utilisation of p_gesvd. In
      the case jobU='V' and jobVT='V', the routine has good pointers,
      otherwise the pointers in the workspace where shifted as if matrices U
      and VT existed which implied out of bound reference for the value
      stored at the end of the workspace.  There was also a few problems at
      the end of the code with some sizes in the case of rectangular matrices.

      Changes:
      ========

      M    SRC/psgesvd.f
      M    SRC/pdgesvd.f


   ------------------------------
   4.11) Documentation correction
   ------------------------------

      Comments:
      =========

      * SRC/p[s,d,c,z]gesv.f *
      [Julien]

      correction in the description of the parameter NRHS
      (it's the number of columns of B not A)

      * SRC/p[s,d]lared1d.f *
      * SRC/p[s,d]lared2d.f *
      [Julien]

      The comments in the routines p[s,d]lared2d (where the initial vectors are stored by row)
      were wrong (basically replace BYCOL by BYROW)


      Changes:
      ========
      M SRC/p[s,d,c,z]gesv.f
      M SRC/p[s,d]lared1d.f
      M SRC/p[s,d]lared2d.f

   -------------------------
   4.12) bug in p[s/d]lahrd
   ------------------------

      Comments:
      =========

      Although the Schur form returned by p[s/d]lahqr was correct (as tested by
      the testing routine), the returned eigenvalues were not computed
      correctely.  This bug was reported by Interactive Supercompting
      (Thanks!). The bug was already found by Greg Henry in March 2002 but the
      patch has never been released. Here we go.

      Changes:
      ========
      M SRC/p[s/d]lahrd.f

      -----------------------------------------------------------------
      4.13) Initial import from netlib 
      -----------------------------------------------------------------

      Comments:
      =========

      For ScaLAPACK: Scalapack 1.7 + patch

      patch contains:

      PBLAS/SRC/PBtools.h      3/12/2002  Comment out CSYMM reference (line 57)
      PBLAS/SRC/pblas.h        3/15/2002  Added missing crot define
      SRC/psdbtrf.f            3/12/2002  Typo (DLACPY->SLACPY) in EXTERNAL declaration (line 374)
      SRC/pcheevd.f            3/25/2002  Correction to LRWORK (lines 117, 248) and INFO=0 return
      SRC/pzheevd.f            3/25/2002  Correction to LRWORK (lines 117, 248) and INFO=0 return
      TESTING/EIG/pcseptst.f   3/15/2002  Correction to LHEEVDSIZE calculation (line 1064)
      TESTING/EIG/pzseptst.f   3/15/2002  Correction to LHEEVDSIZE calculation (line 1064)

      for more information, please visit:
      http://www.netlib.org/scalapack/errata.html#sourcecode

      Changes:
      ========

      M PBLAS/SRC/PBtools.h   
      M PBLAS/SRC/pblas.h     
      M SRC/psdbtrf.f         
      M SRC/pcheevd.f         
      M SRC/pzheevd.f         
      M TESTING/EIG/pcseptst.f
      M TESTING/EIG/pzseptst.f

      ----------------------------------------
      4.14) Modification on the BLACS tar ball
      ----------------------------------------

      Comments:
      =========

      for BLACS: Blacs : pvmblacs + mpiblacs + blacs tester from netlib +
      patch-3 + correction on the Makefile from the INSTALL directory

      For patch details, see:
         http://www.netlib.org/blacs/old_errata.blacs for details

      the ***make clean*** now deletes the following files:
         tc_cCsameF77.o  tc_fCsameF77.o  tc_UseMpich.o

      Changes:
      ========

        INSTALL/Makefile