===================== == ScaLAPACK 1.8.0 == ===================== Release date: Th 04/05/2007. This material is based upon work supported by the National Science Foundation under Grant No. NSF-0444486. * ScaLAPACK 1.8.0: What's new * Thanks * Developer list * More details ============================================= == ScaLAPACK 1.8.0: What's new since 1.7.0 == ============================================= 1) externalisation of the LAPACK routines: starting from 1.8.0, you NEED the LAPACK library installed on your machine in order to link/run a ScaLAPACK application 2) add p[cz]gesvd, the complex version of the SVD driver 3) add p[sdcz]lawrite and [psdcz]laread, tools for easy I/O 4) new directory EXAMPLE that contains a ScaLAPACK example in the 4 precisions 5) bug fixes ======================================= == Thanks for bug-report/patches to == ======================================= Ake Sandgren HPC2N, Umea University Robert Granat Umea University Greg Henry Intel Alan Edelman, Sudarshan Raghunathan Interactive Super Computing Yasuhiro Nakahara Canon inc. Mark Fahey ORNL Desheng Wang Caltech =========================== = Principal Investigators = =========================== Jim Demmel (University or California at Berkeley, USA) Jack Dongarra (University of Tennessee and ORNL, USA) =================================================== == ScaLAPACK developers involved in this release == =================================================== Peng Du (University of Tennessee, USA) Julie Langou (University of Tennessee, USA) Julien Langou (University of Colorado at Denver and Health Sciences Center, USA) Piotr Luszczek (University of Tennessee, USA) Osni Marques (Lawrence Berkeley National Laboratory, USA) ================= == More details = ================= ---------------------------------------------------------------------------- 1) externalisation of the LAPACK library ---------------------------------------------------------------------------- Comments: ========= Until 1.7.x, the LAPACK library was hard-coded in ScaLAPACK, it has been removed starting from 1.8.0. Consequently, the ScaLAPACK library needs to link with an existing LAPACK library in order to work properly. Changes: ======== Remove all the LAPACK routines from TOOLS/LAPACK ---------------------------------------------------------------------------- 2) add the complex version of the SVD driver ---------------------------------------------------------------------------- Comments: ========= Contributed codes by Peng Du (Graduate Research Assistant at UTK, Fall 2005) supervised by Julien. Changes: ======== A SRC/pcgesvd.f A SRC/pzgesvd.f M SRC/Makefile ---------------------------------------------------------------------------- 3) add [sdcz]lawrite and [psdcz]laread: they have been adapated from ScaEx example from Antoine Petitet. ---------------------------------------------------------------------------- Comments: ========= p[sdcz]lawrite and p[sdcz]laread are in the TOOLS directory. They provide an easy way to write/read a matrix to/from a file. Changes: ======== M TOOLS/Makefile A TOOLS/pclaread.f A TOOLS/pclawrite.f A TOOLS/pdlaread.f A TOOLS/pdlawrite.f A TOOLS/pslaread.f A TOOLS/pslawrite.f A TOOLS/pzlaread.f A TOOLS/pzlawrite.f --------------------------------------------------------------------------------- 4) a new directory EXAMPLE that contains a ScaLAPACK example in the 4 precisions. --------------------------------------------------------------------------------- Comments: ========= In the EXAMPLE directory, you now have a program (declined in the 4 precisions) that solves a linear system by calling the ScaLAPACK routine PDGESV. The input matrix and right-hand side are read from a file. The solution is written to a file. To compile and create the example executables (assuming that all librairies have previously been built), type ***make example*** or ***make*** if you are in the EXAMPLE directory. This will create the four executables in the TESTING directory: - xsscaex: for the example using single precision, - xdscaex: for the example using double precision, - xcscaex: for the example using complex precision, - xzscaex: for the example using double complex precision, and copy the input files in the TESTING directory. The input files are CSCAEXMAT.dat, CSCAEXRHS.dat, DSCAEXMAT.dat, DSCAEXRHS.dat, SCAEX.dat, SSCAEXMAT.dat, SSCAEXRHS.dat, ZSCAEXMAT.dat and ZSCAEXRHS.dat. To run the example programs using MPI, type mpirun -np xsscaex (This is the single precision example.) The results will be written in CSCAEXSOL.dat for xcscaex, DSCAEXSOL.dat for xdscaex, SSCAEXSOL.dat for xsscaex andZSCAEXSOL.dat for xzscaex. Changes: ======== A EXAMPLE A EXAMPLE/CSCAEXMAT.dat A EXAMPLE/CSCAEXRHS.dat A EXAMPLE/DSCAEXMAT.dat A EXAMPLE/DSCAEXRHS.dat A EXAMPLE/Makefile A EXAMPLE/SCAEX.dat A EXAMPLE/SSCAEXMAT.dat A EXAMPLE/SSCAEXRHS.dat A EXAMPLE/ZSCAEXMAT.dat A EXAMPLE/ZSCAEXRHS.dat A EXAMPLE/pcscaex.f A EXAMPLE/pdscaex.f A EXAMPLE/pdscaexinfo.f A EXAMPLE/psscaex.f A EXAMPLE/pzscaex.f M Makefile ---------------------------------------------------------------------------- 4) bug fixes ---------------------------------------------------------------------------- --------------------------------------------------- 4.1) Add a define for crot and zrot in SRC/pblas.h --------------------------------------------------- Changes: ======== M SRC/pblas.h ------------------------------------------------------- 4.2) Patches provided by Ake Sandgren and Robert Granat ------------------------------------------------------- All these was found with pathscale compiler with -trapuv -O0 -g which initialized everything to NaN and turns FPE traps on. Comments: ========= The set of patches does two things. 1 - reduce the usage of uninitialized variables 2 - fix a couple of incorrect calls to blacs (bad LDA) * gehdrv * The gehdrv patch is just the complete patch related to https://icl.cs.utk.edu/lapack-forum/viewtopic.php?p=1153#1153 * pzsepinfo * pxsepinfo doesnt initialize THRESH when INFO != 0. * pxlahrd and lasorte * The lahqr patch and a fix to lasorte needed by lahqr which used to get IERR != 0 back from lasorte. The T2 = T1*V2 and T3 = T1*V3 moves are needed due to uninitialized data. The 2 changed IF-statements where brought about to make getting and sending SMALLA consistent. The ISTOP change at the bottom is a copy of the corresponding statement at the top of the loop. The init of VCOPY and SMALLA are neccesary. lasorte couldn't handle a situation where the top S(1,1) eigenvalue was real. This set of patches have been tested as can be seen on https://icl.cs.utk.edu/lapack-forum/viewtopic.php?p=1196#1196 The current pxlahrd fix might not be the best. Maybe something should be done in pxlarfg instead since alpha isn't set in all cases there, like myrow != ixrow for row distribution and likewise for column distribution. * pxlasmsub * pxlasmsub destroys irow1/icol1 in the "find some norm of the local H" part. * pxrot * pxrot used incorrect LDA values for buff in several places, not sure if the intention was to have buff Mx1 or 1xM but it shouldn't really matter should it? * PBLAS/pxscal * PBLAS/pxscal must not test ALPHA unless it is really going to be used since scalapack routines sometimes call pxscal with ALPHA uninitialized when myrow != Xrow/mycol != Xcol. * pxstein * pxstein must initialize ONENRM since it isn't always initialized in the "IF( NBLK.EQ.IBLOCK( NEXT-1 ) .AND. NBLK.NE.OLNBLK ) THEN" case before being used in the "IF( TMPFAC.GT.ODM18 ) THEN" case. Maybe setting to ZERO is wrong but its not worse then the original code. * pxtrevc * pxtrevc and pxevcdriver are just incorrect LDA param to blacs routines. Changes: ======== M PBLAS/SRC/pcscal_.c M PBLAS/SRC/pdscal_.c M PBLAS/SRC/psscal_.c M PBLAS/SRC/pzscal_.c M SLmake.inc M SRC/dlasorte.f M SRC/pclahqr.f M SRC/pclahrd.f M SRC/pclasmsub.f M SRC/pcrot.c M SRC/pcstein.f M SRC/pctrevc.f M SRC/pdlahqr.f M SRC/pdlahrd.f M SRC/pdlasmsub.f M SRC/pdstein.f M SRC/pslahqr.f M SRC/pslahrd.f M SRC/pslasmsub.f M SRC/psstein.f M SRC/pzlahqr.f M SRC/pzlahrd.f M SRC/pzlasmsub.f M SRC/pzrot.c M SRC/pzstein.f M SRC/pztrevc.f M SRC/slasorte.f M TESTING/EIG/pcevcdriver.f M TESTING/EIG/pcgehdrv.f M TESTING/EIG/pcgsepreq.f M TESTING/EIG/pdgehdrv.f M TESTING/EIG/pdgsepreq.f M TESTING/EIG/psgehdrv.f M TESTING/EIG/psgsepreq.f M TESTING/EIG/pzevcdriver.f M TESTING/EIG/pzgehdrv.f M TESTING/EIG/pzgsepreq.f ---------------- 4.3) pxinvdriver ---------------- Comments: ========= Following up on the latest modification (see below). We have increased the size of the integer workspace in the rectangular case. We now report the new integer block size calculation in the tester. So that the LIWORK given by the tester to the PxGETRI is big enough ... Changes: ======== M TESTING/LIN/pcinvdriver.f M TESTING/LIN/pdinvdriver.f M TESTING/LIN/psinvdriver.f M TESTING/LIN/pzinvdriver.f ----------------------------------------------------------------- 4.4) Correct the integer workspace (IWORK) calculation in PxGETRI ----------------------------------------------------------------- Comments: ========= Bug report send by Desheng Wang from Caltech on scalapack@cs.utk.edu, Mon. May, 1st 2006. Fix: Replace the line 221-222: LIWMIN = NQ + MAX( ICEIL( ICEIL( MP, DESCA( MB_ ) ), $ LCM / NPROW ), DESCA( NB_ ) ) By: LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW $ + MOD ( IA - 1, DESCA( MB_ ) ), DESCA ( NB_ ), $ MYCOL, DESCA( CSRC_ ), NPCOL ) + $ MAX ( DESCA( MB_ ) * ICEIL ( ICEIL( $ NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW, $ DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ), $ DESCA( MB_ ) ), LCM / NPROW ), DESCA( NB_ ) ) Yep, slightly more complex... The error in the first computation is that it misinterprets the statement in PxLAPIV: The formula for the integer worskpace calculation in PxLAPIV is LDW = LOCc( M_P + MOD(IP-1, MB_P) ) + MB_P * CEIL( CEIL(LOCr(M_P)/MB_P) / (LCM/NPROW) ) where M_P is the local size of the IPIV. But the IPIV is slighlty bigger than A, the global size of IPIV is: MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW (and not DESCA(M_)). The other quantities are given by M_P is the global length of the pivot vector MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW I_P is IA I_P = IA MB_P is the block size use for the block cyclic distribution of the pivot vector MB_P = DESCA (MB_ ) LOCc ( . ) NUMROC ( . , DESCA ( NB_ ), MYCOL, DESCA ( CSRC_ ), NPCOL ) LOCr ( . ) NUMROC ( . , DESCA ( MB_ ), MYROW, DESCA ( RSRC_ ), NPROW ) CEIL ( X / Y ) ICEIL( X, Y ) LCM LCM = ILCM( NPROW, NPCOL ) and this gives the new formula to compute the integer workspace. Changes: ======== M SRC/pcgetri.f M SRC/pdgetri.f M SRC/psgetri.f M SRC/pzgetri.f ----------------------------------------------------------------- 4.5) Correct the integer workspace (IWORK) calculation in PxGETRI ----------------------------------------------------------------- Comments: ========= Bug report from Yasuhiro Nakahara (Canon inc.) on 03/13/2006. Patch from Greg Henry (Intel) and Mark Fahey (ORNL). Description: pzlahqr routine was aborted due to a segmentation fault. I found an invalid memory access at the line 525 in pzlahqr.f. In the DO-loop, with II=1, S1(1, 0) was accessed. Greg said: > There is an easy fix for this- the idea of exceptional shifts is to > just try something outside the norm based on the size of the diagonal > elements. The offending part can be removed from the code without a > loss of generality. I think I may be able to come with an alternate > solution. move from DO 20 II = 2*JBLK, 1, -1 S1( II, II ) = CONST*( CABS1( S1( II, II ) )+ $ CABS1( S1( II, II-1 ) ) ) S1( II, II-1 ) = ZERO S1( II-1, II ) = ZERO 20 CONTINUE (with problem when II=1 ...) to DO 20 II = 2*JBLK, 2, -1 S1( II, II ) = CONST*( CABS1( S1( II, II ) )+ $ CABS1( S1( II, II-1 ) ) ) S1( II, II-1 ) = ZERO S1( II-1, II ) = ZERO 20 CONTINUE S1( 1, 1 ) = CONST*CABS1( S1( 1, 1 ) ) Note that this part of the code is not exercized by the testing. (So the bug was hard to find.) Changes: ======== M SRC/pclahqr.f M SRC/pdlahqr.f M SRC/pslahqr.f M SRC/pzlahqr.f ---------------------------------------------------------------------- 4.6) Correct typo in the [S,D,C,Z]gesvd files for the delaclaration of P[S,D,C,Z]ORMBRQLN ---------------------------------------------------------------------- Changes: ======== M SRC/pcgesvd.f M SRC/pdgesvd.f M SRC/psgesvd.f M SRC/pzgesvd.f ----------------------------------------------------------------- 4.7) Modify typo in comment + description of workspace. ----------------------------------------------------------------- Comments: ========= When RANGE='V', work need to be of dimension 3 Changes: ======== M SRC/pcheevx.f M SRC/pchegvx.f M SRC/pdsyevx.f M SRC/pdsygvx.f M SRC/pssyevx.f M SRC/pssygvx.f M SRC/pzheevx.f M SRC/pzhegvx.f ----------------------------------------------------------------- 4.8) Correction of a Typo mistake in the work comment. ----------------------------------------------------------------- Changes: ======== M SRC/pdsyevx.f ---------------------------------------------------------------------- 4.9) modify the workspace size of xBDSQR to follow the revision 184 of LAPACK the workspace size of xBDSQR has moved from ---------------------------------------------------------------------- Comments: ========= modify the workspace size of xBDSQR to follow the revision 184 of LAPACK the workspace size of xBDSQR has moved from * WDBDSQR = MAX(1, 4*SIZE ) to * WDBDSQR = MAX(1, 2*SIZE + (2*SIZE - 4)*MAX(WANTU, WANTVT)) and is now back to * WDBDSQR = MAX(1, 4*SIZE ) so SVD of ScaLAPACK is following (at least let us take the max of both until LAPACK is fixed on its workspace size) Changes: ======== M SRC/psgesvd.f M SRC/pcgesvd.f M SRC/pzgesvd.f M SRC/pdgesvd.f ----------------------------------------------------------- 4.10) correct a bug in the workspace utilisation of p_gesvd ----------------------------------------------------------- Comments: ========= [Julien/Osni] correct a bug in the workspace utilisation of p_gesvd. In the case jobU='V' and jobVT='V', the routine has good pointers, otherwise the pointers in the workspace where shifted as if matrices U and VT existed which implied out of bound reference for the value stored at the end of the workspace. There was also a few problems at the end of the code with some sizes in the case of rectangular matrices. Changes: ======== M SRC/psgesvd.f M SRC/pdgesvd.f ------------------------------ 4.11) Documentation correction ------------------------------ Comments: ========= * SRC/p[s,d,c,z]gesv.f * [Julien] correction in the description of the parameter NRHS (it's the number of columns of B not A) * SRC/p[s,d]lared1d.f * * SRC/p[s,d]lared2d.f * [Julien] The comments in the routines p[s,d]lared2d (where the initial vectors are stored by row) were wrong (basically replace BYCOL by BYROW) Changes: ======== M SRC/p[s,d,c,z]gesv.f M SRC/p[s,d]lared1d.f M SRC/p[s,d]lared2d.f ------------------------- 4.12) bug in p[s/d]lahrd ------------------------ Comments: ========= Although the Schur form returned by p[s/d]lahqr was correct (as tested by the testing routine), the returned eigenvalues were not computed correctely. This bug was reported by Interactive Supercompting (Thanks!). The bug was already found by Greg Henry in March 2002 but the patch has never been released. Here we go. Changes: ======== M SRC/p[s/d]lahrd.f ----------------------------------------------------------------- 4.13) Initial import from netlib ----------------------------------------------------------------- Comments: ========= For ScaLAPACK: Scalapack 1.7 + patch patch contains: PBLAS/SRC/PBtools.h 3/12/2002 Comment out CSYMM reference (line 57) PBLAS/SRC/pblas.h 3/15/2002 Added missing crot define SRC/psdbtrf.f 3/12/2002 Typo (DLACPY->SLACPY) in EXTERNAL declaration (line 374) SRC/pcheevd.f 3/25/2002 Correction to LRWORK (lines 117, 248) and INFO=0 return SRC/pzheevd.f 3/25/2002 Correction to LRWORK (lines 117, 248) and INFO=0 return TESTING/EIG/pcseptst.f 3/15/2002 Correction to LHEEVDSIZE calculation (line 1064) TESTING/EIG/pzseptst.f 3/15/2002 Correction to LHEEVDSIZE calculation (line 1064) for more information, please visit: http://www.netlib.org/scalapack/errata.html#sourcecode Changes: ======== M PBLAS/SRC/PBtools.h M PBLAS/SRC/pblas.h M SRC/psdbtrf.f M SRC/pcheevd.f M SRC/pzheevd.f M TESTING/EIG/pcseptst.f M TESTING/EIG/pzseptst.f ---------------------------------------- 4.14) Modification on the BLACS tar ball ---------------------------------------- Comments: ========= for BLACS: Blacs : pvmblacs + mpiblacs + blacs tester from netlib + patch-3 + correction on the Makefile from the INSTALL directory For patch details, see: http://www.netlib.org/blacs/old_errata.blacs for details the ***make clean*** now deletes the following files: tc_cCsameF77.o tc_fCsameF77.o tc_UseMpich.o Changes: ======== INSTALL/Makefile