Release date: 06/28/21.
This material is based upon work supported by the National Science Foundation and the Department of Energy.
LAPACK is a software package provided by Univ. of Tennessee, Univ. of California, Berkeley, Univ. of Colorado Denver and NAG Ltd..
1. Support and questions:
2. LAPACK 3.10.0: What’s new
-
Download: lapack-3.10.0.tar.gz
This is a major release and also addressing multiple bug fixes.
2.1. Safe scaling algorithms
Contribution by Ed Anderson
Added SRC/la_constants.f90
Added SRC/la_xisnan.F90
Deleted BLAS/SRC/dnrm2.f Added BLAS/SRC/dnrm2.f90
Deleted BLAS/SRC/dznrm2.f Added BLAS/SRC/dznrm2.f90
Deleted BLAS/SRC/scnrm2.f Added BLAS/SRC/scnrm2.f90
Deleted BLAS/SRC/snrm2.f Added BLAS/SRC/snrm2.f90
Deleted BLAS/SRC/drotg.f Added BLAS/SRC/drotg.f90
Deleted BLAS/SRC/zrotg.f Added BLAS/SRC/zrotg.f90
Deleted BLAS/SRC/crotg.f Added BLAS/SRC/crotg.f90
Deleted BLAS/SRC/srotg.f Added BLAS/SRC/srotg.f90
Deleted SRC/dlassq.f Added SRC/dlassq.f90
Deleted SRC/zlassq.f Added SRC/zlassq.f90
Deleted SRC/classq.f Added SRC/classq.f90
Deleted SRC/slassq.f Added SRC/slassq.f90
Deleted SRC/dlartg.f Added SRC/dlartg.f90
Deleted SRC/zlartg.f Added SRC/zlartg.f90
Deleted SRC/clartg.f Added SRC/clartg.f90
Deleted SRC/slartg.f Added SRC/slartg.f90
Edward Anderson stated in his paper https://doi.org/10.1145/3061665:
The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly avoids the problem, but it can still occur at extreme values in the range of representable numbers. More careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite for all the Level 1 BLAS is included.
This contribution replaces the original xNRM2, xROTG, xLARTG, xLASSQ routines with Edward’s safe scaling codes plus some minor modifications.
Note: this code follows Fortran90 standard.
-
The new Fortran90 module
SRC/la_constants.f90
expands the functionality ofINSTALL/xLAMCH.f
by adding safe scaling constants. -
The new Fortran90 module
SRC/la_xisnan.F90
expands the functionality ofSRC/xISNAN.f
andSRC/xLAISNAN.f
by adding the option of using theieee_is_nan
routine.
2.2. Implementation of the multishift QZ algorithm with AED
Contribution by thijssteel
It is loosely based on my implementation of the rational QZ algorithm (https://github.com/thijssteel/multishift-multipole-rqz).
It features:
-
Agressive early deflation
-
Multishift QZ sweeps using optimal packing of the bulges
-
A new heuristic to select the number of positions in the sweep windows
It does not feature:
-
A windowed deflation of infinite eigenvalues (that is only useful if many infinite eigenvalues are to be deflated and in that case you should probably do some preprocessing anyway).
Also two accuracy improvements to xtgex2:
-
Daan Camps has proven in his dissertation that replacing the condition in xtgex2 when swapping 1x1 blocks can improve the accuracy. This condition and the respective error tolerances have been changed to reflect it.
-
Also by Daan Camps, "SWAPPING 2 × 2 BLOCKS IN THE SCHUR AND GENERALIZED SCHUR FORM", shows that the accuracy of swapping 2x2 blocks can be improved by iterative refinement and by replacing the QR factorisation with an SVD based method. Only the first improvement was implemented as the paper does not mention how to avoid overflow.
2.3 Householder Reconstruction
new file: SRC/sorhr_col.f
new file: SRC/dorhr_col.f
new file: SRC/cunhr_col.f
new file: SRC/zunhr_col.f
new file: SRC/dorgtsqr.f
new file: SRC/sorgtsqr.f
new file: SRC/dgetsqrhrt.f
new file: SRC/dlarfb_gett.f
new file: SRC/dorgtsqr_row.f
new file: SRC/sgetsqrhrt.f
new file: SRC/slarfb_gett.f
new file: SRC/sorgtsqr_row.f
new file: SRC/cgetsqrhrt.f
new file: SRC/clarfb_gett.f
new file: SRC/cungtsqr_row.f
new file: SRC/zgetsqrhrt.f
new file: SRC/zlarfb_gett.f
new file: SRC/zungtsqr_row.f
GETSQRHRT is a QR factorization routine for tall-skinny matrices. It is based on LATSQR, but returns the Q and R factors in the same format as GEQRT, i.e. using the compact WY representation of Q.
This is the same format used by other LAPACK routines that depend on QR factorizations. GETSQRHRT uses ORGTSQR_ROW (which in turn calls LARFB_GETT) to construct an orthonormal matrix from a TSQR factorization, which is a more efficient version of ORGTSQR.
2.4 Change in the return behavior of GESDD
GESDD now returns with INFO = -4 if A has a NaN entry.
In version 3.9.0 and master, GESDD would call exit(0)
instead of returning an error code on some matrices. See Issue #469
2.5 Bug fixes
For details please see our Github repository
2.6 Notes about compiler dependency
Some LAPACK routines rely on trustworthy complex division and ABS routines in the FORTRAN compiler. This [link](https://github.com/Reference-LAPACK/lapack/files/6672436/complexDivisionFound.txt) lists the LAPACK COMPLEX*16 algorithms that contain compiler dependent complex divisions of the form
REAL / COMPLEX or COMPLEX / COMPLEX
See Issue #575 and Issue #577 for a more complete discussion on this topic.
3. Developer list
-
Weslley da Silva Pereira (University of Colorado Denver, USA)
-
Julie Langou (University of Tennessee, USA)
-
Igor Kozachenko (University of California, Berkeley, USA)
-
Jim Demmel (University of California, Berkeley, USA)
-
Jack Dongarra (University of Tennessee and ORNL, USA)
-
Julien Langou (University of Colorado Denver, USA)
4. Thanks
-
MathWorks: Penny Anderson, Amanda Barry, Mary Ann Freeman, Bobby Cheng, Pat Quillen, Christine Tobler.
-
Ed Anderson
-
GitHub Users: thijssteel, jip, eshpc, martin-frbg, Matthew-Badin, msk, ZTaylor39, sergey-v-kuznetsov, 5tefan
Github contribution details here