Release date: 06/28/21.

This material is based upon work supported by the National Science Foundation and the Department of Energy.

LAPACK is a software package provided by Univ. of Tennessee, Univ. of California, Berkeley, Univ. of Colorado Denver and NAG Ltd..

1. Support and questions:

2. LAPACK 3.10.0: What’s new

This is a major release and also addressing multiple bug fixes.

2.1. Safe scaling algorithms

Contribution by Ed Anderson

Added SRC/la_constants.f90
Added SRC/la_xisnan.F90
Deleted BLAS/SRC/dnrm2.f      Added BLAS/SRC/dnrm2.f90
Deleted BLAS/SRC/dznrm2.f     Added BLAS/SRC/dznrm2.f90
Deleted BLAS/SRC/scnrm2.f     Added BLAS/SRC/scnrm2.f90
Deleted BLAS/SRC/snrm2.f      Added BLAS/SRC/snrm2.f90
Deleted BLAS/SRC/drotg.f      Added BLAS/SRC/drotg.f90
Deleted BLAS/SRC/zrotg.f      Added BLAS/SRC/zrotg.f90
Deleted BLAS/SRC/crotg.f      Added BLAS/SRC/crotg.f90
Deleted BLAS/SRC/srotg.f      Added BLAS/SRC/srotg.f90
Deleted SRC/dlassq.f          Added SRC/dlassq.f90
Deleted SRC/zlassq.f          Added SRC/zlassq.f90
Deleted SRC/classq.f          Added SRC/classq.f90
Deleted SRC/slassq.f          Added SRC/slassq.f90
Deleted SRC/dlartg.f          Added SRC/dlartg.f90
Deleted SRC/zlartg.f          Added SRC/zlartg.f90
Deleted SRC/clartg.f          Added SRC/clartg.f90
Deleted SRC/slartg.f          Added SRC/slartg.f90

Edward Anderson stated in his paper https://doi.org/10.1145/3061665:

The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly avoids the problem, but it can still occur at extreme values in the range of representable numbers. More careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite for all the Level 1 BLAS is included.

This contribution replaces the original xNRM2, xROTG, xLARTG, xLASSQ routines with Edward’s safe scaling codes plus some minor modifications.

Note: this code follows Fortran90 standard.

  • The new Fortran90 module SRC/la_constants.f90 expands the functionality of INSTALL/xLAMCH.f by adding safe scaling constants.

  • The new Fortran90 module SRC/la_xisnan.F90 expands the functionality of SRC/xISNAN.f and SRC/xLAISNAN.f by adding the option of using the ieee_is_nan routine.

2.2. Implementation of the multishift QZ algorithm with AED

Contribution by thijssteel

It is loosely based on my implementation of the rational QZ algorithm (https://github.com/thijssteel/multishift-multipole-rqz).

It features:

  • Agressive early deflation

  • Multishift QZ sweeps using optimal packing of the bulges

  • A new heuristic to select the number of positions in the sweep windows

It does not feature:

  • A windowed deflation of infinite eigenvalues (that is only useful if many infinite eigenvalues are to be deflated and in that case you should probably do some preprocessing anyway).

Also two accuracy improvements to xtgex2:

  • Daan Camps has proven in his dissertation that replacing the condition in xtgex2 when swapping 1x1 blocks can improve the accuracy. This condition and the respective error tolerances have been changed to reflect it.

  • Also by Daan Camps, "SWAPPING 2 × 2 BLOCKS IN THE SCHUR AND GENERALIZED SCHUR FORM", shows that the accuracy of swapping 2x2 blocks can be improved by iterative refinement and by replacing the QR factorisation with an SVD based method. Only the first improvement was implemented as the paper does not mention how to avoid overflow.

2.3 Householder Reconstruction

new file:   SRC/sorhr_col.f
new file:   SRC/dorhr_col.f
new file:   SRC/cunhr_col.f
new file:   SRC/zunhr_col.f
new file:   SRC/dorgtsqr.f
new file:   SRC/sorgtsqr.f
new file:   SRC/dgetsqrhrt.f
new file:   SRC/dlarfb_gett.f
new file:   SRC/dorgtsqr_row.f
new file:   SRC/sgetsqrhrt.f
new file:   SRC/slarfb_gett.f
new file:   SRC/sorgtsqr_row.f
new file:   SRC/cgetsqrhrt.f
new file:   SRC/clarfb_gett.f
new file:   SRC/cungtsqr_row.f
new file:   SRC/zgetsqrhrt.f
new file:   SRC/zlarfb_gett.f
new file:   SRC/zungtsqr_row.f

GETSQRHRT is a QR factorization routine for tall-skinny matrices. It is based on LATSQR, but returns the Q and R factors in the same format as GEQRT, i.e. using the compact WY representation of Q.

This is the same format used by other LAPACK routines that depend on QR factorizations. GETSQRHRT uses ORGTSQR_ROW (which in turn calls LARFB_GETT) to construct an orthonormal matrix from a TSQR factorization, which is a more efficient version of ORGTSQR.

2.4 Change in the return behavior of GESDD

GESDD now returns with INFO = -4 if A has a NaN entry.

In version 3.9.0 and master, GESDD would call exit(0) instead of returning an error code on some matrices. See Issue #469

2.5 Bug fixes

For details please see our Github repository

2.6 Notes about compiler dependency

Some LAPACK routines rely on trustworthy complex division and ABS routines in the FORTRAN compiler. This [link](https://github.com/Reference-LAPACK/lapack/files/6672436/complexDivisionFound.txt) lists the LAPACK COMPLEX*16 algorithms that contain compiler dependent complex divisions of the form

REAL / COMPLEX   or   COMPLEX / COMPLEX

See Issue #575 and Issue #577 for a more complete discussion on this topic.

3. Developer list

LAPACK developers involved in this release
  • Weslley da Silva Pereira (University of Colorado Denver, USA)

  • Julie Langou (University of Tennessee, USA)

  • Igor Kozachenko (University of California, Berkeley, USA)

Principal Investigators
  • Jim Demmel (University of California, Berkeley, USA)

  • Jack Dongarra (University of Tennessee and ORNL, USA)

  • Julien Langou (University of Colorado Denver, USA)

4. Thanks

Github contribution details here

5. Bug Fix