Summary of improvements since LAPACK 3.0

1. Improvements in LAPACK 3.1.0

See Release Notes for 3.1.0 for more details

Hessenberg QR algorithm with the small bulge multi-shift QR algorithm together with aggressive early deflation. This is an implementation of the 2003 SIAM SIAG LA Prize winning algorithm of Braman, Byers and Mathias, that significantly speeds up the nonsymmetric eigenproblem.
Improvements of the Hessenberg reduction subroutines. These accelerate the first phase of the nonsymmetric eigenvalue problem.
New MRRR eigenvalue algorithms that also support subset computations. These implementations of the 2006 SIAM SIAG LA Prize winning algorithm of Dhillon and Parlett are also significantly more accurate than the version in LAPACK 3.0.
Mixed precision iterative refinement subroutines for exploiting fast single precision hardware. On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. These are prototype routines in the sense that their interfaces might changed based on user feedback.
New partial column norm updating strategy for QR factorization with column pivoting. This fixes a subtle numerical bug dating back to LINPACK that can give completely wrong results.
Thread safety: Removed all the SAVE and DATA statements (or provided alternate routines without those statements), increasing reliability on SMPs.
Additional support for matrices with NaN/subnormal elements, optimization of the balancing subroutines, improving reliability.

2. Improvements in LAPACK 3.1.1

See Release Notes for 3.1.1 for more details

Add BLAS routines so that the BLAS provided is complete
Provide 5 flavours of SECOND and DSECND and ddd the TIMER variable in the make.inc to chose which routine to use

2. Improvements in LAPACK 3.2.0

See Release Notes for 3.2 for more details

Extra Precise Iterative Refinement: New linear solvers that “guarantee” fully accurate answers (or give a warning that the answer cannot be trusted). The matrix types supported in this release are: GE (general), SY (symmetric), PO (positive definite), HE (Hermitian), and GB (general band) in all the relevant precisions.
XBLAS, or portable “extra precise BLAS”: Our new linear solvers in (1) depend on these to perform iterative refinement. The XBLAS will be released in a separate package.
Non-Negative Diagonals from Householder QR: The QR factorization routines now guarantee that the diagonal is both real and non-negative. Factoring a uniformly random matrix now correctly generates an orthogonal Q from the Haar distribution.
High Performance QR and Householder Reflections on Low-Profile Matrices: The auxiliary routines to apply Householder reflections (e.g. DLARFB) automatically reduce the cost of QR from O(n³) to O(n²+nb²) for matrices stored in a dense format for band matrices with bandwidth b with no user interface changes. Other users of these routines can see similar benefits in particular on “narrow profile” matrices.
New fast and accurate Jacobi SVD: High accuracy SVD routine for dense matrices, which can compute tiny singular values to many more correct digits than xGESVD when the matrix has columns differing widely in norm, and usually runs faster than xGESVD too.
Routines for Rectangular Full Packed format: The RFP format (SF, HF, PF, TF) enables efficient routines with optimal storage for symmetric, Hermitian or triangular matrices. Since these routines utilize the Level 3 BLAS, they are generally much more efficient than the existing packed storage routines (SP, HP, PP, TP).
Pivoted Cholesky: The Cholesky factorization with diagonal pivoting for symmetric positive semi-definite matrices. Pivoting is required for reliable rank detection.
Mixed precision iterative refinement routines for exploiting fast single precision hardware: On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. The matrix types supported in this release are: GE (general), PO (positive definite).
Some new variants added for the one sided factorization: LU gets right-looking, left-looking, Crout and Recursive), QR gets right-looking and left-looking, Cholesky gets left-looking, right-looking and top-looking. Depending on the computer architecture (or speed of the underlying BLAS), one of these variants may be faster than the original LAPACK implementation.
More robust DQDS algorithm: Fixed some rare convergence failures for the bidiagonal DQDS SVD routine.
Better documentation for the multishift Hessenberg QR algorithm with early aggressive deflation, and various improvements of the code.

3. Improvements in LAPACK 3.2.1

See Release Notes for 3.2.1 for more details

change xLARRD (MRRR routines) to deal with some matrices from Godunov

4. Improvements in LAPACK 3.3.0

See Release Notes for 3.3.0 for more details

Standard C language APIs for LAPACK: The standard interfaces include support for Fortran and C data formats (column-major and row-major) to ease interoperability with, and migration of, Fortran code. Full LAPACK functionality is now accessible in a C-friendly manner. These new interfaces are a key feature of the new LAPACK 3.3 release and the just released Intel MKL 10.3 product. PLASMA 2.3 relies on it.
Computing the complete CS decomposition: Compute the complete CS decomposition of a partitioned unitary matrix. The algorithm is designed for numerical stability. See: Brian D. Sutton. Computing the complete CS decomposition. Numer. Algorithms. 50 (2009), no. 1, 33â€“65.
Level-3 BLAS symmetric indefinite solve and symmetric indefinite inversion: This enables better performance for xSYTRF and xSYTRI.
Thread safe xLAMCH: SLAMCH and DLAMCH were the only two routines not thread-safe in LAPACK-3.2. This is fixed: all routines in LAPACK-3.3 are now thread-safe.

4. Improvements in LAPACK 3.3.1

See Release Notes for 3.3.1 for more details

Improve the CMAKE build system.

Added a FindBLAS module to easily search for and link against various optimized BLAS implementations
Added proper default compiler flags for various commercial Fortran compilers
Check for and disable SIGFPE handlers (LAPACK code handles these internally) on known compilers
Remove testing drivers and BLAS source from nightly code coverage to get a more accurate representation of the LAPACK code getting exercised
Suppress harmless warning messages for various platforms and compilers
Added nightly builds (http://my.cdash.org/index.php?project=LAPACK)
Standardized binary output locations to play nicely when building Windows DLLs
Cleaned up CMake packaging to allow find_package(LAPACK) in applications
Added missing link dependencies for test libraries

5. Improvements in LAPACK 3.4.0

See Release Notes for 3.4.0 for more details

xGEQRT: QR factorization (improved interface). Contribution by Rodney James (UC Denver). xGEQRT is analogous to xGEQRF with a modified interface which enables better performance when the blocked reflectors need to be reused. The companion subroutines xGEMQRT apply the reflectors.
xGEQRT3: recursive QR factorization. Contribution by Rodney James (UC Denver). The recursive QR factorization enables cache-oblivious and enables high performance on tall and skinny matrices.
xTPQRT: Communication-Avoiding QR sequential kernels. Contribution by Rodney James (UC Denver). These subroutines are useful for updating a QR factorization and are used in sequential and parallel Communication Avoiding QR. These subroutines support the general case Triangle on top of Pentagon which includes as special cases so-called Triangle on top of Triangle and Triangle on top of Square. This is the right-looking version of the subroutines and the subroutines are blocked. The T matrices and the block size are part of the interface. The companion subroutines xTPMQRT apply the reflectors.
CMAKE build system. We are striving to help our users install our libraries seamlessly on their machines. The CMAKE team contributed to our effort to port LAPACK and ScaLAPACK under the CMAKE build system. Building under Windows has never been easier. This also allows us to release dll for Windows, so users no longer need a Fortran compiler to use LAPACK under Windows.
Doxygen documentation. LAPACK routine documentation has never been more accessible. See http://www.netlib.org/lapack/explore-html/.
New website allowing for easier navigation.
LAPACKE - Standard C language APIs for LAPACK. Since LAPACK 3.3.0, LAPACK includes new C interfaces. With the LAPACK 3.4.0 release, LAPACKE is directly integrated within the LAPACK library and has been enriched by the full set of LAPACK subroutines. See here.

6. Improvements in LAPACK 3.4.1

See Release Notes for 3.4.1 for more details

BLAS test codes now using F90 machine epsilon, optimization now works
Fix Thread safety issue. xLARFT routines modified so that V in now [in] only instead of [in/out], now thread safe
LAPACK test suite modified so that all tests will pass with gfortran (and ifort with -O0, and hopefully others)

7. Improvements in LAPACK 3.4.2

See Release Notes for 3.4.2 for more details

Modified all of the matrix norm routines. Now, if a matrix contains a NaN, a NaN will be returned for all choices of norm. Previously, if a matrix contained a NaN, the norm routines would not return necessarily a NaN. (They would return regular floating-point numbers in some cases.)

8. Improvements in LAPACK 3.5.0

See Release Notes for 3.5.0 for more details

xSYSV_ROOK: LDL^T with rook pivoting. Contribution by Craig Lucas (University of Manchester and NAG) and Sven Hammarling (NAG). These subroutines enable better stability than the Bunch-Kaufman pivoting scheme currently used in LAPACK xSYSV solvers; also, the elements of L are bounded, which is important in some applications. The computational time of xSYSV_ROOK is slightly higher than the one of xSYSV.
2-by-1 CSD to be used for tall and skinny matrix with orthonormal columns (in LAPCK 3.4.0, we already integrated CSD of a full square orthogonal matrix) More info here: http://faculty.rmc.edu/bsutton/simultaneousbidiagonalization.html
New stopping criteria for balancing.
New complex division algorithm. Following the algorithm of Michael Baudin, Robert L. Smith (see arxiv.1210.4539 )
Other improvements:
- slasyf.f, clasyf.f, clahef.f, dlasyf.f, zlahef.f, zlasyf.f :: various improvements by Igor Kozachenko
- dhgeqz.f, shgeqz.f :: improvements by Duncan Po (MathWorks), committed by Rodney James
- dlasd4.f, slasd4.f :: improvements by Sergey Kuznetsov (Intel), committed by Rodney James
- clarfb.f, dlarfb.f, slarfb.f, zlarfb.f :: improvements by Rodney James

9. Improvements in LAPACK 3.6.0

See Release Notes for 3.6.0 for more details

10. Improvements in LAPACK 3.7.0

See Release Notes for 3.7.0 for more details

11. Improvements in LAPACK 3.8.0

See Release Notes for 3.8.0 for more details