================ == LAPACK 3.2 == ================ Release date: Su 11/16/2008. This material is based upon work supported by the National Science Foundation, the Department of Energy and the MathWorks under Grant No. NSF-CCF-00444486, NSF-CNS-0325873, NSF-EIA 0122599, NSF-ACI-0090127, DOE-DE-FC02-01ER25478, DOE-DE-FC02-06ER25768. * LAPACK 3.2: What's new * References * Contributor list * Developer list * Interface changes * More details * Expected additions and improvements for the future = ============================= == LAPACK 3.2: What's new == ============================= (1) Extra Precise Iterative Refinement: New linear solvers that "guarantee" fully accurate answers (or give a warning that the answer cannot be trusted). The matrix types supported in this release are: GE (general), SY (symmetric), PO (positive definite), HE (Hermitian), and GB (general band) in all the relevant precisions. See reference [3] below. (2) XBLAS, or portable "extra precise BLAS": our new linear solvers in (1) depend on these to perform iterative refinement. See reference [3] below. The XBLAS will be released in a separarate package. See "More Details". (3) Non-Negative Diagonals from Householder QR: The QR factorization routines now guarantee that the diagonal is both real and non-negative. Factoring a uniformly random matrix now correctly generates an orthogonal Q from the Haar distribution. See reference [4] below. (4) High Performance QR and Householder Reflections on Low-Profile Matrices: The auxiliary routines to apply Householder reflections (e.g. DLARFB) automatically reduce the cost of QR from O(n^3) to O(n^2) for matrices stored in a dense format but with a "narrow profile" (including but not limited to band matrices) with no user interface changes. Other users of these routines can see similar benefits. See reference [4] below. (5) New fast and accurate Jacobi SVD: High accuracy SVD routine for dense matrices, which can compute tiny singular values to many more correct digits than xGESVD when the matrix has columns differing widely in norm, and usually runs faster than xGESVD too. See references [5,6,7] below. (6) Routines for Rectangular Full Packed format: The RFP format (SF, HF, PF, TF) enables efficient routines with optimal storage for symmetric, Hermitian or triangular matrices. Since these routines utilise the Level 3 BLAS, they are generally much more efficient than the existing packed storage routines (SP, HP, PP, TP). See reference [8] below. (7) Pivoted Cholesky: The Cholesky factorization with diagonal pivoting for symmetric positive semi-definite matrices. Pivoting is required for reliable rank detection. See reference [9] below. (8) Mixed precision iterative refinement routines for exploiting fast single precision hardware. On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. The matrix types supported in this release are: GE (general), PO (positive definite). See reference [1] below. (9) Some new variants added for the one sided factorization: LU gets Right-Looking, Left-Looking, Crout and Recursive), QR gets Right-Looking and Left-Looking, Cholesky gets Left-Looking, Right-Looking and Top-Looking. Depending on the computer architecture (or speed of the underlying BLAS), one of these variants may be faster than the original LAPACK implementation." (10) More robust DQDS: Fixed some rare convergence failures for the bidiagonal DQDS SVD routine. (11) Better documentation for the multishift Hessenberg QR algorithm with early agressive delfation, and various improvements of the code. ================ == References == ================ [1] Alfredo Buttari, Jack Dongarra, Julie Langou, Julien Langou, Piotr Luszczek, and Jakub Kurzak. Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems International Journal of High Performance Computing Applications, 21(4):457-466, 2007. [2] Ralph Byers. LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation. LAPACK Working Note 187, May 2007. [3] James Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Sonil Mukherjee, and E. Jason Riedy. Error Bounds from Extra Precise Iterative Refinement. ACM Transactions on Mathematical Software (TOMS), 32(2):325-351, 2006. (Also LAWN-165). [4] James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy. Non-Negative Diagonals and High Performance on Low-Profile Matrices from Householder QR. LAPACK Working Note 203, May 2008. [5] Zlatko Drmac. A global convergence proof of cyclic Jacobi methods with block rotations. LAPACK Working Note 196, December 2007. [6] Zlatko Drmac and Kresimir Veselic. New fast and accurate Jacobi SVD algorithm: I. SIAM Journal on Matrix Analysis and Applications, 29(4):1322-1342, 2007. (Also LAWN-169). [7] Zlatko Drmac and Kresimir Veselic. New fast and accurate Jacobi SVD algorithm: II. SIAM Journal on Matrix Analysis and Applications, 29(4):1343-1362, 2007. (Also LAWN-170). [8] Fred G. Gustravson, Jerzy Wasniewski, and Jack J. Dongarra. Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion. LAPACK Working Note 199, April 2008. [9] Craig Lucas. LAPACK-Style Codes for Level 2 and 3 Pivoted Cholesky Factorizations. LAPACK Working Note 161, February 2004. ================== == Contributors == ================== Ralph Byers (University of Kansas, USA) Zlatko Drmac (University of Zagreb, Croatia) Peng Du (University of Tennessee, Knoxville, USA) Fred Gustavson (IBM Watson Research Center, NY, US) Craig Lucas (University of Manchester / NAG Ltd., UK) Kresimir Veselic (Fernuniversitaet Hagen, Hagen, Germany) Jerzy Wasniewski (Technical University of Denmark, Lyngby, Copenhagen, Denmark) ====================================== == Thanks for bug-report/patches to == ====================================== Fernando Guevara (Dept. of Mathematics, University of Utah) =========================== = Principal Investigators = =========================== Jim Demmel (University of California at Berkeley, USA) Jack Dongarra (University of Tennessee and ORNL, USA) ================================================ == LAPACK developers involved in this release == ================================================ Deaglan Halligan (University of California at Berkeley, USA) Sven Hammarling (NAG Ltd., UK) Yozo Hida (University of California at Berkeley, USA) Daniel Kressner (ETH Zurich, Switzerland) Julie Langou (University of Tennessee, USA) Julien Langou (Uinversity of Colorado Denver, USA) Osni Marques (Lawrence Berkeley Laboratory, USA) E. Jason Riedy (University of California at Berkeley, USA) Edward Smyth (NAG Ltd., UK) ================================================ == XBLAS developers involved in this release == ================================================ David Bailey (Lawrence Berkeley Laboratory, USA) Deaglan Halligan (University of California at Berkeley, USA) Greg Henry (Intel) Yozo Hida (University of California at Berkeley, USA) Jimmy Iskandar (University of California at Berkeley, USA) William Kahan (University of California at Berkeley, USA) Anil Kapur (University of California at Berkeley, USA) Suh Y. Kang (University of California at Berkeley, USA) Xiaoye Li (Lawrence Berkeley Laboratory, USA) Sonil Mukherjee (University of California at Berkeley, USA) Jason Riedy (University of California at Berkeley, USA) Michael Martin (University of California at Berkeley, USA) Brandon Thompson (University of California at Berkeley, USA) Teresa Tung (University of California at Berkeley, USA) Daniel Yoo (University of California at Berkeley, USA) ====================== == Install Procedure = ====================== * YOU NEED F90 !!! * XBLAS and iterref integration * VARIANTS integration ====================== == Interface change == ====================== There are interface changes from LAPACK versions 3.1 to 3.2 for routines: DSGESV ZCGESV ================= == More details = ================= ----------------------------------------------------------------------- (1) Extra Precise Iterative Refinement ----------------------------------------------------------------------- The matrix types supported in this release are 1. GE (general) 2. SY (symmetric) 3. PO (positive definite) 4. HE (Hermitian) 5. GB (general band) in all the relevant precisions. ----------------------------------------------------------------------- (2) XBLAS, or portable "extra precise BLAS" ----------------------------------------------------------------------- ----------------------------------------------------------------------- (3) Non-Negative Diagonals and High Performance on Low-Profile Matrices from Householder QR ----------------------------------------------------------------------- * contributors: James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy. * lapacker: Jason Riedy. * see: James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy "Non-Negative Diagonals and High Performance on Low-Profile Matrices from Householder QR.", LAPACK Working Note 203, UCB/EECS-2008-76, May 30, 2008. ----------------------------------------------------------------------- (4) New fast and accurate Jacobi SVD ----------------------------------------------------------------------- * contributors: Zlatko Drmac and Kresemir Veselic. * lapacker: Julien Langou. ----------------------------------------------------------------------- (5) Rectangular Full Packed format ----------------------------------------------------------------------- * contributors: Fred Gustavson and Jerzy Wasniewski. * lapacker: Julien Langou. ----------------------------------------------------------------------- (6) Pivoted Cholesky ----------------------------------------------------------------------- * contributor: Craig Lucas. * lapacker: Jason Riedy. ----------------------------------------------------------------------- (7) Mixed precision iterative refinement subroutines for exploiting fast single precision hardware ----------------------------------------------------------------------- * contributors: Julie Langou * lapackers: Julie Langou ----------------------------------------------------------------------- (8) Add some variants for the one sided factorization ----------------------------------------------------------------------- * contributors: Peng Du and Jason Riedy. * lapackers: Julie Langou and Jason Riedy. * see: LAPACK QR blocked factorization (xGEQRF) is Right-Looking, - add the Left-Looking variant. (Peng) LAPACK Cholesky blocked factorization (xPOTRF) is Left-Looking, - add the Right-Looking variant. (Peng) - add the Top-Looking variant. (Peng) LAPACK LU blocked factorization (xGETRF) is Right-Looking, - add the Right-Looking variant. (Peng) - add the Crout variant. (Peng) - add the recursive variant. (Jason), in F77, please. ----------------------------------------------------------------------- (9) Bug fixes for the bidiagonal SVD routine that fixes some rare convergence failures. ----------------------------------------------------------------------- * contributors: Osni Marques and Beresford Parlett. * lapackers: Osni Marques, Jim Demmel, and Julien Langou. ----------------------------------------------------------------------- (10) New TTQRE from Ralf Byers. ----------------------------------------------------------------------- * contributor: Ralph Byers. * lapacker: Edward Smyth, Daniel Kressner. Most of the revisions are fixing typographical errors, but there are a few revisions that have a small affect on how the program works. Even these are relatively minor revisions: o revised the choice of the size of the deflation window slightly to make the code a little more robust against convergence failures. o revised the section of code that tries to reintroduce bulges after they have collapsed due to underflow. The new version is cleaner and more robust. o revised xLAQR1 so that it does not assume that H(2,1) is real. A code ought to do what it claims to do and in the complex case, this small subroutine didn't quite do it. ======================================================= == Expected additions and improvements for the future = ======================================================= * Have a new QZ, see: Bo Kagstrom and Daniel Kressner. Multishift variants of the QZ algorithm with aggressive early deflation. SIAM J. Matrix Anal. Appl., 29(1):199-227, 2006. * Have a new block reordering algorithm: Daniel Kressner. Block algorithms for reordering standard and generalized Schur forms. ACM Trans. Math. Software, 32(4):521-532, 2006. * Add the accurate and efficient Givens rotations from David Bindel, Jim Demmel, W. Kahan, and Osni Marques. See http://www.cs.berkeley.edu/~demmel/Givens/ and: David Bindel, James Demmel, William Kahan, and Osni Marques On computing givens rotations reliably and efficiently ACM Transactions on Mathematical Software (TOMS) Volume 28, Issue 2, 2002. Pages: 206-238. * Change the default Cholesky factorization in SRC from right--looking to left--looking, move left--loking to the VARIANTS directory. * Add some recursive variants for QR and Cholesky. * Remove IEEE=.FALSE. in DQDS (DLASQ3, SLASQ3 of Osni). * Look at the Matlab laundry list (sent by Penny Anderson). * Support more matrix types for extra-precise iterative refinement. Matrix types SB (symmetric band), PB (positive definite band), HB (Hermitian band), and packed storage. Tridiagonal types such as GT (general tridiagonal) are also on the wish list but first we need to derive adequate test cases.