This section contains performance numbers for selected LAPACK driver routines. These routines provide complete solutions for the most common problems of numerical linear algebra, and are the routines users are most likely to call:
Data is provided for a variety of vector computers, shared memory parallel computers, and high performance workstations. All timings were obtained by using the machine-specific optimized BLAS available on each machine. For the IBM RISC Sys/6000-550 and IBM POWER2 model 590, the ESSL BLAS were used. In all cases the data consisted of 64-bit floating point numbers (single precision on the CRAY C90 and double precision on the other machines). For each machine and each driver, a small problem (N = 100 with LDA = 101) and a large problem (N = 1000 with LDA = 1001) were run. Block sizes NB = 1, 16, 32 and 64 were tried, with data only for the fastest run reported in the tables below. Similarly, UPLO = 'L' and UPLO = 'U' were timed for SSYEVD/DSYEVD, but only times for UPLO = 'U' were reported. For SGEEV/DGEEV, ILO = 1 and IHI = N. The test matrices were generated with randomly distributed entries. All run times are reported in seconds, and block size is denoted by nb. The value of nb was chosen to make N = 1000 optimal. It is not necessarily the best choice for N = 100. See Section 6.2 for details.
The performance data is reported using three or four statistics. First, the run-time in seconds is given. The second statistic measures how well our performance compares to the speed of the BLAS, specifically SGEMM/DGEMM. This ``equivalent matrix multiplies'' statistic is calculated as
and labeled as in the tables.
The performance information for the BLAS routines
SGEMV/DGEMV (TRANS='N') and SGEMM/DGEMM (TRANSA='N', TRANSB='N') is provided in Table 3.8, along with the clock speed for each machine in Table 3.2. The third statistic is the true megaflop rating. For the eigenvalue and singular value drivers, a fourth ``synthetic megaflop'' statistic is also presented. We provide this statistic because the number of floating point operations needed to find eigenvalues and singular values depends on the input data, unlike linear equation solving or linear least squares solving with SGELS/DGELS. The synthetic megaflop rating is defined to be the ``standard'' number of flops required to solve the problem, divided by the run-time in microseconds. This ``standard'' number of flops is taken to be the average for a standard algorithm over a variety of problems, as given in Table 3.9 (we ignore terms of order ) .
Table 3.8: Execution time and Megaflop rates for SGEMV/DGEMV and SGEMM/DGEMM
Note that the synthetic megaflop rating is much higher than the true megaflop
SSYEVD/DSYEVD in Table 3.15; this is because SSYEVD/DSYEVD performs many fewer floating point operations than the standard algorithm, SSYEV/DSYEV.
Table 3.9: ``Standard'' floating point operation counts for LAPACK drivers for n-by-n matrices
Table 3.10: Performance of SGESV/DGESV for n-by-n matrices
Table 3.11: Performance of SGELS/DGELS for n-by-n matrices
Table 3.12: Performance of SGEEV/DGEEV, eigenvalues only
Table 3.13: Performance of SGEEV/DGEEV, eigenvalues and right eigenvectors
Table 3.14: Performance of SSYEVD/DSYEVD, eigenvalues only, UPLO='U'
Table 3.15: Performance of SSYEVD/DSYEVD, eigenvalues and eigenvectors, UPLO='U'
Table 3.16: Performance of SGESVD/DGESVD, singular values only
Table 3.17: Performance of SGESVD/DGESVD, singular values and left and right singular vectors