next up previous contents index
Next: Accuracy and Stability Up: Performance of LAPACK95 Previous: Performance Issues   Contents   Index


Performance Tables


Table 4.1: Computer used for running the performance timing
Computer Processor OS Compiler BLAS
name name version version & options library
COMPAQ Alpha EV6 OSF1 - Tru64 f90 v. 5.3 CXML

@ 500 MHz v. 4.0 Compaq Fortran v. 3.5

    -O3  
IBM PowerPC 604e AIX xlf95 v. 6.1 ESSL

@ 332 MHz v. 4.3.3 IBM compiler v. 3.1.1

    -O3 -qstrict  
IBM Power2 AIX xlf95 v. 6.1 ESSL
  @ 67 MHz v. 4.3.3 IBM compiler v.3.1.1

    -O3 -qstrict  

    -qarch=pwr2  
SUN UltraSparc II SUNOS f90 v. 5.0 SUNPERF

@ 400 MHz v. 5.7 Workshop compiler v. 2.0

    -fast -xtarget=ultra2  

    -xarch=v8plusa  
INTEL Pentium III Linux RedHat f95 v. 1.0 ATLAS

@ 500 MHz v. 6.1 NAG compiler v. 3.0

    -O3  
SGI R12000 IRIX64 f90 v. 7.30 ATLAS

@ 300 MHz v.6.5 MIPSpro compiler v. 3.0

    -O3  

In this section we give performance results, in megaflops, for some basic computations on a variety of computers. Table 4.1 lists the computers used, processor names, operating system versions, compiler versions and the BLAS library versions. Regarding the latter, ATLAS refers to BLAS obtained from the ATLAS system; see Section 1.5.3. ESSL is the Engineering Scientific Subroutine Library [23,22] for IBM computers; it also contains the IBM specialized BLAS. SUNPERF is the Sun WorkShop(TM) 6 Performance Library [41]; it also contains the SUN specialized BLAS. CXML is the Compaq Extended Math Library [7]; it also contains the Compaq Alpha specialized BLAS.
Each of the performance tables below gives the performance of a specific LAPACK95 driver routine and, in addition, the performance obtained by using LAPACK directly; i.e., without the LAPACK95 interface. A table is arranged as follows: Column 1 identifies the computers and their processors. Column 2 gives the optimal block size (Section 1.5.1.1). Column 3 specifies the type of data: D, real double precision; S, real single precision; Z, complex double precision; C, complex single precision. Columns 4 and 5 give the megaflop counts achieved by LAPACK, without the Fortran 95 interface, for problems of order 100 and 1000, respectively. Columns 6 and 7 give the megaflop counts for the same problems when the LAPACK95 driver routine named in the figure caption is used.
Each of the megaflop counts in the tables was obtained as follows: The problem was run 10 times, and each time the elapsed time was measured using the Fortran 95 command CPU_TIME. The megaflop rate was then computed from the formula

\begin{displaymath}Mflop\index{megaflops} = \frac{\alpha\times n^3}{t \times 10^6} \end{displaymath}

where $t$ is the average of the 10 elapsed times and $\alpha$ is given in Table 4.2.

Table: Floating point coefficient of operation counts for LAPACK drivers for $n\times n$ matrices (see also Table 3.13 of [1]). The number of operations is $\alpha \times n^3$.
Driver Options $\alpha$
LA_GESV 1 right hand side 0.67
LA_GEEV eigenvalues only 10.00
LA_GEEV eigenvalues and right eigenvectors 26.33
LA_GES{VD,DD} singular values only 2.67
LA_GES{VD,DD} singular values, and left and right singular vectors 6.67


The driver routines timed are:

Table: Performance of LA_GESV in megaflops; $n = 100$ and $1000$.
Computer / Block Data LAPACK LAPACK95
Processor size type 100 1000 100 1000
COMPAQ Alpha EV6 28 D 402 732 402 679
@ 500 MHz S 402 789 402 755
Z 80 152 80 151
C 81 174 81 171
IBM PowerPC 604e 32 D 104 271 101 271
@ 332 MHz S 145 333 145 333
Z 57 243 57 226
C 67 112 67 111
IBM Power2 32 D 67 236 67 235
@ 67 MHz S 67 218 67 218
Z 33 58 33 58
C 33 60 33 59
SUN UltraSparc II 64 D 109 177 109 172
@ 400 MHz S 130 247 155 249
Z 40 35 37 35
C 40 46 39 46
INTEL Pentium III 40 D 67 251 67 251
@ 500 MHz S 67 314 67 314
Z 34 71 34 71
C 66 88 66 88
SGI R12000 64 D 182 442 190 445
@ 300 MHz S 242 340 242 344
Z 59 113 60 114
C 65 127 66 127


Table: Performance of LA_GEEV in megaflops (eigenvalues only); $n = 100$ and $1000$.
Computer / Block Data LAPACK LAPACK95
Processor size type 100 1000 100 1000
COMPAQ Alpha EV6 28 D 272 262 270 263
@ 500 MHz S 300 369 300 372
IBM PowerPC 604e 32 D 109 87 109 99
@ 332 MHz S 117 128 117 161
IBM Power2 32 D 49 76 50 76
@ 67 MHz S 51 94 51 98
SUN UltraSparc II 64 D 105 95 118 97
@ 400 MHz S 152 176 171 173
INTEL Pentium III 40 D 95 97 91 93
@ 500 MHz S 142 175 142 175
SGI R12000 64 D 171 236 177 233
@ 300 MHz S 230 407 230 411


Table: Performance of LA_GEEV in megaflops (eigenvalues and right eigenvectors); $n = 100$ and $1000$.
Computer Block Data LAPACK LAPACK95
Processor size type 100 1000 100 1000
COMPAQ Alpha EV6 28 D 267 268 267 268
@ 500 MHz S 376 437 351 437
IBM PowerPC 604e 32 D 141 81 141 85
@ 332 MHz S 148 138 147 171
IBM Power2 32 D 69 70 68 70
@ 67 MHz S 63 95 63 108
SUN UltraSparc II 64 D 104 92 100 99
@ 400 MHz S 197 181 199 183
INTEL Pentium III 40 D 112 105 112 111
@ 500 MHz S 181 201 181 207
SGI R12000 64 D 246 241 249 257
@ 300 MHz S 325 500 325 479


Table: Performance of LA_GESVD in megaflops (singular values and left and right singular vectors); $n = 100$ and 1000.
Computer / Block Data LAPACK LAPACK95
Processor size type 100 1000 100 1000
COMPAQ Alpha EV6 28 D 130 60 129 60
@ 500 MHz   S 174 105 181 104
IBM PowerPC 604e 32 D 43 22 43 22
@ 332 MHz   S 56 29 56 29
IBM Power2 32 D 32 11 32 11
@ 67 MHz   S 32 17 32 16
SUN UltraSparc II 64 D 52 15 52 13
@ 400 MHz   S 65 33 64 32
INTEL Pentium III 40 D 49 31 49 31
@ 500 MHz   S 66 51 66 47
SGI R12000 64 D 90 42 90 42
@ 300 MHz   S 129 100 129 97


Table: Performance of LA_GESDD in megaflops (singular values only); $n = 100$ and $1000$.
Computer / Block Data LAPACK LAPACK95
Processor size type 100 1000 100 1000
COMPAQ Alpha EV6 28 D 267 300 267 293
@ 500 MHz   S 285 459 236 456
IBM PowerPC 604e 32 D 78 83 78 83
@ 332 MHz   S 110 119 110 119
IBM Power2 32 D 53 136 53 136
@ 67 MHz   S 56 134 66 138
SUN UltraSparc II 64 D 85 87 85 87
@ 400 MHz   S 140 150 119 144
INTEL Pentium III 40 D 89 121 89 121
@ 500 MHz   S 133 180 133 179
SGI R12000 64 D 134 280 134 280
@ 300 MHz   S 201 369 202 369


Table: Performance of LA_GESDD in megaflops (singular values and left and right singular vectors); $n = 100$ and $1000$.
Computer / Block Data LAPACK LAPACK95
Processor size type 100 1000 100 1000
COMPAQ Alpha EV6 28 D 210 372 200 355
@ 500 MHz   S 285 486 235 485
    C 88 108 88 107
    Z 100 135 80 132
IBM PowerPC 604e 32 D 74 118 74 118
@ 332 MHz   S 96 161 96 161
    Z 45 135 46 132
    C 50 91 50 92
IBM Power2 32 D 41 121 41 121
@ 67 MHz   S 43 124 43 124
    Z 23 45 23 45
    C 24 47 24 47
SUN UltraSparc II 64 D 81 86 81 77
@ 400 MHz   S 112 146 112 146
    Z 32 18 30 18
    C 35 37 34 36
INTEL Pentium III 40 D 78 146 74 146
@ 500 MHz   S 95 191 95 194
    Z 30 51 30 51
    C 39 65 39 65
SGI R12000 64 D 111 272 111 272
@ 300 MHz   S 169 311 170 311
    Z 50 77 52 78
    C 59 110 59 109


next up previous contents index
Next: Accuracy and Stability Up: Performance of LAPACK95 Previous: Performance Issues   Contents   Index
Susan Blackford 2001-08-19