Poor Performance
Next: Index of Driver
Up: Troubleshooting
Previous: Wrong Results
We have tried to make
the performance of LAPACK ``transportable'' by performing most of
the computation within the Level 1, 2, and 3 BLAS, and by isolating
all of the machine-dependent tuning parameters
in a single integer function ILAENV .
To avoid poor performance from LAPACK
routines, note the
following recommendations :
- BLAS:
-
One should use BLAS that have been optimized for the machine being used
if they are available.
Many manufacturers and research institutions have developed, or are
developing, efficient versions of the BLAS for particular machines.
A portable set of Fortran BLAS is supplied with LAPACK
and can always be used if no other BLAS are available or if
there is a suspected problem in the local BLAS library, but
no attempt has been made to structure the Fortran BLAS for
high performance.
- ILAENV:
- For best performance, the LAPACK routine ILAENV
should be set with optimal tuning parameters for the machine being used.
The version of ILAENV provided with LAPACK supplies default values
for these parameters that give good, but not optimal, average
case performance on a range of existing machines.
In particular, the performance of xHSEQR is particularly sensitive to
the correct choice of block parameters; the same applies to the driver
routines which call xHSEQR, namely xGEES, xGEESX, xGEEV and xGEEVX.
Further details on setting parameters in ILAENV are found in
section 6.
- LWORK WORK(1):
-
The performance of some routines depends on the amount of workspace
supplied. In such cases,
an argument, usually called WORK, is
provided, accompanied by an integer argument LWORK specifying its
length as a linear array.
On exit, WORK(1) returns the amount of workspace required to use
the optimal tuning parameters.
If LWORK < WORK(1), then insufficient workspace was provided
to use the optimal parameters, and the performance may be less
than possible.
One should check that LWORK WORK(1) on return from
an LAPACK routine requiring user-supplied workspace to see if
enough workspace has been provided.
Note that the computation is performed correctly, even if the amount of
workspace is less than optimal, unless LWORK is reported as an
invalid value by a call to XERBLA as described in Section 7.2.
- xLAMCH:
- Users should beware of the high cost of the first
call to the LAPACK auxiliary routine xLAMCH,
which computes
machine characteristics such as epsilon and the
smallest invertible number.
The first call dynamically determines a set of parameters defining
the machine's arithmetic, but these values are saved and subsequent
calls incur only a trivial cost.
For performance testing, the initial cost can be hidden by
including a call to xLAMCH in the main program, before any calls to
LAPACK routines that will be timed. A sample use of SLAMCH is
XXXXXX = SLAMCH( 'P' )
or in double precision:
XXXXXX = DLAMCH( 'P' )
A cleaner but less portable solution is for the installer to
save the values computed by xLAMCH for a specific machine
and create a new version of xLAMCH with these constants set in
DATA statements, taking care that no accuracy is lost in the
translation.
Next: Index of Driver
Up: Troubleshooting
Previous: Wrong Results
Tue Nov 29 14:03:33 EST 1994