This section contains performance numbers for selected driver routines. These routines provide complete solutions for the most common problems of numerical linear algebra and are the routines users are most likely to call:
Data is provided for a variety of distributed-memory concurrent computers. All timings were obtained by using the machine-specific optimized BLAS available on each machine. For the IBM Scalable POWERparallel 2, the ESSL BLAS were used. In all cases the data consisted of 64-bit floating-point numbers. For each machine and each driver, a range of problems was run on different number of processors. Different physical distribution block sizes were tried, with data for the fastest run reported in the tables below. Similarly, whenever applicable, UPLO=`L' and UPLO=`U' were timed, but times are reported only for UPLO=`U'. The test matrices were generated with randomly distributed entries. All run times are reported in seconds, and block size is denoted by nb. The value of the physical distribution block size as well as the process grid shape was chosen to make N=2000 optimal. It is not necessarily the best choice for the entire range of problem sizes.
Table 4 presents ``standard'' floating-point operation counts for ScaLAPACK drivers.
Table 4: ``Standard'' Floating-Point Operation Counts for Some ScaLAPACK Drivers for N by N Matrices