Frequently
Asked Questions on the Linpack Benchmark and Top500
What is the Linpack Benchmark?
What is the Linpack Benchmark report?
What is the reference for the Linpack
Benchmark Report?
What are the three benchmarks in the
Linpack Benchmark report?
What is the Linpack Fortran n = 100
benchmark?
What exactly does the Linpack Fortran
n=100 benchmark time?
What is the Linpack n = 1000 benchmark
(TPP, Best Effort)?
What is the Linpack’s “Highly Parallel
Computing” benchmark?
What are the ground rules for the first
benchmark?
What are the ground rules for the second
benchmark?
What are the ground rules for the third
benchmark?
To what accuracy must be the solution conform?
Can I get a more personalized list of
machine and performance results?
How can I get the Linpack Benchmark
program?
Is there a Java version of the Linpack
Benchmark?
What do I do to run the Linpack Benchmark
Program?
How does the Linpack Benchmark performance
relate to my application?
Are there errors in the Linpack Benchmark
report?
How can I get the complete Linpack
software collection?
Is Linpack the most efficient way to solve
systems of equations?
How can I get the whole LAPACK software
collection?
What is the history behind the Linpack
Benchmark?
How can I add my computer's result to the
table?
Should I run the single and double
precision of the benchmarks?
What about a list of clusters?
How can I interpret the results from the
benchmark?
What matrix is used to run the benchmark?
Where can I get a copy of the Top500
report?
How can I interpret the results from the
Linpack 100x100 benchmark?
Do you have an archive of previous Linpack
Benchmark reports or results?
Is there a benchmark for sparse matrices?
Where can I get additional information on
benchmarks?
The Linpack Benchmark is a measure of a
computer’s floating-point rate of execution. It is determined by running a
computer program that solves a dense system of linear equations. Over the years
the characteristics of the benchmark has changed a bit. In fact, there are
three benchmarks included in the Linpack Benchmark report.
The Linpack Benchmark is something that grew out
of the Linpack software project. It was originally intended to give users of
the package a feeling for how long it would take to solve certain matrix
problems. The benchmark stated as an appendix to the Linpack Users' Guide and
has grown since the Linpack User’s Guide was published in 1979.
The Linpack Benchmark report is entitled
“Performance of Various Computers Using Standard Linear Equations Software”.
The report lists the performance in Mflop/s of a number of computer systems. A
copy of the report is available at http://www.netlib.org/benchmark/performance.ps.
The reference for the Linpack Benchmark report
should be referenced in the following way:
“Performance of Various Computers Using Standard
Linear Equations Software”, Jack Dongarra, University of Tennessee, Knoxville
TN, 37996, Computer Science Technical Report Number CS - 89 – 85, today’s date,
url:http://www.netlib.org/benchmark/performance.ps.
Mflop/s is a rate of execution, millions of
floating point operations per second. Whenever this term is used it will refer
to 64 bit floating point operations and the operations will be either addition
or multiplication. Gflop/s refers to billions of floating point operations per
second and Tflop/s refers to trillions of floating point operations per second.
The three benchmarks in the Linpack Benchmark
report are for Linpack Fortran n = 100 benchmark (see Table 1 for the report),
Linpack n = 1000 benchmark (see Table 1 of the report), and Linpack’s Highly
Parallel Computing benchmark (see Table 3 of the report).
The first benchmark is for a matrix of order 100
using the Linpack software in Fortran. The results can be found in Table 1 of
the benchmark report. In order to run this benchmark download the file from http://www.netlib.org/benchmark/Linpackd,
this is a Fortran program. In order to run the program you will need to supply
a timing function called SECOND which should report the CPU time that has
elapsed. The ground rules for running this benchmark are that you can make no
changes to the Fortran code, not even to the comments. Only compiler
optimization can be used to enhance performance.
The Linpack benchmark measures the performance
of two routines from the Linpack collection of software. These routines are
DGEFA and DGESL (these are double-precision versions; SGEFA and SGESL are their
single-precision counterparts). DGEFA performs the LU decomposition with
partial pivoting, and DGESL uses that decomposition to solve the given system
of linear equations.
Most of the time is spent in DGEFA. Once the
matrix has been decomposed, DGESL is used to find the solution; this process
requires O(n2) floating-point operations, as opposed to the O(n3) floating-point operations
of DGEFA. The results for this
benchmark can be found in Table 1 second column under “LINPACK Benchmark n =
100” of the Linpack Benchmark Report.
The second benchmark is for a matrix of size
1000 and can be found in Table 1 of the benchmark report. In order to run this
benchmark download the file from http://www.netlib.org/benchmark/1000d,
this is a Fortran driver. The ground rules for running this benchmark are a bit
more relaxed in that you can specify any linear equation solve you wish,
implemented in any language. A requirement is that your method must compute a
solution and the solution must return a result to the prescribed accuracy. TPP
stands for Toward Peak Performance; this is the title of the column in the
benchmark report that lists the results.
The third benchmark is called the Highly
Parallel Computing Benchmark and can be found in Table 3 of the Benchmark
Report. (This is the benchmark use for the Top500 report). This benchmark
attempts to measure the best performance of a machine in solving a system of
equations. The problem size and software can be chosen to produce the best
performance.
http://www.netlib.org/benchmark/hpl/
The “ground rules” for running the first
benchmark in the report, n=100 case, are that the program is run as is with no
changes to the source code, not even changes to the comments are allowed. The
compiler through compiler switches can perform optimization at compile time.
The user must supply a timing function called SECOND. SECOND returns the
running CPU time for the process. The matrix generated by the benchmark program
must be used to run this case.
The “ground rules” for running the second
benchmark in the report, n=1000 case, allows for a complete user replacement of
the LU factorization and solver steps. The calling sequence should be the same
as the original routines. The problem
size should be of order 1000. The accuracy of the solution must satisfy the
following bound:
(On IEEE machines this is 2-53 ) and n is the size
of the problem. The matrix used must be the same matrix used in the driver
program available from netlib.
The “ground rules” for running the third
benchmark in the report, Highly Parallel case, allows for a complete user
replacement of the LU factorization and solver steps. The accuracy of the
solution must satisfy the following bound:
(On IEEE machines this is 2-53 ) and n is the size
of the problem. The matrix used must be the same matrix used in the driver
program available from netlib. There is no restriction on the problem size.
The solution to all three benchmarks must
satisfy the following mathematical formula:
(On IEEE machines this is 2-53 ) and n is the size
of the problem. This implies the computation must be done in 64 bit floating
point arithmetic.
In order to have an entry included in the
Linpack Benchmark report the results must be computed using full precision. By
full precision we generally mean 64 bit floating point arithmetic or higher.
Note that this is not an issue of single or double precision as some system
have 64-bit floating point arithmetic as single precision. It is a function of
the arithmetic used.
You can get a more personalized listing of
machines by using the interface at http://performance.netlib.org/performance/html/PDSbrowse.html
You can download the programs used to generate
the Linpack benchmark results by using the URL is http://www.netlib.org/benchmark/Linpack.
This is a Fortran program. There is a C version of the benchmark located at: http://www.netlib.org/benchmark/Linpackc.
There is a Java version of the benchmark that can be downloaded as an applet
at:
There is a Java program at:
http://www.netlib.org/benchmark/linpackjava/
There is a Java version of the benchmark that
can be downloaded as an applet at:
There is a Java program at: http://www.netlib.org/benchmark/linpackjava/
For the 100x100 based Fortran version, you need
to supply a timing function called SECOND. SECOND is an elapse timer function
that will be called from Fortran and is expected to return the running CPU time
in seconds. In the program two called to SECOND are made and the difference
taken to gather the time.
The performance of the Linpack benchmark is
typical for applications where the basic operation is based on vector
primitives such as added a scalar multiple of a vector to another vector. Many
applications exhibit the same performance as the Linpack Benchmark. However,
results should not be taken too seriously. In order to measure the performance
of any computer it’s critical to probe for the performance of your
applications. The Linpack Benchmark can only give one point of reference. In addition, in multiprogramming
environments it is often difficult to reliably measure the execution time of a
single program. We trust that anyone actually evaluating machines and operating
systems will gather more reliable and more representative data.
While we make every attempt to verify the
results obtained from users and vendors, errors are bound to exist and should
be brought to our attention. We encourage users to obtain the programs and run
the routines on their machines, reporting any discrepancies with the numbers
listed here.
The Linpack package is a collection of Fortran
subroutines for solving various systems of linear equations.
(http://www.netlib.org/Linpack/) The software in Linpack is based on a
decompositional approach to numerical linear algebra. The general idea is the
following. Given a problem involving a matrix, one factors or decomposes the
matrix into a product of simple, well-structured matrices which can be easily
manipulated to solve the original problem. The package has the capability of
handling many different matrix types and different data types, and provides a
range of options. Linpack itself is built on another package called the BLAS.
Linpack was designed in the late 70's and has been superseded by a package
called LAPACK.
The Linpack software library is available from
netlib. See http://www.netlib.org/Linpack/
The BLAS or Basic Linear Algebra Subroutines are
a set of basic operations that are used over and over again in matrix
computations. The BLAS address simple vector operations, such as adding a
multiple of a vector to another vector (SAXPY) or forming an inner product
(SDOT). Most of the floating-point work within the Linpack algorithms is
carried out by the BLAS, which makes it possible to take advantage of special
computer hardware without having to modify the underlying algorithm. This
approach thus achieves transportability and clarity of software without
sacrificing reliability.
Linpack is not the most efficient software for
solving matrix problems. This is mainly due to the way the algorithm and
resulting software accesses memory. The
memory access patterns of the algorithm has disregard for the multi-layered
memory hierarchies of RISC architecture and vector computers, thereby spending too
much time moving data instead of doing useful floating-point operations. LAPACK
addresses this problem by reorganizing the algorithms to use block matrix
operations, such as matrix multiplication in the innermost loops. For each
computer architecture block operations can be optimized to account for memory
hierarchies, providing a transportable way to achieve high efficiency on
diverse modern machines. We use the term “Transportable” instead of “portable”
because, for fastest possible performance, LAPACK requires that highly
optimized block matrix operations be already implemented on each machine. These
operations are performed by the Level 3 BLAS in most cases.
LAPACK is a software collection to solve various
matrix problem in linear algebra. In particular, systems of linear equations,
least squares problems, eigenvalue problems, and singular value decomposition.
The software is based on the use of block partitioned matrix techniques that
aid in achieving high performance on RISC based systems, vector computers, and
shared memory parallel processors.
LAPACK can be obtained from netlib, see
(http://www.netlib.org/lapack/)
The Linpack Benchmark is, in some sense, an
accident. It was originally designed to assist users of the Linpack package by
providing information on execution times required to solve a system of linear
equations. The first ``Linpack Benchmark'' report appeared as an appendix in
the Linpack Users' Guide in 1979. The appendix comprised data for one commonly
used path in Linpack for a matrix problem of size 100, on a collection of
widely used computers (23 in all), so users could estimate the time required to
solve their matrix problem.
Over the years other data was added, more as a
hobby than anything else, and today the collection includes hundreds of
different computer systems.
You can contact Jack Dongarra and send him the
output from the benchmark program. When sending results please include the
specific information on the computer on which the test was run, the compiler,
the optimization that was used, and the site it was run on. You can contact
Dongarra by sending email to dongarra@cs.utk.edu.
In order to run the benchmark program you will
have to supply a function to gather the execution time on your computer. The
execution time is requested by a call to the Fortran function SECOND. It is
expected that the routine returns the accumulated execution time of your
program. Two called to SECOND are made and the difference taken to compute the
execution time.
The results reported in the benchmark report
reflect performance for 64 bit floating point arithmetic. On some machines this
may be DOUBLE PERCISION, such as computers that have IEEE floating point
arithmetic and on other computers this may be single precision, (declared REAL
in Fortran), such as Cray’s vector computers.
When and how often are the results updated in
the benchmark report?
The benchmark report is updated continuously as
new results arrive. They are posted to the web as they are updated.
The matrices are generated using a pseudo-random
number generator. The matrices are designed to force partial pivoting to be
performed in Gaussian Elimination.
The Top500 list the 500 fastest computer system
being used today. In 1993 the collection was started and has been updated every
6 months since then. The report lists the sites that have the 500 most powerful
computer systems installed. The best Linpack benchmark performance achieved is
used as a performance measure in ranking the computers. The TOP500 list has
been updated twice a year since June 1993.
The Top500 reports are maintained at http://www.top500.org/.
We
are starting a new list on Clusters for more information see http://clusters.top500.org/.
When the Linpack Fortran n = 100 benchmark is
run it produces the following kind of results:
Please send the results of this run to:
Jack J. Dongarra
Computer Science Department
University of Tennessee
Knoxville, Tennessee 37996-1300
Fax: 865-974-8296
Internet: dongarra@cs.utk.edu
norm. resid resid
machep x(1) x(n)
1.67005097E+00 7.41628980E-14
2.22044605E-16
1.00000000E+00 1.00000000E+00
times are reported for matrices of
order 100
dgefa dgesl total
mflops unit ratio
times for array with leading dimension of 201
1.540E-03
6.888E-05 1.609E-03 4.268E+02
4.686E-03 2.873E-02
1.509E-03
7.084E-05 1.579E-03 4.348E+02
4.600E-03 2.820E-02
1.509E-03
7.003E-05 1.579E-03 4.348E+02
4.600E-03 2.820E-02
1.502E-03
6.593E-05 1.568E-03 4.380E+02
4.567E-03 2.800E-02
times for array with leading dimension of 200
1.431E-03
6.716E-05 1.498E-03 4.584E+02
4.363E-03 2.675E-02
1.424E-03
6.694E-05 1.491E-03 4.605E+02
4.343E-03 2.663E-02
1.431E-03
6.699E-05 1.498E-03 4.583E+02
4.364E-03 2.676E-02
1.432E-03
6.439E-05 1.497E-03 4.588E+02
4.360E-03 2.673E-02
The norm. resid is a measure of the accuracy of
the computation. The value should be O(1). If the value is much greater than
O(100) it suggest that the results are not correct.
The resid is the unnormalized quantity.
The term machep measure the precision used to
carry out the computation. On an IEEE floating point computer the value should
be 2.22044605e-16.
The values of x(1) and x(n) are the first and
last component of the solution. The problem is constructed so that the values
of solution should be all ones.
There are two sets of timings performed both on
matrices of size 100. The first one is where the 2-dimensional array that
contained the matrix has a leading dimension of 201, and a second set where the
leading dimension 200. This is done to see what effect, if any, the placement
of the arrays in memory has on the performance.
Times for dgefa and dgesl are reported. dgefa
factors the matrix using Gaussian
elimination with partial pivoting and dgesl solves a system based on the
factoriuzation. dgefa requires 2/3 n3 operations and dgesl requires
n2 operations. The value of total is the sum of the times and mflops
is the execution rate, or millions of floating point operations per second.
Here a floating point operations is taken to be floating point additions and
multiplications. Unit and ratio are obsolete and should be ignored.
If the time reported is negative or zero then
the clock resolution is not accurate enough for the granularity of the work. In
this case a different timing routine should be used that has better resolution.
No archive is maintained of previous results.
However here is some information to provide a historical perspective. The numbers in the following tables have
been extracted from old Linpack Benchmark Reports. It took a bit of ``file archaeology'' to put the list together
since I don't have the complete set of reports.
Top Computers Over Time for the Linpack n=100
Benchmark
(Entries for this table began in 1979.)
Year |
Computer |
Number
of Processors |
Cycle
time in
nsecs |
Mflop/s |
2000 |
Fujitsu VPP5000/1 |
1 |
3.33 |
1156 |
1999 |
CRAY T916 |
4 |
2.2 |
1129 |
1995 |
CRAY T916 |
1 |
2.2 |
522 |
1994 |
CRAY C90 |
16 |
4.2 |
479 |
1993 |
CRAY C90 |
16 |
4.2 |
479 |
1992 |
CRAY C90 |
16 |
4.2 |
479 |
1991 |
CRAY C90 |
16 |
4.2 |
403 |
1990 |
CRAY Y-MP |
8 |
6.0 |
275 |
1989 |
CRAY Y-MP |
8 |
6.0 |
275 |
1988 |
CRAY Y-MP |
1 |
6.0 |
74 |
1987 |
ETA 10-E |
1 |
10.5 |
52 |
1986 |
NEC SX-2 |
1 |
6.0 |
46 |
1985 |
NEC SX-2 |
1 |
6.0 |
46 |
1984 |
CRAY X-MP |
1 |
9.5 |
21 |
1983 |
CRAY 1 |
1 |
12.5 |
12 |
... |
|
|
|
|
1979 |
CRAY 1 |
1 |
12.5 |
3.4 |
These numbers come from the Linpack Benchmark
Report Table 1.
=====================================================================
Top Computers Over Time for the Linpack n=1000
Benchmark
(Entries for this table began in 1986.)
Year |
Computer |
Number
of Processors |
Cycle time in nsec. |
Measured Mflop/s |
Peak Mflop/s |
2000 |
NEC SX-5/16 |
16 |
4.0 |
45030 |
64000 |
1995 |
CRAY T916 |
16 |
2.2 |
1940 |
28800 |
1994 |
Hitachi S-3800/480 |
4 |
2 |
16170 |
32000 |
1993 |
NEC SX-3/44R |
4 |
2.5 |
15120 |
25600 |
1992 |
NEC SX-3/44 |
4 |
2.9 |
13420 |
22000 |
1991 |
Fujitsu VP2600/10 |
1 |
3.2 |
4009 |
5000 |
1990 |
Fujitsu VP2600/10 |
1 |
3.2 |
2919 |
5000 |
1989 |
CRAY Y-MP/832 |
8 |
6 |
2144 |
2667 |
1988 |
CRAY Y-MP/832 |
8 |
6 |
2144 |
2667 |
1987 |
NEC SX-2 |
1 |
6 |
885 |
1300 |
1986 |
CRAY X-MP-4 |
4 |
9.5 |
713 |
840 |
|
These numbers come from the Linpack Benchmark
Report Table 1.
(Full precision; matrix size 1000; best effort
programming, maximum optimization permitted.)
Top Computers Over Time for the Highly-Parallel
Linpack Benchmark
(Entries for this table began in 1991.)
Year |
Computer |
Number of Processors |
Measured Gflop/s |
Size of Problem |
Size of 1/2 Perf |
Theoretical Peak Gflop/s |
2000 |
ASCI White-Pacific, IBM SP Power 3 |
7424 |
4938 |
430000 |
|
11136 |
1999 |
ASCI Red Intel Pentium II Xeon core |
9632 |
2379 |
362880 |
75400 |
3207 |
1998 |
ASCI Blue-Pacific SST, IBM SP 604E |
5808 |
2144 |
431344 |
|
3868 |
1997 |
Intel ASCI Option Red (200 MHz Pentium Pro) |
9152 |
1338 |
235000 |
63000 |
1830 |
1996 |
Hitachi CP-PACS |
2048 |
368.2 |
103680 |
30720 |
614 |
1995 |
Intel Paragon XP/S MP |
6768 |
281.1 |
128600 |
25700 |
338 |
1994 |
Intel Paragon XP/S MP |
6768 |
281.1 |
128600 |
25700 |
338 |
1993 |
Fujitsu NWT |
140 |
124.5 |
31920 |
11950 |
236 |
1992 |
NEC SX-3/44 |
4 |
20.0 |
6144 |
832 |
22 |
1991 |
Fujitsu VP2600/10 |
1 |
4.0 |
1000 |
200 |
5 |
|
These numbers come from the Linpack Benchmark
Report Table 3.
(Full precision; the manufacture is allowed to
solve as large a problem as desired, maximum optimization permitted.)
Measured Gflop/s is the measured peak rate of
execution for running the benchmark in billions of floating point operations
per second.
Size of Problem is the matrix size at which the
measured performance was observed.
Size of ½ Perf is the size of problem needed to
achieve ½ the measured peak performance.
Theoretical Peak Gflop/s is the theoretical peak
performance for the computer.
The Linpack Benchmark suite is built around
software for dense matrix problems. In May 2000 we started to put together a
benchmark for sparse iterative matrix problems. For additional information see:
http://www.netlib.org/benchmark/sparsebench/
For addition information on benchmarks see: http://www.netlib.org/benchweb/
Please send your comments to Jack Dongarra at dongarra@cs.utk.edu.