Answers to Common Questions About SPEC95 Benchmark Suites
Date: Mon, 21 Aug 1995
Kaivalya Dixit, IBM; and
Jeff Reilly, Intel Corporation
Q1: What is SPEC95?
A1: SPEC95 is a software benchmark product produced by
the Standard Performance Evaluation Corp. (SPEC), a non-profit group of
computer vendors, systems integrators, universities, research
organizations, publishers and consultants throughout the world. It was
designed to provide measures of performance for comparing compute-
intensive workloads on different computer systems. SPEC95
contains two suites of benchmarks: CINT95 for measuring and
comparing compute-intensive integer performance, and CFP95 for
measuring and comparing compute- intensive floating point performance.
Q2: What is a benchmark?
A2: The definition from Webster's II Dictionary states: "A
standard of measurement or evaluation." SPEC is a non-profit
corporation formed to establish and maintain computer benchmarks for
measuring component- and system-level computer performance.
Q3: What does the "C" in CINT95 and CFP95 stand for?
A3: In its product line, SPEC uses "C" to denote a
"component-level" benchmark and "S" to denote a "system-level"
benchmark. CINT95 and CFP95 are component-level
benchmarks.
Q4: What components do CINT95 and CFP95 measure?
A4: Being compute-intensive benchmarks, these benchmarks
emphasize the performance of the computer's processor, the memory
architecture and the compiler. It is important to remember the
contribution of the latter two components; performance is more than
just the processor.
Q5: What component performance is not measured by CINT95
and CFP95?
A5: The CINT95 and CFP95 benchmarks do not stress
other computer components such as I/O (disk drives), networking or
graphics. It might be possible to configure a system in such a way
that one or more of these components impact the performance of
CINT95 and CFP95, but that is not the intent of the
suites.
Q6: What is included in the SPEC95 package?
A6: SPEC provides the following in its SPEC95 package:
- SPEC95 tools for compiling, running and validating
the benchmarks, compiled for a variety of operating systems;
- source code for the SPEC95 tools, to allow the tools
to be built for systems not covered by the pre-compiled tools;
- source code for the benchmarks;
- tools for generating performance reports;
- run and reporting rules defining how the benchmarks
should be used to produce standard results; and
- SPEC95 documentation.
The initial offering of SPEC95 will have tools for most UNIX
operating systems. Additional products for other operating systems
(Windows NT, VMS, etc.) will be released as later products if SPEC
detects enough demand. All of this will be shipped on a single CD-ROM
disk.
Q7: What does the user of the SPEC95 user have to provide?
A7: The user must have a computer system running a
UNIX environment with a compiler installed and a CD-ROM drive.
Approximately 300MB will be needed on a hard drive to install and run
SPEC95. It is also assumed that the system has at least 64MB of
RAM to ensure that the benchmarks remain compute-intensive (SPEC is
assuming this will be the standard amount of desktop memory during the
life of this suite).
Q8: What are the basic steps in running the
benchmarks?
A8: Installation and use are covered in detail
in the SPEC95 User Documentation. The basic steps are as
follows:
- Install SPEC95 from media.
- Run the installation scripts specifying your operating
system.
- Compile the tools, if executables are not provided in
SPEC95.
- Determine what metric you wish to run.
- Create a configuration file for that metric. In this file, you
specify compiler flags and other system-dependent information.
- Run the SPEC tools to build (compile), run and validate the benchmarks.
- If the above steps are successful, generate a report
based on the run times and metric equations.
Q9: What source code is provided? What exactly makes up
these suites?
A9: CINT95 and CFP95 are based on compute-intensive
applications provided as source code. CINT95 contains eight
applications written in C that are used as benchmarks:
Name Ref Time Remarks
099.go 4600 Artificial intelligence; plays the game of "Go"
124.m88ksim 1900 Moto 88K Chip simulator; runs test program
126.gcc 1700 New version of GCC; builds SPARC code
129.compress 1800 Compresses and decompresses file in memory
130.li 1900 LISP interpreter
132.ijpeg 2400 Graphic compression and decompression
134.perl 1900 Manipulates strings (anagrams) and prime
numbers in Perl
147.vortex 2700 A database program
CFP95 contains 10 applications written in FORTRAN that are
used as benchmarks:
Name Ref Time Remarks
101.tomcatv 3700 A mesh-generation program
102.swim 8600 Shallow water model with 1024 x 1024 grid
103.su2cor 1400 Quantum physics; Monte Carlo simulation
104.hydro2d 2400 Astrophysics; Hydrodynamical Navier Stokes
equations
107.mgrid 2500 Multi-grid solver in 3D potential field
110.applu 2200 Parabolic/elliptic partial differential
equations
125.turb3d 4100 Simulates isotropic, homogeneous turbulence in
a cube
141.apsi 2100 Solves problems regarding temperature, wind,
velocity and distribution of pollutants
145.fpppp 9600 Quantum chemistry
146.wave5 3000 Plasma physics; Electromagnetic particle
simulation
Q10: What metrics can be measured?
A10: The CINT95 and CFP95 suites can be used to
measure and calculate the following metrics:
- CINT95:
-
SPECint95: The geometric mean of eight normalized
ratios (one for each integer benchmark) when compiled
with aggressive optimization for each benchmark.
SPECint_base95: The geometric mean of eight normalized
ratios when compiled with conservative optimization for
each benchmark.
SPECint_rate95: The geometric mean of eight normalized
throughput ratios when compiled with aggressive
optimization for each benchmark.
SPECint_rate_base95: The geometric mean of eight
normalized throughput ratios when compiled with
conservative optimization for each benchmark.
-
CFP95:
-
SPECfp95: The geometric mean of ten normalized ratios
(one for each floating point benchmark) when compiled
with aggressive optimization for each benchmark.
SPECfp_base95: The geometric mean of ten normalized
ratios when compiled with conservative optimization for
each benchmark.
SPECfp_rate95: The geometric mean of ten normalized
throughput ratios when compiled with aggressive
optimization for each benchmark.
SPECfp_rate_base95: The geometric mean of ten
normalized throughput ratios when compiled with
conservative optimization for each benchmark.
The ratio for each of the benchmarks is calculated using a
SPEC-determined reference time and the run time of the
benchmark.
Q11: What is the difference between a "base"
metric and a "non-base" metric?
A11: In order to provide comparisons across
different computer hardware, SPEC had to provide the
benchmarks as source code. Thus, in order to run the
benchmarks, they must be compiled. There was agreement that
the benchmarks should be compiled the way users compile
programs. But how do users compile programs? On one side,
people might experiment with many different compilers and
compiler flags to achieve the best performance. On the other
side, people might just compile with the basic options
suggested by the compiler vendor. SPEC recognizes that it
cannot exactly match how everyone uses compilers, but two
reference points are possible. The base metrics (i.e.,
SPECint_base95) are required for all reported results and
have set guidelines for compilation (i.e., the same flags
must be used in the same order for all benchmarks). The non-base metrics
(i.e., SPECint95) are optional and have less
strict requirements (i.e., different compiler options may be
used on each benchmark).
A full description of the distinctions can be found in the
SPEC95 Run and Reporting Rules available with SPEC95.
Q12: What is the difference between a "rate" and
a "non-rate" metric?
A12: There are several different ways to measure
computer performance. One way is to measure how fast the
computer completes a single task; this is a speed measure.
Another way is to measure how many tasks a computer can
accomplish in a certain amount of time; this is called a
throughput, capacity or rate measure.
The SPEC speed metrics (i.e., SPECint95) are used for
comparing the ability of a computer to complete single
tasks. The SPEC rate metrics (i.e., SPECint_rate95) measure
the throughput or rate of a machine carrying out a number of
tasks.
Q13: Why and/or when should I use SPEC95?
A13: Typically, the best measure of a system is your own
application with your own workload. Unfortunately, it is
often very difficult and expensive to get a wide base of
reliable, repeatable and comparable measurements on
different systems for your own application with your own
workload. This might be due to time, money or other
constraints.
Benchmarks exist to act as a reference point for comparison.
It's the same reason that EPA gas mileage exists, although
probably no driver in America gets exactly the EPA gas
mileage. If you understand what benchmarks measure, they're
useful. It's important to know that CINT95 and CFP95 are
CPU-focused and not system-focused benchmarks. These CPU
benchmarks focus on only one portion of those factors that
contribute to applications performance. A graphics or
network performance bottleneck within an application, for
example, will not be reflected in these benchmarks.
Understanding your own needs helps determine the relevance
of the benchmarks.
Q14: Which SPEC95 metric should be used to determine
performance?
A14: It depends on your needs. SPEC provides the benchmarks
and results as tools for you to use. You need to determine
how you use a computer or what your performance requirements
are and then choose the appropriate SPEC benchmark or
metrics.
A single user running a compute-intensive integer program,
for example, might only be interested in SPECint95 or
SPECint_base95. On the other hand, a person who maintains a
machine used by multiple scientists running floating point
simulations might be more concerned with SPECfp_rate95 or
SPECfp_rate_base95.
Q15: SPEC92 is already an available product. Why create
SPEC95 and will it show anything different from SPEC92?
A15: Technology is always improving. As the technology
improves, the benchmarks need to improve as well. SPEC
needed to address the following issues:
- Run-time -- Several of the SPEC92 benchmarks were running in less than a minute on leading-
edge processors/systems. Given the SPEC measurement tools, small changes
or fluctuations in the measurements were having significant impacts on the
percentage improvements being seen. SPEC chose to make the SPEC95
benchmarks longer to take into account future performance and prevent this
from being an issue for the life of the suite.
- Application size --
Many comments received by SPEC indicated that applications had grown in
complexity and size and that SPEC92 was becoming less representative of
what runs on current systems. For SPEC95, SPEC selected programs
with larger resource requirements to provide a mix with some of the
smaller programs.
- Application type --
SPEC felt that there were additional application areas that should be
included in SPEC95 to increase variety and representation within the
suites. Areas such as imaging and database have been added.
- Portability -- SPEC
found that compute-intensive performance was important beyond the UNIX
workstation arena where SPEC was founded. It was important, therefore,
that the benchmarks and the tools running the benchmarks be as independent
of the operating system as possible. While the first release of SPEC95
will be geared toward UNIX, SPEC has consciously chosen programs and tools
that are dependent only upon POSIX or ANSI standard development
environments. SPEC will produce additional releases for other operating
systems (such as Microsoft Windows/NT) based on demand.
- Moving target -- The
initial hope for benchmarks is that improvements in the benchmark
performance will be generally applicable to other situations. As
competition develops, however, improvements in the test performance can
become specific to that test only. SPEC95 provides updated benchmarks so
that general improvements will be encouraged and test-specific
optimizations become less effective.
- Education -- As the
computer industry grows, benchmark results are being quoted more often.
With the release of new benchmark suites, SPEC has a fresh opportunity to
discuss and clarify how and why the suite was developed.
Q15: What happens to SPEC92 after SPEC95 is released?
A15: SPEC will begin the process of making SPEC92 obsolete.
The results published by SPEC will be marked as obsolete and
by June 1996, SPEC will stop publishing SPEC92 results and
stop selling the SPEC92 suites.
Q16: Is there a way to translate SPEC92 results to SPEC95
results or vice versa?
A16: There is no formula for converting from SPEC92 results
to SPEC95 results; they are different products. There might
be a high correlation between SPEC92 and SPEC95 results
(i.e., machines with higher SPEC92 results might have higher
SPEC95 results), but there is no universal formula for all
systems.
SPEC is strongly encouraging SPEC licensees to publish
SPEC95 numbers on older platforms to provide a historical
perspective.
Q17: What criteria was used to select the benchmarks?
A17: In the process of selecting applications to use as
benchmarks, SPEC considered the following criteria:
- portability to all SPEC hardware architectures (32- and 64-bit including Alpha, Intel Architecture, PA-RISC, Rxx00, Sparc, etc.);
- portability to various operating systems,
particularly UNIX, NT and VMS;
- benchmarks should not include measurable I/O;
- benchmarks should not include networking or graphics;
- benchmarks should run in 64MB RAM without swapping
(SPEC is assuming this will be a minimal memory
requirement for the life of SPEC95 and the emphasis
is on compute-intensive performance and not disk
activity);
benchmarks should run at least five minutes on a
Digital Equipment Corp. 200MHz Alpha system; and
- no more than five percent of benchmarking time should
be spent processing code not provided by SPEC.
Q18: Weren't some of the SPEC95 benchmarks in SPEC92? How
are they different?
A18: Although some of the benchmarks from SPEC92 are
included in SPEC95, they all have been given different
workloads or modified to improve their coding style or use
of resources. The revised benchmarks have been assigned
different identifying numbers to distinguish them from
versions in previous suites and to indicate they are not
comparable with their predecessors.
Q19: Why were some of the benchmarks not carried over from
SPEC92?
A19: Some benchmarks were not carried over because it was
not possible to create a longer running workload or to
create a more robust workload, or the benchmarks were too
susceptible to benchmark-specific compiler optimization.
Q20: Why does SPEC use a reference machine for determining
performance metrics? What machine is used for SPEC95
benchmark suites?
A20: SPEC uses a reference machine to normalize the
performance metrics used in the SPEC95 suites. Each
benchmark is run and measured on this machine to establish a
reference time for that benchmark. These times are then used
in the SPEC calculations. SPEC uses the SPARCstation 10/40
(40MHz SuperSPARC with no L2 cache) as the reference
machine. It takes approximately 48 hours to run a SPEC-conforming execution of CINT95 and CFP95 on this machine.
Q21: How long does it take to run the
SPEC95 benchmark suites?
A21: This depends on the suite and the
machine that is running the benchmarks. As mentioned above,
on the reference machine it takes two days for a SPEC-conforming run (at least three iterations of each benchmark
to ensure that results can be reproduced).
Q22: What if the tools can not be run or
built on a system? Can they be run manually?
A22: To generate SPEC-compliant results,
the tools used must be approved by SPEC. If several attempts
at using the SPEC tools are not successful for the operating
system for which you purchased SPEC95, you should contact
SPEC for technical support. SPEC will work with you to
correct the problem and/or investigate SPEC-compliant
alternatives.
Q23: What if I don't want to run the
benchmarks? Is there any place that results will be
available?
A23: There are several current alternatives:
- Every quarter, SPEC publishes the
SPEC Newsletter, which contains results submitted to
SPEC by SPEC members and licensees. Subscription
information is available from SPEC.
- SPEC provides information to the
Performance Database Server found at:
http://performance.netlib.org/performance/html/spec.html
This typically lags three months behind the SPEC
Newsletter.
- SPEC is working on establishing its
own Internet presence, although details are not yet
available.
Q24: How do I contact SPEC?
A24: Here is the contact information for
SPEC:
Dianne Rice
SPEC
c/o NCGA
2722 Merrilee Drive, Ste. 300
Fairfax, VA 22031
Tel: 703-698-9604
Fax: 703-560-2752
E-mail: drice@uspro.fairfax.va.us
Questions and answers were prepared by Kaivalya Dixit of IBM
and Jeff Reilly of Intel Corp. Dixit is president of SPEC
and Reilly is release manager for SPEC95.