Next: Description of Machine
Up: Short Description of Archictectures
Previous: Short Description of Archictectures
Since many years the taxonomy of Flynn[2] has proven to be useful
for the classification of high-performance computers. This classification is
based on the way of manipulating of instruction and data streams and comprises
four main architectural classes. We will first briefly sketch these classes and
afterwards fill in some details when each of the classes are described
separately.
- SISD machines: These are the conventional systems that contain
one CPU and hence can accommodate one instruction stream that is executed
serially. Nowadays many large mainframes may have more than one CPU but each
of these execute instruction streams that are unrelated. Therefore, such
systems still should be regarded as (a couple of) SISD machines acting on
different data spaces. Examples of SISD machines are for instance the Bull
DPX 5000 series, the Control Data 4000 series, and most workstations like
those of DEC, Hewlett-Packard, and Sun Microsystems. The definition of SISD
machines is given here for completeness' sake. We will not discuss this type
of machines in this report.
- SIMD machines: Such systems often have a large number of
processing units, ranging from 1,024 to 16,384 that all may execute the same
instruction on different data in lock-step. So, a single instruction
manipulates many data items in parallel. Examples of SIMD machines in this
class are the CPP DAP Gamma and the MasPar MP-2.
- Another subclass of the SIMD systems are the vectorprocessors.
Vectorprocessors act on arrays of similar data rather than on single data items
using specially structured CPUs. When data can be manipulated by these vector
units, results can be delivered with a rate of one, two and --- in special
cases --- of three per clock cycle (a clock cycle being defined as the basic
internal unit of time for the system). So, vector processors execute on their
data in an almost parallel way but only when executing in vector mode. In this
case they are several times faster than when executing in conventional scalar
mode. For practical purposes vectorprocessors are therefore mostly regarded as
SIMD machines. Examples of such systems are for instance the Convex C410, and
the NEC SX-3/11.
- MISD machines: Theoretically in these type of machines multiple
instructions should act on a single stream of data. As yet no
practical machine in this class has been constructed nor are such systems
easily to conceive. We will disregard them in the following discussions.
- MIMD machines: These machines execute several instruction streams in
parallel on different data. The difference with the multi-processor SISD
machines mentioned above lies in the fact that the instructions and data are
related because they represent different parts of the same task to be executed.
So, MIMD systems may run many sub-tasks in parallel in order to shorten the
time-to-solution for the main task to be executed. There is a large variety of
MIMD systems and especially in this class the Flynn taxonomy proves to be not
fully adequate for the classification of systems. Systems that behave very
differently like a two-processor Cray Y-MP C92 and a thousand processor nCUBE 3
fall both in this class. In the following we will make another
important distinction between classes of system.
- Shared memory systems: Shared memory systems have multiple CPUs all
of which share the same address space. This means that the knowledge of
where data is stored is of no concern to the user as there is only one
memory accessed by all CPUs on an equal basis. Shared memory systems can be
both SIMD or MIMD. Single-CPU vector processors can be regarded as an
example of the former, while the multi-CPU models of these machines
are examples of the latter. We will sometimes use the abbreviations SM-SIMD and
SM-MIMD for the two subclasses.
- Distributed memory systems: In this case each CPU has its own
associated memory. The CPUs are connected by an internal network in some way
and may exchange data between their respective memories when required. In
contrast to shared memory machines the user must be aware of the location of
the data in the local memories and will have to move or distribute these data
explicitly when needed. Again, distributed memory systems may be either SIMD or
MIMD. The first class of SIMD systems mentioned which operate in lock step, all
have distributed memories associated to the processors. For the distributed
memory MIMD systems again a subdivision is possible: those in which the
processors are connected in a fixed topology and those in which the topology is
flexible and may vary from task to task. For the distributed memory systems we
will sometimes use DM-SIMD and DM-MIMD to indicate the two subclasses.
Although the difference between shared- and distributed memory machines seems
clear cut, this is not always entirely the case from user's point of view. For
instance, the late Kendall Square Research systems employed the idea of
``virtual shared memory'' on a hardware level. Virtual shared memory can also
be simulated at the programming level: The first draft proposal for High
Performance Fortran (HPF) was published in November 1992 [3] which by
means of compiler directives distributes the data over the available
processors. The proposal was fixed by May 1993. Therefore, the system on which
HPF is implemented will act in this case as a shared memory machine to the
user. Other vendors of Massively Parallel Processing systems (the buzz-word
MPP systems is fashionable here), like Convex and Cray, also support
proprietary virtual shared-memory programming models which means that these
physically distributed memory systems, by virtue of the programming model,
logically will behave as shared memory systems.
A very large majority of the systems appearing in the TOP500 are either of the
shared memory MIMD class (large vector processors) or of the distributed
memory MIMD class with a large amount of processors. The reason is that the
individual processors of the latter class are mostly off-the-shelf RISC
processors which have a lower single node performance than the vectorprocessors
in the former class. Still, by their shear number they may perform at the same
level as the large vector processors on certain applications. This is certainly
true for the LINPACK benchmark used in this report.
We will not discuss in full the various machine properties for all relevant
systems (such an overview could for instance be found in [4]), but
we will give a short charactersation of each machine of interest that might
help in understanding how the performance for these systems came about.
Next: Description of Machine
Up: Short Description of Archictectures
Previous: Short Description of Archictectures
top500@rz.uni-mannheim.de
Tue Nov 14 15:39:09 PST 1995