Next:
Shared-memory SIMD machines
Up:
Overview of Recent
Previous:
Introduction and account
Before going on to the descriptions of the machines themselves, it is
important to consider some mechanisms that are or have been used to
increase the performance. The hardware structure or
architecture determines to a large extent what the possibilities
and impossibilities are in speeding up a computer system beyond the
performance of a single CPU. Another important factor that is
considered in combination with the hardware is the capability of
compilers to generate efficient code to be executed on the given
hardware platform. In many cases it is hard to distinguish between
hardware and software influences and one has to be careful in the
interpretation of results when ascribing certain effects to hardware or
software peculiarities or both. In this chapter we will give most
emphasis to the hardware architecture. For a description of machines
that can be considered to be classified as "high-performance" one is
referred to [20] and, for more recently
available systems, [19].
Since many years the taxonomy of Flynn
[6] has proven to be useful for the
classification of high-performance computers. This classification is
based on the way of manipulating of instruction and data streams and
comprises four main architectural classes. We will first briefly sketch
these classes and afterwards fill in some details when each of the
classes is described.
- SISD machines: These are the conventional systems that
contain one CPU and hence can accommodate one instruction stream that
is executed serially. Nowadays many large mainframes may have more than
one CPU but each of these execute instruction streams that are
unrelated. Therefore, such systems still should be regarded as (a
couple of) SISD machines acting on different data spaces. Examples of
SISD machines are for instance most workstations like those of DEC,
Hewlett-Packard, and Sun Microsystems. The definition of SISD machines
is given here for completeness' sake. We will not discuss this type of
machines in this report.
- SIMD machines: Such systems often have a large number of
processing units, ranging from 1,024 to 16,384 that all may execute the same
instruction on different data in lock-step. So, a single instruction
manipulates many data items in parallel. Examples of SIMD machines in this
class are the CPP DAP Gamma II and the MasPar MP-2.
- Another subclass of the SIMD systems are the vectorprocessors.
Vectorprocessors act on arrays of similar data rather than on single data items
using specially structured CPUs. When data can be manipulated by these vector
units, results can be delivered with a rate of one, two and -- in special
cases -- of three per clock cycle (a clock cycle being defined as the basic
internal unit of time for the system). So, vector processors execute on their
data in an almost parallel way but only when executing in vector mode. In this
case they are several times faster than when executing in conventional scalar
mode. For practical purposes vectorprocessors are therefore mostly regarded as
SIMD machines. Examples of such systems are for instance the Convex C4610, and
the Hitachi S3600.
- MISD machines: Theoretically in these type of machines multiple
instructions should act on a single stream of data. As yet no
practical machine in this class has been constructed nor are such systems
easily to conceive. We will disregard them in the following discussions.
- MIMD} machines: Theoretically in these type of machines multiple
instructions should act on a single stream of data. As yet no
practical machine in this class has been constructed nor are such systems
easily to conceive. We will disregard them in the following discussions.
\its {\bf MIMD} machines: These machines execute several instruction
streams in parallel on different data. The difference with the
multi-processor SISD machines mentioned above lies in the fact that the
instructions and data are related because they represent different
parts of the same task to be executed. So, MIMD systems may run many
sub-tasks in parallel in order to shorten the time-to-solution for the
main task to be executed. There is a large variety of MIMD systems and
especially in this class the Flynn taxonomy proves to be not fully
adequate for the classification of systems. Systems that behave very
differently like a four-processor Cray Y-MP T94 and a thousand
processor nCUBE 2S fall both in this class. In the following we will
make another important distinction between classes of systems and treat
them accordingly.
- Shared memory systems: Shared memory systems have multiple CPUs all
of which share the same address space. This means that the knowledge of
where data is stored is of no concern to the user as there is only one
memory accessed by all CPUs on an equal basis. Shared memory systems can be
both SIMD or MIMD. Single-CPU vector processors can be regarded as an
example of the former, while the multi-CPU models of these machines
are examples of the latter. We will sometimes use the abbreviations SM-SIMD and
SM-MIMD for the two subclasses.
- Distributed memory systems: In this case each CPU has its
own associated memory. The CPUs are connected by some network and may
exchange data between their respective memories when required. In
contrast to shared memory machines the user must be aware of the
location of the data in the local memories and will have to move or
distribute these data explicitly when needed. Again, distributed memory
systems may be either SIMD or MIMD. The first class of SIMD systems
mentioned which operate in lock step, all have distributed memories
associated to the processors. For the distributed memory MIMD systems
again a subdivision is possible: those in which the processors are
connected in a fixed topology and those in which the topology is
flexible and may vary from task to task. For the distributed memory
systems we will sometimes use DM-SIMD and DM-MIMD to indicate the two
subclasses.
Although the difference between shared- and distributed memory machines
seems clear cut, this is not always entirely the case from user's point
of view. For instance, the late Kendall Square Research systems
employed the idea of "virtual shared memory" on a hardware level.
Virtual shared memory can also be simulated at the programming level:
A specification of High Performance Fortran (HPF) was published in 1993
[11]
which by means of compiler directives distributes the data over the
available processors. Therefore, the system on which HPF is implemented
in this case will look like a shared memory machine to the user. Other
vendors of Massively Parallel Processing systems (the buzz-word MPP
systems is fashionable here), like Convex and Cray, also support
proprietary virtual shared-memory programming models which means that
these physically distributed memory systems, by virtue of the
programming model, logically will behave as shared memory systems. In
addition, packages like TreadMarks
([1]) provide a virtual shared memory
environment for networks of workstations.
Another trend that has came up in the last few years is distributed
processing. This takes the DM-MIMD concept one step further:
instead of many integrated processors in one or several boxes,
workstations, mainframes, etc., are connected by Ethernet, FDDI, or
otherwise and set to work concurrently on tasks in the same program.
Conceptually, this is not different from DM-MIMD computing, but the
communication between processors is often orders of magnitude slower.
Many packages to realise distributed computing are available. Examples
of these are PVM (standing for Parallel Virtual
Machine) [7], and MPI
(Message Passing Interface,
[8,15]). PVM and MPI have been adopted
for instance by HP/Convex, SGI/Cray, IBM and Intel for the transition
stage between distributed computing and MPP on the clusters of their
favorite processors and they are available on a large amount of
distributed memory MIMD systems and even on shared memory MIMD systems
for compatibility reasons. In addition there is a tendency to cluster
shared memory systems, for instance by HIPPI channels, to obtain
systems with a very high computational power. E.g., the Intel Paragon
with the MP (Multi Processor) nodes, the NEC SX-4, and
the have Convex Exemplar SPP-2000X this structure. In addition, the
latter system has a software environment that allows for virtual shared
memory addressing.
Next:
Shared-memory SIMD machines
Up:
Overview of Recent
Previous:
Introduction and account
Aad van der Steen
Thu Feb 20 16:58:40 MET 1997