next up previous contents
Next: The IBM 9076 SP2 Up: Distributed-memory MIMD systems Previous: The Hitachi SR2201 series.

The HP/Convex Exemplar SPP-1200.

Machine type RISC-based distributed-memory multi-processor
Models SPP-1200
Operating system SPP-UX, based on OSF/1 AD microkernel
Connection structure Ring
Compilers Fortran, C
Vendors information Web page

System parameters:

Model SPP-1200
Clock cycle 8.3 ns
Theor. peak performance
Per proc. (64-bit) 240 Mflop/s
Maximal (64-bit) 30.7 Gflop/s
Main memory <=32 GB
Memory/node <=256 MB
Communication bandwidth
aggregate (see remarks) 16 GB/s, 4GB/s
No. of processors 4-128


The SPP-1200 is the second generation in Exemplar SPP series. In fact, in almost every respect the system is identical to its predecessor, the SPP-1000 except the clock cycle (10 instead of 8.3 ns) and the use of PA/RISC 7200 instead of 7100 processors. Because of the prefetch and poststore capabilities of the 7200 processors the number of floating-point operations per cycles should be somewhat higher than in the 7100 processor, thus increasing the floating-point performance beyond the amount that is caused by the reduction of the clock cycle. Up to 8 HP PA/RISC 7200 processors can be placed in what is called a hypernode

/ by Convex. A maximal system consists of 16 nodes, i.e., 128 processors.

Within each hypernode up to 2 GB of memory can be accommodated which can be reached by the local processors via a crossbar with an aggregate bandwidth of 16 GB/s. The hypernodes in turn are connected to each other by a crossbar with an aggregate bandwidth of 4 GB/s. So, the system concept is somewhat hybrid: within a hypernode the machine is effectively a shared-memory system, while between hypernodes it is a distributed memory system. Each node supports local I/O, while external global I/O can be done at an aggregate rate of 4 GB/s.

The Exemplar programming environment complements the SPP-1200 at the software side. This environment includes a message passing programming model (PVM) and a virtual shared memory model which allows the user to have a shared-memory view of the system. The underlying communication is hidden from the user, thus enabling the execution of standard Fortran 77, C, or C++ programs. The efficiency of this mode of operation is determined by the extent to which the original code is parallelisable. In many cases it might be enhanced using another (possibly message passing) implementation. The application compiler included in the Exemplar environment may help in parallelising the original program and in generating the necessary parallel code.

Measured Performances: First results for the solution of a linear system of order N = 1000 are 123, 213, 383, and 656 Mflop/s for 1, 2, 4, and 8 processors (within one hypernode), respectively. In [#linpackbm##1#] also a speed of 3.72 Gflop/s is reported for the solution of a dense system of order 25,100.

next up previous contents
Next: The IBM 9076 SP2 Up: Distributed-memory MIMD systems Previous: The Hitachi SR2201 series.

Jack Dongarra
Sat Feb 10 15:12:38 EST 1996