Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | SPP-1200 |
Operating system | SPP-UX, based on OSF/1 AD microkernel |
Connection structure | Ring |
Compilers | Fortran, C |
Vendors information Web page | http://www.convex.com/prod_serv/exemplar/exemplar.html |
System parameters:
Model | SPP-1200 |
---|---|
Clock cycle | 8.3 ns |
Theor. peak performance | |
Per proc. (64-bit) | 240 Mflop/s |
Maximal (64-bit) | 30.7 Gflop/s |
Main memory | <=32 GB |
Memory/node | <=256 MB |
Communication bandwidth | |
aggregate (see remarks) | 16 GB/s, 4GB/s |
No. of processors | 4-128 |
Remarks:
The SPP-1200 is the second generation in Exemplar SPP series. In fact, in almost every respect the system is identical to its predecessor, the SPP-1000 except the clock cycle (10 instead of 8.3 ns) and the use of PA/RISC 7200 instead of 7100 processors. Because of the prefetch and poststore capabilities of the 7200 processors the number of floating-point operations per cycles should be somewhat higher than in the 7100 processor, thus increasing the floating-point performance beyond the amount that is caused by the reduction of the clock cycle. Up to 8 HP PA/RISC 7200 processors can be placed in what is called a hypernode
/ by Convex. A maximal system consists of 16 nodes, i.e., 128 processors.
Within each hypernode up to 2 GB of memory can be accommodated which can be reached by the local processors via a crossbar with an aggregate bandwidth of 16 GB/s. The hypernodes in turn are connected to each other by a crossbar with an aggregate bandwidth of 4 GB/s. So, the system concept is somewhat hybrid: within a hypernode the machine is effectively a shared-memory system, while between hypernodes it is a distributed memory system. Each node supports local I/O, while external global I/O can be done at an aggregate rate of 4 GB/s.
The Exemplar programming environment complements the SPP-1200 at the software side. This environment includes a message passing programming model (PVM) and a virtual shared memory model which allows the user to have a shared-memory view of the system. The underlying communication is hidden from the user, thus enabling the execution of standard Fortran 77, C, or C++ programs. The efficiency of this mode of operation is determined by the extent to which the original code is parallelisable. In many cases it might be enhanced using another (possibly message passing) implementation. The application compiler included in the Exemplar environment may help in parallelising the original program and in generating the necessary parallel code.
Measured Performances: First results for the solution of a linear system
of order N = 1000 are 123, 213, 383, and 656 Mflop/s for 1, 2, 4, and 8
processors (within one hypernode), respectively. In [#linpackbm#
Next: The IBM 9076 SP2
Up: Distributed-memory MIMD systems
Previous: The Hitachi SR2201 series.
Jack Dongarra
Sat Feb 10 15:12:38 EST 1996