Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | SPP-2000 |
Operating system | SPP-UX, based on OSF/1 AD microkernel |
Connection structure | Ring |
Compilers | Fortran, C |
Vendors information Web page | http://www.hp.com/go/techservers |
System parameters:
Model | SPP-2000K | SPP-2000S | SPP-2000X |
---|---|---|---|
Clock cycle | 5.55 ns | 5.55 ns | 5.55 ns |
Theor. peak performance | |||
Per proc. (64-bit) | 720 Mflop/s | 720 Mflop/s | 720 Mflop/s |
Maximal (64-bit) | 2.9 Gflop/s | 11.5 Gflop/s | 46.8 Gflop/s |
Memory/node | <=1 GB | <=1 GB | <=1 GB |
Main memory | <=4 GB | <=16 GB | <=64 GB |
Communication bandwidth | |||
aggregate (see remarks) | 3.84 GB/s | 15.4 GB/s | 15.4/3.84 4GB/s |
No. of processors | 1-4 | 4-16 | 16-64 |
Remarks:
The SPP-2000 systems form the family of successors of the SPP-1200/1600. There are significant differences with respect to the preceding SPP-1200 generation. The SPP-2000K and S are shared memory machines connecting their maximally 4 and 16 PA-RISC 8000 processors, respectively, by a crossbar. Each processor has a peak performance of 720 Mflop/s and because the processors feature out-of-order execution of instructions it may be expected that memory latency effects can be evaded or diminished in a good many cases. This should make the impact of cache misses much less severe. Data and instruction caches are large (1 MB both) which also will help in minimising cache misses.
One SPP-2000S can be viewed as the successor of a hypernode in the earlier SPP-1200/SPP-1600 systems. As such the number of processors within a hypernode has doubled. Also the amount of memory per system has increased 8-fold from 8\tm256 MB to 16\tm 1 GB. The internal aggregate bandwidth is 15.36 GB/s for the 2000S and 3.84 GB/s for the 2000K. I/O can be done at an aggregate rate of 960 MB/s.
As in the earlier SPP-1200/1600 systems, the hypernodes are connected by uni-directional SCI rings with an aggregate bandwidth of 3.84 GB/s. This makes the SPP-2000X a NUMA machine when operates in a shared memory fashion.
The Exemplar programming environment as was available for the SPP-1200/1600 carries over to the SPP-2000K/S/X without changes. This environment includes a message passing programming model (PVM) and a virtual shared memory model which allows the user to have a shared-memory view of the system. Of course the shared memory model is not surprising for a symmetrical multiprocessor machine like the SPP-2000S but it is still valid in the SPP-2000X systems which effectively clusters four SPP-2000S systems.
Measured Performances: In [4] a speed of 7.8 \gfl is reported for a 16 proc. system when solving a 13,320-order dense linear system. For the EuroBen mod2a matrix-vector multiplication benchmark a speed of 417 Mflop/s is found on 16 processors. This is however for straight Fortran 77 code with PVM and without the use of library routines.