The Hitachi SR2201 series.

Next: The HP/Convex Exemplar SPP-1200. Up: Distributed-memory MIMD systems Previous: The Fujitsu VPP300 series.

The Hitachi SR2201 series.

Machine type RISC-based distributed memory multi-processor
Models SR2201
Operating system HI-UX/MPP (Micro kernel Mach 3.0)
Connection structure Hyper crossbar
Compilers Fortran 77, Fortran 90, Parallel Fortran, HPF, C, C++

Machine type	RISC-based distributed memory multi-processor
Models	SR2201
Operating system	HI-UX/MPP (Micro kernel Mach 3.0)
Connection structure	Hyper crossbar
Compilers	Fortran 77, Fortran 90, Parallel Fortran, HPF, C, C++

System parameters:

Model SR2201
Clock cycle 6.7 ns
Theor. peak performance
Per proc. (64-bit) 300 Mflop/s
Maximal (64-bit) 307 Gflop/s

Main memory <=256 GB
Memory/node <=256 MB
Communication bandwidth 300 MB/s
No. of processors 32-1024

Model	SR2201
Clock cycle	6.7 ns
Theor. peak performance
Per proc. (64-bit)	300 Mflop/s
Maximal (64-bit)	307 Gflop/s
Main memory	<=256 GB
Memory/node	<=256 MB
Communication bandwidth	300 MB/s
No. of processors	32-1024

Remarks:

The SR2201 is the second generation of distributed memory parallel systems of Hitachi. The basic node processor is again an Hitachi implementation of the PA-RISC architecture of HP running at a clock cycle of 6.7 ns. However, in contrast with its predecessor, the SR2001, in the SR2201 the node processors are somewhat modified to allow for ``pseudo vector processing'' (both hardware and instructions). This means that for operations on long vectors one does not have to care about the detrimental effects of cache misses that often ruin the performance of RISC processors unless code is carefully blocked and unrolled. First experiments have shown that this idea seems to work quite well. THe system supports distributed I/O with a possibility to connect disks to every node.

As in the earlier SR2001, the connection structure is a hyper (3-D) crossbar which connects all nodes directly at high speed (300 MB/s point-to-point). In February 1996 two 1024-node systems will be to in stalled at the Universities of Tokyo and Tsukuba respectively.

Like in some other systems as the Cray T3E (3.4.4) and the Meiko CS-2 (3.4.12), and the NEC Cenju-3 (3.4.14), one is able to directly access the memories of remote processors. Together with the very fast hardware-based barrier synchronisation this should allow for writing distributed programs with very low parallelisation overhead.

The following software products will be supported in addition to those already mentioned above: PVM, MPI, PARMACS, Linda, Express, FORGE90, and PARALLELWARE. In addition a numerical libraries (MATRIX/MPP, MATRIX/MPP/SSS) will be offered. These libraries support basic linear algebra operations with dense and band matrices, Fast Fourier Transformations, and skyline solvers.

Measured Performances: Some preliminary (but not yet officially certified) results of class A NAS parallel benchmarks show that the SR2201 runs at about 1.3 Gflop/s on 16 processors for the MG benchmarks and about 700 Mflop/s for the CG benchmark also on 16 processors ([#kawabe##1#]

Next: The HP/Convex Exemplar SPP-1200. Up: Distributed-memory MIMD systems Previous: The Fujitsu VPP300 series.

Jack Dongarra
Sat Feb 10 15:12:38 EST 1996