Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | Cenju-3S, Cenju-3 |
Operating system | Cenju-3 OS |
Connection structure | Multi-stage crossbar |
Compilers | Fortran 77, HPF (subset), ANSI C |
Vendors information Web page | www.ccrl-nece.technopark.gmd.de/Cenju.html |
Year of introduction | 1994. |
System parameters:
Model | Cenju-3S | Cenju-3 |
---|---|---|
Clock cycle | 20 ns | 13.3 ns |
Theor. peak performance | ||
Per Proc. (64 bits) | 33 Mflop/s | 50 Mflop/s |
Maximal (64 bits) | 533 Mflop/s | 12.8 Gflop/s |
Main memory | ||
Memory/node | < 64 MB | < 64 MB |
Memory/maximal | < 1 GB | < 16 GB |
Communication bandwidth | 40 MB/s | 40 MB/s |
No. of processors | 8--16 | 16--256 |
Remarks:
The name Cenju-3 suggests that there have been predecessors, Cenju-1 and Cenju-2. This is indeed the case but these systems have only been used internally by NEC for research purposes and were never officially marketed. The Cenju-3 is based on the MIPS R4400 RISC processor. All processors have, apart from their on-chip primary cache, a secondary cache of 1 MB to mitigate the problems that arise in the high data usage of the CPU.
The interconnection type used in the Cenju is a multistage crossbar build from 4×4 modules that are pipelined. So, in a full configuration the maximal number of levels in the crossbar to be traversed is four. The peak transfer rate of the crossbar is quoted as 40 MB/s irrespective of the data placement.
The system needs a front-end processor of the EWS4800 type (functionally equivalent to Silicon Graphics workstations). The I/O requirements have to be fulfilled by the front-end system as the Cenju does not have local (distributed) I/O capabilities.
There is some software support that should make the programmer's life somewhat easier. The library PARALIB/CJ contains proprietary functions for forking processes, barrier synchronisation, remote procedure calls, and block transfer of data. Like on the Cray T3E, the Hitachi SR2201, and on the Meiko CS-2 the programmer has the possibility to write/read directly to/from non-local memories which avoids much message passing overhead.
Measured Performances: Delivery of the systems have started in the second quarter of 1994 but no performance figures are available ever published for the Cenju-3. A comparative performance study has been done for various sorting algorithms on the Cenju-3 ([13])and the SP2 and a performance evaluation of MPP LS-DYNA3D ([19]).