The NEC Cenju-3.

Next: The Parsytec CC series. Up: Distributed-memory MIMD systems Previous: The nCUBE 2S.

The NEC Cenju-3.

Machine type RISC-based distributed-memory multi-processor
Models Cenju-3S, Cenju-3
Operating system Cenju-3 OS
Connection structure Multi-stage crossbar
Compilers Fortran 77, HPF (subset), ANSI C
Vendors information Web page www.ccrl-nece.technopark.gmd.de/Cenju.html
Year of introduction 1994.

Machine type	RISC-based distributed-memory multi-processor
Models	Cenju-3S, Cenju-3
Operating system	Cenju-3 OS
Connection structure	Multi-stage crossbar
Compilers	Fortran 77, HPF (subset), ANSI C
Vendors information Web page	www.ccrl-nece.technopark.gmd.de/Cenju.html
Year of introduction	1994.

System parameters:

Model Cenju-3S Cenju-3
Clock cycle 20 ns 13.3 ns
Theor. peak performance
Per Proc. (64 bits) 33 Mflop/s 50 Mflop/s
Maximal (64 bits) 533 Mflop/s 12.8 Gflop/s
Main memory
Memory/node < 64 MB < 64 MB
Memory/maximal < 1 GB < 16 GB
Communication bandwidth 40 MB/s 40 MB/s
No. of processors 8--16 16--256

Model	Cenju-3S	Cenju-3
Clock cycle	20 ns	13.3 ns
Theor. peak performance
Per Proc. (64 bits)	33 Mflop/s	50 Mflop/s
Maximal (64 bits)	533 Mflop/s	12.8 Gflop/s
Main memory
Memory/node	< 64 MB	< 64 MB
Memory/maximal	< 1 GB	< 16 GB
Communication bandwidth	40 MB/s	40 MB/s
No. of processors	8--16	16--256

Remarks:

The name Cenju-3 suggests that there have been predecessors, Cenju-1 and Cenju-2. This is indeed the case but these systems have only been used internally by NEC for research purposes and were never officially marketed. The Cenju-3 is based on the MIPS R4400 RISC processor. All processors have, apart from their on-chip primary cache, a secondary cache of 1 MB to mitigate the problems that arise in the high data usage of the CPU.

The interconnection type used in the Cenju is a multistage crossbar build from 4×4 modules that are pipelined. So, in a full configuration the maximal number of levels in the crossbar to be traversed is four. The peak transfer rate of the crossbar is quoted as 40 MB/s irrespective of the data placement.

The system needs a front-end processor of the EWS4800 type (functionally equivalent to Silicon Graphics workstations). The I/O requirements have to be fulfilled by the front-end system as the Cenju does not have local (distributed) I/O capabilities.

There is some software support that should make the programmer's life somewhat easier. The library PARALIB/CJ contains proprietary functions for forking processes, barrier synchronisation, remote procedure calls, and block transfer of data. Like on the Cray T3E, the Hitachi SR2201, and on the Meiko CS-2 the programmer has the possibility to write/read directly to/from non-local memories which avoids much message passing overhead.

Measured Performances: Delivery of the systems have started in the second quarter of 1994 but no performance figures are available ever published for the Cenju-3. A comparative performance study has been done for various sorting algorithms on the Cenju-3 ([13])and the SP2 and a performance evaluation of MPP LS-DYNA3D ([19]).

Next: The Parsytec CC series. Up: Distributed-memory MIMD systems Previous: The nCUBE 2S.

Aad van der Steen
Mon Feb 16 11:36:56 MET 1998