next up previous contents
Next: The Parallel Computing Industries Up: Distributed-memory MIMD systems Previous: The nCUBE 3.

The NEC Cenju-3.

Machine type RISC-based distributed-memory multi-processor
Models Cenju-3S, Cenju-3
Operating system EWS-UX/V (Unix variant based on Unix System V.4)
Connection structure Multi-stage crossbar
Compilers Fortran 77, ANSI C

System parameters:

Model Cenju-3S Cenju-3
Clock cycle 20 ns 13.3 ns
Theor. peak performance
Per Proc. (64 bits) 33 Mflop/s 50 Mflop/s
Maximal (64 bits) 533 Mflop/s 12.8 Gflop/s
Main memory < 1 GB < 16 GB
Memory/node < 64 MB < 64 MB
Communication bandwidth 40 MB/s 40MB/s
No. of processors 8-16 16- 256

Remarks:

The name Cenju-3 suggests that there have been predecessors, Cenju-1 and Cenju-2. This is indeed the case but these systems have only been used internally by NEC for research purposes and were never officially marketed. The Cenju-3 is based on the same RISC processor as the Silicon Graphics Challenge, the MIPS R4400 processor (see 3.3.8). It is confusing that the peak performance of the processor is rated differently by Silicon Graphics and NEC respectively. The lower estimates of 33 vs. 50, and 50 vs. 75 Mflop/s as quoted by NEC seem to be more realistic. All processors have apart from their on-chip primary cache a secondary cache of 1 MB to mitigate the problems that arise in the high data usage of the CPU.

The interconnection type used in the Cenju is a multistage crossbar build from 4 tex2html_wrap_inline1236 4 modules that are pipelined. So, in a full configuration the maximal number of levels in the crossbar to be traversed is four. The peak transfer rate of the crossbar is quoted as 40 MB/s irrespective of the data placement.

The system needs a front-end processor of the EWS4800 type (functionally equivalent to Silicon Graphics workstations). The I/O requirements have to be fulfilled by the front-end system as the Cenju does not have local (distributed) I/O capabilities.

There is some software support that should make the programmer's life somewhat easier. The library PARALIB/CJ contains proprietary functions for forking processes, barrier synchronisation, remote procedure calls, and block transfer of data. Like on the Cray T3D (3.4.4) and on the Meiko CS-2 (3.4.13) the programmer has the possibility to write/read directly to/from non-local memories which avoids much message passing overhead.

Measured Performances: Delivery of the systems have started in the second quarter of 1994 but no performance figures are available ever published for the Cenju-3.



next up previous contents
Next: The Parallel Computing Industries Up: Distributed-memory MIMD systems Previous: The nCUBE 3.



Jack Dongarra
Sat Feb 10 15:12:38 EST 1996