Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | Cenju-3S, Cenju-3 |
Operating system | EWS-UX/V (Unix variant based on Unix System V.4) |
Connection structure | Multi-stage crossbar |
Compilers | Fortran 77, ANSI C |
System parameters:
Model | Cenju-3S | Cenju-3 |
---|---|---|
Clock cycle | 20 ns | 13.3 ns |
Theor. peak performance | ||
Per Proc. (64 bits) | 33 Mflop/s | 50 Mflop/s |
Maximal (64 bits) | 533 Mflop/s | 12.8 Gflop/s |
Main memory | < 1 GB | < 16 GB |
Memory/node | < 64 MB | < 64 MB |
Communication bandwidth | 40 MB/s | 40MB/s |
No. of processors | 8-16 | 16- 256 |
Remarks:
The name Cenju-3 suggests that there have been predecessors, Cenju-1 and Cenju-2. This is indeed the case but these systems have only been used internally by NEC for research purposes and were never officially marketed. The Cenju-3 is based on the same RISC processor as the Silicon Graphics Challenge, the MIPS R4400 processor (see 3.3.8). It is confusing that the peak performance of the processor is rated differently by Silicon Graphics and NEC respectively. The lower estimates of 33 vs. 50, and 50 vs. 75 Mflop/s as quoted by NEC seem to be more realistic. All processors have apart from their on-chip primary cache a secondary cache of 1 MB to mitigate the problems that arise in the high data usage of the CPU.
The interconnection type used in the Cenju is a multistage crossbar build from 4 4 modules that are pipelined. So, in a full configuration the maximal number of levels in the crossbar to be traversed is four. The peak transfer rate of the crossbar is quoted as 40 MB/s irrespective of the data placement.
The system needs a front-end processor of the EWS4800 type (functionally equivalent to Silicon Graphics workstations). The I/O requirements have to be fulfilled by the front-end system as the Cenju does not have local (distributed) I/O capabilities.
There is some software support that should make the programmer's life somewhat easier. The library PARALIB/CJ contains proprietary functions for forking processes, barrier synchronisation, remote procedure calls, and block transfer of data. Like on the Cray T3D (3.4.4) and on the Meiko CS-2 (3.4.13) the programmer has the possibility to write/read directly to/from non-local memories which avoids much message passing overhead.
Measured Performances: Delivery of the systems have started in the second quarter of 1994 but no performance figures are available ever published for the Cenju-3.