|Machine type||Distributed-memory multi-vector processor|
|Operating system||EWS-UX/V (Unix variant based on Unix System V.4)|
|Connection structure||Multi-stage crossbar (see Remarks)|
|Compilers||Fortran 77, Fortran 90, HPF, ANSI C, C++|
|Vendors information Web page||http://www.nec.co.jp/english/product/computer/sx/|
|Clock cycle||8 ns||8 ns||8 ns|
|Theor. peak performance|
|Per Proc. (64 bits)||1 Gflop/s||2 Gflop/s||2 Gflop/s|
|Single frame: Maximal (64 bits)||1 Gflop/s||8 Gflop/s||64 Gflop/s|
|Multi frame: Maximal (64 bits)||--||--||1 Tflop/s|
|Main memory||< 2 GB||< 2 GB||< 128 GB|
|No. of processors||1||1-4||4-512|
The SX-4 series is comprised of a large range of machine sizes. The smallest of these is the SX-4Ce. This machine has one CPU housing 4 vector pipe sets. As the clock cycle is 8 ns and each pipe set is able to deliver 2 floating-point results per cycle, the total maximum performance is 1 Gflop/s for this system. In all other systems the replication factor of the pipe sets is 8 which doubles the speed per CPU to a maximum of 2 Gflop/s. The bandwidth from memory to the CPUs is 16 64-bit words per cycle per CPU. With a replication factor of 8 this is enough to provide two operands per pipe set but it is not sufficient to transport the results back to the memory at the same time. So, some trade-offs with the re-use of operands have to be made to attain the peak performance.
The technology used is CMOS. This lowers the fabrication costs and the power consumption appreciably (the same approach is being used in the Fujitsu VPP300, see 3.4.6) and all models are air cooled. This enables the placement of up to 32 CPUs in one frame (for the SX-4 model). Beyond this maximum single frame system, it is possible to couple up to 16 frames together to form a distributed memory system. This is equivalent to the PowerChallenge Array idea (see 3.3.6). There are two ways to couple the SX-4 frames: NEC provides a full crossbar, the so-called IXS, crossbar to connect the various frames together at a speed of 16 GB/s for poit-to-point out-of-frame communication (128 GB/s bi-sectional bandwidth for a maximum configuration). In addition, a HiPPI interface is available for interframe communication at lower cost and speed.
For distributed computing there is an HPF compiler and for message passing an optimised MPI (MPI/SX) is available. The SX-4 is the only system that supports three floating-point number systems: IBM-compatible, Cray-compatible, and the IEEE 754 standard.
Measured Performances: The SX-4 will be available from the first quarter of 1996. Therefore, at this moment no performance results are available.