next up previous contents
Next: The Tera MTA. Up: Shared-memory MIMD systems Previous: The HP/Convex C4600.

The NEC SX-4.

Machine type Distributed-memory multi-vector processor
Models SX-4C, SX-4
Operating system EWS-UX/V (Unix variant based on Unix System V.4)
Connection structure Multi-stage crossbar (see Remarks)
Compilers Fortran 77, Fortran 90, HPF, ANSI C, C++
Vendors information Web page http://www.nec.co.jp/english/product/computer/sx/

System parameters:

Model SX-4Ce SX-4C SX-4
Clock cycle 8 ns 8 ns 8 ns
Theor. peak performance
Per Proc. (64 bits) 1 Gflop/s 2 Gflop/s 2 Gflop/s
Single frame: Maximal (64 bits) 1 Gflop/s 8 Gflop/s 64 Gflop/s
Multi frame: Maximal (64 bits) -- -- 1 Tflop/s
Main memory < 2 GB < 2 GB < 128 GB
Communication bandwidth
(see Remarks) -- -- --
No. of processors 1 1-4 4-512

Remarks:

The SX-4 series is comprised of a large range of machine sizes. The smallest of these is the SX-4Ce. This machine has one CPU housing 4 vector pipe sets. As the clock cycle is 8 ns and each pipe set is able to deliver 2 floating-point results per cycle, the total maximum performance is 1 Gflop/s for this system. In all other systems the replication factor of the pipe sets is 8 which doubles the speed per CPU to a maximum of 2 Gflop/s. The bandwidth from memory to the CPUs is 16 64-bit words per cycle per CPU. With a replication factor of 8 this is enough to provide two operands per pipe set but it is not sufficient to transport the results back to the memory at the same time. So, some trade-offs with the re-use of operands have to be made to attain the peak performance.

The technology used is CMOS. This lowers the fabrication costs and the power consumption appreciably (the same approach is being used in the Fujitsu VPP700) and all models are air cooled. This enables the placement of up to 32 CPUs in one frame (for the SX-4 model). Beyond this maximum single frame system, it is possible to couple up to 16 frames together to form a distributed memory system. This is equivalent to the AlphaServer cluster idea. There are two ways to couple the SX-4 frames: NEC provides a full crossbar, the so-called IXS, crossbar to connect the various frames together at a speed of 16 GB/s for point-to-point out-of-frame communication (128 GB/s bi-sectional bandwidth for a maximum configuration). In addition, a HiPPI interface is available for interframe communication at lower cost and speed.

For distributed computing there is an HPF compiler and for message passing an optimised MPI (MPI/SX) is available. The SX-4 is the only system that supports three floating-point number systems: IBM-compatible, Cray-compatible, and the IEEE 754 standard.

Measured Performances: In [4] a speed of 60.7 \gfl was reported for the solution of a full linear system of order 10000 on a 32-processor configuration.



next up previous contents
Next: The Tera MTA. Up: Shared-memory MIMD systems Previous: The HP/Convex C4600.



Aad van der Steen
Mon Mar 3 08:24:51 MET 1997