Machine type: Shared-memory multi-vectorprocessor.
Models: SX-3/11R, SX-3/12R, SX-3/14R, SX-3/22R, SX-3/24R, SX-3/42R,
SX-3/44R.
Operating system: SXOS, SUPER-UX (NECs Unix variant).
Compilers: Fortran, C.
System parameters:
(x=1,2,4)
Performance:
Note: Only and for a fully configured SX-3/44R are quoted.
The SX-3R series is the second generation of NECs vector supercomputers, the first being the SX-2. In fact, the -3R series is an upgrade from the SX-3 series. This was however simply a lowering of the clock cycle from 2.9 to 2.5 ns.
The many model numbers deserve some explanation: suffix /xy stands for the number of processors x and the number of vector pipe sets y, respectively. In this terminology the largest SX-2 would have been a SX-2/14R because it has four pipe sets. With the new generation of SX-3s multiple processors were introduced. Because a pipe set under the right circumstances can produce four results per clock cycle at a cycle time of 2.5 ns, the theoretical peak speed is 1.6 Gflop/s per pipe set which leads to the quoted performances in the system parameters list. Although the clock cycle for the scalar processor is identical to that of the VPUs, an instruction only can be issued every two cycles, so effectively the situation is not very different from that of the other Japanese vectorprocessors where the scalar unit has a clock rate of twice that of the VPUs.
Apart from the arithmetic processors (APs) which contain the VPUs and a scalar processor, each system can be configured with 1 or 2 Control Processors (CPs). The CPs are dedicated to systems tasks, interactive processing, and controlling I/O requests (which are handled by separate I/O processors). The APs are thus freed from any menial tasks and can concentrate on computational work.
As in the Cray and Convex machines, synchronisation of parallel processes is performed via special communication registers in order to minimise synchronisation overhead.
It is not easy to derive the CPU-to-memory bandwidth in the SX-3 from the NEC documentation. One is left with the impression that up to the SX-3/24R the bandwidth is enough to sustain a stream of 2 operands/pipe-set/cycle and shipping one result/pipe-set/cycle back to memory.