In 1993 Fujitsu surprised the world by announcing a Vector Parallel Processor (VPP) series that was designed for reaching well into the range of hundreds of Gflop/s. The first implementation of a vector parallel architecture is the Fujitsu VPP500, and its predecessor, the Numerical Wind Tunnel at the Japanese National Laboratory for Aerospace. The VPP500 system is based on a GaAs/BiCMOS implementation of Fujitsu's VP vector processor, again with a number of enhancements. The single processor has a peak performance of 1.6 Gflop/s. In contrast to the NEC SX-4 system, no shared memory concept is pursued in the VPP500 in the sense of a uniform memory access speed. Instead, each vector processor has its own local memory, constructed from synchronous SRAM with a bandwith of 6.4 GByte/s for load plus the same for store. These local memories are connected through a data path of 400 MByte/s bandwidth for write and 400 MByte/s for read to an interprocessor crossbar network.
The most striking part of the VPP500 is the capability to interconnect up to 222 processors via a crossbar network via two independent (read/write) connections, each operating at 400 MB/s. The total memory can be addressed via virtual shared memory primitives. The crossbar network offers global adressing hardware to support transfer of data packets between local memories. This hardware allows the programmer to view the physically distributed memories as a single flat address space.
The system is requires a VP vector processor as front-end that handles input/output and permanent file store, and job queue logistics. While the VPP500 deserves much credit as the first commercially viable vector parallel system, scalability is limited at least in principle, because it requires a frontend system and because the single stage crossbar architecture would seem to be difficult for very large numbers of processors. Fujitsu has just issued a statement of direction that promises CMOS based (probably VPP500-like) systems for next year, which presumably would be fully scalable.