In 1977 Fujitsu produced the first supercomputer prototype called the F230-75 APU that was a pipelined vector processor added to a scalar processor. This attached processor was installed in the Japanese Atomic Energy Commission (JAERI) and the National Aeronautic Lab (NAL).
In 1983 the company came out with the VP-200 and VP-100 systems, which later spun off the low-end VP-50 and VP-30 systems. In 1986 came the VP-400 (with twice as many pipelines as the VP-200) and as of mid-1987 the whole family became the E-series with the addition of an extra (multiply-add) pipelined floating point unit that boosted the performance potential by 50%. Thanks to the flexible range of systems in this generation (VP-30E to VP-400E), and other reasons such as good marketing and a broad range of applications, Fujitsu has became the largest domestic supplier with over 80 systems installed, many of which are well below the cut-off limit in the TOP500.
Available since 1990, the VP-2000 family can offer a peak performance of 5 Gflop/s thanks to a vector cycle time of 3.2 ns. The family was initially announced with four vector performance levels (model 2100, 2200, 2400, and 2600) where each level could have either one of two scalar processors. but the VP-2400/40 doubled this limit offering a peak vector performance similar to the VP-2600. Most of these models are now represented in the Japanese TOP500.
Previous machines had been heavily criticised for the lack of memory throughput. The VP-400 series had only one load/store path to memory that peaked at 4.57 GB/s. This was improved in the VP-2000 series by doubling the paths so that each pipeline set can do two load/store operations per cycle giving a total transfer rate of 20 GB/s. Fujitsu recently decided to use the label, VPX-2x0, for the VP-2x00 systems adapted to their Unix system. Keio Daigaku (university) now runs such a system.
[o]The VPP-500 series In 1993 Fujitsu surprised the world by announcing a Vector Parallel Processor (VPP) series that was designed for reaching well into the range of hundreds of Gflop/s. At the core of the system is a combined Ga-As/Bi-CMOS processor, based largely on the original design of the VP-200. By using the most advanced hardware technology available the processor chips have a gate delay as low as 60 ps in the Ga-As chips. The resulting cycle time is 9.5 ns. The processor has four independent pipelines each capable of executing two Multiply-Add instructions in parallel resulting in a peak speed of 1.7 Gflop/s per processor. Each processor board is equipped with 256 Megabytes of central memory.
The most striking part of the VPP-500 is the capability to interconnect up to 222 processors via a cross-bar network via two independent (read/write) connections, each operating at 400 MB/s. The total memory can be addressed via virtual shared memory primitives. The system is meant to be front-ended by a VP-2x00 system that handles input/output and permanent file store, and job queue logistics.
As mentioned in the introduction, an early version of this system called the Numeric Wind Tunnel, was developed together with NAL. This early version of the VPP-500 (with 140 processors) is today the fastest supercomputer in the world and stands out at the beginning of the TOP500 due to a [tex2html_wrap2074] value that is twice that of the TMC CM-5/1024 installed at Los Alamos.