next up previous contents
Next: The Fujitsu VPP300 series. Up: Distributed-memory MIMD systems Previous: The Cray Research Inc.

The Fujitsu AP1000.

Machine type RISC-based distributed-memory multi-processor
Models AP1000
Operating system Cell OS (transparent to the user) and SunOS (Sun's Unix variant) on the front-end system
Connection structure T-net (2-D torus), B-net (common bus + hierarchical ring), S-net (tree) (see remarks)
Compilers Fortran 77 and C with extensions

System parameters:

Model AP1000
Clock cycle 40 ns
Theor. peak performance
Per proc. (64-bit) 12.5 Mflop/s
Maximal (64-bit) 12.8 Gflop/s
Main memory <=16 GB
Memory/node 16 MB
Communication bandwidth
B-net 50 MB/s
T-net 25 MB/s
No. of processors 8-1024

Remarks:

The AP1000 is put together from computing cells each of which contains a 25 MHz SPARC processor (IU) and an additional floating-point processor (FPU). The processor cells are complemented by routing- and message controllers, a B-net interface (see below), cell memory, and cache memory (128 KB). The peak performance of the FPU is estimated to be 12.5 Mflop/s which brings the aggregate peak rate to 12.8 Gflop/s for a full 1024 cell system. The system is front-ended by a Sun 4 machine.

Fujitsu has attempted to diminish the communication problems that are inherent to DM-MIMD machines by implementing different networks for broadcasting and collection of data (the B-net), for synchronisation (the S-net), and for communication on the processor grid (the T-net). As the broadcasting or multicasting (i.e., broadcasting to a selected subset) of data often constitutes a bottleneck in the execution of a computational task, the B-net has a two times higher bandwidth than the interprocessor T-net (50 vs. 25 MB/s). Because the gather and scatter of data over the processors is generally less structured a combination od a common bus and a hierarchical ring structure is used. The B-net interface has FIFO buffers and scatter-gather controllers to allow for sending/receiving data independent the other active components in the cell. The message controller seeks to minimise the overhead for data transfer setup and relieves the IU from doing the message passing proper.

For the T-net which connects the cells in a 2-D grid the transfer speed is two times lower than that of the B-net, but as data movement will often be more regular, it is expected to give good throughput, especially as a new conflict-free wormhole routing scheme has been implemented by allocating routed messages to alternating buffer pairs in the intermediate cells. Experiments have shown relatively low message overhead for this system [#ishihata##1#].

There is a tree-structured S-net for barrier synchronisation of processes with again quite low overheads (a maximum of 5.2 tex2html_wrap_inline1228 s for a full configuration).

Recently an entry model of the AP1000, the AP1000C, is being offered. The AP1000C starts at a configuration of 8 processor cells instead of the original 64. Also the housing has been made more compact for this model, saving a factor 3 in space.

Measured Performances; In [#Ikesaka91##1#] the performance on the solution of a full linear system on a 256 cell machine is given. A system of order 100 performed at about 40 Mflop/s, an order 300 system attained 180 Mflop/s, while a 1000 tex2html_wrap_inline1236 1000 system reached more than 300 Mflop/s. In [#linpackbm##1#] a speed of 2.3 Gflop/s on a dense system of order 25,600 on 512 cells.



next up previous contents
Next: The Fujitsu VPP300 series. Up: Distributed-memory MIMD systems Previous: The Cray Research Inc.



Jack Dongarra
Sat Feb 10 15:12:38 EST 1996