Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | AP1000 |
Operating system | Cell OS (transparent to the user) and SunOS (Sun's Unix variant) on the front-end system |
Connection structure | T-net (2-D torus), B-net (common bus + hierarchical ring), S-net (tree) (see remarks) |
Compilers | Fortran 77 and C with extensions |
System parameters:
Model | AP1000 |
---|---|
Clock cycle | 40 ns |
Theor. peak performance | |
Per proc. (64-bit) | 12.5 Mflop/s |
Maximal (64-bit) | 12.8 Gflop/s |
Main memory | <=16 GB |
Memory/node | 16 MB |
Communication bandwidth | |
B-net | 50 MB/s |
T-net | 25 MB/s |
No. of processors | 8-1024 |
Remarks:
The AP1000 is put together from computing cells each of which contains a 25 MHz SPARC processor (IU) and an additional floating-point processor (FPU). The processor cells are complemented by routing- and message controllers, a B-net interface (see below), cell memory, and cache memory (128 KB). The peak performance of the FPU is estimated to be 12.5 Mflop/s which brings the aggregate peak rate to 12.8 Gflop/s for a full 1024 cell system. The system is front-ended by a Sun 4 machine.
Fujitsu has attempted to diminish the communication problems that are inherent to DM-MIMD machines by implementing different networks for broadcasting and collection of data (the B-net), for synchronisation (the S-net), and for communication on the processor grid (the T-net). As the broadcasting or multicasting (i.e., broadcasting to a selected subset) of data often constitutes a bottleneck in the execution of a computational task, the B-net has a two times higher bandwidth than the interprocessor T-net (50 vs. 25 MB/s). Because the gather and scatter of data over the processors is generally less structured a combination od a common bus and a hierarchical ring structure is used. The B-net interface has FIFO buffers and scatter-gather controllers to allow for sending/receiving data independent the other active components in the cell. The message controller seeks to minimise the overhead for data transfer setup and relieves the IU from doing the message passing proper.
For the T-net which connects the cells in a 2-D grid the transfer speed is two
times lower than that of the B-net, but as data movement will often be more
regular, it is expected to give good throughput, especially as a new
conflict-free wormhole routing scheme has been implemented by allocating
routed messages to alternating buffer pairs in the intermediate cells.
Experiments have shown relatively low message overhead for this system [#ishihata#
There is a tree-structured S-net for barrier synchronisation of processes with
again quite low overheads (a maximum of 5.2 s for a full configuration).
Recently an entry model of the AP1000, the AP1000C, is being offered. The
AP1000C starts at a configuration of 8 processor cells instead of the original
64. Also the housing has been made more compact for this model, saving a factor
3 in space.
Measured Performances; In [#Ikesaka91#
Next: The Fujitsu VPP300 series.
Up: Distributed-memory MIMD systems
Previous: The Cray Research Inc.
Jack Dongarra
Sat Feb 10 15:12:38 EST 1996