Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | Avalon A12 |
Operating system | AVALON micro kernel based Unix (Image compatible with Digital Unix) |
Connection structure | Multistage variable (see remarks) |
Compilers | Fortran 77, Fortran 90 extensions, HPF, ANSI C |
Vendors information Web page | www.teraflop.com/ |
Year of introduction | 1996. |
System parameters:
Model | A12 |
---|---|
Clock cycle | 2.5 ns |
Theor. peak performance | |
Per proc. (64-bit) | 800 Mflop/s |
Maximal (64-bit) | 1.3 Tflop/s |
Memory/node | <= 1 GB |
Memory (maximal) | 1.7 TB |
Communication bandwidth | |
Point-to-point | 128-400 MB/s |
Bisectional (maximal) | 10 GB/s |
No. of processors | 12--1680 |
Remarks:
The A12 is be based on the DEC Alpha 21164 RISC procesor. The processor used in the system has a clock cycle of 2.5 ns. However, most of the information given at the vendors Web page still uses the data for a node processor with a 3.3 ns clock to describe the configuration properties. The Web information is therefore internally somewhat inconsistent. Because the Alpha 21164 has dual floating-point arithmetic pipes it will deliver a theoretical peak performance of 800 Mflop/s. The maximum configuration of the system is given as 1680 processors The first and second level cache that reside on chip, a 1 MB third level cache is provided on each A12 CPU card. The bandwidth to/from the first level cache is sufficient to transport two operands to the CPU and to ship one result back in one cycle. The second level cache has two-thirds of is bandwidth, while the third level cache has the capability of providing an 64-bit word every two cycles. The bandwidth to/from memory is 400 MB/s or one 64-bit word every 8 cycles. The memory has two-way interleaved banks of a memory that can be up to 1 GB/node.
Each CPU card contains a Alpha 21164 processor, the third level or B cache and the local memory for that node. Twelve CPU cards can be housed in one crate which has a full crossbar backplane. This yields a internode bandwidth of slightly under 400 MB/s between the cards within one crate. Apart from the 12 slots for CPU cards, there are two extra dual channel slots that can accomodate communication cards that provide the connections with other crates. For the in-crate crossbar CMOS technology is used. However, for the intercrate connections ECL logic is employed. The actual connections between crates are made by coaxial cables. This way of connection provides a large flexibility in the overall interconnection topology: one could build trees or toruses or a secondary level crossbar (is the last case one crate should be filled entirely with communication cards to build a 144 processor system). The communication speed between crates is less fast (but still respectable): 128 MB/s. Various configurations are described at the Web-address given above.
I/O can be configured in various ways: It is possible to put 32-bit or 64-bit PCI expansion cards on each CPU card to obtain what Avalon calls "Type 1 I/O nodes". Also, a direct switch connection via a variant of the communication card can be made to the outside world. Depending on the number of cards the bandwidth is 400 or 800 MB/s for this type 3 I/O node. The type 2 I/O node is in fact a dedicated TCP/IP connection as needed for the control workstation as required by the system.
Measured Performances: A A12 was said to be sold by the end of 1996 although Avalon was not willing to reaveal its customer, nor were any performance figures available.