|Machine type||Processor array|
|Models||Quadrics Qx, QHx, x = 1, 16|
|Front-end||Almost any Unix workstation|
|Operating system||Internal OS transparent to the user, Unix on front-end|
|Connection structure||3-D mesh, (see remarks)|
|Compilers||TAO: a Fortran 77 compiler with some Fortran 90 and some proprietary array extensions|
|Clock cycle||40 ns||40 ns|
|Theor. peak performance|
|Per Proc. (32-bits)||50 Mflop/s||50 Mflop/s|
|Maximal (32-bits)||6.4 Gflop/s||100 Gflop/s|
|Memory||<= 2 GB||<= 32 GB|
|No. of processors||8-128||128-2048|
|Per Proc.||50 MB/s||50 MB/s|
|Aggregate local||<=6 GB/s||<=96 GB/s|
|Aggregate non-local||<=1.5 GB/s||<=24 GB/s|
The Quadrics is a commercial spin-off of the APE-100 project of the Italian National Institute for Nuclear Physics. Systems are available in multiples of 8 processor nodes in the Q-model where up to 16 boards can be fitted into one crate or in multiples of 128 nodes in the QH-model by adding up to 15 crates to the minimal 1-crate system. The interconnection topology of the Quadrics is a 3-D grid with interconnections to the opposite sides (so, in effect a 3-D torus). The 8-node floating-point boards (FPBs) are plugged into the crate backplane which provides point-to-point communication and global control distribution. The FPBs are configured as cubes that are connected to the other boards appropriately to arrive at the 3-D grid structure.
The basic floating-point processor, the so-called MAD chip, contains a register file of 128 registers. Of these registers the first two hold permanently the values 0 and 1 to be able to express any addition or multiplication as a ``normal operation'', i.e., a combined multiply-add operation, where an addition is of the form, and a multiplication is . In favourable circumstances the processor can therefore deliver two floating-point operations per cycle. Instructions are centrally issued by the controller at a rate of one instruction every two clock cycles.
Communication is controlled by the Memory Controller and the Communication Controller which are both housed on the backplane of a crate. When the Memory Controller generates an address it is decoded by the Communication Controller. In case non-local access is desired, the Communication Controller will provide the necessary data transmission. The memory bandwidth per processor is 50 MB/s which means that very 2 cycles an operand can be shipped in or out a processor. The bandwidth for non-local communication turns out to be only four times smaller then local memory access.
The Quadrics communicates with the front-end system via a T805 transputer-based interface system, called the Local Asynchronous Interface (LAI). The interface can write and read the memories of the nodes and the Controller. Presently, the bandwidth of the interface to the front-end processor is not very large (1 MB/s). It is expected that this can be improved by about a factor of 30 in the near future. I/O has to be performed via the front-end system and will therefore be relatively slow.
The TAO language has several extensions to employ the SIMD features of the Quadrics. Firstly, floating-point variables are assumed to be local to the processor that owns them, while integer variables are assumed to be global. Local variables can be promoted to global variables. Other extensions are the ANY, ALL, and WHERE/END WHERE keywords that can be used for global testing and control. Processors that not meet a global condition effectively skip the operation(s) that are associated with it. For easy referencing nearest-neighbour locations special constants LEFT, RIGHT, UP, DOWN, FRONT, and BACK are provided. In addition, new data types and operators on these data types are supported together with overloading of operators. This enables very concise code for certain types of calculations.