Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | ADENART64, ADENART256 |
Operating system | Internal OS transparent to the user, SunOS (Sun's Unix variant) on the front-end system |
Connection structure | HX-net (see remarks) |
Compilers | ADETRAN, an extended Fortran 77 |
System parameters:
Model | ADENART64 | ADENART256 |
---|---|---|
Clock cycle | 50 ns | 50 ns |
Theor. peak performance | ||
Per Proc. (64 bits) | 10 Mflop/s | 10 Mflop/s |
Maximal (64 bits) | 0.64 Gflop/s | 2.56 Gflop/s |
Main memory | 0.5GB | 0.5GB |
Memory/node | 8MB | 2MB |
{ Communication bandwidth} | 20 MB/s | 20MB/s |
No. of processors | 64 | 256 |
Remarks:
The ADENART has an interesting interconnection structure that is somewhere halfway between a crossbar and a grid. The processors are organised in planes, where for each plane all processors are connected by a crossbar. Between planes there is a connection structure that connects each crossbar node in a plane directly with its corresponding counterpart on all other planes. So, for a processor (i,j) in plane data that are required by processor (k,j) in the same plane can be transported by simply shifting it through the in-plane crossbar which can be accomplished in one step. For processors in different planes the number of steps is at most two. In the first step the data is routed to the right crossbar node in one plane and after being send to the plane where the target processor resides, send there from the corresponding crossbar node to the processor that requires them. The connection structure is called HX-net by Matsushita. Because of the connection structure the number of processors is constrained to be of the form 22n and presently in the two model numbers available n is 3 or 4 (a machine with 1024 processors, n=5, is being considered). As remarked, the complexity of the network is lower than that of a crossbar: O(n3/2) instead of O(n²) while the efficiency is half of that of a crossbar: a maximum of 2 steps instead of 1.
The processors consist of a proprietary RISC processor with a peak speed of 20 Mflop/s in perfect pipeline mode, however, a ``sustained speed'' of 10 Mflop/s is quoted by Matsushita to arrive at the peak performance given in the system parameters list above. The inter-processor bandwidth is 20 MB/s, which is quite reasonable with respect to the processor speed. At this moment nothing is known about the message setup overhead however. Curiously enough, the amount of memory per node is 4 times larger for the ADENART64 than for the 256-processor model (8 MB against 2 MB per node). The latter memory size seems fairly small for a processor node that is meant to process large amounts of data. The front-end machine that hosts the ADENART is a Solbourne (Sun 4 compatible) workstation.
Measured Performances: In [13] a speed of 475 Mflop/s for a PDE solver using a Splitting-up Conjugate Gradient algorithm was reported for an ADENART256. Also, results for some Livermore kernels were given of which the highest reported speed was 520.1 Mflop/s. In the article there are some complaints about the rigidness of existing benchmark codes which should be a disadvantage for massively parallel computers. It could of course also be argued that massively parallel machines are too rigid to run general codes well. In [14] some class A results for the ADENART256 are quoted: EP, FT, IS, SP, and BT times are 32.9, 72.7, 46.6, 209.9, 314.1 seconds respectively.