next up previous contents
Next: The Meiko Computing Surface Up: Distributed-memory MIMD systems Previous: The Intel Paragon XP.

The Matsushita ADENART.

Machine type RISC-based distributed-memory multi-processor
Models ADENART64, ADENART256
Operating system Internal OS transparent to the user, SunOS (Sun's Unix variant) on the front-end system
Connection structure HX-net (see remarks)
Compilers ADETRAN, an extended Fortran 77

System parameters:

Model ADENART64 ADENART256
Clock cycle 50 ns 50 ns
Theor. peak performance
Per Proc. (64 bits) 10 Mflop/s 10 Mflop/s
Maximal (64 bits) 0.64 Gflop/s 2.56 Gflop/s
Main memory 0.5GB 0.5GB
Memory/node 8MB 2MB
{ Communication bandwidth} 20 MB/s 20MB/s
No. of processors 64 256

Remarks:

The ADENART has an interesting interconnection structure that is somewhere halfway between a crossbar and a grid. The processors are organised in planes, where for each plane all processors are connected by a crossbar. Between planes there is a connection structure that connects each crossbar node in a plane directly with its corresponding counterpart on all other planes. So, for a processor (i,j) in plane data that are required by processor (k,j) in the same plane can be transported by simply shifting it through the in-plane crossbar which can be accomplished in one step. For processors in different planes the number of steps is at most two. In the first step the data is routed to the right crossbar node in one plane and after being send to the plane where the target processor resides, send there from the corresponding crossbar node to the processor that requires them. The connection structure is called HX-net by Matsushita. Because of the connection structure the number of processors is constrained to be of the form tex2html_wrap_inline1272 and presently in the two model numbers available n is 3 or 4 (a machine with 1024 processors, n=5, is being considered). As remarked, the complexity of the network is lower than that of a crossbar: tex2html_wrap_inline1278 instead of tex2html_wrap_inline1072 while the efficiency is half of that of a crossbar: a maximum of 2 steps instead of 1.

The processors consist of a proprietary RISC processor with a peak speed of 20 Mflop/s in perfect pipeline mode, however, a ``sustained speed'' of 10 Mflop/s is quoted by Matsushita to arrive at the peak performance given in the system parameters list above. The inter-processor bandwidth is 20 MB/s, which is quite reasonable with respect to the processor speed. At this moment nothing is known about the message setup overhead however. Curiously enough, the amount of memory per node is 4 times larger for the ADENART64 than for the 256-processor model (8MB against 2MB per node). The latter memory size seems fairly small for a processor node that is meant to process large amounts of data. The front-end machine that hosts the ADENART is a Solbourne (Sun 4 compatible) workstation.

Measured Performances: In [#kadota##1#] a speed of 475 Mflop/s for a PDE solver using a Splitting-up Conjugate Gradient algorithm was reported for an ADENART256. Also, results for some Livermore kernels were given of which the highest reported speed was 520.1 Mflop/s. In the article there are some complaints about the rigidness of existing benchmark codes which should be a disadvantage for massively parallel computers. It could of course also be argued that massively parallel machines are too rigid to run general codes well. In [#nasbm##1#] some class A results for the ADENART256 are quoted: EP, FT, IS, SP, and BT times are 32.9, 72.7, 46.6, 209.9, 314.1 seconds respectively.



next up previous contents
Next: The Meiko Computing Surface Up: Distributed-memory MIMD systems Previous: The Intel Paragon XP.



Jack Dongarra
Sat Feb 10 15:12:38 EST 1996