next up previous contents
Next: The nCUBE 3. Up: Distributed-memory MIMD systems Previous: The Matsushita ADENART.

The Meiko Computing Surface 2.

Machine type Distributed-memory multi-vectorprocessor
Models Computing Surface 2
Operating system Internal OS transparent to the user, SunoS (Sun's Unix variant) on the front-end system
Connection structure Multistage crossbar
Compilers Extended Fortran 77, ANSI C
Vendors information Web page http://www.meiko.com/

System parameters:

Model Computing Surface 2
Clock cycle 20 ns
Theor. peak performance
Per Proc. (64 bits) 200, 40 Mflop/s
Maximal (64 bits) 204.8 Gflop/s
Main memory <= 128 GB
Memory/node 32-128, 32--512MB
Communication bandwidth --
No. of processors 8-1024 PEs

Remarks:

The CS-2 features 8-1,024 processor elements (PEs) which can be either scalar or vector nodes. Apart from a separate communications module, these PEs contain either a SuperSparc or a SuperSparc + 2 tex2html_wrap_inline1228 VP vectorprocessors. The speed of a scalar PE is estimated to be 40 Mflop/s (at a 20 ns clock) and 200 Mflop/s for the vector PEs for 64-bit precision. The tex2html_wrap_inline1228 VP modules are manufactured by Fujitsu. The speed at 32-bit precision is doubled with respect to 64-bit operation and, unlike the earlier Fujitsu VP products, use IEEE 754 floating-point format. The memory has 16 banks and to avoid memory bank conflicts the CS-2 has the interesting option to have scrambled allocation of addresses, thus guaranteeing good access at potential problematic strides 2, 4, etc.

The point-to-point communication speed is 100 MB/s (50 MB/s in each direction). Because the communication happens through multi-level crossbars, called ``layers'' by Meiko, the aggregate bandwidth of the system scales with the number of PEs, with a very respectable latency of 200 ns per layer. As the maximum configuration of the machine contains 1,024 PEs, the theoretical peak performance at 64-bit precision is 200 Gflop/s. It is possible to connect each PE to its own I/O devices to have scalable parallel I/O with the scaling of other resources.

The Portland Group which has won some renown for its excellent i860 compilers has developed the compilers for the CS-2. These include Fortran 77 and ANSI C but also Fortran 90. The current compiler already offers data distribution directives as proposed in [#HPFspec##1#].

In the USA the machine will be marketed by Meiko, however, in Europe and the rest of the world marketing is done by Parallel Computing Industries, a consortium of Meiko, Parsys, and Telmat.

Measured Performances: In [#linpackbm##1#] a speed of 5.0 Gflop/s on a 64 processor CS-2 is reported for the solution of an order 18688 dense linear system. From the NAS parallel benchmarks some results on a 128 processor machine are given for class B problems: EP took 21.16 seconds while 6.52 seconds was measured for the MG problem.



next up previous contents
Next: The nCUBE 3. Up: Distributed-memory MIMD systems Previous: The Matsushita ADENART.



Jack Dongarra
Sat Feb 10 15:12:38 EST 1996