Machine type | Distributed-memory multi-vectorprocessor |
---|---|
Models | Computing Surface 2 |
Operating system | Internal OS transparent to the user, Solaris (Sun's Unix variant) on the front-end system |
Connection structure | Multistage crossbar |
Compilers | Extended Fortran 77, ANSI C |
Vendors information Web page | www.meiko.com/info/ |
Year of introduction | 1994. |
System parameters:
Model | Computing Surface 2 |
---|---|
Clock cycle | 20 ns |
Theor. peak performance | |
Per Proc. (64 bits) | 200, 40 Mflop/s |
Maximal (64 bits) | 204.8 Gflop/s |
Main memory | |
Memory/node | 32-128, 32--512MB |
Memory/maximal | <= 128 GB |
Communication bandwidth | |
Point-to-point, bi-directional | 50 MB/s |
No. of processors | 8--1024 PEs |
Remarks:
The CS-2 features 8-1,024 processor elements (PEs) which can be either scalar or vector nodes. Apart from a separate communications module, these PEs contain either a SuperSparc or a SuperSparc + 2 µVP vectorprocessors. The speed of a scalar PE is estimated to be 40 Mflop/s (at a 20 ns clock) and 200 Mflop/s for the vector PEs for 64-bit precision. The µVP modules are manufactured by Fujitsu. The speed at 32-bit precision is doubled with respect to 64-bit operation and, unlike the earlier Fujitsu VP products, use IEEE 754 floating-point format. The memory has 16 banks and to avoid memory bank conflicts the CS-2 has the interesting option to have scrambled allocation of addresses, thus guaranteeing good access at potential problematic strides 2, 4, etc.
The point-to-point communication speed is 100 MB/s (50 MB/s in each direction). Because the communication happens through multi-level crossbars, called "layers" by Meiko, the aggregate bandwidth of the system scales with the number of PEs, with a very respectable latency of 200 ns per layer. As the maximum configuration of the machine contains 1,024 PEs, the theoretical peak performance at 64-bit precision is 200 Gflop/s. It is possible to connect each PE to its own I/O devices to have scalable parallel I/O with the scaling of other resources.
The Portland Group which has won some renown for its excellent i860 compilers has developed the compilers for the CS-2. These include Fortran 77 and ANSI C but also Fortran 90. The current compiler already offers data distribution directives as proposed in [9].
In the USA the machine will be marketed by Meiko. In 1996 Meiko has merged with Alenia, the same firm that also markets the Alenia Quadrics. Although the new marketing policy has not been made clear, it may assumed that Alenia will market the system in Europe and the rest of the world.
Measured Performances: In [2] a speed of 5.0 Gflop/s on a 64 processor CS-2 is reported for the solution of an order 18688 dense linear system. From the NAS parallel benchmarks [11] some results on a 128 processor machine are given for class B problems: EP took 21.16 seconds while 6.52 seconds was measured for the MG problem.