Machine type: Shared-memory multi-processor.
Models: PowerChallenge XL Array.
Operating system: IRIX (SGI's Unix variant).
Compilers: Fortran 77, C, C++ , Pascal.
System parameters:
Model | Model XL Array |
Clock cycle | 11.1 or 13.3 ns |
Theor. peak performance: | |
Per proc. (64-bit) | 0.30 or 0.36 Gflop/s |
Maximal (64-bit) | -- Gflop/s |
Main memory | -- GB |
Memory bandwidth: | |
Proc. to cache/proc. | 1.2 GB/s |
Main memory/cache | 1.2 GB/s |
Communication bandwidth | 100 MB/s |
No. of processors | -- |
Performance:
26.7 Gflop/s | |
(64-bit) | 46.1 Gflop/s |
Note: The value of is obtained with 128 processors.
The PowerChallenge XL system (see [3]) is used as a computational ``node'' in the PowerChallenge Array system. Such an XL system can house up to 18 MIPS R8000 RISC processors with, depending on the clock cycle, a peak speed of 300 or 360 Mflop/s. Internally data is transported from the main memory to the CPUs by the so-called POWERpath-2 bus. It is 256 bits wide and has a bandwidth of 1.2 GB/s. This is very fast as busses go but even then the data rates that are needed by the CPUs cannot possibly be fulfilled when no special provisions would exist. These provisions are present in the form of large data and instruction caches for each of the CPUs. The off-chip cache cache for floating-point data is very large: 16 MB. It should reduce the bus traffic as much as possible. All floating-point operations are done by streaming the operands from this large off-chip cache to the floating-point registers.
In the PowerChallenge Array the XL systems are coupled via a HiPPI channel to form a cluster of systems using PVM to communicate between them for the solution of extremely large application problems. SGI provides a ``shared-memory'' PVM which makes the message passing model homogeneous for the user both within a PowerChallenge XL and between the ``nodes''. This trend is also to be seen with other vendors (e.g. Convex SPP1200, Fujitsu VPP300, and NEC SX-4).
Parallelisation is done either automatically by the (Fortran or C) compiler or explicitly by the user, mainly through the use of directives. As synchronisation, etc., has to be done via memory the parallelisation overhead is fairly large.