Machine type | RISC-based distributed-memory multi-processor cluster |
---|---|
Models | IBM eServer p690. |
Operating system | AIX (IBMs Unix variant), Linux. |
Connection structure | Ω-switch |
Compilers | XL Fortran (Fortran 90), (HPF), XL C, C++ |
Vendors information Web page | www-1.ibm.com/servers/eserver/pseries/hardware/highend/p690.html |
Year of introduction | 2002 (16/32-CPU POWER4+ SMP). |
System parameters:
Model | eServer p690 |
---|---|
Clock cycle | 1.9 GHz |
Theor. peak performance | |
Per Proc. (64-bits) | 7.6 Gflop/s |
Per 16-proc. HPC node | 121.6 Gflop/s |
Per 32-proc. Turbo node | 243.2 Gflop/s |
Maximal | 124.5 Tflop/s |
Main memory | |
Memory/node | ≤ 1 TB |
Memory/maximal | 512 TB |
No. of processors | 8--16,384 |
Communication bandwidth | |
Node-to-node (bidirectional) | 2 GB/s |
Remarks:
The eServer p690 is the successor of the RS/6000 SP. It retains much of the
macro structure of this system: multi-CPU nodes are connected within a frame
either by a dedicated switch or by other means, like switched Ethernet. The
structure of the nodes, however, has changed considerably, see the POWER4+
The so-called Federation switch is the fourth generation of the
high-performance interconnects made for the p690 series. The Federation switch
is, like its predecessors, an Ω-switch as described in the section on SM-MIMD systems. It has a bi-directional link speed of 2
GB/s and an MPI latency of 5—7 µs. Although we mentioned only the
highest speed option for the communication, the high-performance switch, there
is a wide range of other options that could be chosen instead, e.g., Gbit
Ethernet is also possible.
Applications can be run using PVM or MPI. IBM used to support High Performance
Fortran, both a proprietary version and a compiler from the Portland Group. It
is not clear whether this is still the case. IBM uses its own PVM version from
which the data format converter XDR has been stripped. This results in a lower
overhead at the cost of generality. Also the MPI implementation, MPI-F, is
optimised for the p690-based systems. As the nodes are in effect shared-memory
SMP systems, within the nodes OpenMP can be employed for shared-memory
parallelism and it can be freely mixed with MPI if needed. In addition to its
own AIX OS IBM also supports some Linux distributions: both the professional
versions of RedHat and SuSe Linux are available for the p690 series.
The standard commercial models that are marketed contain up to 128 nodes. However, on special request systems with up to 512 nodes can be built. This largest configuration is used in the table above (although never a system of a size exceeding 128 nodes has been sold yet). A POWER5-based system p690 system might come onto the market soon but no definite plans in this direction are known.
Measured Performances:
In [42] a performance of 6188 Gflop/s for
a 1600 processor system with the slightly slower 1.7 GHz variant of the
processor is reported for solving a dense linear system of order N =
355,000 yielding an efficiency of 57%. A system with 8 Turbo nodes was reported
to obtain a speed of 737 Gflop/s out of 1331 Gflop/s on a linear system of size
285,000, an efficiency of 55%. As this type of application primarily operates
from the L1 cache, the more or less similar efficiencies are as expected.