next up previous contents
Next: 4 The Parallel IFS Up: No Title Previous: 2 Cluster Efficiency

3 Cluster computing assumptions

We consider an architecture consisting of Cray C90 systems that are connected by one or more Hippi channels. A C90 system contains up to 16 computing nodes with a shared memory. The Hippi channel provides a high raw data rate of 100 Mbyte/s at maximum. For realistic measurements, we considered the TCP/IP protocol and an appropriate C-environment. Using a single Hippi channel, 50 Mbyte/s for sustained data rate of very long messages and a start-up time of 0.46 ms have been measured on existing C90/Hippi configurations for inter-system transfer in the case of saturation (bidirectional ping-pong between 8 pairs of communicating processors [5, 7]). Since such systems were not implemented to show maximum performance on the Hippi, there might be some margin for improvement. Nevertheless, our consideration is based on the values measured. However, a single pair of communicating processes is not able to exhaust the Hippi transfer capacity. Only 3 or 4 pairs of communicating processes can achieve the above performance.

The intra-system communication within a C90 system is very fast. In this case, the PARMACS version 5 was used. We achieved a performance which was faster than that of the inter-system communication via the Hippi by more than one order of magnitude.

To simplify our case study, we considered fully connected systems only, i.e. each pair of C90 systems is connected by a separate Hippi channel. Two systems, for example, are connected by a single Hippi, 4 systems are connected by 6 Hippi channels.

Though the Hippi channel is at present the strongest possibility of connecting supercomputers, it is clearly the weakest part of the interconnection system for the C90 computing nodes. Therefore, it seems to be useless to investigate incomplete C90 systems. As usual, we use C916 to denote such systems consisting of 16 computing nodes. To minimise the worklaod of the Hippi, the structure of a cluster has to be considered when an application problem is partitioned and mapped onto the computing nodes. Subsets of application processes should be mapped onto the same system if they communicate very frequently .


next up previous contents
Next: 4 The Parallel IFS Up: No Title Previous: 2 Cluster Efficiency

top500@rz.uni-mannheim.de
Tue May 28 14:38:25 PST 1996