Next: 5 Main result Up: No Title Previous: 3 Cluster computing assumptions

4 The Parallel IFS - Partitioning Concepts

Each time step of the IFS consists of computations in three different spaces (grid point space, Fourier space and spectral space) and has the following order:

computations in grid point space,
transition to Fourier space by Fast Fourier Transforms,
computations in Fourier space,
transition to spectral space by Legendre transforms,
computations in spectral space (phase I),
computations in spectral space (phase II),
transition to Fourier space by inverse Legendre transforms,
computations in Fourier space and
transition to grid point space by inverse Fourier transforms.

In each of these computational steps complex data dependencies exist in exactly one dimension while the other two dimensions are freely available for parallelisation and vectorisation. Unfortunately, the independent directions vary among the different computational steps. In grid point space, for example, these data dependencies are in the vertical direction without any influence of longitudes and latitudes, while in Fourier space, the data dependencies refer to the longitudinal dimension.

So, in each computational step a partitioning of the data and the computations to processes is possible with respect to two dimensions, providing a high degree of parallelism. The data transposition strategy [1, 4] allows to utilise this approach efficiently on parallel systems. Its idea is to re-distribute the complete data to the processes (data transposition) at some stages of the algorithm such that the arithmetic computations between two consecutive transpositions can be performed without any interprocess communication: during the corresponding computational step the direction with data dependencies is not partitioned among processes.

Figure 1 shows the structure of the parallel algorithm: here, the data are partitioned among 3 X 4 processes. Transpositions are performed in each data space in order to allow a fully parallelised computational phase without any interprocess communication. The dashed cells in Fourier and spectral space illustrate the data partitioning before and after the Legendre transforms; in Fourier space each process is responsible for two of the indicated process columns (which are in general not adjacent) in order to provide load balanced Legendre transforms.

The usual expression for the size of the IFS data structure is TM Lz, where M is the wave number and z is the number of levels. We investigate TM L31 with M=63, 106, 213. The standard relation between M and the number of grid points in direction of longitudes or latitudes is tex2html_wrap_inline1908 and tex2html_wrap_inline1910 (cf. Figure 1). For the present paper we consider the RAPS version 2.0 with full grid and Eulerian treatment of advection. The finest grid size corresponds to a 60 km network on the equator.

For the mapping of the process structure onto the computing nodes we used a simple mapping function called column-mapping. Here the columns of the process structure are mapped onto the columns of the cluster structure one after the other. Depending on the size of these columns, more than one process column might be mapped onto one cluster column or one process column might be split to several cluster columns. Let tex2html_wrap_inline1924 and tex2html_wrap_inline1926 denote the elements of the tex2html_wrap_inline1870 -process structure or the tex2html_wrap_inline1930 -cluster structure respectively. The -cluster structure represents a cluster of tex2html_wrap_inline1934 C90 systems each of which consists of tex2html_wrap_inline1936 computing nodes. Then the column-mapping is defined by tex2html_wrap_inline1938 . We refer to Figure 2 as an example of this mapping.

Next: 5 Main result Up: No Title Previous: 3 Cluster computing assumptions

top500@rz.uni-mannheim.de
Tue May 28 14:38:25 PST 1996