================================================================== === === === GENESIS Distributed Memory Benchmarks === === === === PDE1 === === === === 3-Dimensional Poisson Solver Using Red-Black Relaxation === === === === Versions: Std F77, PARMACS, PVM 3.1, === === Subset HPF === === === === Original authors: J. Klose, M. Lemke === === Subset HPF: Vladimir Getov === === SOR version by: Ivan Wolton === === === === Inquiries: HPC Centre === === Computing Services === === University of Southampton === === Southampton SO17 4BJ, U.K. === === === === Fax: +44 703 593939 E-mail: support@par.soton.ac.uk === === === === Last update: June 1994; Release: 3.0 === === === ================================================================== 1. Description -------------- The benchmark solves the Poisson-Equation on a 3-dimensional grid. The PARMACS and PVM versions use parallel red-black succesive over relaxation (SOR) with Chebyshev acceleration. The sequential and HPF versions do not yet implement the SOR algorithm, but will be upgraded when time permits. Many problems in the area of scientific computing are formulated in terms of partial differential equations. Typical application areas are Computational Fluid Dynamics, Meteorology, Climate Research, Oil Reservoir simulation and others. The resulting PDEs are discretized on some grid structure. The PDE is then represented by a large set of (non)linear equations, each of which couples values at neighbouring grid point with each other. For time-dependent problems, this set of equations has to be determined (integration phase) and solved (solution phase) at each time step. This benchmark is an extreme example of the class of PDE solvers, as due to the simplicity of the discretization of Poisson's equation the number of floating point operations per gridpoint is quite small relative to more complex PDEs. The ratio of computation to communication is thus rather low. The parallelization is performed by grid splitting. A part of the computational grid is assigned to each processor. After each computational step, values at the boundary of the subgrids are exchanged with nearest neighbours. 2) Operating Instructions ------------------------- A. Sequential version The sequential version automatically produces results for a range of problem sizes. The problem size is determined by the grid size, which is related to the parameter N. The number of internal grid points in each direction is 2**N, giving 2**3N internal points in 3 dimensions. There are also 2 boundary points in each direction but no work is done at these points and so the problem size is dependent only on the number of internal grid points. The parameter N is varied from 3 to MMAX within the benchmark and the benchmark performance calculated for each resulting problem size. The value of MMAX can be changed by editing the PARAMETER statement in the file pde1.inc. The maximum value of MMAX which is consistent with the available processor memory should be chosen. The memory required for array storage is approximately { 2 * (2**MMAX + 2)**3 } * 8 bytes, which gives the following table: MMAX Approx Memory required (Mbyte) 5 0.6 6 4.6 7 35. 8 274. For the largest problem size the relative sizes of the grid in each dimension is varied whilst keeping the overall problem size constant. This variation in the shape of the grid for the same problem size can increase performance by allowing more efficient vectorization. To achieve a given accuracy in the timing measurements the number of relaxations timed by the benchmark is specified by the input parameter NITER. This should be chosen so that the benchmarked time for NITER cycles on the smallest problem size is at least 100 times the clock resolution . (If the clock resolution is unknown this can be determined using the TICK1 benchmark). For larger problem sizes, the value of NITER is automatically reduced (subject to a minumum value of 5) to maintain the overall benchmarked time constant for each problem size. Compiling and running the sequential benchmark: 1) Change value of MMAX in file pde1.inc, if appropriate, to give maximum problem size compatible with the available memory. (see above) 2) To compile and link the benchmark type: `make' for the distributed version or `make slave' for the single-node version. 3) To run the benchmark type: pde1 4) Input NITER, the number of relaxations (suggested value 320) Output from the benchmark is written to the file "result" B) Distributed Version In the distributed version of the program the problem size, the number of processors, and the number of relaxations are input from the standard input on channel 5. The problem size is proportional to the total grid size, which is determined by the input parameter NN. The number of internal grid points in each direction is 2**NN, giving 2**3NN points in 3 dimensions. The number of processors over which the lattice is distributed is determined by the input parameter LOGP which is the log to base 2 of the required number of processors, ie. number of processors = 2**LOGP. The specified number of processors is configured as a 3D grid internally within the program. The recommended minimum number of iterations, NITER, supplied as input is 10. For the smaller problem size this may need to be increased if the total runtime is small compared with the clock resolution, however on the larger problem sizes increasing NITER will result in unneccesarily long run times. The size of the local lattice determines the size of the workspace required in the node program. The size of this workspace is determined by a PARAMETER statement in the file node.u of the form: PARAMETER (NWORKD = 160000) The size of NWORKD should be changed if necessary to ensure that it is greater than or equal to (2**NN + 4)**3/(2**LOGP), a warning message is printed if this condition is not satisfied and the program halts. The maximum size of NWORKD, and hence of the local lattice size, is constrained by the available node memory. The node memory required is approximately 3 * NWORKD * 8 bytes. Suggested Problem Sizes : It is recommended that the benchmark is run with four standard problem sizes, given by NN = 6, 7, 8 and 9. The optimum SOR relaxation parameter for each problem size is calculated within the program. The rate of convergence of the solution is theoretically dependent only on the problem size, after a certain number of iterations. The radius of convergence is printed in the results file and can be used to check correct operation of the code. A little more verification is needed on this feature but details should be available shortly. Contact the benchmark distributor for more information on this. Note that it may not be possible to run the largest problem size on all machines because of restrictions on the available memory. The approximate total memory required for array storage is given by the following table: NN Approx value of (2**NN+4)**3 Approx Memory required (Mbyte) 6 .32 * 10**6 8 7 2.3 * 10**6 56 8 18 * 10**6 430 9 138 * 10**6 3300 To find the minimum node memory required to run each problem size, the total memory required should be divided by the number of processors on which the benchmark is run. The number of processors to be used will obviously depend on the system available. The most important measurement is likely to be for the largest power of two that will fit in to the machine. If time permits the variation of performance with number of processors should be investigated by reducing the number of processors by successive factors of two or four. Compiling and running the distributed benchmark: 1) Change value of NWORKD in file node.u, if appropriate, to give maximum work space compatible with the available memory. (see above) 2) To compile and link the benchmark type: make 3) To run the benchmark type: pde1 4) Input parameters NN, LOGP, NITER on standard input. Output from the benchmark is written to the file "result" $Id: ReadMe,v 1.5 1994/06/28 11:38:21 igl Exp igl $

Submitted by Mark Papiani,

last updated on 10 Jan 1995.