================================================================== === === === GENESIS Distributed Memory Benchmarks === === === === PDE2 === === === === 2-Dimensional Multi-grid Poisson Solver === === === === Versions: Std F77, PARMACS, Subset HPF, === === PVM 3.1 === === === === Original authors: R. Hempel, A. Schueller, === === M. Lemke === === Modified by: J. Klose === === PARMACS macros: Clemens-August Thole === === Subset HPF: Bryan Carpenter === === PVM: Ian Glendinning === === === === Inquiries: HPC Centre === === Computing Services === === University of Southampton === === Southampton SO17 4BJ, U.K. === === === === Fax: +44 703 593939 E-mail: support@par.soton.ac.uk === === === === Last update: Jun 1994; Release: 3.0 === === === ================================================================== 1. Description -------------- This benchmark solves a 2D Poisson equation using a multigrid method. A mixture of fine and coarse grids are used to accelerate the solution process. Multigrid methods are important as they are one of the fastest methods for the solution of systems of equations originating from the discretization of partial differential equations. The multigrid kernel is an important solver for the solution phase in the numerical treatment of partial differential equations. Many problems in the area of scientific computing are formulated in terms of partial differential equations. Typical application areas are Computational Fluid Dynamics, Meteorology, Climate Research, Oil Reservoir simulation and others. The resulting PDEs are discretized on some grid structure. The PDE is then represented by a large set of (non)linear equations, each of which couples values at neighbouring grid point with each other. For time-dependent problems, this set of equations has to be determined (integration phase) and solved (solution phase) at each time step. The parallelization is performed by grid splitting. A part of the computational grid is assigned to each processor. After each computational step, values at the boundary of the subgrids are exchanged with nearest neighbours. For simplicity, in this benchmark coarse grids are only used up to the level where each node contains only one interior gridpoint. At this level 10 relaxation steps are used to solve the two-dimensional problem on the coarsest grid. 2) Operating Instructions ------------------------- The sequential version automatically produces results for a range of problem sizes. The problem size is determined by the grid size, which is related to the parameter N. The number of grid points in each direction is 2**N + 1, giving (2**N + 1)**2 points in 2 dimensions. The parameter N is varied from 3 to MMAX within the benchmark and the benchmark performance calculated for each resulting problem size. The value of MMAX can be changed by editing the PARAMETER statement in the file pde2.inc. The maximum value of MMAX which is consistent with the available processor memory should be chosen. The memory required for array storage is approximately { 3 * (2**MMAX + 1)**2 } * 8 bytes, which gives the following table: MMAX Approx Memory required (Mbyte) 8 1.6 9 6.4 10 27.0 11 101.0 For the largest problem size the relative sizes of the grid in each dimension is varied whilst keeping the overall problem size constant. This variation in the shape of the grid for the same problem size can increase performance by allowing more efficient vectorization. To achieve a given accuracy in the timing measurements the number of multigrid cycles timed by the benchmark is specified by the input parameter NITER. This should be chosen so that the benchmarked time for NITER cycles on the smallest problem size is at least 100 times the clock resolution . (If the clock resolution is unknown this can be determined using the TICK1 benchmark). For larger problem sizes, the value of NITER is automatically reduced (subject to a minumum value of 10) to maintain the overall benchmarked time constant for each problem size. Compiling and running the sequential benchmark: 1) Change value of MMAX in file pde2.inc, if appropriate, to give maximum problem size compatible with the available memory. (see above) 2) To compile and link the benchmark type: make slave 3) To run the benchmark type: pde2 4) Input NITER, the number of multigrid cycles (suggested value 160) Output from the benchmark is written to the file "result" B) Distributed Version In the distributed version of the program the problem size and the number of processors are input from the standard input on channel 5. The problem size is proportional to the total grid size, which is determined by the input parameter NN. The number of grid points in each direction is 2**NN + 1, giving (2**NN + 1)**2 points in 2 dimensions. The number of processors over which the lattice is distributed is determined by the input parameter LOGP which is the log to base 2 of the required number of processors, ie. number of processors = 2**LOGP. The specified number of processors is configured as a 2D grid internally within the program. The size of the local lattice determines the size of the workspace required in the node program. The size of this workspace is determined by a PARAMETER statement in the file node.u of the form: PARAMETER (NWORKD = 300000) The size of NWORKD should be changed if necessary to ensure that it is greater than or equal to 4 * (2**NN + 4)**2/(2**LOGP) The maximum size of NWORKD, and hence of the local lattice size, is constrained by the available node memory. The node memory required is approximately NWORKD * 8 bytes. Suggested Problem Sizes : It is recommended that the benchmark is run with four standard problem sizes, given by NN = 8, 10, 12 and 13. Note that it may not be possible to run the largest problem size on all machines because of restrictions on the available memory. The approximate total memory required for array storage is given by the following table: NN Approx value of 4*(4+2**NN)**2 Approx Memory required (Mbyte) 8 .3 * 10**6 2.4 10 4.3 * 10**6 35 12 68 * 10**6 544 13 270 * 10**6 2160 To find the minimum node memory required to run each problem size, the total memory required should be divided by the number of processors on which the benchmark is run. The number of processors to be used will obviously depend on the system available. The most important measurement is likely to be for the largest power of two that will fit in to the machine. If time permits the variation of performance with number of processors should be investigated by reducing the number of processors by successive factors of two or four. As for the sequential version the accuracy of timing may be adjusted through the parameter NITER. Compiling and running the distributed benchmark: 1) Change value of NWORKD in file node.u, if appropriate, to give maximum work space compatible with the available memory. (see above) 2) To compile and link the benchmark type: `make' 3) To run the benchmark type: pde2 4) Input parameters NN, LOGP, NITER on standard input. Output from the benchmark is written to the file "pde2.res" $Id: ReadMe,v 1.4 1994/06/28 11:35:26 igl Exp igl $

Submitted by Mark Papiani,

last updated on 10 Jan 1995.