================================================================== === === === GENESIS Distributed Memory Benchmarks === === === === TRANS1 === === === === Matrix Transpose (Grid) === === === === Versions: Std F77, PARMACS, Subset HPF, === === PVM 3.1 === === === === Original author: James Allwright === === PARMACS + Subset HPF: Vladimir Getov === === PVM: Ian Glendinning === === === === Inquiries: HPC Centre === === Computing Services === === University of Southampton === === Southampton SO17 4BJ, U.K. === === === === Fax: +44 703 593939 E-mail: support@par.soton.ac.uk === === === === Last update: Jun 1994; Release: 3.0 === === === ================================================================== 1. Description -------------- The benchmark assumes that the matrix to be transposed is square. The matrix is decomposed into a grid of sub-matrices. Hence this program must be run on a number of processes which is a perfect square. 2. Operating Instructions ------------------------- Changing problem size and numbers of processes: The following parameters are specified by PARAMETER statements in the file "trans1.inc" : Number of processes (N), maximum matrix side on each process (MAXD), and the number of times the transpose is to be repeated (LOOPS). Suggested Problem Sizes : The experience accumulated so far shows that for small number of processes (up to 36) the values in the table below are suitable. -------------------------------------- | SIDE | No. of processes - N | D | -------------------------------------- | 1 | 1 | 960 | | 2 | 4 | 480 | | 3 | 9 | 320 | | 4 | 16 | 240 | | 5 | 25 | 192 | | 6 | 36 | 160 | -------------------------------------- For larger number of processes the problem size should also be larger by increasing the parameter MAXD. Compiling and Running The Benchmark: 1) Choose problem size and number of processes by editing the PARAMETER statement in the file `trans1.inc'. 2) To compile and link the benchmark type: `make' for the distributed version or `make slave' for the single-node version. 3) If any of the parameters in the include files are changed, the code has to be recompiled. The make-file will automatically send to the compiler only affected files. 4) On some systems it may be necessary to allocate the appropriate resources before running the benchmark, eg. on the iPSC/860 to reserve a cube of 8 processors, type: getcube -t8 5) To run either sequential or distributed version of the benchmark, type: trans1 A permanent copy of the benchmark output is written to a file called 'trans1.res'. $Id: ReadMe,v 1.3 1994/07/01 17:19:36 igl Exp igl $
Submitted by Mark Papiani,
last updated on 10 Jan 1995.