================================================================== === === === GENESIS / PARKBENCH Distributed Memory Benchmarks === === === === COMMS2 === === === === Message Exchange === === === === Version: PARMACS === === === === Original Author: Roger Hockney === === Department of Electronics and === === Computer Science === === University of Southampton === === Southampton SO17 1BJ, UK === === === === Modifications by: Ian Glendinning === === Southampton HPC Centre === === Computing Services === === University of Southampton === === Southampton SO17 1BJ, UK === === === === fax.:+44-703-593939 e-mail:support@uk.ac.soton.par === === === === Last update: Jun 1994; GENESIS Release: 3.0 === === === ================================================================== 1. Description ----------- This benchmark measures the message exchange properties of a computer network, and makes use of bidirectional links if they are present. A pair of nodes send a message of varying length, n, to each other and then wait to receive the message from the other of the pair. One quarter of the time for this exchange is recorded as the time, t, to send a message, because four messages are sent during the exchange. This time is fitted by least-squares to the straight-line relation: t = (n + nhalf) / rinf (1) where rinf = the asymptotic stream rate (Byte/s), and nhalf = the message length (Byte) giving half the asymptotic performance This corresponds to an average performance, r, as a function of message length, n, rinf r = ------------- (2) (1 + nhalf/n) In the above formula rinf is the asymptotic stream rate to use with the value of nhalf in order to calculate the average bandwidth. For short messages the values of rinf may be high but they will not be achieved because of the effect of nhalf via equation (2). The benchmark is restricted to asynchronous communication. Asynchronous, here, means that a send returns to the calling program when the user data array being sent may be safely reused. This, however, may be before the message has been received by the receiving node. The receiving node program stops (i.e. blocks) until the data is available for use by the user's program. 2. Operating Instructions ---------------------- To expand the PARMACS macros, compile and link the code with the appropriate libraries, enter the directory d77 and type: make To compile and link the code with the appropriate libraries for PVM, enter the directory pvm3 and type: make On some systems it may be necessary to allocate the appropriate resources before running the benchmark, eg. on the iPSC/860 to reserve a cube of 2 processors, type: getcube -t2 The message length of each test is defined by a file called 'msglen.def'. If you wish to obtain a benchmark result for comparison with results from other machines, you should use the standard version of msglen.def provided with this release. Alternatively, if you wish to investigate the detailed variation of communication speed with message length for your particular machine, you can edit the file before running the benchmark. The format is one integer value per line, each defining the message length of a test case. The values should be in ascending order. You can specify any number of values, up to a compile time limit specified by the parameter MAXTST, which is defined in the file 'comms2.inc'. To run the benchmark executable, type: comms2 This will automatically load both host and node programs. The host program will then prompt you for answers to various questions, before the node code actually performs the benchmarking. First you must say how many node processors you wish to allocate. The default is 2, but you may allocate any number up to a maximum defined by the compile-time parameter MAXNOD, declared in the file comms2.inc. If you choose more than 2 processors, you are asked which slave node you want the master (node 1) to communicate with. This option can be used to study the time variation with separation within the network. Many message-passing computers have different timing for short and long messages, and you are next asked to enter the number of bytes in the longest short message, or zero if there is no difference between short and long messages. If you specify a non-zero value, the program will automatically add test cases for the longest short message and the shortest long message, if they are not already defined in msglen.def. Next you must say whether or not you wish timings for zero length messages to used in computing least squares fits to the data. This is useful since such timings can be anomalous. Finally you must specify the approximate measurement time that you want for each test case (with different message lengths). The actual number of times a message is ping-ponged for each case is calculated to give approximately that execution time. This means that, for any particular system, you can ensure each test is run for long enough to average out disturbances caused by spurious operating system effects. It also means that you have direct control over the total time the benchmark will take to run. When you have answered the above questions, the program proceeds to make estimates of the loop overhead and communication parameters. It uses these to calculate the number of ping-pongs needed for each test case, to obtain the requested execution time per test. It should be noted that the loop overhead is re-measured for each test, and that the measurement takes approximately the same time as the ping-pong part of the test, so the total elapsed time for each test case is actually about twice the specified execution time. Once the timing parameters have been estimated, the benchmark test cases are executed. To enable their progress to be followed, a line is written to the standard output, showing the test number and message length, when each test starts. When it finishes, the measurement of the time taken to send one message is written out, together with the number of iterations the test used. A permanent copy of the full benchmark results is written to a file called 'comms2.res'. If the run is successful and a permanent record is required, this file should be copied to another file before the next run overwrites it. A file called 'plot.dat' is also written, which contains the raw timing data in the format required by Gnuplot to plot a graph of time per message versus message length. Each line in the file contains the message length for one test case, and the time it took to send one message of that length. 3. Modifications ------------- The latest version of this benchmark differs from the one in GENESIS release 2.3 in that it uses separate arrays as arguments to the send and receive operations. Previously a single array had been used for both, meaning there was a potential memory bottleneck. This means that the new version makes a purer measurement of the machine's communication capabilities, but that measurements taken with it will not necessarily agree with results obtained using the old version. The bandwidth obtained with the new version should be greater than or equal to that of the old one. The major improvement to this benchmark introduced in release 2.3 was to turn it into a fixed-time benchmark. The code was modified to calculate the number of iterations for each test case, so as to make the run time of each test approximately the same. The benchmarker can specify the approximate execution time of each test interactively. The program was also modified to take its message lengths for each test case from the file 'msglen.def' rather than the lengths being hard coded. If there is a difference between short and long messages, tests are automatically run for the cases of the longest short message and the shortest long message, if the user has not explicitly asked for them in msglen.def. Extra output is now produced. Message length and timing data is written to the file 'plot.dat' in the format expected by gnuplot, and summary information is printed on the terminal during execution as each test case starts and finishes. The benchmarker now has the option of ignoring zero length messages in least squares fits. In addition to the above changes in functionality, there has been considerable tidying up and restructuring of the code. $Id: ReadMe,v 1.4 1994/06/24 11:13:51 igl Exp igl $
Submitted by Mark Papiani,
last updated on 10 Jan 1995.