================================================================== === === === GENESIS / PARKBENCH Parallel Benchmarks === === === === RINF1 === === === === R-infinity and N-half === === === === Versions: Std F77 === === === === Author : Roger Hockney === === Department of Electronics and Computer Science === === University of Southampton === === Southampton SO9 5NH, U.K. === === fax.:+44-703-593045 e-mail:firstname.lastname@example.org === === email@example.com === === === === Last update: November 1993 === === === ================================================================== 1. Description -------------- The performance of vector operations on a processor can be characterised by two parameters: the asymptotic performance, R-infinity (RINF), and the half-performance length, N-half (N1/2). R-infinity is the asymptotic performance obtained as the vector length tends to infinity. For finite vector lengths this maximum performance will not be realised due to the start-up time associated with vector operations. One useful method of parameterizing this start-up time is by the use of N-half which corresponds to the vector length which gives exactly half of the asymptotic performance. The use of vectors whose length is less than N-half will result in significant loss in performance. The performance, R, for a vector of length N is given by: R = R-infinity / [ 1 + (N-half/N) ] (1) The execution time, T, for a vector of length N is: T = (N + N-half) / R-infinity (2) In this benchmark N-half and R-infinity are derived from a least-squares fit of time against vector length. The value of N-half will vary with different vector operations. Seventeen different tests are incorporated for different expressions which could potentialy be vectorized by a compiler. The examples are selected to be useful in the assessment of both architectures and compilers. The values of R-infinity & N-half will depend on the operations being performed and also on the size of the cache memory. The summary of best values, which appears in the benchmark output give values for the parameter pair (RINF,N1/2) for vector lengths that fit into the cache memory and for those that exceed the cache memory. 2. Operating Instructions ------------------------- This benchmark assumes by default that the maximum vector length is 100,000. Change the parameters NNMAX if this is not suitable. It is also advisable to check the number of iterations and to adjust this if necessary in accordance with the clock tick. NITER = 1000 if tick is 1.0E-5 sec NITER = 100000 if tick is 1.0E-3 sec All parameters are to be found in the include file `rinf1.inc'. To compile and link the benchmark type: `make' . If you set 'XDIR=.' in the Makefile to put the executable in the current directory, you will get an Fatal error: failed to target 'rinf1'. Ignore this, the executable rinf1 is created and can be used. On some systems it may be necessary to allocate the appropriate resources before running the benchmark, eg. on the iPSC/860 to reserve a single processor, type: getcube -t1. To run the benchmark type: rinf1 Output from the benchmark is written to the file "rinf1.res". Copy this to another file to save it. If NITER=10000 RINF1 will take about 2 minutes to run on a typical workstation. For accurate results with NITER=100,000 allow 15 to 20 minutes. 3. Interpretation of Results ---------------------------- Low-level benchmarks like RINF1 are trying to represent, for each kernel, some 50 data sets (the vector lengths) by two performance parameters (R-infinity and N-half). The times to be measured are also very short, and if the repeat number NITER is not large enough for the timer being used, nonesense values for the time of execution will give nonesense values for the parameters. It really is a case of garbage-in gives garbage-out. Interference from other users can also give a large scatter to the input times and give unsatisfactory results. Compared with an application benchmark that only requires the measurement of a long time interval comparable to a second or minute, for perhaps only three input data sets, without any effort to fit the time to a model, the interpretation of data from low-level benchmarks is incomparably more difficult. Good results are not to expected from such benchmarks unless they are carried out with care and interpreting with good sense. The summary table of results at the end of the output is an attempt to pick automatically from the mass of measurements the best value of the parameter pair (RINF,N1/2) for in-cache values (reported first) and out-of- cache values (reported second). The summary line states the vector lengths that have been used to obtain these values. If the summary look silly, and perhaps in any case, one should also examine the detailed output, because the automatic selection cannot be expected always to work satisfactorily. These are our recommendations for interpreting the detailed results. For each of the 17 kernels (DO loops): (1) Examine the TOTAL TIME column of the output and ensure that this is at least 100 times the measured tick of your timer. If not increase NITER by a factor 10. The run will now take longer, but the timing results should show much less scatter. (2) Examine the values in the time column TI, this is the time per vector operation as a function of the vector length in the column headed NI. If TI is not a monotonically increasing function of NI that is roughly linear, then the (rinf,nhalf) parameters are not appropriate, and this benchmark will not make sense. Therefore plot TI against NI and see what it looks like. If there is a lot of scatter, then increase NITER and rerun. If it is reasonably smooth but not at all linear do something else. If it is approximately linear then the columns headed RINF and N1/2 should have stable values that do not change much as the vector length increases. This is what one is looking for, and such stable values are the ones to be reported. The column headed PCT ERROR gives the root mean square deviation of the line from the measured points with the parameters derived from the data points, expressed as a percentage of the last value of TI. Values up to a few percent indicate that the straight-line fit is good and that the (RINF,N1/2) values are reliable. Values greater than, say, 20 percent indicate that the approximation is poor and the parameters should be used with caution, if at all. Bear in mind also that values of N1/2 are added to N in equation (2), and divided by N in equation (1), thus large values and variations in N1/2 may in fact be insignificant and unimportant when the value of N itself is large. They do not necessarily indicate an unsatisfactory result. (3) It is important to understand the meaning of the values in the columns RINF and N1/2. The vector lengths are run through in the order printed and as a new time of execution is obtained for the next vector length, updated values of RINF and N1/2 are computed. That means that the values printed on one line are the best least squares fit of a straight line to all data computed up to this time (i.e. all NI, TI pairs appearing on this and all previous lines, but not of course from any later lines). The first line (SI=1) provides only one point and does not define a straight line, so RINF=N1/2=0 is printed, meaning not enough information to compute values. By SI=2 there are two points and a straight line is defined together with the values of RINF and N1/2. The fit is exact and the ERROR column records correctly zero. As each new point is computed, RINF and N1/2 is updated, with the best least-squares straight line. For small vector lengths, and perhaps inaccurate timer, values of RINF and N1/2 may wave around and even become negative. This does not matter provided the values stabilise for longer vector lengths. It probably means that NITER was taken too small. Apart from the effects of cache, discussed next, the best values of RINF and N1/2 should be the last ones recorded for the longest vector, because this straight line uses all the previous data values. (4) The presence of a data cache complicates the picture considerably by increasing the execution times significantly once the vector length exceeds the cache or paging size, when references to off-chip memory are required. This shows up by driving the value of N1/2 negative, which is correct and only means that the best straight line intercepts the positive x-axis. In the sample results shown in StdRes directory, RINF and N1/2 have stabilised before this point, and these values are the in-cache measurement. The automatic selection procedure tries to pick these values and prints them in the summary table. This trip point where N1/2 goes negative is marked in the detailed output by 'PCT ERROR' being set to 222.2. The selected value is taken three measurements before this point. The least-squares fit is then reset, and a separate best straight line is obtained for longer vectors exceeding the cache size. This provides a second pair of (RINF,N1/2) values for vectors longer than some stated value in the summary table. This value is 4 measurement points past the trip point, in order to avoid using points in the transition region. It must be obvious from the above that sensible results will only be obtained from RINF1 if the benchmark is run sensibly (using e.g. a correct value for NITER), and the results are interpretted with care and understanding. It is easy to misuse this benchmark and produce rubbish results. It is therefore easy to "rubbish" the benchmark if one wishes to do so, however it delivers good understanding of the behaviour of the basic hardware (and the software through which it is used) when it is used properly. 4. Negative values of RINF and N1/2 ----------------------------------- It is often supposed that negative values for RINF and N1/2 are meaningless and therefore bring the benchmark into disrepute. This shows a misunderstanding of the parameters: RINF and N1/2 should be thought of as two parameters that determine, respectively, the inverse slope and the negative intercept on the x-axis, of a straight line. They are used in equations (1) and (2) to determine the performance, R, or time, T, as a function of vector length, N. Whereas neither R, T nor N can by their very nature be negative, there is no reason why in certain circumstances RINF and N1/2 cannot be negative. Such negative values can appear for small values of N with an inaccurate timer, and should generally be ignored, provided later values stabilise. Negative values of N1/2 are quite usual and correct for out-of-cache measurements. Negative RINF would imply that larger problems execute in less time, and this would not be expected, but there may be such cases. In fact the benchmark traps negative RINF with negative N1/2 as indicating poor input data, rejects such data and restarts the least squares fit. This action is signalled in the output by a value of PCT ERROR being 111.1. The only statement that we can say with certainty is that R, T and N computed from equations (1) and (2) cannot be negative. $Id: ReadMe,v 1.4 1994/05/27 15:32:57 igl Exp igl $
Submitted by Mark Papiani,
last updated on 10 Jan 1995.