Table 6.1: Timing Results in Seconds for a 512-processor and a
1-processor nCUBE-1. The values and represent the numbers
of grid points per processor in the x and y directions. The
concurrent efficiency, overhead, and speedup are denoted by ,
f, and S.
The code was timed for the Kelvin-Helmholtz problem for hypercubes with dimension ranging from zero to nine. The results for the 512-processor case are presented in Table 6.1, and show a speedup of 429 for the largest problem size considered. Subsequently, a group at Sandia National Laboratories, using a modified version of the code, attained a speedup of 1009 on a 1024-processor nCUBE-1 for a similar type of problem [Gustafson:88a]. The definitions of concurrent speedup, overhead, and efficiency are given in Section 3.5.
An analytic model of the performance of the concurrent algorithm was developed, and ignoring communication latency, the concurrent overhead was found to be proportional to , where n is the number of grid points per processor. This is in approximate agreement with the results plotted in Figure 6.3, that shows the concurrent overhead for a number of different hypercubes dimensions and grain sizes.