In this method we again use the fastest sequential algorithm to identify the clusters in the sublattice on every processor. Each processor then looks at the labels of sites along the edges of the neighboring processors in the positive directions, and works out which ones are connected and should be matched up. These lists of ``equivalences'' are all passed to one of the processors, which uses an algorithm for finding equivalence classes [Knuth:68a], [Press:86a] (which, in this case, are the global cluster labels) to match up the connected clusters. This processor then broadcasts the results back to all the other processors.
Figure 12.28: Speedups for Global Equivalencing Algorithm
This part of the algorithm is purely sequential, and is thus a potentially disastrous bottleneck for large numbers of processors. It also requires this processor to have a large amount of memory in which to store all the labels from every other processor. The amount of work involved in doing the global matchup is proportional to P times the perimeter of the sublattice on each processor, or so that the efficiency should be less than for self-labelling; although, we might still expect reasonable speedups if the number of processors is not extremely large. The speedups obtained on the Symult 2010 for a variety of lattice sizes are shown in Figure 12.28. The figure for on 128 processors is missing due to memory constraints. Global equivalencing gives about the same speedups as self-labelling for small numbers of processors, but as expected self-labelling does much better as the number of processors increases.