Next: 11.3 Physical Optimization Up: 11 Load Balancing and Previous: 11.1.9 Conclusions

Applications and Extensions of the Physical Analogy

In [Fox:86a;92c;92h;93a], we point out some interesting features of the physical analogy and energy function introduced in Section 11.1.4.

Suppose that we are using the simulating annealing method of Section 11.1.4 on a dynamically varying system. Assume that this annealing algorithm is running in parallel in the same machine on which the problem executes. Suppose that we use a (reasonably) optimal annealing strategy. Even in this case, the ``heatbath'' formed by load balancer and operating system can only ``cool'' the problem to a minimum temperature . At this temperature, any further gains from improved decomposition by lowering the temperature will be outweighed by time taken to perform the annealing. This temperature is independent of performance of computer; it is a property of the system being simulated. Thus, we can consider this temperature as a new property of a dynamical complex system. High values of imply that the system is rapidly varying; low values that it is slowly varying.

Now we want to show that decompositions can lead to phase transitions between different states of the physical system defined by analogy of Section 11.1.2. In the language of Chapter 3, we can say that the complex system representing this problem exhibits a phase transition. We illustrate this with a trivial particle dynamics problem shown in Figure 11.19. Typically, we use on such problems the domain decomposition of Figure 11.19(a), where each node of the parallel machine contains a single connected region (compare Section 12.4). Alternatively, we can use the scattered decomposition-described for matrices in Section 8.1 and illustrated in Figure 11.19(b). One assigns to each processor several small regions of the space scattered uniformly throughout the domain. Each processor gets ``a piece of the action'' and shares those parts of the domain where the particle density and hence computational work is either large or small. This was explored for partial differential equations in [Morison:86a]. The scattered decomposition is a local minimum-there is an optimal size for the scattered blocks of space assigned to each processor. Both in this example and generally, the scattered decomposition is not as good as domain decomposition. This is shown in Figure 11.19(c), which sketches the energy H as a function of the chosen decomposition. Now, suppose that the particles move in time from t to as shown in Figure 11.19(d). The scattered decomposition minimum is unchanged, but as shown in Figure 11.19(c),(d) the domain decomposition minimum moves with time.

Figure 11.19: Particle Dynamics Problem on a Four-node System with Two Decompositions (a) Domain, Time t, (b) Scattered Times t and , (c) Instantaneous Energies, (d) Domain Decomposition Changing from Time t to

Now, one would often be interested not in the instantaneous energy H, but rather in the average

For this new objective function , the scattered decomposition can be the global minimum as illustrated in Figure 11.20. The domain decomposition is smeared with time and so its minimum is raised in value; the value of H at the scattered decomposition minimum is unchanged. We can study as a function of , and the hardware ratio used in Equation 3.10. As increases or decreases, we move from the situation of Figure 11.19(c) to that of Figure 11.20. In physics language, and are order parameters which control the phase transition between the two states scattered and domain. Rapidly varying systems (high ), rather than those with lower , are more likely to see the transition as increases. This agrees with physical intuition, as we now describe. When is small (slowly varying system), domain decomposition is the global minimum and this switches to a scattered decomposition as increases. In Figure 11.19(a),(b), we can associate with each particle in the simulation a spin value which indicates the label of the processor to which it is assigned. Then we see the direct analogy to physical spin systems. At high temperatures, we have spin waves (scattered decomposition); at low temperatures, (magnetic) domains (domain decomposition).

Figure: The Average Energy of Equation 11.13

We end by noting that in the analogy there is a class of problems which we call microscopically dynamic. These are explored in [Fox:88f], [Fox:88kk;88uu]. In this problem class, the fundamental entities (particles in above analogy) move between nodes of parallel machine on a microscopic time scale. The previous discussion had only considered the adiabatic loosely synchronous problems where one can assume that the data elements (particles in the analogy) can be treated as fixed in a particular processor at each time instant. We will not give a general discussion here, but rather just illustrate the ideas with one example-the global sum calculation written in Fortran as

DO l I=l, LIMIT1 A(I)=0 DO 1 J=l, LIMIT2 1 A(I)=A(I) + B(I,J)

This is illustrated in Figure 11.21 (Color Plate) for the case LIMIT1=4 decomposed onto a four-node machine. The value of LIMIT1 is important for performance considerations but irrelevant for the discussion here. The optimal scheduling of communication and calculation is tricky and is discussed as the fold algorithm in [Fox:88a]. The four tasks of calculating the four A(I) cannot be viewed as particles as they move from node to node and we cannot represent this movement in the formalism used up to now. Rather, we now represent the tasks by ``space-time'' strings or world lines and one replaces Equation 11.9 by a Hamiltonian which describes interacting strings rather than interacting particles. This can be applied to event-driven simulations, message routing, and other microscopically dynamic problems. The strings need to be draped over the space-time grid formed by the complex computer as it evolves in time. Figure 11.21 (Color Plate) shows this compact ``draping'' for the fold algorithm.

Figure 11.21: The Fold Algorithm. Four global sums interleaved optimally on four processors.

We have successfully applied similar ideas to multivehicle and multiarm robot path planning and routing problems [Chiu:88f], [Fox:90e;90k;92c], [Gandhi:90a]. Comparison of the vehicle navigation in Figure 11.22 (Color Plate) with the computational routing problem in Figure 11.21 (Color Plate) illustrates the analogy.

Figure 11.22: Two- and four-vechcle navigation problems. in each case, vehicles have been given initial and final target positions. The black squares are impassable and define a narrow pass. Physical optimisation methods[Fox:88ii;90e] were used to find solutions.

Next: 11.3 Physical Optimization Up: 11 Load Balancing and Previous: 11.1.9 Conclusions

Guy Robinson
Wed Mar 1 10:19:35 EST 1995