Simulated annealing does not take advantage of the local opt heuristics. This means that instead of sampling local opt tours as does L-K repeated from random starts, the chain samples all tours. It would be a great advantage to be able to restrict the sampling of a Markov chain to the local opt tours only. Then the bias which the Markov chain provides would enable one to sample the shortest local opt tours more efficiently than local opt repeated from random starts. This is what our new algorithm does.
To do this, one has to find a way to go from one local opt tour, , to
another,
, and this is the heart of our procedure. We propose to do
a change on
, which we call a ``kick.'' This can be a random p-change,
for instance, but we will choose something smarter than that. Follow this
kick by the local opt tour improvement heuristic until reaching a new local
opt tour
. Then accept or reject
depending on the
increase or decrease in tour length compared to
. This is illustrated
in Figure 11.24. Since there are many changes in going from
to
, we call this method a ``Large-Step Markov Chain.'' It can
also be called ``Iterated Local Opt,'' but it should be realized that it is
precisely finding a way to iterate which is the difficulty! The algorithm is
far better than the small-step Markov chain methods (conventional simulated
annealing) because the accept/reject procedure is not implemented on the
intermediate tours which are almost always of longer length. Instead, the
accept/reject does not happen until the system has returned to a local
minimum. The method directly steps from one local minimum to another. It is
thus much easier to escape from local minima.
Figure 11.24: Schematic Representation of the Objective Function and of the
Tour Modification Procedure Used in the Large-step Markov Chain
At this point, let us mention that this method is no longer a true simulated annealing algorithm. That is, the algorithm does NOT correspond to the simulation of any physical system undergoing annealing. The reason is that a certain symmetry property, termed ``detailed balance'' in the physics community, is not satisfied by the large-step algorithm. [Martin:91a] says a bit more about this. One consequence of this is that the parameter ``temperature'' which one anneals with no longer plays the role of a true, physical temperature-instead it is merely a parameter which controls the bias towards the optimum. The lack of a physical analogy may be the reason that this algorithm has not been tried before, even though much more exotic algorithms (such as appealing to quantum mechanical analogies!) have been proposed.
We have found that in practice, this methodology provides an efficient sampling of the local opt tours. There are a number of criteria which need to be met for the biased sampling of the Markov chain to be more efficient than plain random sampling. These conditions are satisfied for the TSP, and more generally whenever local search heuristics are useful. Let us stress before proceeding to specifics that this large-step Markov chain approach is extremely general, being applicable to any optimization problem where one has local search heuristics. It enables one to get a performance which is at least as good as local search, with substantial improvements over that if the sampling can be biased effectively. Finally, although the method is general, it can be adapted to match the problem of interest through the choice of the kick. We will now discuss how to choose the kick for the TSP.
Consider, for instance, the case where the local search is three-opt. If we
used a kick consisting of a three-change, the three-opt would very often
simply bring us back to the previous tour with no gain. Thus, it is probably
a good idea to go to a four-change for the kick when the local search is
three-opt. For more general local search algorithms, a good choice for the
kick would be a k-change which does not occur in the local search.
Surprisingly, it turns out that two-opt, three-opt, and especially L-K
are structured so that there is one kick choice which is natural for all of
them. To see this, it is useful to go back to the paper by Lin and
Kernighan. In that paper, they define ``sequential'' changes and they also show
that if the tour is to be improved, one can force all the partial gains during
the k-change to be positive. A consequence of this is that the
checkout time for sequential k-changes can be completed in steps.
It is easy to see that all two and three changes are sequential, and that the
first nonsequential change occurs at k=4 (Figure 2 of their paper). We
call this graph a ``double-bridge'' change because of what it does to the tour.
It can be constructed by first doing a two-change which disconnects the tour;
the second two-change must then reconnect the two parts, thereby creating a
bridge. Note that both of the two-changes are bridges in their own way, and
that the double-bridge change is the only nonsequential four-change which
cannot be obtained by composing changes which are both sequential and leave
the tour connected. If we included this double-bridge change in the
definition of the neighborhood for a local search, checkout time would
require
steps (a factor N for each bridge essentially). Rather
than doing this change as part of the local search, we include such changes
stochastically as our kick. The double-bridge kick is the most natural
choice for any local search method which considers only sequential changes.
Because L-K does so many changes for k greater than three, but misses
double-bridges, one can expect that most of what remains in excess length
using L-K might be removed with our extension. The results below
indicate that this is the case.