Neural nets are ``by definition'' parallel computing systems of many densely interconnected units. Parallel computation is the basic method used by our brain to achieve response times of hundreds of milliseconds, using sloppy biological hardware with computing times of a few milliseconds per basic operation.

Our implementation of the learning algorithm is based on the use of MIMD
machines with large grain size. An efficient mapping strategy consists of
assigning a subset of the examples (input-output pairs) and the
*entire* network structure to each processor. To obtain proper
generalization, the number of example patterns has to be much larger (say,
) than the number of parameters defining the architecture
(i.e., the number of connection weights). For this reason, the amount
of memory used for storing the weights is not too large for significant
problems.

Function and gradient evaluation is executed in parallel. Each processor
calculates the contribution of the assigned patterns (with no communication),
and a global combining-distributing step (see the *ADDVEC* routine in
[Fox:88a]) calculates the total energy and gradient (let's remember that
the energy is a *sum* of the patterns' contributions) and communicates
the result to all processors.

Then the one-dimensional minimization along the search direction is completed and the weights are updated.

**Figure 9.25:** ``Healthy Food'' Has to Be Distinguished from ``Junk Food'' Using
Taste and Smell Information.

This simple parallelization approach is promising: It can be easily adapted to different network representations and learning strategies, and it is going to be a fierce competitor with analog implementations of neural networks, when these are available for significant applications (let's remember that airplanes do not flap their wings ).

Wed Mar 1 10:19:35 EST 1995