Neural nets are ``by definition'' parallel computing systems of many densely interconnected units. Parallel computation is the basic method used by our brain to achieve response times of hundreds of milliseconds, using sloppy biological hardware with computing times of a few milliseconds per basic operation.
Our implementation of the learning algorithm is based on the use of MIMD machines with large grain size. An efficient mapping strategy consists of assigning a subset of the examples (input-output pairs) and the entire network structure to each processor. To obtain proper generalization, the number of example patterns has to be much larger (say, ) than the number of parameters defining the architecture (i.e., the number of connection weights). For this reason, the amount of memory used for storing the weights is not too large for significant problems.
Function and gradient evaluation is executed in parallel. Each processor calculates the contribution of the assigned patterns (with no communication), and a global combining-distributing step (see the ADDVEC routine in [Fox:88a]) calculates the total energy and gradient (let's remember that the energy is a sum of the patterns' contributions) and communicates the result to all processors.
Then the one-dimensional minimization along the search direction is completed and the weights are updated.
Figure 9.25: ``Healthy Food'' Has to Be Distinguished from ``Junk Food'' Using Taste and Smell Information.
This simple parallelization approach is promising: It can be easily adapted to different network representations and learning strategies, and it is going to be a fierce competitor with analog implementations of neural networks, when these are available for significant applications (let's remember that airplanes do not flap their wings ).