Up to this point, our primary concern was with communication between neighboring processors. Applications, however, tended to show two fundamental types of communication: local exchange of boundary condition data, and global operations connected with control or extraction of physical observables.
As seen from the examples in this book, these two types of communication are generally believed to be fundamental to all scientific problems-the modelled application usually has some structure that can be mapped onto the nodes of the parallel computer and its structure induces some regular communication pattern. A major breakthrough, therefore, was the development of what have since been called the ``collective'' communication routines, which perform some action across all the nodes of the machine.
The simplest example is that of ``broadcast''- a function that enabled node 0 to communicate one or more packets to all the other nodes in the machine. The routine ``concat'' enabled each node to accumulate data from every other node, and ``combine'' let us perform actions, such as addition, on distributed data sets. The routine combine is often called a reduction operator.
The development of these functions, and the natural way in which they could be mapped to the hypercube topology of the machines, led to great increases in both productivity on the part of the programmers and efficiency in the execution of the algorithms. CrOS quickly grew to a dozen or more routines.