The most productive and efficient run-time environment interior to each subcube was that of CrOS-described above in Chapter 5-since the applications typically hosted were of the synchronous type. But the intermodule communications needed were definitely asynchronous. For them, the communications primitives needed were those that had already been developed in the Mercury OS [Burns:88a] and similar to those described in Chapter 16. The system-level view was one of needing ``loosely coupled sets of tightly coupled multiprocessors.'' That is, a single node needed to be tightly coupled, using CrOS, to nearest neighbors in its local subcube, yet loosely coupled, using Mercury or other asynchronous protocol, to other subcubes working on other tasks. Of course, it would have been possible to use Mercury for communications of both types, but on the Mark III level of hardware implementation, the performance penalty for using the asynchronous protocol where a synchronous protocol would suffice was factor of nearly five.
The CrOS latency of nearest-neighbor messaging on the Mark III is ; for Mercury it is -both adequate numbers for the 2-Mip 68020 data processors used on the Mark III, but often strained by the Weitek 16-MFLOPS floating-point accelerator. Messaging latency still is a key problem, even on the most recent machines. For example, on the Delta machine, NX delivers a nearest-neighbor message in ; Express takes . About the same, but now supporting a 120-Mip, 60-MFLOPS data processor.
To meet these hybrid needs and preserve maximum performance, a dual-protocol messaging system-called Centaur for its evocation of duality-was developed [Lee:89a]. To implement the disparate protocols involved-Mercury is interrupt driven whereas CrOS uses polled communications-it was determined that all messages would initially be assumed to be asynchronous and first handled as Mercury messages. A message that was actually synchronous contained a signal to that effect in its first packet header. Upon reading this signal, Mercury would disable interrupts and yield to the much faster CrOS machinery for the duration of the CrOS message. This scheme yielded synchronous and asynchronous performance only about 30% degraded from their counterparts in a nonmixed context.