The definition of a heterogeneous computing environment depends to some extent on the application. Here we attempt a definition that is relevant to numerical software. The three main issues determining the classification are the hardware, the communication layer, and the software (operating system, compiler, compiler options). Any differences in these areas can potentially affect the behavior of the application. Specifically, the following conditions must be satisfied before a system can be considered homogeneous:
We regard a homogeneous machine as one which satisfies condition (1.); a homogeneous network as a collection of homogeneous machines which additionally satisfies condition (2.); and finally, a homogeneous computing environment as a homogeneous network which satisfies condition (3.). We can then make the obvious definition that a heterogeneous computing environment is one that is not homogeneous. The requirements for a homogeneous computing environment are quite stringent and are frequently not met in networks of workstations, or PCs, even when each computer in the network is the same model.
Some areas of distinction are quite obvious, such as a difference in the architecture of two machines, or the type of communication layer implemented. Communication issues are discussed in more detail in Section 6. Some hardware and software issues, however, can potentially affect the behavior of the application and be difficult to diagnose. For example, the determination of machine parameters such as machine precision, overflow, and underflow; or the implementation of complex arithmetic such as complex division; or the handling of NaNs and subnormal numbers could differ. Some of these subtleties may only become apparent when the arithmetic operations occur on the edge of the range of representable numbers. Section 4 discusses arithmetic issues in more detail.
The difficult question that remains unanswered for developers of library software is: when can we guarantee that heterogeneous computing is safe? There is also the question of just how much additional programming effort we should expend to gain the additional robustness. Unless we can incorporate a reliable test for homogeneity, we are also in danger of imposing a considerable additional performance penalty on homogeneous systems in order to perform safely on heterogeneous systems.
To illustrate the potential problems consider the iterative solution of a system of linear equations where the stopping criterion depends upon the value of some function, f, of the relative machine precision, . The test for convergence might well include a test of the form:
In a heterogeneous setting the value of f may be different on different processors and and may depend upon data of different accuracies, and thus one or more processes may converge in a fewer number of iterations. Indeed the stopping criterion used by the most accurate processor may never be satisfied if it depends on data computed less accurately by other processors. If the code contains communication between processors within an iteration, it may not complete if one processor converges before the others. In a heterogeneous environment, the only way to guarantee termination is to have one processor make the convergence decision and broadcast that decision.
This is a strategy we shall see again in later sections.