next up previous
Next: Further Information Up: Case Studies on The Previous: The NAG Numerical PVM

Heterogeneous Computing Environments

  In principal, both ScaLAPACK and the NAG parallel library can be run on networks of heterogeneous machines, but in this final section we mention the special challenges associated with writing and testing numerical software that is to be executed on networks containing heterogeneous processors, that is, processors which perform floating point arithmetic differently. This includes not just machines with different floating point formats and semantics such as Cray vector computers and workstations performing IEEE standard floating point arithmetic, but even supposedly identical machines running different compilers, or even just different compiler options or runtime environments.

Moreover, on such networks, floating point data transfers between two processes may require a data conversion phase and thus a possible loss of accuracy. It is therefore impractical, error-prone and difficult to compare supposedly identical computed values on such heterogeneous networks. As a consequence, the validity and correctness of the tests performed can only currently be guaranteed for networks of processors with identical floating point formats.

It is not enough to require identical floating point representation across all processors of a parallel computer. The way arithmetic is performed should also agree to some extent. For example, having a processor in the network that does not produce and recognize denormalized number representations can cause problems when receiving such a number from other processors that can properly generate denormalized numbers. We have not yet tried to make the testing programs generate input data on the edge of the floating point number range, in order to identify and trap these problems. Whilst this is highly desirable, we have not yet sufficiently investigated the generation of such test problems to be confident of exposing the difficulties.

Further discussion of the dangers of heterogeneous computing can be found in DDHOS:IEEEC:96 and citeasnounBCDDDHPRSW:UTK-cs:96.



Jack Dongarra
Tue Sep 3 09:41:41 EDT 1996