There are special challenges associated with writing reliable numerical software on networks containing heterogeneous processors. That is, processors which may do floating point arithmetic differently. This includes not just machines with completely different floating point formats and semantics (e.g. Cray versus workstations running IEEE standard floating point arithmetic), but even supposedly identical machines running with different compilers or even just different compiler options or runtime environment. The basic problem occurs when making data dependent branches on different processors. The flow of an algorithm is usually data dependent and so slight variations in the data may lead to different processors executing completely different sections of code.
A simple example of where an algorithm might not work correctly is an iteration where the stopping criterion depends on the value of the machine precision. If the precision varies from process to process, different processes may have significantly different stopping criteria. In particular, the stopping criterion used by the most accurate process may never be satisfied if it depends on data computed less accurately by other processes.
Many such problems can be eliminated by using the largest machine precision among all participating processes. In LAPACK routine DLAMCH returns the (double precision) machine precision (as well as other machine parameters). In ScaLAPACK this is replaced by PDLAMCH which returns the largest value over all the processes, replacing the uniprocessor value returned by DLAMCH. Similarly, one should use the smallest overflow threshold and largest underflow threshold over the processes being used. In a non-homogeneous environment the ScaLAPACK routine PDLAMCH runs the LAPACK routine DLAMCH on each process and computes the relevant maximum or minimum value. We refer to these machine parameters as the multiprocessor machine parameters.
It should be noted that if the code contains communication between processes within an iteration, it will not complete if one process converges before the others. In a heterogeneous environment, the only way to guarantee termination is to have one process make the convergence decision and broadcast that decision. Further problems and suggested solutions are discussed in [12, 6].