PBLAS test programs have been designed, developed and included with the PBLAS code along similar lines to those of the BLAS test programs. This test package consists of several main programs and a set of subprograms generating test data and comparing the results with those obtained by element-wise computations of the sequential BLAS. These test programs assume the correctness of the BLAS and the BLACS routines [&make_named_href('', "node12.html#Whaley:UTK-cs:95","[]")]; it is therefore highly recommended that one run the testing programs provided with both of these packages before performing any PBLAS tests.
After each call to a subprogram being tested, its operation is checked in two ways. First, each of its input arguments, including all elements of the distributed operands, is checked to see if it has been altered by the subprogram. If any argument, other than the specified elements of the result scalar, vector or matrix, has been modified, an error is reported. This check includes the supposedly unreferenced elements of the distributed matrices. Second, the resulting scalar, vector or matrix computed by the subprogram is compared with the corresponding result obtained by the sequential BLAS or by simple Fortran code. We do not expect exact agreement because the two results are not necessarily computed by the same sequences of floating point operations. We do, however, expect the differences to be small relative to working precision. The error bounds are then the same as the ones used in the BLAS testers. A more detailed description of those tests can be found in DDHH:TOMS:88 and DDDH:TOMS:90.
The PBLAS testing programs are thus very similar to those for the BLAS. However, it was necessary to slightly depart from the way the BLAS testing programs operate due to the difficulties inherent in the testing of programs written for distributed-memory computers. In the following paragraphs, some essential features of the PBLAS testing programs design are presented in greater detail, together with discussion of the problems encountered, the ones we were able to solve, as well as the ones that remain open questions.
Very little distributed memory parallel programming experience is required to realize that having a program running correctly, say, on 2 processors does not necessarily imply that it will successfully run on p > 2 processors. Further increasing the number of potential test cases is the fact that parallel dense linear algebra kernels ordinarily assume a processor grid, typically a two dimensional grid. Furthermore, a general software library such as the PBLAS has to behave correctly even in degenerate cases, such as when the distributed matrix does not span all processors in one or both dimensions of the grid. Finally, it should also be possible to vary the size and location of the submatrices to operate on, the data decomposition parameters such as the block sizes used for the matrix partitioning and distribution, or even the local leading dimension of the arrays that locally store the pieces of the distributed matrices. Note that none of these remarks apply to the sequential testing problem.
These remarks suggest that it is in practice impossible to test even a very small portion of all the possible different test cases. However, it is important to be able to generate any possible case, so that the tester can also be used to check a given operation for a particular data distribution.
These facts motivated the decision to permit a user configurable set of tests for every PBLAS routine. Concretely, the input testing files allow for the precise specification of a limited number of tests. The input files for each test contain for each test, a complete description of the data layout of each operand allowing one to mimic exactly a given call to a PBLAS subroutine. Consequently, one can test the PBLAS with any possible machine configuration as well as data layout. The obvious drawback of such generality is that the input testing file is slightly longer and more complex than the input files used for the sequential BLAS testers.
The PBLAS software follows an SPMD or data-parallel programming model. If a PBLAS routine is called with an invalid value for any of its arguments, then it must report the fact and terminate the execution of the program. In the model implementation, each routine, on detecting an error, calls a common error-handling routine.
This input error checking aspect of the software is also tested. It is straightforward to plug in an erroneous combination of input arguments and check that the error handler behaves correctly. It is however interesting to notice that a PBLAS routine cannot ensure that every process does indeed call this subroutine.
Since checking arguments in a global fashion would add a global synchronization step, for efficiency purposes, the PBLAS routines only perform a local validity check of their argument list. If a value is invalid in at least one process of the current context, the program execution is stopped. As a result, different processes may have different values of an argument that should be the same, thus causing non predictable results. . We comment further on the problems of networks of heterogeneous computers in Section 8 below.