Just as we selected high-performance mathematical software for evaluation because of its broad applicability for users, we have given priority to three mathematical software target domains for the same reason.
Several issues need to be considered when establishing evaluation criteria for mathematical software. One observation is that, in contrast to the evaluation of parallel tools, the evaluation of mathematical software is inherently more quantitative. Assessing software by assigning scores, as was done for the evaluation of parallel tools, would be inappropriate for the evaluation of mathematical software.
Another consideration is that mathematical software packages often have different aims and different target applications. We must ensure that systematically and consistently checking the same criteria across all packages does not lead to comparing apples and oranges.
Another important observation is that some goals of evaluation are inherently conflicting. Satisfying a wish list of ideal goals is impossible, and tradeoffs will be necessary. Consider the following desirable and reasonable evaluation goals:
Now consider the following scenario. Package A is well established, widely known to be thoroughly tested, and the package authors are known to the reviewer. In contrast, everything about Package B is unknown to the reviewer. It clearly would be appropriate to run Package B through a battery of various simple tests to ensure it meets at least some minimal standards. Running the same tests on Package A might seem inappropriate because the package has clearly survived far more rigorous testing. Running the tests does not appear to offer much added value to the user and does not appear to be the best use of the reviewer's time. However, not running the same tests on both packages could lead to a double standard or the appearance of a double standard. A satisfactory resolution of this scenario will require some tradeoffs between the conflicting goals.
Our basic approach for meeting the conflicting goals is to test the packages on a relatively small set of standard test problems. The problem set will include problems with a wide range of difficulty levels, easy problems any package should be able to solve and extremely difficult problems that will test the packages' limits. Problems will also be selected to test special claims made by package authors. Problem sets will necessarily vary somewhat from package to package, but our aim is to have some small common core of test problems across similar packages so that users will have a basis for side-by-side comparison. Any other tests tailored to particular packages would be extra and optional.
Evaluation results will be presented as a reconfigurable package/problem Web-accessible table, with each cell of the table containing the results of that particular test. We expect the problem set used in our evaluations to evolve over time. We plan to update the tests and the results table when the problem set changes to ensure a continuing common basis for package comparison.
Characteristics of mathematical software can be divided into two categories - those characteristics that can be assessed by inspection of the code and documentation and those that can only be assessed through actual testing.
Ideally the software testing examines the following characteristics.
For testing sparse linear system solvers, several useful resources are available. The Harwell/Boeing [6] collection of sparse test matrices will be the source for many of our test problems. SPARSKIT [7] also contains a useful collection of test problems and in addition provides matrix generation and matrix format conversion utilities. The Harwell/Boeing and SPARSKIT collections are available through the Matrix Market [8].
The evaluation characteristics of sparse solvers that can be assessed largely from inspection of the code and its documentation include the following.
Our current emphasis in the HPC-Netlib evaluation is on sparse linear system solvers, although many of the sparse packages also fall into the PDE category. We are currently evaluating the iterative packages Aztec, PETSc, and PIM. We also plan to evaluate the iterative packages BlockSolve95, BPKIT, Elegant, IML++, ITPACK, LASPack, PARPRE, PCG, P-SPARSLIB, and Templates, and the direct packages CAPSS, SPARSE, SuperLU, and UMFPACK. The evaluations are available through the HPC-Netlib homepage at http://www.nhse.org/hpc-netlib/. See http://www.nhse.org/sw_catalog/ for descriptions of the HPC-Netlib software packages.