Next: Summary Up: Evaluation of High-Performance Computing Previous: Evaluation of PTLIB

Evaluation of HPC-Netlib

Just as we selected high-performance mathematical software for evaluation because of its broad applicability for users, we have given priority to three mathematical software target domains for the same reason.

Linear algebra, especially sparse linear system solvers
Partial differential equations (PDEs)
Optimization

Several issues need to be considered when establishing evaluation criteria for mathematical software. One observation is that, in contrast to the evaluation of parallel tools, the evaluation of mathematical software is inherently more quantitative. Assessing software by assigning scores, as was done for the evaluation of parallel tools, would be inappropriate for the evaluation of mathematical software.

Another consideration is that mathematical software packages often have different aims and different target applications. We must ensure that systematically and consistently checking the same criteria across all packages does not lead to comparing apples and oranges.

Another important observation is that some goals of evaluation are inherently conflicting. Satisfying a wish list of ideal goals is impossible, and tradeoffs will be necessary. Consider the following desirable and reasonable evaluation goals:

consistency in evaluation procedures because it promotes objectivity and fairness in the evaluations, and
tailoring evaluation procedures to packages because it promotes appropriate testing and optimal use of evaluation resources.

Now consider the following scenario. Package A is well established, widely known to be thorougly tested, and the package authors are known to the reviewer. In contrast, everything about Package B is unknown to the reviewer. It clearly would be appropriate to run Package B through a battery of various simple tests to ensure it meets at least some minimal standards. Running the same tests on Package A might seem inappropriate because the package has clearly survived far more rigorous testing. Running the tests does not appear to offer much added value to the user and does not appear to be the best use of the reviewer's time. However, not running the same tests on both packages could lead to a double standard or the appearance of a double standard. A satisfactory resolution of this scenario will require some tradeoffs between the conflicting goals.

Our basic approach for meeting the conflicting goals is to test the packages on a relatively small set of standard test problems. The problem set will include problems with a wide range of difficulty levels, easy problems any package should be able to solve and extremely difficult problems that will test the packages' limits. Problems will also be selected to test special claims made by package authors. Problem sets will necessarily vary somewhat from package to package, but our aim is to have some small common core of test problems across similar packages so users will have a basis for side-by-side comparison. Any other tests tailored to particular packages would be extra and optional.

Evaluation results will be presented as a reconfigurable package/problem Web-accessible table, with each cell of the table containing the results of that particular test. We expect the problem set used in our evaluations to evolve over time. We plan to update the tests and the results table when the problem set changes to ensure a continuing common basis for package comparison.

Characteristics of mathematical software can be divided into two categories - those characteristics that can be assessed by inspection of the code and documentation and those that can only be assessed through actual testing.

Ideally the software testing examines the following characteristics.

Correctness: The code works correctly on the intended problems.
Efficiency: The code is efficient with respect to both speed and storage.
Stability: The code is stable, performing as efficiently and as accurately as the problem's conditioning allows.
Robustness: The code handles error conditions reasonably. The ability to estimate a problem's condition, or otherwise providing a check on the computed answer's reliability, is also desirable.

Full examination of each characteristic for each package is clearly unrealistic. In addition, absolute quantitative assessments of the characteristics may mean little to a typical package user. Our approach of doing side-by-side comparisons on common standard problems provides relative assessments that are both more practical to obtain and more helpful to the user.

For testing sparse linear system solvers, several useful resources are available. The Harwell/Boeing [1] collection of sparse test matrices will be the source for many of our test problems. SPARSKIT [2] also contains a useful collection of test problems and in addition provides matrix generation and matrix format conversion utilities. The Harwell/Boeing and SPARSKIT collections are available through the Matrix Market [3].

The evaluation characteristics of sparse solvers that can be assessed largely from inspection of the code and its documentation include the following.

Capabilities

Includes methods, formats

Methods: Identify which methods used and preconditioners are used in the package.
Formats: Identify which matrix formats are supported. Packages that use non-standard matrix formats may be harder to test and to use, and will tend to have a relatively small base of users.

Portability

Includes standards, architectures

Standards: Identify which standards (e.g. MPI, BLAS) are used.
Architectures: Identify on which architectures the packages has been tested and is supported.

Versatility

Includes methods, interfaces

Methods: Identify the extent to which a user can design or specify the method or preconditioner to be used.
Interfaces: Identify how well the package interfaces with other packages, and whether it has multi-language support.

Ease of use

Identify adequacy of documentation, examples, and support.

Our current emphasis in the HPC-Netlib evaluation is on sparse linear system solvers, although many of the sparse packages also fall into the PDE category. We are currently evaluating the iterative packages Aztec, PETSc, and PIM. We also plan to evaluate the iterative packages BlockSolve95, BPKIT, Elegant, IML++, ITPACK, LASPack, PARPRE, PCG, P-SPARSLIB, and Templates, and the direct packages CAPSS, SPARSE, SuperLU, and UMFPACK. Like the PTLIB evaluations, the HPC-Netlib evaluations will be available through the NHSE homepage at http://www.nhse.org/.

Next: Summary Up: Evaluation of High-Performance Computing Previous: Evaluation of PTLIB

Jack Dongarra
Fri Nov 15 09:09:21 EST 1996