Sparse BLAS

Next: CRS Matrix-Vector Product Up: Matrix-Vector and Matrix-Matrix Multiplications Previous: BLAS Contents Index

Sparse BLAS

In the spirit of the dense BLAS, work is underway in the BLAS Technical Forum to establish a standard for the sparse BLAS. The sparse BLAS interface addresses computational routines for unstructured sparse matrices. Sparse BLAS also contains the three levels of operations as in the dense case. However, only a small subset of the dense BLAS is specified:

: Level 1: sparse dot product, vector update, and gather/scatter;
: Level 2: sparse matrix-vector multiply and triangular solve;
: Level 3: sparse matrix-dense matrix multiply and triangular solve with multiple right-hand sides.

These are the basic operations used in most of the iterative algorithms for solving sparse linear equations and eigensystems. The Level 2 and 3 sparse BLAS interfaces support nine sparse matrix
storage formats. The nine formats are divided into two categories: the point entry formats such as compressed row and the block entry formats such as block compressed row. In the block entry formats, the sparsity structure is represented as a series of small dense blocks. Note that the nine formats include some surveyed in §10.1, except JDS and SKS.

The interface design of the sparse BLAS is fundamentally different from the dense BLAS. Since there is no single ``best'' storage format for any sparse matrix operation, the sparse matrix arguments to the Level 2 and 3 sparse BLAS do not use one particular storage format. Instead, a generic interface is defined for these routines, in which a sparse matrix argument is a handle (integer) representing the matrix. Before calling a sparse BLAS routine, one needs to call a creation routine to create the sparse matrix handle from one of the nine formats. After calling the sparse BLAS routine, one needs to call a cleanup routine to release the resources associated with the matrix handle. This handle-based approach allows one to use the sparse BLAS independently of the sparse matrix storage. The internal representation is implementation dependent and can be chosen for best performance.

Since matrix-vector products often account for most of the time spent in iterative methods, several research efforts have tried to optimize their performance [438,439,240,239]. In matrix-vector multiplication, each matrix entry participates in only one operation, but each vector entry may participate in more than one operation. A primary goal of optimization is to reduce the amount of movement of the source vector between different levels of memory. The optimization techniques include matrix reordering, cache blocking, and register blocking. A recently developed toolbox, called SPARSITY, embraces all these techniques [240,239]. Depending on the matrix structure, the authors have observed up to threefold speedup even on uniprocessors. SPARSITY also contains mechanisms to automatically choose the best block sizes based on matrix and machine characteristics.

In many of the iterative methods discussed earlier, both the product of a matrix and that of its transpose times a vector are needed; that is, given an input vector we want to compute products

$\begin{displaymath}y=Ax\qquad\hbox{and}\qquad y=A^Tx. \end{displaymath}$

In the following two subsections, we will present algorithms for the storage format from §10.1, CRS.

Subsections

Next: CRS Matrix-Vector Product Up: Matrix-Vector and Matrix-Matrix Multiplications Previous: BLAS Contents Index

Susan Blackford 2000-11-20