These are the basic operations used in most of the iterative algorithms
for solving sparse linear equations and eigensystems.
The Level 2 and 3 sparse BLAS interfaces support nine sparse matrix
storage formats. The nine formats are divided into two categories:
the point entry formats such as compressed row and the block entry
formats such as block compressed row. In the block entry formats, the
sparsity structure is represented as a series of small dense blocks.
Note that the nine formats include
some surveyed in §10.1,
except JDS and SKS.
The interface design of the sparse BLAS is fundamentally different from the dense BLAS. Since there is no single ``best'' storage format for any sparse matrix operation, the sparse matrix arguments to the Level 2 and 3 sparse BLAS do not use one particular storage format. Instead, a generic interface is defined for these routines, in which a sparse matrix argument is a handle (integer) representing the matrix. Before calling a sparse BLAS routine, one needs to call a creation routine to create the sparse matrix handle from one of the nine formats. After calling the sparse BLAS routine, one needs to call a cleanup routine to release the resources associated with the matrix handle. This handle-based approach allows one to use the sparse BLAS independently of the sparse matrix storage. The internal representation is implementation dependent and can be chosen for best performance.
Since matrix-vector products often account for most of the time spent in iterative methods, several research efforts have tried to optimize their performance [438,439,240,239]. In matrix-vector multiplication, each matrix entry participates in only one operation, but each vector entry may participate in more than one operation. A primary goal of optimization is to reduce the amount of movement of the source vector between different levels of memory. The optimization techniques include matrix reordering, cache blocking, and register blocking. A recently developed toolbox, called SPARSITY, embraces all these techniques [240,239]. Depending on the matrix structure, the authors have observed up to threefold speedup even on uniprocessors. SPARSITY also contains mechanisms to automatically choose the best block sizes based on matrix and machine characteristics.
In many of the iterative methods discussed earlier, both the product
of a matrix and that of its transpose times a vector are needed; that
is, given an input vector we want to compute products