** Next:** The LU Factorization
**Up:** Single Integration Step
** Previous:** Jacobian Computation

This approach has
been considered in the Concurrent DASSL framework. We currently have
experimental versions of two mechanisms, both of which are designed to
work with the sparse-matrix structures associated with direct, sparse
LU factorization ([Skjellum:90d]). The first is called
``bandlike'' Jacobian evaluation. For a banded Jacobian matrix of
bandwidth **b**, only **b** residuals are needed to
evaluate the Jacobian. This feature is incorporated into the original
DASSL, along with a LINPACK banded solver. In Concurrent DASSL,
collections of Jacobian columns are placed in each process column,
according to the column data distribution, which thus far is picked
solely to balance LU factorization and triangular-solve performance
[Skjellum:90d]. In each process column, there will be
``compatible'' columns that can be evaluated using a single, composite
perturbation. Identification of these compatible columns is
accomplished by checks on the bandwidth overlap
condition. Columns that possess off-band structure are stricken from
the list and evaluated separately. Presumably, a heuristic algorithm
could be employed further to increase the size of the compatible sets,
but this is yet to be implemented. The same ``greedy'' algorithm of
Curtis et al. used for the sequential reduction of Jacobian computation
effort would be applied independently to each process column (see
comments by [Duff:86a, Section 12.3]). Then, clearly, the column
distribution affects the performance of the Jacobian computation, and
the linear-algebra performance can no longer be viewed so readily in
isolation.
We have also devised a ``blocklike'' format, which will be applied to block
n-diagonal matrices that include some off-block entries as well. Optimally,
fewer residual computations will be needed than for the banded case. The same
column-by-column compatible sets will be created, and the Curtis algorithm
can also be applied. Hopefully, because of the less restrictive
compatibility requirement, the blocklike case will produce higher
concurrent speedups than those attained using the conservative bandlike
assumption for Jacobians possessing blocklike structure. Comparative results
will be presented in a future paper.

** Next:** The LU Factorization
**Up:** Single Integration Step
** Previous:** Jacobian Computation

* Guy Robinson *

Wed Mar 1 10:19:35 EST 1995