next up previous contents index
Next: The LU Factorization Up: Single Integration Step Previous: Jacobian Computation

Exploitation of Latency

This approach has been considered in the Concurrent DASSL framework. We currently have experimental versions of two mechanisms, both of which are designed to work with the sparse-matrix structures associated with direct, sparse LU factorization ([Skjellum:90d]). The first is called ``bandlike'' Jacobian evaluation. For a banded Jacobian matrix of bandwidth  b, only b residuals are needed to evaluate the Jacobian. This feature is incorporated into the original DASSL, along with a LINPACK banded solver. In Concurrent DASSL, collections of Jacobian columns are placed in each process column, according to the column data distribution, which thus far is picked solely to balance LU factorization and triangular-solve performance [Skjellum:90d]. In each process column, there will be ``compatible'' columns that can be evaluated using a single, composite perturbation. Identification of these compatible columns is accomplished by checks on the bandwidth  overlap condition. Columns that possess off-band structure are stricken from the list and evaluated separately. Presumably, a heuristic algorithm could be employed further to increase the size of the compatible sets, but this is yet to be implemented. The same ``greedy'' algorithm of Curtis et al. used for the sequential reduction of Jacobian computation effort would be applied independently to each process column (see comments by [Duff:86a, Section 12.3]). Then, clearly, the column distribution affects the performance of the Jacobian computation, and the linear-algebra performance can no longer be viewed so readily in isolation.

We have also devised a ``blocklike'' format, which will be applied to block n-diagonal matrices that include some off-block entries as well. Optimally, fewer residual computations will be needed than for the banded case. The same column-by-column compatible sets will be created, and the Curtis algorithm can also be applied. Hopefully, because of the less restrictive compatibility requirement, the blocklike case will produce higher concurrent speedups than those attained using the conservative bandlike assumption for Jacobians possessing blocklike structure. Comparative results will be presented in a future paper.



next up previous contents index
Next: The LU Factorization Up: Single Integration Step Previous: Jacobian Computation



Guy Robinson
Wed Mar 1 10:19:35 EST 1995