In general, we'd like to consider the Jacobian computation on a rectangular
grid. For this, we can consider using to accomplish the
calculation. With a general grid shape, we exploit some concurrency in
*both* the column evaluations and the residual computations, with
the time for this step,
the corresponding speedup, the
residual evaluation time with **P** row processes, and the
apparent speedup compared to one row process:

assuming no shortcuts are available as a result of latency. This timing is exemplified in the example below, which does not take advantage of latency.

There is additional work whenever the Jacobian structure is rebuilt for better numerical stability in the subsequent LU factorization (A-mode). Then, work is involved in each process in the filling of the initial Jacobian. In the normal case, work proportional to the number of local nonzeroes plus fill elements is incurred in each process for refilling the sparse Jacobian structure.

Wed Mar 1 10:19:35 EST 1995