In general, we'd like to consider the Jacobian computation on a rectangular
grid. For this, we can consider using to accomplish the
calculation. With a general grid shape, we exploit some concurrency in
both the column evaluations and the residual computations, with
the time for this step,
the corresponding speedup,
the
residual evaluation time with P row processes, and
the
apparent speedup compared to one row process:
assuming no shortcuts are available as a result of latency. This timing is exemplified in the example below, which does not take advantage of latency.
There is additional work whenever the Jacobian structure is rebuilt for
better numerical stability in the subsequent LU factorization (A-mode).
Then, work is involved in each process in the filling of the
initial Jacobian. In the normal case, work proportional to the number of
local nonzeroes plus fill elements is incurred in each process for
refilling the sparse Jacobian structure.