In general, we'd like to consider the Jacobian computation on a rectangular grid. For this, we can consider using to accomplish the calculation. With a general grid shape, we exploit some concurrency in both the column evaluations and the residual computations, with the time for this step, the corresponding speedup, the residual evaluation time with P row processes, and the apparent speedup compared to one row process:
assuming no shortcuts are available as a result of latency. This timing is exemplified in the example below, which does not take advantage of latency.
There is additional work whenever the Jacobian structure is rebuilt for better numerical stability in the subsequent LU factorization (A-mode). Then, work is involved in each process in the filling of the initial Jacobian. In the normal case, work proportional to the number of local nonzeroes plus fill elements is incurred in each process for refilling the sparse Jacobian structure.