Most linear algebra operations in the iterative solution of sparse systems are easily parallelised. The one exception concerns the inner products that appear in all iterative methods (with the exception of the Chebyshev iteration). Since the collection and redistribution of data in an inner product will inevitably have some processors waiting, the number of inner products should be kept low.
The conjugate gradients and bi-conjugate gradients algorithms have two interdependent inner products, and various people have suggested ways to reduce this to two independent ones, which can then combine their communication stages. See [3] and the references cited therein. This approach has been taken in the PIM package; section 3.11.
The second source of inner products is the orthogonalisation stage in the GMRES algorithm. Using Gram-Schmidt orthogonalisation, the inner products are independent and can be combined; however, this makes the resulting algorithm unstable. Modified Gram-Schmidt gives a more stable algorithm, but the inner products are interdependent, and have to be processed in sequence. A middle ground is to apply (unmodified) Gram-Schmidt twice.