The Conjugate Gradient method involves one matrix-vector product, three vector updates, and two inner products per iteration. Some slight computational variants exist that have the same structure (see Reid ). Variants that cluster the inner products , a favorable property on parallel machines, are discussed in §.
For a discussion of the Conjugate Gradient method on vector and shared memory computers, see Dongarra, et al. . For discussions of the method for more general parallel architectures see Demmel, Heath and Van der Vorst  and Ortega , and the references therein.