Methods based on orthogonalization were developed by a number of authors in the early '50s. Lanczos' method  was based on two mutually orthogonal vector sequences, and his motivation came from eigenvalue problems. In that context, the most prominent feature of the method is that it reduces the original matrix to tridiagonal form. Lanczos later applied his method to solving linear systems, in particular symmetric ones . An important property for proving convergence of the method when solving linear systems is that the iterates are related to the initial residual by multiplication with a polynomial in the coefficient matrix.
The joint paper by Hestenes and Stiefel , after their independent discovery of the same method, is the classical description of the conjugate gradient method for solving linear systems. Although error-reduction properties are proved, and experiments showing premature convergence are reported, the conjugate gradient method is presented here as a direct method, rather than an iterative method.
This Hestenes/Stiefel method is closely related to a reduction of the Lanczos method to symmetric matrices, reducing the two mutually orthogonal sequences to one orthogonal sequence, but there is an important algorithmic difference. Whereas Lanczos used three-term recurrences, the method by Hestenes and Stiefel uses coupled two-term recurrences. By combining the two two-term recurrences (eliminating the ``search directions'') the Lanczos method is obtained.
A paper by Arnoldi  further discusses the Lanczos biorthogonalization method, but it also presents a new method, combining features of the Lanczos and Hestenes/Stiefel methods. Like the Lanczos method it is applied to nonsymmetric systems, and it does not use search directions. Like the Hestenes/Stiefel method, it generates only one, self-orthogonal sequence. This last fact, combined with the asymmetry of the coefficient matrix means that the method no longer effects a reduction to tridiagonal form, but instead one to upper Hessenberg form. Presented as ``minimized iterations in the Galerkin method'' this algorithm has become known as the Arnoldi algorithm.
The conjugate gradient method received little attention as a practical method for some time, partly because of a misperceived importance of the finite termination property. Reid  pointed out that the most important application area lay in sparse definite systems, and this renewed the interest in the method.
Several methods have been developed in later years that employ, most often implicitly, the upper Hessenberg matrix of the Arnoldi method. For an overview and characterization of these orthogonal projection methods for nonsymmetric systems see Ashby, Manteuffel and Saylor , Saad and Schultz , and Jea and Young .
Fletcher  proposed an implementation of the Lanczos method, similar to the Conjugate Gradient method, with two coupled two-term recurrences, which he named the bi-conjugate gradient method (BiCG).