Latent Semantic Indexing is a method for automatic indexing and retrieval that tries to take advantage of the semantic, or conceptual, content of documents. The particular LSI technique used in Xnetlib at UT/ORNL employs singular-value decomposition to take a large matrix of term-document association data (in the case of Xnetlib, the documents are the netlib index files) and construct a ``semantic'' space wherein terms and documents that are closely associated are placed near one another. LSI tries to tackle the problems of synonymy (many ways to refer to the same object) and polysemy (more than one meaning for a term), so as to improve the recall and precision of retrieval. In fact, terms that do not actually appear in a document may still end up close to the document, if that is consistent with the major patterns of association in the data. Retrieval is carried out by using the terms in a query to indentify a point in the semantic space and by returning documents in the neighborhood of this space. At UT/ORNL, the SVD is done periodically on a matrix constructed from the netlib index files to produce a semantic space for the netlib repository. The keyword-lsi service invoked from nlrexecd carries out retrieval. For more information about LSI, see [3].
The particular LSI technique currently used in Xnetlib is patented and proprietary and can be used only with the written permission of Bell Communications Research.