Latent Semantic Indexing
is a method for automatic indexing
and retrieval that tries to take advantage of the semantic,
or conceptual, content of documents. The particular LSI
technique used in Xnetlib at UT/ORNL employs singular-value decomposition
to take a large matrix of term-document association data
(in the case of Xnetlib, the documents are the netlib index files)
and construct a ``semantic'' space wherein terms and
documents that are closely associated are placed near one another.
LSI tries to tackle the problems of synonymy (many ways to
refer to the same object) and
polysemy (more than one meaning for a term),
so as to improve the recall and precision of retrieval.
In fact, terms that do not actually appear in a document
may still end up close to the document, if that is consistent
with the major patterns of association in the data.
Retrieval is carried out by using the terms in a query
to indentify a point in the semantic space and by returning
documents in the neighborhood of this space.
At UT/ORNL, the SVD is done periodically on a matrix constructed
from the netlib index files to produce a semantic space
for the netlib repository. The keyword-lsi service
invoked from nlrexecd carries out retrieval.
For more information about LSI, see [3].
The particular LSI technique currently used in Xnetlib is patented and proprietary and can be used only with the written permission of Bell Communications Research.