Thursday, May 8, 9:30 am in MTH 3206, University of Maryland,
College Park
Information retrieval via limited-memory matrix methods
Dr. Tamara G. Kolda
Department of Mathematics,
University of Maryland,
College Park
With ever larger collections of documents available electronically, a
need has arisen for fast and efficient search engines. Latent
Semantic Indexing (LSI) approximates a matrix representing a document
collection using the truncated SVD; this allows automatic recognition
of latent relationships between words and leads to a more efficient
search engine. We propose replacing the SVD with what we call the
semi-discrete decomposition (SDD). The resulting SDD-based LSI
performs as well as the SVD-based method, requires substantially less
storage, and processes queries faster. Furthermore, the SDD is easy
to update when new documents are added to the collection. This is
joint work with Dianne P. O'Leary.
|