Published:
May 2004
Proceedings:
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
Volume
Issue:
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
Track:
All Papers
Downloads:
Abstract:
In recent years, Latent Semantic Indexing (LSI) has been recognized as an effective tool for Information Retrieval in text documents. The level of “granularity” in LSI (i.e. whether LSI is performed on documents, paragraphs, sentences, phrases, etc.) is somewhat of a limiting factor, in that LSI comparisons can only be made at the level of granularity chosen. Here we argue that, as long as a record of the document structure is maintained, the level of granularity may be arbitrarily fine while still allowing for comparison at any coarser granularity. It is shown that the reduced-dimension vector for any particular section of a document is a function of the vectors of its constituent subsections. Using this information, we illustrate how LSI can be used to compare documents at multiple structural levels. One possible application (automated plagiarism detection) is discussed as an example of how this method of multilevel comparison may be used to improve query time in fine-granularity LSI applications.
FLAIRS
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
ISBN 978-1-57735-201-3
Published by The AAAI Press, Menlo Park, California.