Indexing Documents by Discourse and Semantic Contents from Automatic Annotations of Texts

Brahim DJIOUA, Jean-Pierre Desclès

The basic aim of the model proposed here is to automatically build semantic metatext structure for texts that would allow us to search and extract discourse and semantic information from texts indexed in that way. This model is built up from two engines: The first engine, called EXCOM, is an XML based system for an automatic annotation of texts according to discourse and semantic categories. The second engine called MOCXE uses automatic semantic annotation that is generated by EXCOM to create a semantic inverted index which is able to find relevant documents for queries associated with discursive and semantic categories such as definition, quotation, causality, relations between concepts, etc. We explain by an example of a relation of "connection" between concepts in French. The model proposed is enough general to be translated in other languages.

Subjects: 1.10 Information Retrieval; 13. Natural Language Processing

Submitted: Feb 11, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.