Published:
May 2003
Proceedings:
Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2003)
Volume
Issue:
Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2003)
Track:
All Papers
Downloads:
Abstract:
One challenge in text processing is the treatment of case insensitive documents such as speech recognition results. The traditional approach is to re-train a language model excluding case-related features. This paper presents an alternative two-step approach whereby a preprocessing module (Step 1) is designed to restore case-sensitive form to feed the core system (Step 2). Step 1 is implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach (i) outperforms the feature exclusion approach for Named Entity tagging, (ii) leads to limited degradation for semantic parsing and relationship extraction, (iii) reduces system complexity, and (iv) has wide applicability: the restored text can feed both statistical model and rule-based systems.
FLAIRS
Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2003)
ISBN 978-1-57735-177-1
Published by The AAAI Press, Menlo Park, California.