Proceedings:
Book One
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 17
Track:
Machine Learning and Data Mining
Downloads:
Abstract:
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 17