Using Regulatory Instructions for Information Extraction

Thomas Y. Lee

In this paper, we describe a novel approach for learning to extract content from the text segments of regulatory filings for the purpose of competitive analysis and regulatory audit. Existing strategies that rely upon an explicit schema or a training set of representative documents are less suited for managing thousands of idiosyncratic submissions by independent filers. We introduce a technique that learns from regulatory instructions. Knowledge about document structure is drawn from the policy documents to initialize a set of extraction patterns. Patterns are relaxed to account for single insertion, deletion, and substitution errors within individual filings. Preliminary results are reported on various sets of filings submitted to the SEC in 2004 and 2005.

Subjects: 10. Knowledge Acquisition; 8. Enabling Technologies

Submitted: May 10, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.