Putting Semantic Information Extraction on the Map: Noisy Label Models for Fact Extraction

Chris Pal, Gideon Mann, Richard Minerich

Geographic indexing is a powerful and effective way to organize information on the web, but the use of standardized location tags is not widespread. Therefore, there is considerable interest in using machine learning approaches to automatically obtain semantic associations involving geographic locations from processing unstructured natural language text. While it is often impractical or expensive to obtain training labels, there are often ways to obtain noisy labels. We present a novel discriminative approach using a hidden variable model suitable for learning with noisy labels and apply it to extracting location relationships from natural language. We examine the problem of associating events with locations, where simple keyword matching produces a small number of positive examples within many false positives. Compared to a state-of-the-art baseline, our method doubles the precision of extracting semantic information while maintaining the same recall.

Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing

Submitted: May 15, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.