Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction

Marius Pasca

A seed-based framework for textual information extraction allows for weakly supervised acquisition of open-domain class attributes over conceptual hierarchies, from a combination of Web documents and query logs. Automatically-extracted labeled classes, consisting of a label (e.g., painkillers) and an associated set of instances (e.g., vicodin, oxycontin), are linked under existing conceptual hierarchies (e.g., brain disorders and skin diseases are linked under the concepts BrainDisorder and SkinDisease respectively). Attributes extracted for the labeled classes are propagated upwards in the hierarchy, to determine the attributes of hierarchy concepts (e.g., Disease) from the attributes of their subconcepts (e.g., BrainDisorder and SkinDisease).

Subjects: 13. Natural Language Processing; 10. Knowledge Acquisition

Submitted: Apr 10, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.