A seed-based framework for textual information extraction allows for weakly supervised acquisition of open-domain class attributes over conceptual hierarchies, from a combination of Web documents and query logs. Automatically-extracted labeled classes, consisting of a label (e.g., painkillers) and an associated set of instances (e.g., vicodin, oxycontin), are linked under existing conceptual hierarchies (e.g., brain disorders and skin diseases are linked under the concepts BrainDisorder and SkinDisease respectively). Attributes extracted for the labeled classes are propagated upwards in the hierarchy, to determine the attributes of hierarchy concepts (e.g., Disease) from the attributes of their subconcepts (e.g., BrainDisorder and SkinDisease).
Subjects: 13. Natural Language Processing; 10. Knowledge Acquisition
Submitted: Apr 10, 2008