Knowledge Representation, Learning, and Reasoning in WebDoc -- A Web Document Classification System

Bo Tang and Julia Hodges

This paper describe a novel approach to knowledge representation, learning, and reasoning in WebDoc, a system that classifies Web documents according to the Library of Congress classification system. We argue that an automatically constructed domain-independent knowledge base is indispensable. The WebDoc system builds a knowledge base (represented as a semantic network) that contains the Library of Congress subject headings and their relationships. Through training on human-indexed and NLP-parsed Web documents, WebDoc modifies the semantic network and generates rules for future index generation tasks.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.