Track:
Contents
Downloads:
Abstract:
We believe that the domain-specific knowledge of the structural organization of information is central to the human ability to deal with large quantities of data efficiently. A better understanding of the computational nature of this ability may lead to solutions to information retrieval problems of practical significance. The paper outlines a Theory of Document Presentation (DPT) which addresses the problems of how information can be structurally organized in the documents of a given domain and how the standards for such organization emerge. The paper describes FAQ Minder, a document processing system whose implementation was guided by DPT. FAQ Minder processes FAQs, files of "Frequently Asked Questions" associated with USENET newsgroups [1,2]. The system identifies and tags the logical components of FAQs: network headers, tables of contents, sections, glossaries, questions, answers, and bibliographies.