Toward Structured Retrieval in Semi-structured Information Spaces

Authors

Scott B. Huffman

Catherine Baudin

and Robert A. Nado

Track:

Contents

Downloads:

Download PDF

Abstract:

A semistructured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, structured methods like SQL cannot be easily employed, and users often must make do with only full-text search. In this paper, we describe an intermediate approach that provides structured querying for particular types of entities, such as companies, people, and skills. Entity-based retrieval is enabled by normalizing entity references in a heuristic, type-dependent manner. To organize and filter search results, entities are categorized as playing particular roles (e.g., company as client, as vendor, etc.) in particular collection types (directories, client engagement records, etc.). The approach can be used to retrieve documents and can also be used to construct entity profiles summaries of commonly sought information about an entity based on the documents content. The approach requires only a modest amount of meta-information about the source collections, much of which is derived automatically. On a set of typical user queries in a large corporate information space, the approach produces a dramatic improvement in retrieval quality over knowledge-free methods like full-text search.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.