Alon Y. Levy, Anand Rajaraman, Joann J. Ordille
We describe the architecture and query-answering algorithms used in the Information Manifold, an implemented information gathering system that provides uniform access to structured information sources on the World-Wide Web. Our architecture provides an expressive language for describing information sources, which makes it easy to add new sources and to model the fine-grained distinctions between their contents. The query-answering algorithm guarantees that the descriptions of the sources are exploited to access only sources that are relevant to a given query. Accessing only relevant sources is crucial to scale up such a system to large numbers of sources. In addition, our algorithm can exploit run-time information to further prune information sources and to reduce the cost of query planning.