Vasiis Vassalos and Yannis Papakonstantinou
Many autonomous and heterogeneous information sources are becoming increasingly available to users through the Internet, especially through the World Wide Web. In order to make the information available in a consolidated, uniform, and efficient manner, it is necessary to integrate the different information sources. Important challenges need to be addressed in order for an integration system to work efficiently. In this paper we discuss how to deal with the redundancy and overlap of the information in autonomous heterogeneous sources. We formulate two optimization problems that arise in the presence of redundant or overlapping information and present some preliminary work towards finding approximation algorithms for these problems.