Jafar Adibi, Hans Chalupsky, Eric Melz, and Andre Valente
Link discovery is a new challenge in data mining whose primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on low-level, incomplete and noisy evidence data. To address this challenge, we are developing a hybrid link discovery system called KOJAK that combines state-of-the art knowledge representation and reasoning (KR&R) technology with statistical clustering and analysis techniques from the area of data mining. In this paper we report on the architecture and technology of its first fully completed module called the KOJAK Group Finder. The Group Finder is capable of finding hidden groups and group members in large evidence databases. Our group finding approach addresses a variety of important LD challenges, such as being able to exploit heterogeneous and structurally rich evidence, handling the connectivity curse, noise and corruption as well as the capability to scale up to very large, realistic data sets. The first version of the KOJAK Group Finder has been successfully tested and evaluated on a variety of synthetic datasets.