Christopher P. Diehl, Galileo Mark Namata, Lise Getoor
In recent years, informal, online communication has transformed the ways in which we connect and collaborate with friends and colleagues. With millions of individuals communicating online each day, we have a unique opportunity to observe the formation and evolution of roles and relationships in networked groups and organizations. Yet a number of challenges arise when attempting to infer the underlying social network from data that is often ambiguous, incomplete and context-dependent. In this paper, we consider the problem of collaborative network discovery from domains such as intelligence analysis and litigation support where the analyst is attempting to construct a validated representation of the social network. We specifically address the challenge of relationship identification where the objective is to identify relevant communications that substantiate a given social relationship type. We propose a supervised ranking approach to the problem and assess its performance on a manager-subordinate relationship identification task using the Enron email corpus. By exploiting message content, the ranker routinely cues the analyst to relevant communications relationships and message traffic that are indicative of the social relationship.
Subjects: 12. Machine Learning and Discovery; 1.10 Information Retrieval
Submitted: May 5, 2008