Nowadays, it is very common for one person to be in different social networks. Linking identical users across different social networks, also known as the User Identity Linkage (UIL) problem, is fundamental for many applications. There are two major challenges in the UIL problem. First, it's extremely expensive to collect manually linked user pairs as training data. Second, the user attributes in different networks are usually defined and formatted very differently which makes attribute alignment very hard. In this paper we propose CoLink, a general unsupervised framework for the UIL problem. CoLink employs a co-training algorithm, which manipulates two independent models, the attribute-based model and the relationship-based model, and makes them reinforce each other iteratively in an unsupervised way. We also propose the sequence-to-sequence learning as a very effective implementation of the attribute-based model, which can well handle the challenge of the attribute alignment by treating it as a machine translation problem. We apply CoLink to a UIL task of mapping the employees in an enterprise network to their LinkedIn profiles. The experiment results show that CoLink generally outperforms the state-of-the-art unsupervised approaches by an F1 increase over 20%.
Published Date: 2018-02-08
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2018, Association for the Advancement of Artificial Intelligence All Rights Reserved.