Social Media Relevance Filtering Using Perplexity-Based Positive-Unlabelled Learning

Authors

Sunghwan Mac Kim,Stephen Wan,Cécile Paris,Andreas Duenser

Lorica Health,CSIRO Data61,CSIRO Data61,CSIRO Data61

Proceedings:

Vol. 14 (2020): Fourteenth International AAAI Conference on Web and Social Media

Volume

Issue:

Vol. 14 (2020): Fourteenth International AAAI Conference on Web and Social Media

Track:

Full Papers

Downloads:

Download PDF

Abstract:

Internet user-generated data, like Twitter, offers data scientists a public real-time data source that can provide insights, supplementing traditional data. However, identifying relevant data for such analyses can be time-consuming. In this paper, we introduce our Perplexity variant of Positive-Unlabelled Learning (PPUL) framework as a means to perform social media relevance filtering. We note that this task is particularly well suited to a PU Learning approach. We demonstrate how perplexity can identify candidate examples of the negative class, using language models. To learn such models, we experiment with both statistical methods and a Variational Autoencoder. Our PPUL method generally outperforms strong PU Learning baselines, which we demonstrate on five different data sets: the Hazardous Product Review data set, two well known social media data sets, and two real case studies in relevance filtering. All datasets have manual annotations for evaluation, and, in each case, PPUL attains state-of-the-art performance, with gains ranging from 4 to 17% improvement over competitive baselines. We show that the PPUL framework is effective when the amount of positive annotated data is small, and it is appropriate for both content that is triggered by an event and a general topic of interest.

DOI:

10.1609/icwsm.v14i1.7307

ICWSM

Vol. 14 (2020): Fourteenth International AAAI Conference on Web and Social Media

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.