Proceedings:
Proceedings of the International AAAI Conference on Web and Social Media, 6
Volume
Issue:
Vol. 6 No. 1 (2012): Sixth International AAAI Conference on Weblogs and Social Media
Track:
Poster Papers
Downloads:
Abstract:
In this paper we present several methods for collecting Web textual contents and filtering noisy data. We show that knowing which user publishes which contents can contribute to detecting noise. We begin by collecting data from two forums and from Twitter. For the forums, we extract the meaningful information from each discussion (texts of question and answers, IDs of users, date). For the Twitter dataset, we first detect tweets with very similar texts, which helps avoiding redundancy in further analysis. Also, this leads us to clusters of tweets that can be used in the same way as the forum discussions: they can be modeled by bipartite graphs. The analysis of nodes of the resulting graphs shows that network structure and content type (noisy or relevant) are not independent, so network studying can help in filtering noise.
DOI:
10.1609/icwsm.v6i1.14295
ICWSM
Vol. 6 No. 1 (2012): Sixth International AAAI Conference on Weblogs and Social Media