Proceedings:
Vol. 12 No. 1 (2018): Twelfth International AAAI Conference on Web and Social Media
Volume
Issue:
Vol. 12 No. 1 (2018): Twelfth International AAAI Conference on Web and Social Media
Track:
Dataset Papers
Downloads:
Abstract:
Veracity assessment of news and social bot detection have become two of the most pressing issues for social media platforms, yet current gold-standard data are limited. This paper presents a leap forward in the development of a sizeable and feature rich gold-standard dataset. The dataset was built by using a collection of news items posted to Facebook by nine news outlets during September 2016, which were annotated for veracity by BuzzFeed. These articles were refined beyond binary annotation to the four categories: mostly true, mostly false, mixture of true and false, and no factual content. Our contribution integrates data on Facebook comments and reactions publicly available on the platform’s Graph API, and provides tailored tools for accessing news article web content. The features of the accessed articles include body text, images, links, Facebook plugin comments, Disqus plugin comments, and embedded tweets. Embedded tweets provide a potent possible avenue for expansion across social media platforms. Upon development, this utility yielded over 1.6 million text items, making it over 400 times larger than the current gold-standard. The resulting dataset—BuzzFace—is presently the most extensive created, and allows for more robust machine learning applications to news veracity assessment and social bot detection than ever before.
DOI:
10.1609/icwsm.v12i1.14985
ICWSM
Vol. 12 No. 1 (2018): Twelfth International AAAI Conference on Web and Social Media