Proceedings:
Vol. 11 No. 1 (2017): Eleventh International AAAI Conference on Web and Social Media
Volume
Issue:
Vol. 11 No. 1 (2017): Eleventh International AAAI Conference on Web and Social Media
Track:
Poster Papers
Downloads:
Abstract:
In this paper, we address the problem of identifying spam users on Wikipedia and present our preliminary results. We formulate the problem as a binary classification task and propose a set of features based on user editing behavior to separate spammers from benign users. We tested our system on a new dataset we built consisting of 4.2K (half spam and half benign) users and 75.6K edits. Experimental results show that our approach reaches 80.8% classification accuracy and 0.88 mean average precision. We compared against ORES, the most recent tool developed by Wikimedia which assigns a damaging score to each edit, and we show that our system outperforms ORES in spam users detection. Moreover, by combining our features with ORES, classification accuracy increases to 82.1%. Additionally, we also show that our system performs well in a more realistic, unbalanced setting, that is, when spammers are greatly outnumbered by benign users, by achieving an AUROC of 0.84 (which increases to 0.86 when we combine with ORES).
DOI:
10.1609/icwsm.v11i1.14962
ICWSM
Vol. 11 No. 1 (2017): Eleventh International AAAI Conference on Web and Social Media