Detecting Spam Blogs: A Machine Learning Approach

Pranam Kolari, Akshay Java, Tim Finin, Tim Oates, Anupam Joshi

Weblogs or blogs are an important new way to publish information, engage in discussions, and form communities on the Internet. The has unfortunately been infected by several varieties of spam-like content. Blog search engines, for example, are inundated by posts from splogs -- false blogs with machine generated or hijacked content whose sole purpose is to host ads or raise the PageRank of target sites. We discuss how SVM models based on local and link-based features can be used to detect splogs. We present an evaluation of learned models and their utility to blog search engines; systems that employ techniques differing from those of conventional web search engines.

Subjects: 12. Machine Learning and Discovery; 1.10 Information Retrieval


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.