Proceedings:
Computational Approaches to Analyzing Weblogs
Volume
Issue:
Papers from the 2006 AAAI Spring Symposium
Track:
Contents
Downloads:
Abstract:
Blogs are difficult to categorize by humans and machines alike, because they are written in a capricious style. In the early days of web, directories maintain by humans could not keep up millions the websites; likewise, blog directories cannot keep up with the explosive growth of the blogsphere. This paper investigates the efficacy of using machine learning to categorize blogs. We design a text classification experiment to categorize one hundred and twenty blogs into four topics: personal diary, news, political, and sports. The baseline feature is unigrams weighed by TF-IDF, which yielded 84% accuracy. We analyze the corpus, features, and result data. Our analysis leads us to believe that blog taxonomies need to support polyhierarchy—a given blog may be correctly classified under more than one category.
Spring
Papers from the 2006 AAAI Spring Symposium