Proceedings:
Computational Approaches to Analyzing Weblogs
Volume
Issue:
Papers from the 2006 AAAI Spring Symposium
Track:
Contents
Downloads:
Abstract:
In order to perform analysis over weblogs, we must first identify the appropriate unit of a weblog that corresponds to a document. We argue in the paper that, for weblogs, the correct unit is the weblog post. A weblog post is a structured document with the following fields: date, timestamp, title, content, permalink and author. We present our approach for segmenting weblogs into posts, which breaks down into several steps: (1) automatic feed discovery; (2) feed-guided segmentation, using the weblog feed and HTML; and (3) model-based weblog segementation.
Spring
Papers from the 2006 AAAI Spring Symposium