Finding your Way through Blogspace: Using Semantics for Cross-domain Blog Analysis

Bettina Berendt, Roberto Navigli

Blogspace is one of the most dynamic areas of today's Internet, and it is increasingly recognised that blogs are much more than "meaningless chatter." Many syntax-based approaches exist to analyse the text and the network structure between blogs. While this is very helpful for purposes such as the detection of discussion bursts concerning uniquely-named topics (e.g., a book, product, or person), it is insufficient for understanding blogs discussing new phenomena in different wordings, or for finding and explaining relationships between new discourse topics or the context of a new topic in a larger domain of discourse. In this paper, we propose two methods for semantics-enhanced blogs analysis that allow the analyst to integrate domain-specific as well as general background knowledge. The methods rely on the Term Extractor for identifying keyphrases, SSI (Structural Semantic Interconnections) for disambiguating terms, and the taxonomy of domain labels by Magnini and Cavaglia. Applications include topic detection and grouping, the proposal of blog tags and the forming of blog directories, and blog recommender systems. To illustrate the usefulness of our approach, we present a detailed experimental analysis of a sample of four sets of blogs with different thematic foci (food, health, law, and weblogs about blogging).

