Haizheng Zhang, C. Lee Giles, Henry C. Foley, John Yen.
Complex networks exist in a wide array of diverse domains, ranging from biology, sociology, and computer science. These real-world networks, while disparate in nature, often comprise of a set of loose clusters(a.k.a communities), whose members are better connected to each other than to the rest of the network. Discovering such inherent community structures can lead to deeper understanding about the networks and therefore has raised increasing interests among researchers from various disciplines. This paper describes GWN-LDA(Generic weighted network-Latent Dirichlet Allocation) model, a hierarchical Bayesian model derived from the widely-received LDA model, for discovering probabilistic community profiles in social networks. In this model, communities are modeled as latent variables and defined as distributions over the social actor space. In addition, each social actor belongs to every community with different probability. This paper also proposes two different network encoding approaches and explores the impact of these two approaches to the community discovery performance. This model is evaluated on two research collaborative networks: CiteSeer and NanoSCI. The experimental results demonstrate that this approach is promising for discovering community structures in large-scale networks.
Subjects: 12. Machine Learning and Discovery; 3.4 Probabilistic Reasoning
Submitted: Apr 24, 2007