Abstract:
In this paper, we model the pair-wise similarities of a setof documents as a weighted network with a single cutoffparameter. Such a network can be thought of an ensemble of unweighted graphs, each consisting of edges withweights greater than the cutoff value. We look at this network ensemble as a complex system with a temperature parameter, and refer to it as a Latent Network. Ourexperiments on a number of datasets from two different domains show that certain properties of latent networks like clustering coefficient, average shortest path,and connected components exhibit patterns that are significantly divergent from randomized networks. We explain that these patterns reflect the network phase transition as well as the existence of a community structure in document collections. Using numerical analysis,we show that we can use the aforementioned networkproperties to predicts the clustering Normalized MutualInformation (NMI) with high correlation (rho > 0.9). Finally we show that our clustering method significantlyoutperforms other baseline methods (NMI > 0.5)
DOI:
10.1609/aaai.v25i1.7972