AAAI Publications, Twenty-Second International FLAIRS Conference

Font Size: 
Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps
Jeremy R. Millar, Gilbert L. Peterson, Michael J. Mendenhall

Last modified: 2009-03-23


Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive two-dimensional format. Document topics are inferred using a probabilistic topic model. Then, due to the topology preserving properties of self-organizing maps, document clusters with similar topic distributions are placed near one another in the visualization. This provides the user an intuitive means of browsing from one cluster to another based on topics held in common. The effectiveness of LDA-SOM is evaluated on the 20 Newsgroups and NIPS data sets.


document clustering; latent dirichlet allocation (LDA); self-organizing maps (SOM)

Full Text: PDF