AAAI Publications, The Thirty-Third International Flairs Conference

Font Size: 
Theory Interpretations for Topic Models
Felix Kuhr, Özgür L. Özcep

Last modified: 2020-05-05


Many machine learning models have to incorporate latent variables to learn target concepts on training data. The variables are understood only statistically and optimize a statistical property such as likelihood, but usually they are not understood in human understandable semantical terms. An example for such a situation is that of topics in the generative Bayesian model called latent Dirichlet allocation, modelling topics as word distributions from the vocabulary of documents. This paper proposes a framework of classifications and theory interpretations to be used as a construction and analysis tool for exactly such situations. As a proof of concept an algorithm is considered that uses latent Dirichlet allocation topics induced by a corpus to enrich the given sets of RDF annotations on each text of the corpus. The general framework of classifications is used to discuss the role of the algorithm in finding representations of topics by RDF triples.

Full Text: PDF