Combining Probabilistic Search, Latent Variable Analysis and Classification Models

Ian Davidson

The EM and K-Means algorithms are two popular search techniques that converge to a local minimum of their respective loss functions. The EM algorithm uses partial assignment of instances while the K-Means algorithm uses exclusive assignment. We show that an exclusive random assignment (ERA) algorithm that performs exclusive assignment based on a random experiment can outperform both EM and K-Means for mixture modeling. We show that the ERA algorithm can obtain better maximum likelihood estimates on three real world data sets. On an artificial data set, we show that the ERA algorithm can produce parameter estimates that are more likely to be closer to the generating mechanism. To illustrate the practical benefits of the ERA algorithm we test its ability in a classification context. We propose Latent Variable Classifier (LVC) that combines latent variable analysis such as mixture models and classification models such as Naive Bayes classifiers. For each mixture component (cluster) a classification model is built from those observations assigned to the component. Our experiments on three UCI data sets show LVC’s obtain a greater cross-validated accuracy than building a single classifier from the entire data set and probabilistic search out-performs the EM algorithm.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.