Track:
Contents
Downloads:
Abstract:
The EM and K-Means algorithms are two popular search techniques that converge to a local minimum of their respective loss functions. The EM algorithm uses partial assignment of instances while the K-Means algorithm uses exclusive assignment. We show that an exclusive random assignment (ERA) algorithm that performs exclusive assignment based on a random experiment can outperform both EM and K-Means for mixture modeling. We show that the ERA algorithm can obtain better maximum likelihood estimates on three real world data sets. On an artificial data set, we show that the ERA algorithm can produce parameter estimates that are more likely to be closer to the generating mechanism. To illustrate the practical benefits of the ERA algorithm we test its ability in a classification context. We propose Latent Variable Classifier (LVC) that combines latent variable analysis such as mixture models and classification models such as Naive Bayes classifiers. For each mixture component (cluster) a classification model is built from those observations assigned to the component. Our experiments on three UCI data sets show LVC’s obtain a greater cross-validated accuracy than building a single classifier from the entire data set and probabilistic search out-performs the EM algorithm.