Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved