Genetic Algorithms for Selection and Partitioning of Attributes in Large-Scale Data Mining Problems

William H. Hsu, Michael Welge, Jie Wu, and Ting-Hao Yang

This paper proposes and surveys genetic implementations of algorithms for selection and partitioning of attributes in large-scale concept learning problems. Algorithms of this type apply relevance determination criteria to attributes from those specified for the original data set. The selected attributes are used to define new data clusters that are used as intermediate training targets. The purpose of this change of representation step is to improve the accuracy of supervised learning using the reformulatedata. Domain knowledge about these operators has been shown to reduce the number of fitness evaluations for candidate attributes. This paper examines the genetic encoding of attribute selection and partitioning specifications, and the encoding of domain knowledge about operators in a fitness function. The purpose of this approach is to improve upon existing search-based algorithms (or wrappers) in terms of training sample efficiency. Several GA implementations of alternative (search-based and knowledge-besed) attribute synthesis algorithms are surveyed, and their application to large-scale concept learning problems is addressed.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.