Trace Ratio Criterion for Feature Selection

Feiping Nie, Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, Shuicheng Yan

Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Instead of calculating the scores of all the feature subsets, traditional methods calculate the score for each feature, and then select the leading features based on the rank of these feature-level scores. However, selecting the feature subset based on the feature-level score cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel algorithm to efficiently find the global optimal feature subset such that the subset-level score is maximized. Extensive experiments demonstrate the effectiveness of our proposed algorithm in comparison with the traditional methods for feature selection.

Subjects: 12. Machine Learning and Discovery; 9. Foundational Issues

Submitted: Apr 15, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.