Learning Feature Relevance: Empirical and Forman Analyses
Indexing of cases is an important topic for Memory- Based Reasoning(MBR). One key problem is how assign weights to attributes of cases. Although several weighting methods have been proposed, some methods cannot handle numeric attributes directly, so it is necessary to discretize numeric values by classification. Furthermore, existing methods have no theoretical background, so little can be said about optimality. We propose a new weighting method based on a statistical technique called Quantification Method II. It can handle both numeric and symbolic attributes in the same framework. Generated attribute weights are optimal in the sense that they maximize the ratio of variance between classes to variance of all cases. Experiments on several benchmark tests show that in many cases, our method obtains higher accuracies than some other weighting methods. The results also indicate that it can distinguish relevant attributes from irrelevant ones, and can tolerate noisy data.