Beyond Accuracy, F-score and ROC: A Family of Discriminant Measures for Performance Evaluation

Marina Sokolova, Nathalie Japkowicz, Stan Szpakowicz

Different evaluation measures assess different characteristics of machine learning algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going debate between researchers. Although most measures in use today focus on a classifier's ability to identify classes correctly, we suggest that, in certain cases, other properties, such as failure avoidance or class discrimination may also be useful. We suggest the application of measures which evaluate such properties. These measures - Youden's index, likelihood, Discriminant power - are used in medical diagnosis. We show that these measures are interrelated, and we apply them to a case study from the field of electronic negotiations. We also list other learning problems which may benefit from the application of the proposed measures.

Subjects: 12. Machine Learning and Discovery

Submitted: May 10, 2006


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.