Abstract:
We present a novel algorithm for extracting a high-quality case base from raw data while preserving and sometimes improving the competence of case-based reasoning. We extend the framework of Smyth and Keane’s case-deletion policy with two additional features. First, we build a case base using a statistical distribution that is mined from the input data so that the case-base competence can be preserved or even increased for future problems. Second, we introduce a nonlinear transformation of the data set so that the case-base sizes can be further reduced while ensuring that the competence be preserved and even increased. We show that Smyth and Keane’s deletion-based algorithm is sensitive to noisy cases, and that our solution solves this problem more satisfactorily. We show the theoretical foundation and empirical evaluation on several data sets.