Statistical and Probabilistic Models
One of the most important problems in rule induction methods is how to estimate the reliability of the induced rules, Which is a semantic pm~t .of knowledge to be estimated from finite training samples. In order to estimate errors of induced results, resampling methods, such as cross-vaiidation, the bootstrap method, have been introduced. However, While cross-validation method obtains better results in some domains, the bootstrap method calculates better estimation in other domains, and it is very di/~cult how to Choose one' of the two methods. In order tO reduce these disadvantages further, we introduce recursive iteration of resampling methods(RECITE). RECITE consists of the following four procedures: First, it randomly splits trldnlng samples(So) into two equal parts, one for new training samples(St) and the other for new test samples(T1). Second, rules are induced from $I, and severai estimation methods, given by users, are executed by using $I. Third, the rules are tested by TI~ sad test error estimators are compared with each estimator. The second and the third procedure are repeated for certain times given by users. Then the estimation method which gives the best estimator is selected as the most suitable estimation method. Finally, we use 1~his estimation method for 50 sad derive the estimators iof statistical measures from the original training samples. We apply this RECITE method to three original medical databases, and seven UCI databases. The results show that this method gives the best Selection of estimation methods in almost all the cases.