Jerzy W. Grzymala-Busse, Linda K. Goodwin, Witold J. Grzymala-Busse, and Xinqun Zheng
This paper describes experiments with a challenging data set describing preterm births. The data set, collected at the Duke University Medical Center, was large and, at the same time, many attribute values were missing. However, the main problem was that only 20.7% of the total number of cases represented the important preterm birth class. Thus the data set was imbalanced. For comparison, we include results of experiments on another imbalanced data set, the wellknown breast cancer data set. Our approach to dealing with this imbalanced data set was to induce a rule set using our standard procedure: the LEM2 algorithm of the LERS rule induction system and then increase the rule strength for all rules describing preterm births by multiplying all such rule strengths by the same number called a strength multiplier. The rules strength for any rule describing the majority class, fullterm birth, remained unchanged. The optimal strength multiplier was determined experimentally using our optimality criterion: the maximum of the sum of sensitivity and specificity.