Efficient Specific-to-General Rule Induction

Pedro Domingos

RISE (Domingos 1995; in press) is a rule induction algorithm that proceeds by gradually generalizing rules, starting with one rule per example. This has several advantages compared to the more common strategy of gradually specializing initially null rules, and has been shown to lead to significant accuracy gains over algorithms like C4.5RULES and CN2 in a large number of application domains. However, RISE’s running time (like that of other rule induction algorithms) is quadratic in the number of examples, making it unsuitable for processing very large databases. This paper introduces a method for reducing RISE’s running time based on partitioning the training set, evaluating rules from one partition on examples from another, and combining the final results at classification time. Partitioning guarantees a learning time that is linear in the number of examples, even in the presence of numeric attributes and high noise. Windowing, a well-known speedup method, is also studied as applied to RISE. In low-noise conditions, both methods are successful in reducing running time whilst maintaining accuracy (partitioning sometimes improves it significantly). In noisy conditions, the performance of windowing deteriorates, while that of partitioning remains stable.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.