Ashwin Srinivasan and Ross D. King
The existence and rapid growth of chemical databases have brought into focus the utility of methods that can assist the discovery of predictive patterns in data, and communicating them in a manner designed to provoke insight. This has turned attention to machine learning techniques capable of extracting "symbolic" descriptions from data. At the cutting-edge of such techniques is Inductive Logic Programming (ILP). Given a set of observations and background knowledge encoded as a set of logical descriptions, an ILP system attempts to construct explanations for the observations. The explanations are in the same language as the observations and background knowledge -- usually a subset of first-order logic. This contrasts with algorithms like decision-trees, and neural networks which employ simple propositional logic representations. This, along with the flexibility to include background knowledge -- which can even include other propositional algorithms -- allow a form of data analysis and decision-support that is, in principle, unmatched by first-generation methods. Bio/chemical applications of ILP have largely been concerned with determining "structure-activity" relationships (SARs). The task here is to obtain rules that predict the activity of a compound, like toxicity, from its chemical structure. The representation language adopted by ILP systems allows the development of compact, chemist-friendly "theories", and ILP systems have progressively been shown to be capable of handling 1, 2, and 3-dimensional descriptions of chemical structure. Empirical results in predicting mutagenicity and carcinogenicity suggest that structure-activity relations found by ILP systems achieve at least the same predictive power of traditional SAR techniques, with fewer limitations (like the need for alignment, pre-determination of structural features etc.) In some cases, they have found novel structural features that significantly improve the predictive capabilities of traditional 1 and 2-dimensional SAR methods. Here we summarise the progress achieved so far in the use of ILP in these areas, including ideas emerging from a recent toxicology prediction challenge which suggest that a combination of ILP and established prediction methods could provide a powerful form of relating chemical activity to structure.