Cen Li, Gautam Biswas
A framework for knowledge-based scientific discovery in geological databases has been developed. The discovery process consists of two main steps: context definition and equation derivation. Context definition properly defines and formulates homogeneous regions, each of which is likely to produce a unique and meaningful analytic formula for the goal variable. Clustering techniques and a suite of visualization and interpretation routines make up a tool box that assists the context, definition task. Within each context, multi-variable regression analysis is conducted to derive analytic equations between the goal variable and a set of relevant independent variables, starting with one or more of the initial base models. Domain knowledge, plus a heuristic search technique called component plus residual plots dynamically guide the equation refinement process. The methodology has been applied to derive porosity equations for data collected from oil fields in the Alaska Basin. Preliminary results demonstrate the effectiveness of this methodology.