Automatic Synthesis of Statistical Data Analysis Programs

Bernd Fischer

Statistical data analysis is a core activity in experimental sciences. It encompasses a wide variety of tasks, ranging from, e.g., a simple linear regression to fitting complex dynamical models. Developing statistical data analysis programs, however, is an arduous and error-prone process which requires profound expertise in different areas: statistics, numerics, software engineering, and of course the scientific application domain. We believe that statistical data analysis is a very promising domain for the application program synthesis, despite-- or even because--these difficulties. Statistics provides a unifying and concise domain-specific notation. Graphical models (Buntine 1994) provide a structuring mechanism which can be exploited during the synthesis process, e.g., to decompose a problem into independent subproblems. Statistical algorithms like EM (Dempster, Laird, and Rubin 1977) are often applicable to a wide range of problems; their generic formulations allow a "plug'n'play"-style algorithm combination. Recently developed sophisticated data structures as for example kd-trees (PeUeg and Moore 1999) offer orders-ofmagnitude speed-up for certain problems but are rarely employed due to the increased programming complexity they cause. Finally, data analysis is characterized by an iterative development style: an initial model is hypothesized, implemented, evaluated on real data, and--if necessary---refined. However, each iteration typically involves substantial programming efforts as prototyping is often not sufficient to work with real data sets and even small model modifications may require radically dfferent algorithms. Program synthesis encapsulates many of the statistical, numerical, and software engineering aspects of each iteration and thus allows users to concentrate on their scientific application. Its fast turn-around times make model refinement and design-space exploration feasible.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.