Minimal Length Encoding Methods in Molecular Biology

Aleksandar Milosavljevic

Minimal encoding length (Kolmogorov complexity) is a measure of information content (randomness) the observed data. Algorithmic information theory studies this quantity in great detail and provides a framework for a most general formulation of the principle of parsimony (Occam’s razor). The principle of parsimony has been widely and explicitly used in taxonomy (Sober 1988). Applications have expanded with the appearance of macromolecular sequence data. The minimal edit distance criterion, a special case of the parsimony criterion, has been used for pairwise sequence alignment (e.g., (Waterman 1989)). The eral principle was often falsely identified with narrow formulations (e.g., the parsimony principle in taxonomy) and then criticized; the possibility of changing the specific formulation while still applying the general principle was often ignored. In addition to providing a formal framework for a unified theory of inductive inference, the concepts from algorithmic information theory have been employed to obtain a mathematical definition of life (Chaitin 1979). The definition highlights a fundamental difference in the structure of knowledge about the living and nonliving worlds, and has enabled a statistical test for discriminating observations that come from a living world from those that come form the non-living world (Milosavljevi6). In the following we provide a brief outline of the theory of minimal length encoding and then sketch a few recent applications.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.