Proceedings:
Proceedings of the Twentieth International Conference on Machine Learning, 2000
Volume
Issue:
Proceedings of the Twentieth International Conference on Machine Learning, 2000
Track:
Contents
Downloads:
Abstract:
The identification of sequence motifs is a fundamental method for suggesting good candidates for biologically functional regions such as promoters, splice sites, binding sites, etc. We investigate the following approach to identifying motifs: given a collection of orthologous sequences from multiple species related by a known phylogenetic tree, search for motifs that are well conserved (according to a parsimony measure) in the species. We present an exact algorithm for solving this problem. We then discuss experimental results on finding promoters of the rbcS gene for a family of 10 plants, on finding promoters of the adh gene for 12 Drosophila species, and on finding promoters of several chloroplast encoded genes.
ISMB
Proceedings of the Twentieth International Conference on Machine Learning, 2000