Proceedings:
Proceedings of the Twentieth International Conference on Machine Learning, 2000
Volume
Issue:
Proceedings of the Twentieth International Conference on Machine Learning, 2000
Track:
Contents
Downloads:
Abstract:
Position-specific scoring matrices have been used extensively to recognize highly conserved protein regions. We present a method for accelerating these searches using a suffix tree data structure computed from the sequences to be searched. Building on earlier work that allows evaluation of a scoring matrix to be stopped early, the suffix tree-based method excludes many protein segments from consideration at once by pruning entire subtrees. Although suffix trees are usually expensive in space, the fact that scoring matrix evaluation requires an in-order traversal allows nodes to be stored more compactly without loss of speed, and our implementation requires only 17 bytes of primary memory per input symbol. Searches are accelerated by up to a factor of ten.
ISMB
Proceedings of the Twentieth International Conference on Machine Learning, 2000