Proceedings:
Proceedings of the Twentieth International Conference on Machine Learning, 1995
Volume
Issue:
Proceedings of the Twentieth International Conference on Machine Learning, 1995
Track:
Contents
Downloads:
Abstract:
A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm, Running Karp-Rabin Greedy String Tiling (RKR-GST), which involves "tiling" one string with matching substrings of a second string. In practice, RKR- GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect transposed substrings or substrings of one biosequence that appears rearranged in a second sequence. Repeated substrings can also be detected. Neweyes also supports a form of matching-by-group that gives the effect of different amino acid mutation matrices. Neweyes can be used in a macro mode (searching a database for a list of biosequences that are similar to a given biosequence) or in a micro mode, where two biosequences are compared and more detailed output formats are available. Currently, work is underway aimed at improving the display facilities. A second project is looking at improving the scoring of tiles based on combining log-odds values.
ISMB
Proceedings of the Twentieth International Conference on Machine Learning, 1995