Proceedings:
Proceedings of the Twentieth International Conference on Machine Learning, 1995
Volume
Issue:
Proceedings of the Twentieth International Conference on Machine Learning, 1995
Track:
Contents
Downloads:
Abstract:
We present a report on work in progress on a divide and conquer approach to multiple alignment The algorithm makes use of the costs calculated from applying the standard dynamic programming scheme to all pairs of sequences. The resulting cost matrices for pairwise alignment give rise to secondary matrices containing the additional costs imposed by fixing the path through the dynamic programming graph at a particular vertex. Such a constraint corresponds to a division of the problem obtained by slicing both sequences between two particular positions, and aligning the two sequences on the left and the two sequences on the right, charging for gaps introduced at the slicing point. To obtain an estimate for the additional cost imposed by forcing the multiple alignment through a particular vertex in the whole hypercube, we will take a (weighted) sum of secondary costs over all pairwise projections of the division of the problem, as defined by this vertex, that is, by slicing all sequences at the points suggested by the vertex. We then use that partition of every single sequence under consideration into two "halfs" which imposes a minimal (weighted) sum of pairwise additional costs, making sure that one of the sequences is divided somewhere close to its midpoint. Hence, each iteration can cut the problem size in half. As the enumeration of all possible partitions may restrict this approach to small-size problems, we eliminate futile partitions, and organize their enumeration in a way that starts with the most promising ones. Comparing our approach for the case of 3 sequences with a) structurally verified alignments and b) alignments from literature, indicates high quality alignments, with roughly the same number of errors as the ``optimal'' (in the dynamic programming framework) solution in a), and being as close as the "optimal" to a maximum weight trace done by Kececioglu, using 6 sequences altogether.
ISMB
Proceedings of the Twentieth International Conference on Machine Learning, 1995