Proceedings of the AAAI Conference on Artificial Intelligence, 21
Constraint Satisfaction and Satisfiability
One of the main topics of research in genomics is determining the relevance of mutations, described in haplotype data, as causes of some genetic diseases. However, due to technological limitations, genotype data rather than haplotype data is usually obtained. The haplotype inference by pure parsimony (HIPP)problem consists in inferring haplotypes from genotypes s.t.~the number of required haplotypes is minimum. Previous approaches to the HIPP problem have focused on integer programming models and branch-and-bound algorithms. In contrast, this paper proposes the utilization of Boolean Satisfiability (SAT). The proposed solution entails a SAT model, a number of key pruning techniques, and an iterative algorithm that enumerates the possible solution values for the target optimization problem. Experimental results, obtained on a wide range of instances, demonstrate that the SAT-based approach can be several orders of magnitude faster than existing solutions. Besides being more efficient, the SAT-based approach is also the only capable of computing the solution for a large number of instances.