Communications in Information and Systems
Volume 10 (2010)
Theory and Algorithms for the Haplotype Assembly Problem
Pages: 23 – 38
Genome sequencing studies to date have generally sought to assemble consensus genomes by merging sequence contributions from multiple homologous copies of each chromosome. With growing interest in genetic variations, however, there is a need for methods to separate these distinct contributions and assess how individual homologous chromosome copies differ from one another. An approach to this problem was developed using small sequence fragments derived from shotgun sequencing studies to determine the patterns of variations that co-occur on individual chromosomes. This has become known as the "haplotype assembly" problem. This review paper surveys results on the theory and algorithms for haplotype assembly. It first describes common abstractions of the problem. It then discusses some notable intractibility results for different problem variants. It next examines a variety of combinatorial, statistical, and heuristic methods for assembling fragment data sets in practice. The review concludes with a discussion of recent directions in diploid genome sequencing and their implications for haplotype assembly in the future.