Abstract
Haplotype phase represents the collective genetic variation between homologous chromosomes and is an essential feature of non-haploid genomes. Determining the haplotype phase requires knowledge of both the genotypes at variant sites and their linkage across each chromosome. Haplotype linkage can be either inferred statistically from a genotyped population, or determined by long-range sequencing of an individual genome. However, extending haplotype inference to the whole-chromosome scale remains challenging and usually requires special experimental techniques. Here we describe a general computational strategy to determine complete chromosomal haplotypes using a combination of bulk long-range sequencing and Hi-C sequencing. We demonstrate that this strategy can resolve the haplotypes of parental chromosomes in diploid human genomes at high precision (99%) and completeness (98%), and is further able to assemble the syntenic organization of aneuploid genomes (“digital karyotype”).
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
We have added new results related to the karyotype of K-562 cells and revised the text to improve clarity.