Abstract
In this paper we extend multi-locus iterative peeling to be a computationally efficient method for calling, phasing, and imputing sequence data of any coverage in small or large pedigrees. Our method, called hybrid peeling, uses multi-locus iterative peeling to estimate shared chromosome segments between parents and their offspring, and then uses single-locus iterative peeling to aggregate genomic information across multiple generations. Using a synthetic dataset, we first analysed the performance of hybrid peeling for calling and phasing alleles in disconnected families, families which contained only a focal individual and its parents and grandparents. Second, we analysed the performance of hybrid peeling for calling and phasing alleles in the context of the full pedigree. Third, we analysed the performance of hybrid peeling for imputing whole genome sequence data to the remaining individuals in the population. We found that hybrid peeling substantially increase the number of genotypes that were called and phased by leveraging sequence information on related individuals. The calling rate and accuracy increased when the full pedigree was used compared to a reduced pedigree of just parents and grandparents. Finally, hybrid peeling accurately imputed whole genome sequence information to non-sequenced individuals. We believe that this algorithm will enable the generation of low cost and high accuracy whole genome sequence data in many pedigreed populations. We are making this algorithm available as a standalone program called AlphaPeel.