PT - JOURNAL ARTICLE AU - Mara Battagin AU - Serap Gonen AU - Roger Ros-Freixedes AU - Andrew Whalen AU - Gregor Gorjanc AU - John M Hickey TI - A family-based phasing algorithm for sequence data AID - 10.1101/504480 DP - 2018 Jan 01 TA - bioRxiv PG - 504480 4099 - http://biorxiv.org/content/early/2018/12/21/504480.short 4100 - http://biorxiv.org/content/early/2018/12/21/504480.full AB - This paper describes a family-based phasing algorithm, for variable-coverage sequence data, that first minimises phasing errors and then maximises the proportion of alleles phased. This algorithm is one of the essential tools that underpin an overall strategy for generating highly accurate sequence data on whole populations at low cost.The algorithm is called AlphaFamSeq. It uses sequence data on the focal individual and at least two generations of ancestors to phase alleles. In the first step, AlphaFamSeq calculates allele probabilities using iterative peeling. In subsequent steps, the alleles are phased using heuristics deriving information from the sequence data of parents, grandparents and progenies and, if available, from other families in the pedigree. AlphaFamSeq was tested on a range of simulated data sets.AlphaFamSeq gives low phasing error rates and, if there is sufficient sequence information and haplotype sharing amongst individuals, it can give a high yield of correctly phased alleles.The allele threshold had a large effect and window size had a small effect on performance. When all individuals in a single family were sequenced at different coverages the highest correctly phased alleles reached 90% of the possible maximum (98.9%) at ~1/6 of the maximum aggregate coverage. Adding sequence information from other related individuals increased the percentage of correctly phased alleles. Imputation performance was high across all allele frequencies (average correlation by marker of 0.94), except for a slight decrease at very low frequencies (≤0.01 MAF).Within an overall strategy for generating highly accurate sequence data on whole populations at low cost the role of AlphaFamSeq is to provide very accurately phased haplotypes on focal individuals, who are individuals whose haplotypes are very common in the population.