RT Journal Article SR Electronic T1 GPhase: Greedy Approach for Accurate Haplotype Inferencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 073379 DO 10.1101/073379 A1 Kshitij Tayal A1 Naveen Sivadasan A1 Rajgopal Srinivasan YR 2016 UL http://biorxiv.org/content/early/2016/09/04/073379.abstract AB We consider the computational problem of phasing an individual genotype sample given a collection of known haplotypes in the population. We give a fast and accurate algorithm GPhase for reconstructing haplotype pair consistent with input genotype. It uses the coalescent based mutation model of Stephens and Donnelly (2000). Computing optimal solution under this model is expensive and our algorithm uses a greedy approximation for fast and accurate estimation. Our algorithm is simple, efficient and has linear time and space complexity. Experiments on real datasets revealed improved gene level phasing accuracy for GPhase tool compared to other widely used tools such as SHAPEIT, Beagle, MaCH and Impute2. On simulated data, GPhase tool was able to phase samples each containing more than 1700 markers with high accuracy. GPhase can be used for gene level phasing of individual samples using publicly available haplotype datasets such as HapMap data or 1000 genome data. This finds applications in studies on recessive Mendelian disorders where parent data is lacking. GPhase is freely available for download and use from https://github.com/kshitijtayal/GPhase/.