RT Journal Article SR Electronic T1 High-depth whole genome sequencing of a large population-specific reference panel: Enhancing sensitivity, accuracy, and imputation JF bioRxiv FD Cold Spring Harbor Laboratory SP 167924 DO 10.1101/167924 A1 Todd Lencz A1 Jin Yu A1 Cameron Palmer A1 Shai Carmi A1 Danny Ben-Avraham A1 Nir Barzilai A1 Susan Bressman A1 Ariel Darvasi A1 Judy H. Cho A1 Lorraine N. Clark A1 Zeynep H. Gümüş A1 Vijai Joseph A1 Robert Klein A1 Steven Lipkin A1 Kenneth Offit A1 Harry Ostrer A1 Laurie J. Ozelius A1 Inga Peter A1 Gil Atzmon A1 Itsik Pe’er YR 2017 UL http://biorxiv.org/content/early/2017/07/24/167924.abstract AB Background While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. In the present study, we sequenced at full-depth (≥30x) a moderately large (n=738) cohort of samples drawn from the Ashkenazi Jewish population across two platforms (Illumina X Ten and Complete Genomics, Inc.). We developed and refined a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population.Results For samples sequenced on the Illumina X Ten platform, quality thresholds were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. The Complete Genomics, Inc. platform was more conservative (fewer variants called) compared to the Illumina platform, but also demonstrated relatively greater numbers of false positives that needed to be filtered. Quality control procedures also permitted detection of novel genome reads that are not mapped to current reference or alternate assemblies. After stringent quality control, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.Conclusions Our primary results demonstrate enhanced accuracy of a population-specific imputation panel relative to cosmopolitan panels, especially in the range of infrequent (<5% non-reference allele frequency) and rare (<1% non-reference allele frequency) variants that may be most critical to further progress in mapping of complex phenotypes.