RT Journal Article
SR Electronic
T1 Low-pass sequencing increases the power of GWAS and decreases measurement error of polygenic risk scores compared to genotyping arrays
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2020.04.29.068452
DO 10.1101/2020.04.29.068452
A1 Jeremiah H. Li
A1 Chase A. Mazur
A1 Tomaz Berisa
A1 Joseph K. Pickrell
YR 2020
UL http://biorxiv.org/content/early/2020/05/05/2020.04.29.068452.abstract
AB Low-pass sequencing (sequencing a genome to an average depth less than 1x coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores; however, the current literature is largely limited to simulation- and downsampling-based approaches. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5x and 1x) and array genotyping (using the Illumina Global Screening Array) on 120 DNA samples derived from African and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave-one-out design. First, we evaluated overall imputation accuracy from these different assays as measured by genotype concordance; we introduce the concept of effective coverage that accounts for evenness of sequencing and show that this metric is a better predictor of imputation accuracy than nominal mapped coverage for low-pass sequencing data. Next, we evaluated overall power for genome-wide association studies (GWAS) as measured by the squared correlation between imputed and true genotypes. In the African individuals, at common variants (&gt; 5% minor allele frequency), imputation r2 averaged 0.83 for the array data and ranged from 0.89 to 0.95 for the low-pass sequencing data, corresponding to an effective 7 – 15% increase in GWAS discovery power. For the same variants in the European individuals, imputation r2 averaged 0.91 for the array data and ranged from 0.92-0.96 for the low-pass sequencing data, corresponding to an effective 1-6% increase in GWAS discovery power. Finally, we computed polygenic risk scores for breast cancer and coronary artery disease from the different assays. We observed consistently lower measurement error for risk scores computed from low-pass sequencing data above an effective coverage of ∼ 0.5x. The mean squared error of the array-based estimates was three to four times that of the estimates from samples sequenced at an effective coverage of ∼ 1.2x for coronary artery disease, with qualitatively similar results for breast cancer. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼ 0.5x and higher.Competing Interest StatementJ.H.L., C.A.M., T.B., and J.K.P. were employees of Gencove, Inc. at the time of writing.