Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation

Am J Hum Genet. 2015 Jun 4;96(6):926-37. doi: 10.1016/j.ajhg.2015.04.018. Epub 2015 May 28.

Abstract

Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical*
  • Europe
  • Genetic Association Studies / methods*
  • Genetic Association Studies / standards
  • Genotyping Techniques / methods*
  • Humans
  • Models, Genetic*
  • Pedigree*
  • Principal Component Analysis
  • Sequence Analysis, DNA / methods*
  • Software*