Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Robust Genome-Wide Ancestry Inference for Heterogeneous Datasets and Ancestry Facial Imaging based on the 1000 Genomes Project

Jairui Li, Tomas Gonzalez, View ORCID ProfileJulie D. White, Karlijne Indencleef, Hanne Hoskens, Alejandra Ortega Castrillon, Nele Nauwelaers, View ORCID ProfileArslan Zaidi, Ryan J. Eller, View ORCID ProfileTorsten Günther, Emma M. Svensson, Mattias Jakobsson, Susan Walsh, Kristel Van Steen, View ORCID ProfileMark D. Shriver, View ORCID ProfilePeter Claes
doi: https://doi.org/10.1101/549881
Jairui Li
1Medical Imaging Research Center, MIRC, University Hospitals Leuven, Leuven, Belgium
2Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Jiarui.li@kuleuven.be peter.claes@kuleuven.be
Tomas Gonzalez
3Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, US
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julie D. White
3Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, US
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Julie D. White
Karlijne Indencleef
1Medical Imaging Research Center, MIRC, University Hospitals Leuven, Leuven, Belgium
4Department of Neurosciences, Experimental Otorhinolaryngology, KU Leuven, Leuven, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hanne Hoskens
1Medical Imaging Research Center, MIRC, University Hospitals Leuven, Leuven, Belgium
5Department of Human Genetics, KU Leuven, Leuven, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alejandra Ortega Castrillon
1Medical Imaging Research Center, MIRC, University Hospitals Leuven, Leuven, Belgium
2Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nele Nauwelaers
1Medical Imaging Research Center, MIRC, University Hospitals Leuven, Leuven, Belgium
2Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arslan Zaidi
3Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, US
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Arslan Zaidi
Ryan J. Eller
6Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, US
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Torsten Günther
7Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236, Uppsala, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Torsten Günther
Emma M. Svensson
7Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236, Uppsala, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mattias Jakobsson
7Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236, Uppsala, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susan Walsh
6Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, US
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kristel Van Steen
5Department of Human Genetics, KU Leuven, Leuven, Belgium
8Medical Genomics Research Unit, GIGA-R, University of Liège, Belgium
9Walloon Excellence in Life sciences and Biotechnology (WELBIO), Belgium;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark D. Shriver
3Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, US
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mark D. Shriver
Peter Claes
1Medical Imaging Research Center, MIRC, University Hospitals Leuven, Leuven, Belgium
2Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
5Department of Human Genetics, KU Leuven, Leuven, Belgium
10Murdoch Childrens Research Institute, Melbourne, Victoria, Australia
11Department of Biomedical Engineering, University of Oxford, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter Claes
  • For correspondence: Jiarui.li@kuleuven.be peter.claes@kuleuven.be
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Accurate inference of genomic ancestry is critically important in human genetics, epidemiology, and related fields. Geneticists today have access to multiple heterogeneous population-based datasets from studies collected under different protocols. Therefore, joint analyses of these datasets require robust and consistent inference of ancestry, where a common strategy is to yield an ancestry space generated by a reference dataset. However, such a strategy is sensitive to batch artefacts introduced by different protocols. In this work, we propose a novel robust genome-wide ancestry inference method; referred to as SUGIBS, based on an unnormalized genomic (UG) relationship matrix whose spectral (S) decomposition is generalized by an Identity-by-State (IBS) similarity degree matrix. SUGIBS robustly constructs an ancestry space from a single reference dataset, and provides a robust projection of new samples, from different studies. In experiments and simulations, we show that, SUGIBS is robust against individual outliers and batch artifacts introduced by different genotyping protocols. The performance of SUGIBS is equivalent to the widely used principal component analysis (PCA) on normalized genotype data in revealing the underlying structure of an admixed population and in adjusting for false positive findings in a case-control admixed GWAS. We applied SUGIBS on the 1000 Genome project, as a reference, in combination with a large heterogeneous dataset containing auxiliary 3D facial images, to predict population stratified average or ancestry faces. In addition, we projected eight ancient DNA profiles into the 1000 Genome ancestry space and reconstructed their ancestry face. Based on the visually strong and recognizable human facial phenotype, comprehensive facial illustrations of the populations embedded in the 1000 Genome project are provided. Furthermore, ancestry facial imaging has important applications in personalized and precision medicine along with forensic and archeological DNA phenotyping.

Author Summary Estimates of individual-level genomic ancestry are routinely used in human genetics, epidemiology, and related fields. The analysis of population structure and genomic ancestry can yield significant insights in terms of modern and ancient population dynamics, allowing us to address questions regarding the timing of the admixture events, and the numbers and identities of the parental source populations. Unrecognized or cryptic population structure is also an important confounder to correct for in genome-wide association studies (GWAS). However, to date, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source software toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. Given that visually evident and easily recognizable patterns of human facial characteristics covary with genomic ancestry, we can generate predicted ancestry faces on both the population and individual levels as we illustrate for the 26 1000 Genome populations and for eight eminent ancient-DNA profiles, respectively.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted February 14, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Robust Genome-Wide Ancestry Inference for Heterogeneous Datasets and Ancestry Facial Imaging based on the 1000 Genomes Project
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Robust Genome-Wide Ancestry Inference for Heterogeneous Datasets and Ancestry Facial Imaging based on the 1000 Genomes Project
Jairui Li, Tomas Gonzalez, Julie D. White, Karlijne Indencleef, Hanne Hoskens, Alejandra Ortega Castrillon, Nele Nauwelaers, Arslan Zaidi, Ryan J. Eller, Torsten Günther, Emma M. Svensson, Mattias Jakobsson, Susan Walsh, Kristel Van Steen, Mark D. Shriver, Peter Claes
bioRxiv 549881; doi: https://doi.org/10.1101/549881
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Robust Genome-Wide Ancestry Inference for Heterogeneous Datasets and Ancestry Facial Imaging based on the 1000 Genomes Project
Jairui Li, Tomas Gonzalez, Julie D. White, Karlijne Indencleef, Hanne Hoskens, Alejandra Ortega Castrillon, Nele Nauwelaers, Arslan Zaidi, Ryan J. Eller, Torsten Günther, Emma M. Svensson, Mattias Jakobsson, Susan Walsh, Kristel Van Steen, Mark D. Shriver, Peter Claes
bioRxiv 549881; doi: https://doi.org/10.1101/549881

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3575)
  • Biochemistry (7520)
  • Bioengineering (5479)
  • Bioinformatics (20677)
  • Biophysics (10258)
  • Cancer Biology (7931)
  • Cell Biology (11583)
  • Clinical Trials (138)
  • Developmental Biology (6563)
  • Ecology (10136)
  • Epidemiology (2065)
  • Evolutionary Biology (13540)
  • Genetics (9498)
  • Genomics (12788)
  • Immunology (7872)
  • Microbiology (19451)
  • Molecular Biology (7614)
  • Neuroscience (41875)
  • Paleontology (306)
  • Pathology (1252)
  • Pharmacology and Toxicology (2179)
  • Physiology (3249)
  • Plant Biology (7007)
  • Scientific Communication and Education (1291)
  • Synthetic Biology (1942)
  • Systems Biology (5406)
  • Zoology (1107)