Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A likelihood method for estimating present-day human contamination in ancient DNA samples using low-depth haploid chromosome data

View ORCID ProfileJ. Víctor Moreno-Mayar, Thorfinn Sand Korneliussen, View ORCID ProfileAnders Albrechtsen, Jyoti Dalal, Gabriel Renaud, Rasmus Nielsen, Anna-Sapfo Malaspinas
doi: https://doi.org/10.1101/594481
J. Víctor Moreno-Mayar
1Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
2Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for J. Víctor Moreno-Mayar
  • For correspondence: morenomayar@gmail.com annasapfo.malaspinas@unil.ch
Thorfinn Sand Korneliussen
3Centre for Geogenetics, University of Copenhagen, 1350 Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anders Albrechtsen
4The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anders Albrechtsen
Jyoti Dalal
1Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
2Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gabriel Renaud
3Centre for Geogenetics, University of Copenhagen, 1350 Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rasmus Nielsen
5Department of Statistics, University of California, Berkeley, CA 94720, USA
6Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anna-Sapfo Malaspinas
1Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
2Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: morenomayar@gmail.com annasapfo.malaspinas@unil.ch
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

1 Abstract

Motivation The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested while few are aimed at low-depth data, a common feature in aDNA datasets.

Results We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e., when the contaminant and the target come from closely related populations or with increased error rates. With a running time below five minutes, our method is applicable to large scale aDNA genomic studies.

Availability and implementation The method is implemented in C++ and R and is freely available in https://github.com/sapfo/contaminationX.

Contact morenomayar{at}gmail.com, annasapfo.malaspinas{at}unil.ch.

Footnotes

  • Funding acknowledgements have been corrected in this version.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted May 19, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A likelihood method for estimating present-day human contamination in ancient DNA samples using low-depth haploid chromosome data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A likelihood method for estimating present-day human contamination in ancient DNA samples using low-depth haploid chromosome data
J. Víctor Moreno-Mayar, Thorfinn Sand Korneliussen, Anders Albrechtsen, Jyoti Dalal, Gabriel Renaud, Rasmus Nielsen, Anna-Sapfo Malaspinas
bioRxiv 594481; doi: https://doi.org/10.1101/594481
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A likelihood method for estimating present-day human contamination in ancient DNA samples using low-depth haploid chromosome data
J. Víctor Moreno-Mayar, Thorfinn Sand Korneliussen, Anders Albrechtsen, Jyoti Dalal, Gabriel Renaud, Rasmus Nielsen, Anna-Sapfo Malaspinas
bioRxiv 594481; doi: https://doi.org/10.1101/594481

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3502)
  • Biochemistry (7343)
  • Bioengineering (5319)
  • Bioinformatics (20258)
  • Biophysics (10008)
  • Cancer Biology (7735)
  • Cell Biology (11293)
  • Clinical Trials (138)
  • Developmental Biology (6434)
  • Ecology (9947)
  • Epidemiology (2065)
  • Evolutionary Biology (13315)
  • Genetics (9359)
  • Genomics (12579)
  • Immunology (7696)
  • Microbiology (19008)
  • Molecular Biology (7437)
  • Neuroscience (41011)
  • Paleontology (300)
  • Pathology (1228)
  • Pharmacology and Toxicology (2134)
  • Physiology (3155)
  • Plant Biology (6858)
  • Scientific Communication and Education (1272)
  • Synthetic Biology (1895)
  • Systems Biology (5311)
  • Zoology (1087)