Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia

Daniel R. Schrider, Julien Ayroles, Daniel R. Matute, Andrew D. Kern
doi: https://doi.org/10.1101/170670
Daniel R. Schrider
*Department of Genetics, and †Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
‡Biology Department, University of North Carolina, Chapel Hill, NC 27510.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: drs@unc.edu
Julien Ayroles
§Ecology and Evolutionary Biology Department; Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, 08540
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel R. Matute
‡Biology Department, University of North Carolina, Chapel Hill, NC 27510.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew D. Kern
*Department of Genetics, and †Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
‡Biology Department, University of North Carolina, Chapel Hill, NC 27510.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.

AUTHOR SUMMARY Understanding the extent to which species or diverged populations hybridize in nature is crucially important if we are to understand the speciation process. Accordingly numerous research groups have developed methodology for finding the genetic evidence of such introgression. In this report we develop a supervised machine learning approach for uncovering loci which have introgressed across species boundaries. We show that our method, FILET, has greater accuracy and power than competing methods in discovering introgression, and in addition can detect the directionality associated with the gene flow between species. Using whole genome sequences from Drosophila simulans and Drosophila sechellia we show that FILET discovers quite extensive introgression between these species that has occurred mostly from D. simulans to D. sechellia. Our work highlights the complex process of speciation even within a well-studied system and points to the growing importance of supervised machine learning in population genetics.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted March 20, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
Daniel R. Schrider, Julien Ayroles, Daniel R. Matute, Andrew D. Kern
bioRxiv 170670; doi: https://doi.org/10.1101/170670
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
Daniel R. Schrider, Julien Ayroles, Daniel R. Matute, Andrew D. Kern
bioRxiv 170670; doi: https://doi.org/10.1101/170670

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Evolutionary Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (3701)
  • Biochemistry (7820)
  • Bioengineering (5695)
  • Bioinformatics (21343)
  • Biophysics (10603)
  • Cancer Biology (8206)
  • Cell Biology (11974)
  • Clinical Trials (138)
  • Developmental Biology (6786)
  • Ecology (10425)
  • Epidemiology (2065)
  • Evolutionary Biology (13908)
  • Genetics (9731)
  • Genomics (13109)
  • Immunology (8171)
  • Microbiology (20064)
  • Molecular Biology (7875)
  • Neuroscience (43171)
  • Paleontology (321)
  • Pathology (1282)
  • Pharmacology and Toxicology (2267)
  • Physiology (3363)
  • Plant Biology (7254)
  • Scientific Communication and Education (1316)
  • Synthetic Biology (2012)
  • Systems Biology (5550)
  • Zoology (1133)