Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Evolutionary shortcuts via multi-nucleotide substitutions and their impact on natural selection analyses

View ORCID ProfileAlexander G Lucaci, Jordan D Zehr, Sergei L. Kosakovsky Pond
doi: https://doi.org/10.1101/2022.12.02.518889
Alexander G Lucaci
1Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexander G Lucaci
Jordan D Zehr
1Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sergei L. Kosakovsky Pond
1Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: spond@temple.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Inference and interpretation of evolutionary processes - in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models for such analyses. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased - often systematically, and lead to poor statistical performance. Here, we performed a detailed characterization of how modeling instantaneous multi-nucleotide (or multi-hit, MH) substitutions impacts dN/dS based inference of episodic diversifying selection at the level of the entire alignment. The inclusion of MH reduces the rate (1.37-fold or 26.8%) at which positive selection is called based on the analysis of N = 9,861 empirical data-sets, while offering significantly better statistical fit to sequence data in 8.37% of cases. Through additional simulation studies, we show that this reduction is not simply due to loss of power because of additional model complexity. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we reveal that MH substitutions occurring along shorter branches in the tree are largely responsible for discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions and finds them to be problematic for biological data analysis. Because multi-nucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that routine selection analysis of this type consider their inclusion. To facilitate this procedure, we developed a simple model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: synonymous rate variation, and multi-nucleotide instantaneous substitutions.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted December 03, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evolutionary shortcuts via multi-nucleotide substitutions and their impact on natural selection analyses
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evolutionary shortcuts via multi-nucleotide substitutions and their impact on natural selection analyses
Alexander G Lucaci, Jordan D Zehr, Sergei L. Kosakovsky Pond
bioRxiv 2022.12.02.518889; doi: https://doi.org/10.1101/2022.12.02.518889
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Evolutionary shortcuts via multi-nucleotide substitutions and their impact on natural selection analyses
Alexander G Lucaci, Jordan D Zehr, Sergei L. Kosakovsky Pond
bioRxiv 2022.12.02.518889; doi: https://doi.org/10.1101/2022.12.02.518889

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4085)
  • Biochemistry (8755)
  • Bioengineering (6477)
  • Bioinformatics (23331)
  • Biophysics (11743)
  • Cancer Biology (9144)
  • Cell Biology (13242)
  • Clinical Trials (138)
  • Developmental Biology (7412)
  • Ecology (11364)
  • Epidemiology (2066)
  • Evolutionary Biology (15084)
  • Genetics (10397)
  • Genomics (14006)
  • Immunology (9115)
  • Microbiology (22036)
  • Molecular Biology (8777)
  • Neuroscience (47346)
  • Paleontology (350)
  • Pathology (1420)
  • Pharmacology and Toxicology (2480)
  • Physiology (3703)
  • Plant Biology (8045)
  • Scientific Communication and Education (1431)
  • Synthetic Biology (2207)
  • Systems Biology (6014)
  • Zoology (1249)