Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Maximum likelihood pandemic-scale phylogenetics

View ORCID ProfileNicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, View ORCID ProfileBui Quang Minh, Nick Goldman
doi: https://doi.org/10.1101/2022.03.22.485312
Nicola De Maio
1European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicola De Maio
  • For correspondence: demaio@ebi.ac.uk
Prabhav Kalaghatgi
2Max Planck Institute for Molecular Genetics, Ihnestraße 63-73 14195 Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yatish Turakhia
3Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Russell Corbett-Detig
4Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
5Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bui Quang Minh
6School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, ACT 2600, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bui Quang Minh
Nick Goldman
1European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Summary

Phylogenetics plays a crucial role in the interpretation of genomic data1. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus’s origins2, of its international3,4 and local4–9 spread, and of the emergence10 and reproductive success11 of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic12. However, preferred model-based phylogenetic approaches including maximum likelihood and Bayesian methods, mostly based on Felsenstein’s ‘pruning’ algorithm13,14, cannot scale to the size of the datasets from the current pandemic4,15, hampering our understanding of the virus’s evolution and transmission16. We present new approaches, based on reworking Felsenstein’s algorithm, for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. We exploit near-certainty regarding ancestral genomes, and the similarities between closely related and densely sampled genomes, to greatly reduce computational demands for memory and time. Combined with new methods for searching amongst candidate evolutionary trees, this results in our MAPLE (‘MAximum Parsimonious Likelihood Estimation’) software giving better results than popular approaches such as FastTree 217, IQ-TREE 218, RAxML-NG19 and UShER15. Our approach therefore allows complex and accurate proba-bilistic phylogenetic analyses of millions of microbial genomes, extending the reach of genomic epidemiology. Future epidemiological datasets are likely to be even larger than those currently associated with COVID-19, and other disciplines such as metagenomics and biodiversity science are also generating huge numbers of genome sequences20–22. Our methods will permit continued use of preferred likelihood-based phylogenetic analyses.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • We now improved our methods/software, achieving further ~3-fold reduction in runtime, ~3-fold reduction in memory usage, and increased accuracy.

  • https://github.com/NicolaDM/MAPLE

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 18, 2022.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Maximum likelihood pandemic-scale phylogenetics
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Maximum likelihood pandemic-scale phylogenetics
Nicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, Bui Quang Minh, Nick Goldman
bioRxiv 2022.03.22.485312; doi: https://doi.org/10.1101/2022.03.22.485312
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Maximum likelihood pandemic-scale phylogenetics
Nicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, Bui Quang Minh, Nick Goldman
bioRxiv 2022.03.22.485312; doi: https://doi.org/10.1101/2022.03.22.485312

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4383)
  • Biochemistry (9599)
  • Bioengineering (7094)
  • Bioinformatics (24865)
  • Biophysics (12615)
  • Cancer Biology (9958)
  • Cell Biology (14354)
  • Clinical Trials (138)
  • Developmental Biology (7950)
  • Ecology (12107)
  • Epidemiology (2067)
  • Evolutionary Biology (15989)
  • Genetics (10926)
  • Genomics (14743)
  • Immunology (9870)
  • Microbiology (23676)
  • Molecular Biology (9485)
  • Neuroscience (50872)
  • Paleontology (369)
  • Pathology (1539)
  • Pharmacology and Toxicology (2683)
  • Physiology (4016)
  • Plant Biology (8657)
  • Scientific Communication and Education (1509)
  • Synthetic Biology (2397)
  • Systems Biology (6436)
  • Zoology (1346)