Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Fast and accurate large multiple sequence alignments using root-to-leave regressive computation

View ORCID ProfileEdgar Garriga, View ORCID ProfilePaolo Di Tommaso, View ORCID ProfileCedrik Magis, View ORCID ProfileIonas Erb, View ORCID ProfileHafid Laayouni, View ORCID ProfileFyodor Kondrashov, View ORCID ProfileEvan Floden, View ORCID ProfileCedric Notredame
doi: https://doi.org/10.1101/490235
Edgar Garriga
1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
2Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Edgar Garriga
Paolo Di Tommaso
1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paolo Di Tommaso
Cedrik Magis
1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
2Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cedrik Magis
Ionas Erb
1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ionas Erb
Hafid Laayouni
3Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
4Bioinformatics Studies, ESCI-UPF, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hafid Laayouni
Fyodor Kondrashov
5Institute of Science and Technology, Klosterneuburg, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fyodor Kondrashov
Evan Floden
1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Evan Floden
  • For correspondence: evan.floden@crg.eu cedric.notredame@crg.eu
Cedric Notredame
1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
2Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cedric Notredame
  • For correspondence: evan.floden@crg.eu cedric.notredame@crg.eu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Inferences derived from large multiple alignments of biological sequences are critical to many areas of biology, including evolution, genomics, biochemistry, and structural biology. However, the complexity of the alignment problem imposes the use of approximate solutions. The most common is the progressive algorithm, which starts by aligning the most similar sequences, incorporating the remaining ones following the order imposed by a guide-tree. We developed and validated on protein sequences a regressive algorithm that works the other way around, aligning first the most dissimilar sequences. Our algorithm produces more accurate alignments than non-regressive methods, especially on datasets larger than 10,000 sequences. By design, it can run any existing alignment method in linear time thus allowing the scale-up required for extremely large genomic analyses.

One Sentence Summary Initiating alignments with the most dissimilar sequences allows slow and accurate methods to be used on large datasets

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted December 07, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Fast and accurate large multiple sequence alignments using root-to-leave regressive computation
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Fast and accurate large multiple sequence alignments using root-to-leave regressive computation
Edgar Garriga, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Hafid Laayouni, Fyodor Kondrashov, Evan Floden, Cedric Notredame
bioRxiv 490235; doi: https://doi.org/10.1101/490235
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Fast and accurate large multiple sequence alignments using root-to-leave regressive computation
Edgar Garriga, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Hafid Laayouni, Fyodor Kondrashov, Evan Floden, Cedric Notredame
bioRxiv 490235; doi: https://doi.org/10.1101/490235

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4224)
  • Biochemistry (9101)
  • Bioengineering (6749)
  • Bioinformatics (23935)
  • Biophysics (12086)
  • Cancer Biology (9491)
  • Cell Biology (13728)
  • Clinical Trials (138)
  • Developmental Biology (7614)
  • Ecology (11656)
  • Epidemiology (2066)
  • Evolutionary Biology (15476)
  • Genetics (10615)
  • Genomics (14292)
  • Immunology (9456)
  • Microbiology (22773)
  • Molecular Biology (9069)
  • Neuroscience (48840)
  • Paleontology (354)
  • Pathology (1479)
  • Pharmacology and Toxicology (2562)
  • Physiology (3822)
  • Plant Biology (8307)
  • Scientific Communication and Education (1467)
  • Synthetic Biology (2289)
  • Systems Biology (6170)
  • Zoology (1297)