Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Benchmarking of alignment-free sequence comparison methods

View ORCID ProfileAndrzej Zielezinski, View ORCID ProfileHani Z. Girgis, View ORCID ProfileGuillaume Bernard, View ORCID ProfileChris-Andre Leimeister, Kujin Tang, Thomas Dencker, Anna K. Lau, Sophie Röhling, JaeJin Choi, Michael S. Waterman, View ORCID ProfileMatteo Comin, Sung-Hou Kim, View ORCID ProfileSusana Vinga, View ORCID ProfileJonas S. Almeida, View ORCID ProfileCheong Xin Chan, View ORCID ProfileBenjamin T. James, View ORCID ProfileFengzhu Sun, View ORCID ProfileBurkhard Morgenstern, View ORCID ProfileWojciech M. Karlowski
doi: https://doi.org/10.1101/611137
Andrzej Zielezinski
Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614 Poznan, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrzej Zielezinski
Hani Z. Girgis
Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hani Z. Girgis
Guillaume Bernard
Sorbonne Université, UMR 7205 ISYEB, Paris 75005 France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guillaume Bernard
Chris-Andre Leimeister
University of Göttingen, Institute of Microbiology and Genetics, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chris-Andre Leimeister
Kujin Tang
Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Thomas Dencker
University of Göttingen, Institute of Microbiology and Genetics, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anna K. Lau
University of Göttingen, Institute of Microbiology and Genetics, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sophie Röhling
University of Göttingen, Institute of Microbiology and Genetics, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
JaeJin Choi
Department of Chemistry, University of California, Berkeley, CA 94720Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720Department of Integrated Omics for Biomedical Sciences, Yonsei University, Seoul 03722, Republic of KoreaKorea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael S. Waterman
Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USACentre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matteo Comin
Department of Information Engineering, University of Padova, Padova, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Matteo Comin
Sung-Hou Kim
Department of Chemistry, University of California, Berkeley, CA 94720Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720Department of Integrated Omics for Biomedical Sciences, Yonsei University, Seoul 03722, Republic of KoreaKorea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susana Vinga
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisbon, PortugalIDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisbon, Portugal
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Susana Vinga
Jonas S. Almeida
National Cancer Institute (NIH/NCI), Division of Epidemiology and Genetics (DCEG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonas S. Almeida
Cheong Xin Chan
Institute for Molecular Bioscience, and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cheong Xin Chan
Benjamin T. James
Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Benjamin T. James
Fengzhu Sun
Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USACentre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fengzhu Sun
Burkhard Morgenstern
University of Göttingen, Institute of Microbiology and Genetics, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Burkhard Morgenstern
Wojciech M. Karlowski
Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614 Poznan, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wojciech M. Karlowski
  • For correspondence: wmk@amu.edu.pl
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Here, we present a community resource (http://afproject.org) to establish standards for comparing AF methods across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference and reconstruction of species trees under horizontal gene transfer and recombination events. The interactive web service allows researchers to explore the performance of AF tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with the current state-of-the art tools, accelerating the development of new, more accurate AF solutions.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted April 16, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Benchmarking of alignment-free sequence comparison methods
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Benchmarking of alignment-free sequence comparison methods
Andrzej Zielezinski, Hani Z. Girgis, Guillaume Bernard, Chris-Andre Leimeister, Kujin Tang, Thomas Dencker, Anna K. Lau, Sophie Röhling, JaeJin Choi, Michael S. Waterman, Matteo Comin, Sung-Hou Kim, Susana Vinga, Jonas S. Almeida, Cheong Xin Chan, Benjamin T. James, Fengzhu Sun, Burkhard Morgenstern, Wojciech M. Karlowski
bioRxiv 611137; doi: https://doi.org/10.1101/611137
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Benchmarking of alignment-free sequence comparison methods
Andrzej Zielezinski, Hani Z. Girgis, Guillaume Bernard, Chris-Andre Leimeister, Kujin Tang, Thomas Dencker, Anna K. Lau, Sophie Röhling, JaeJin Choi, Michael S. Waterman, Matteo Comin, Sung-Hou Kim, Susana Vinga, Jonas S. Almeida, Cheong Xin Chan, Benjamin T. James, Fengzhu Sun, Burkhard Morgenstern, Wojciech M. Karlowski
bioRxiv 611137; doi: https://doi.org/10.1101/611137

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1524)
  • Biochemistry (2479)
  • Bioengineering (1731)
  • Bioinformatics (9670)
  • Biophysics (3897)
  • Cancer Biology (2968)
  • Cell Biology (4190)
  • Clinical Trials (135)
  • Developmental Biology (2624)
  • Ecology (4098)
  • Epidemiology (2031)
  • Evolutionary Biology (6894)
  • Genetics (5206)
  • Genomics (6498)
  • Immunology (2183)
  • Microbiology (6937)
  • Molecular Biology (2751)
  • Neuroscience (17262)
  • Paleontology (126)
  • Pathology (425)
  • Pharmacology and Toxicology (705)
  • Physiology (1056)
  • Plant Biology (2488)
  • Scientific Communication and Education (643)
  • Synthetic Biology (831)
  • Systems Biology (2687)
  • Zoology (429)