Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs

Jun Ma, Manuel Cáceres, Leena Salmela, Veli Mäkinen, Alexandru I. Tomescu
doi: https://doi.org/10.1101/2022.01.07.475257
Jun Ma
1Department of Computer Science, University of Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manuel Cáceres
1Department of Computer Science, University of Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: el.ariel.cl@gmail.com
Leena Salmela
1Department of Computer Science, University of Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Veli Mäkinen
1Department of Computer Science, University of Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexandru I. Tomescu
1Department of Computer Science, University of Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications in e.g., improving variant calling. While the vg toolkit (Garrison et al., Nature Biotechnology, 2018) is a popular aligner of short reads, GraphAligner (Rautiainen and Marschall, Genome Biology, 2020) is the state-of-the-art aligner of long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. We present a new algorithm to co-linearly chain a set of seeds in an acyclic variation graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of long reads to variation graphs, GraphChainer. Compared to GraphAligner, at a normalized edit distance threshold of 40%, it aligns 9% to 12% more reads, and 15% to 19% more total read length, on real PacBio reads from human chromosomes 1 and 22. On both simulated and real data, GraphChainer aligns between 97% and 99% of all reads, and of total read length. At the more stringent normalized edit distance threshold of 30%, GraphChainer aligns up to 29% more total real read length than GraphAligner.

GraphChainer is freely available at https://github.com/algbio/GraphChainer

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • jun.ma{at}helsinki.fi, manuel.caceresreyes{at}helsinki.fi, leena.salmela{at}helsinki.fi, veli.makinen{at}helsinki.fi, alexandru.tomescu{at}helsinki.fi

  • * This work was partially funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 851093, SAFEBIO) and partially by the Academy of Finland (grants No. 322595, 328877, 308030).

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted January 07, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs
Jun Ma, Manuel Cáceres, Leena Salmela, Veli Mäkinen, Alexandru I. Tomescu
bioRxiv 2022.01.07.475257; doi: https://doi.org/10.1101/2022.01.07.475257
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs
Jun Ma, Manuel Cáceres, Leena Salmela, Veli Mäkinen, Alexandru I. Tomescu
bioRxiv 2022.01.07.475257; doi: https://doi.org/10.1101/2022.01.07.475257

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3513)
  • Biochemistry (7359)
  • Bioengineering (5338)
  • Bioinformatics (20306)
  • Biophysics (10034)
  • Cancer Biology (7763)
  • Cell Biology (11333)
  • Clinical Trials (138)
  • Developmental Biology (6444)
  • Ecology (9968)
  • Epidemiology (2065)
  • Evolutionary Biology (13346)
  • Genetics (9366)
  • Genomics (12598)
  • Immunology (7719)
  • Microbiology (19060)
  • Molecular Biology (7452)
  • Neuroscience (41108)
  • Paleontology (300)
  • Pathology (1233)
  • Pharmacology and Toxicology (2141)
  • Physiology (3171)
  • Plant Biology (6869)
  • Scientific Communication and Education (1275)
  • Synthetic Biology (1899)
  • Systems Biology (5320)
  • Zoology (1090)