Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment

View ORCID ProfileYilei Fu, View ORCID ProfileMedhat Mahmoud, Viginesh Vaibhav Muraliraman, View ORCID ProfileFritz J. Sedlazeck, View ORCID ProfileTodd J. Treangen
doi: https://doi.org/10.1101/2021.05.29.446291
Yilei Fu
1Department of Computer Science, Rice University, Houston, TX 77005, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yilei Fu
Medhat Mahmoud
2Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, United States of America
3Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Medhat Mahmoud
Viginesh Vaibhav Muraliraman
1Department of Computer Science, Rice University, Houston, TX 77005, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fritz J. Sedlazeck
2Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fritz J. Sedlazeck
  • For correspondence: Fritz.Sedlazeck@bcm.edu treangen@rice.edu
Todd J. Treangen
1Department of Computer Science, Rice University, Houston, TX 77005, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Todd J. Treangen
  • For correspondence: Fritz.Sedlazeck@bcm.edu treangen@rice.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hotspots reduces read alignment accuracy and impedes structural variant detection.

Findings We tested our hypothesis by implementing a read mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via e.g. minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long read mapper (NGMLR). In support of our hypothesis, we show Vulcan improves the alignments for Oxford Nanopore Technology (ONT) long-reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read mapping methods alone.

Conclusions Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes, resulting in improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • ↵† These authors share senior authorship.

  • https://gitlab.com/treangenlab/vulcan

  • List of abbreviations

    Normalized edit distance
    normalized edit distance can be expressed as E = e/l, where e is the edit distance and l is the alignment length.
    SV
    structural variants
    ONT
    Oxford Nanopore Technologies
    PacBio CLR
    PacBio Continuous long read
    PacBio HiFi
    PacBio circular consensus sequencing
    SNV
    Single nucleotide variation
    INS
    insertions
    DEL
    deletions
    TRA
    translocations
    DUP
    duplications
    INV
    inversions
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted May 30, 2021.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment
    Yilei Fu, Medhat Mahmoud, Viginesh Vaibhav Muraliraman, Fritz J. Sedlazeck, Todd J. Treangen
    bioRxiv 2021.05.29.446291; doi: https://doi.org/10.1101/2021.05.29.446291
    Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
    Citation Tools
    Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment
    Yilei Fu, Medhat Mahmoud, Viginesh Vaibhav Muraliraman, Fritz J. Sedlazeck, Todd J. Treangen
    bioRxiv 2021.05.29.446291; doi: https://doi.org/10.1101/2021.05.29.446291

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4086)
    • Biochemistry (8759)
    • Bioengineering (6478)
    • Bioinformatics (23339)
    • Biophysics (11747)
    • Cancer Biology (9147)
    • Cell Biology (13245)
    • Clinical Trials (138)
    • Developmental Biology (7413)
    • Ecology (11367)
    • Epidemiology (2066)
    • Evolutionary Biology (15086)
    • Genetics (10397)
    • Genomics (14008)
    • Immunology (9118)
    • Microbiology (22039)
    • Molecular Biology (8777)
    • Neuroscience (47356)
    • Paleontology (350)
    • Pathology (1420)
    • Pharmacology and Toxicology (2480)
    • Physiology (3703)
    • Plant Biology (8049)
    • Scientific Communication and Education (1431)
    • Synthetic Biology (2208)
    • Systems Biology (6015)
    • Zoology (1249)