Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

SeQuiLa-cov: A fast and scalable library for depth of coverage calculations

View ORCID ProfileMarek Wiewiórka, View ORCID ProfileAgnieszka Szmurło, View ORCID ProfileWiktor Kuśmirek, View ORCID ProfileTomasz Gambin
doi: https://doi.org/10.1101/494468
Marek Wiewiórka
Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marek Wiewiórka
Agnieszka Szmurło
Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Agnieszka Szmurło
Wiktor Kuśmirek
Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wiktor Kuśmirek
Tomasz Gambin
Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tomasz Gambin
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Background Depth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next generation sequencing pipelines, including the analyses of RNA-seq data, detection of copy number variants, or quality control procedures.

Results Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides efficient depth of coverage calculations, reaching more than 100x speedup over the state-of-the-art tools. Performance and scalability of our solution allows for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface.

Conclusions SeQuiLa-cov provides significant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.

  • List of Abbreviations

    API –
    Application Programming Interface
    BAM –
    Binary Alignment Map
    GKL –
    Genomics Kernel Library
    NGS –
    Next Generation Sequencing
    SQL –
    Structured Query Language
    YARN –
    Yet Another Resource Negotiator
    WES –
    Whole Exome Sequencing
    WGS –
    Whole Genome Sequencing
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted December 13, 2018.
    Download PDF
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    SeQuiLa-cov: A fast and scalable library for depth of coverage calculations
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    Share
    SeQuiLa-cov: A fast and scalable library for depth of coverage calculations
    Marek Wiewiórka, Agnieszka Szmurło, Wiktor Kuśmirek, Tomasz Gambin
    bioRxiv 494468; doi: https://doi.org/10.1101/494468
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    SeQuiLa-cov: A fast and scalable library for depth of coverage calculations
    Marek Wiewiórka, Agnieszka Szmurło, Wiktor Kuśmirek, Tomasz Gambin
    bioRxiv 494468; doi: https://doi.org/10.1101/494468

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (1529)
    • Biochemistry (2482)
    • Bioengineering (1742)
    • Bioinformatics (9687)
    • Biophysics (3907)
    • Cancer Biology (2974)
    • Cell Biology (4199)
    • Clinical Trials (135)
    • Developmental Biology (2635)
    • Ecology (4104)
    • Epidemiology (2033)
    • Evolutionary Biology (6902)
    • Genetics (5211)
    • Genomics (6508)
    • Immunology (2188)
    • Microbiology (6954)
    • Molecular Biology (2757)
    • Neuroscience (17316)
    • Paleontology (126)
    • Pathology (428)
    • Pharmacology and Toxicology (707)
    • Physiology (1058)
    • Plant Biology (2491)
    • Scientific Communication and Education (645)
    • Synthetic Biology (831)
    • Systems Biology (2690)
    • Zoology (430)