Abstract
Background Depth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next generation sequencing pipelines, including the analyses of RNA-seq data, detection of copy number variants, or quality control procedures.
Results Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides efficient depth of coverage calculations, reaching more than 100x speedup over the state-of-the-art tools. Performance and scalability of our solution allows for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface.
Conclusions SeQuiLa-cov provides significant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.
List of Abbreviations
- API –
- Application Programming Interface
- BAM –
- Binary Alignment Map
- GKL –
- Genomics Kernel Library
- NGS –
- Next Generation Sequencing
- SQL –
- Structured Query Language
- YARN –
- Yet Another Resource Negotiator
- WES –
- Whole Exome Sequencing
- WGS –
- Whole Genome Sequencing