Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

RabbitVar: ultra-fast and accurate somatic small-variant calling on multi-core architectures

View ORCID ProfileHao Zhang, Honglei Song, View ORCID ProfileZekun Yin, Qixin Chang, Yanjie Wei, Beifang Niu, View ORCID ProfileBertil Schmidt, Weiguo Liu
doi: https://doi.org/10.1101/2023.01.06.522980
Hao Zhang
1School of Software, Shandong University, Jinan, 250100, Shandong Province, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hao Zhang
Honglei Song
1School of Software, Shandong University, Jinan, 250100, Shandong Province, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zekun Yin
1School of Software, Shandong University, Jinan, 250100, Shandong Province, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zekun Yin
  • For correspondence: zekun.yin@sdu.edu.cn weiguo.liu@sdu.edu.cn
Qixin Chang
1School of Software, Shandong University, Jinan, 250100, Shandong Province, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yanjie Wei
2Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guandong Province, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Beifang Niu
3Computer Network Information Center, Chinese Academy of Sciences, 100083, Beijing, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bertil Schmidt
4Institute for Computer Science, Johannes Gutenberg University, Mainz, 55128, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bertil Schmidt
Weiguo Liu
1School of Software, Shandong University, Jinan, 250100, Shandong Province, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: zekun.yin@sdu.edu.cn weiguo.liu@sdu.edu.cn
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The continuous development of next-generation sequencing (NGS) technology has led to extensive and frequent use of genomic analysis in cancer research. The associated production of large-scale NGS datasets establishes the need for high-precision somatic variant calling methods that are highly optimized on commonly used hardware platforms. We present RabbitVar (https://github.com/LeiHaoa/RabbitVar), a scalable variant caller that can detect small somatic variants from paired tumor/normal NGS data on modern multi-core CPUs. Our approach combines candidate-finding and machine-learning-based filtering strategies with optimized data structures and multi-threading to achieve both high accuracy and efficiency. We have compared the performance of RabbitVar to leading state-of-the-art callers (Strelka2, Mutect2, NeuSomatic, VarDict, VarScan2) on real-world HCC1395 breast cancer datasets under different sequencing conditions and contamination rates. The evaluation results demonstrate that RabbitVar achieves highly competitive F1-scores when calling SNVs. Moreover, when calling the more challenging indel variants, it consistently achieves the highest F1-scores. RabbitVar is able to process a paired tumor and normal whole human genome sequencing datasets with 80x depth in less than 20 minutes on a 48-core workstation outperforming all other tested variant callers in terms of efficiency.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 06, 2023.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
RabbitVar: ultra-fast and accurate somatic small-variant calling on multi-core architectures
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
RabbitVar: ultra-fast and accurate somatic small-variant calling on multi-core architectures
Hao Zhang, Honglei Song, Zekun Yin, Qixin Chang, Yanjie Wei, Beifang Niu, Bertil Schmidt, Weiguo Liu
bioRxiv 2023.01.06.522980; doi: https://doi.org/10.1101/2023.01.06.522980
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
RabbitVar: ultra-fast and accurate somatic small-variant calling on multi-core architectures
Hao Zhang, Honglei Song, Zekun Yin, Qixin Chang, Yanjie Wei, Beifang Niu, Bertil Schmidt, Weiguo Liu
bioRxiv 2023.01.06.522980; doi: https://doi.org/10.1101/2023.01.06.522980

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4224)
  • Biochemistry (9101)
  • Bioengineering (6748)
  • Bioinformatics (23932)
  • Biophysics (12081)
  • Cancer Biology (9489)
  • Cell Biology (13727)
  • Clinical Trials (138)
  • Developmental Biology (7614)
  • Ecology (11655)
  • Epidemiology (2066)
  • Evolutionary Biology (15475)
  • Genetics (10614)
  • Genomics (14291)
  • Immunology (9455)
  • Microbiology (22773)
  • Molecular Biology (9069)
  • Neuroscience (48836)
  • Paleontology (354)
  • Pathology (1479)
  • Pharmacology and Toxicology (2560)
  • Physiology (3821)
  • Plant Biology (8307)
  • Scientific Communication and Education (1467)
  • Synthetic Biology (2289)
  • Systems Biology (6168)
  • Zoology (1297)