Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers

View ORCID ProfileChang Xu, Xiujing Gu, Raghavendra Padmanabhan, Zhong Wu, Quan Peng, John DiCarlo, View ORCID ProfileYexun Wang
doi: https://doi.org/10.1101/281659
Chang Xu
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chang Xu
  • For correspondence: chang.xu@qiagen.com
Xiujing Gu
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Raghavendra Padmanabhan
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhong Wu
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Quan Peng
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John DiCarlo
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yexun Wang
Life Science Research and Foundation, QIAGEN Sciences, Inc. 6951 Executive Way, Frederick, Maryland 21703, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yexun Wang
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Motivation Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end-repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling.

Results We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit at 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data.

Availability The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license.

Footnotes

  • ↵† yexun.wang{at}qiagen.com

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted March 14, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
Chang Xu, Xiujing Gu, Raghavendra Padmanabhan, Zhong Wu, Quan Peng, John DiCarlo, Yexun Wang
bioRxiv 281659; doi: https://doi.org/10.1101/281659
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
Chang Xu, Xiujing Gu, Raghavendra Padmanabhan, Zhong Wu, Quan Peng, John DiCarlo, Yexun Wang
bioRxiv 281659; doi: https://doi.org/10.1101/281659

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4234)
  • Biochemistry (9135)
  • Bioengineering (6784)
  • Bioinformatics (23999)
  • Biophysics (12129)
  • Cancer Biology (9534)
  • Cell Biology (13777)
  • Clinical Trials (138)
  • Developmental Biology (7635)
  • Ecology (11701)
  • Epidemiology (2066)
  • Evolutionary Biology (15512)
  • Genetics (10644)
  • Genomics (14325)
  • Immunology (9482)
  • Microbiology (22839)
  • Molecular Biology (9090)
  • Neuroscience (48989)
  • Paleontology (355)
  • Pathology (1482)
  • Pharmacology and Toxicology (2570)
  • Physiology (3845)
  • Plant Biology (8331)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2296)
  • Systems Biology (6190)
  • Zoology (1301)