FastUniq: a fast de novo duplicates removal tool for paired short reads

PLoS One. 2012;7(12):e52249. doi: 10.1371/journal.pone.0052249. Epub 2012 Dec 20.

Abstract

The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Sequence Analysis, DNA / methods*
  • Software*

Grants and funding

This work was supported by the Key National Natural Science Foundation of China [Grant number 81130069]; the Program for Changjiang Scholars and Innovative Research Team in University of Ministry of Education of China [Grant number IRT1150]; and the National Science Foundation of China [Grant numbers 30901156, 31170619]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.