TY - JOUR T1 - Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing JF - bioRxiv DO - 10.1101/114603 SP - 114603 AU - Jungeui Hong AU - David Gresham Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/03/07/114603.abstract N2 - Quantitative analysis of next-generation sequencing data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are defined as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. The false positive rate of coordinate-based deduplication has not been well characterized and may introduce unforeseen biases during analyses. We developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed sequencing. Incorporation of UMIs enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TruSeq adapters containing UMIs (TrUMIseq adapters), we find that accurate removal of PCR duplicates results in enhanced data quality for quantitative analysis of allele frequencies in heterogeneous populations and gene expression.Method Summary TrUMIseq adapters incorporate unique molecular identifiers in TruSeq adapters while maintaining the capacity to multiplex sequencing libraries using existing workflows. The use of UMIs increases the accuracy of quantitative sequencing assays, including RNAseq and allele frequency estimation, by enabling accurate detection of PCR duplicates. ER -