RT Journal Article SR Electronic T1 UMI-Reducer: Collapsing duplicate sequencing reads via Unique Molecular Identifiers JF bioRxiv FD Cold Spring Harbor Laboratory SP 103267 DO 10.1101/103267 A1 Mangul, Serghei A1 Driesche, Sarah Van A1 Martin, Lana S. A1 Martin, Kelsey C. A1 Eskin, Eleazar YR 2017 UL http://biorxiv.org/content/early/2017/01/25/103267.abstract AB Summary Every sequencing library contains duplicate reads. While many duplicates arise during polymerase chain reaction (PCR), some duplicates derive from multiple identical fragments of mRNA present in the original lysate (termed “biological duplicates”). Unique Molecular Identifiers (UMIs) are random oligonucleotide sequences that allow differentiation between technical and biological duplicates. Here we report the development of UMI-Reducer, a new computational tool for processing and differentiating PCR duplicates from biological duplicates. UMI-Reducer uses UMIs and the mapping position of the read to identify and collapse reads that are technical duplicates. Remaining true biological reads are further used for bias-free estimate of mRNA abundance in the original lysate. This strategy is of particular use for libraries made from low amounts of starting material, which typically require additional cycles of PCR and therefore are most prone to PCR duplicate bias.Availability and Implementation The UMI-Reducer is an open source Python software and is freely available for non-commercial use (GPL-3.0) at https://sergheimangul.wordpress.com/umi-reducer/. Documentation and tutorials are available at https://github.com/smangul1/UMI-Reducer/wiki/.Contact smangul{at}ucla.edu, SVanDriesche{at}mednet.ucla.eduSupplementary information Flowchart of Library Construction