TY - JOUR T1 - UMI-Reducer: Collapsing duplicate sequencing reads via Unique Molecular Identifiers JF - bioRxiv DO - 10.1101/103267 SP - 103267 AU - Serghei Mangul AU - Sarah Van Driesche AU - Lana S. Martin AU - Kelsey C. Martin AU - Eleazar Eskin Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/25/103267.abstract N2 - Summary Every sequencing library contains duplicate reads. While many duplicates arise during polymerase chain reaction (PCR), some duplicates derive from multiple identical fragments of mRNA present in the original lysate (termed “biological duplicates”). Unique Molecular Identifiers (UMIs) are random oligonucleotide sequences that allow differentiation between technical and biological duplicates. Here we report the development of UMI-Reducer, a new computational tool for processing and differentiating PCR duplicates from biological duplicates. UMI-Reducer uses UMIs and the mapping position of the read to identify and collapse reads that are technical duplicates. Remaining true biological reads are further used for bias-free estimate of mRNA abundance in the original lysate. This strategy is of particular use for libraries made from low amounts of starting material, which typically require additional cycles of PCR and therefore are most prone to PCR duplicate bias.Availability and Implementation The UMI-Reducer is an open source Python software and is freely available for non-commercial use (GPL-3.0) at https://sergheimangul.wordpress.com/umi-reducer/. Documentation and tutorials are available at https://github.com/smangul1/UMI-Reducer/wiki/.Contact smangul{at}ucla.edu, SVanDriesche{at}mednet.ucla.eduSupplementary information Flowchart of Library Construction ER -