RT Journal Article SR Electronic T1 Effects of duplicated mapped read PCR artifacts on RNA-seq differential expression analysis based on qRNA-seq JF bioRxiv FD Cold Spring Harbor Laboratory SP 301259 DO 10.1101/301259 A1 Anna C. Salzberg A1 Jiafen Hu A1 Elizabeth J. Conroy A1 Nancy M. Cladel A1 Robert M. Brucklacher A1 Georgina V. Bixler A1 Yuka Imamura Kawasawa, Ph.D YR 2018 UL http://biorxiv.org/content/early/2018/04/13/301259.abstract AB Best practices to handling duplicated mapped reads in RNA-seq analyses has long been discussed but a gold standard method has yet to be established, as such duplicates could originate from valid biological transcripts or they could be PCR-related artifacts. Here we used the NEXTflex™ qRNA-SeqTM (aka Molecular Indexing™) technology to identify PCR duplicates via the random attachment of unique molecular labels to each cDNA molecule prior to PCR amplification. We found that up to 64.3% of the single end and 19.3% of the mouse paired end duplicates originated from valid biological transcripts rather than PCR artifacts. For single end reads, either removing or retaining all duplicates resulted in a substantial number of false positives (up to 47.0%) and false negatives (up to 12.1%) in the sets of significantly differentially expressed genes. For paired end reads, only the alignment retaining all duplicates resulted in a substantial number of false positives. This is the first effort to evaluate the performance of qRNA-seq using ‘real-world’ biomedical samples, and we found that PCR duplicate identification provided minor benefits for paired end reads but greatly improved the sensitivity and specificity in the determination of the significantly differentially expressed genes for single end reads.