TRUmiCount: Correctly counting absolute numbers of molecules using unique molecular identifiers

Florian G. Pflug; Arndt von Haeseler

doi:10.1101/217778

Abstract

Counting DNA or RNA molecules using next-generation sequencing (NGS) suffers from amplification biases. Counting unique molecular identifiers (UMIs) instead of reads is still prone to over-estimation due to amplification and sequencing artifacts and under-estimation due to lost molecules. We present an algorithm that corrects for these errors, based on a mechanistic model of the PCR and sequencing process whose parameters have an immediate physical interpretation and are easily estimated. We demonstrate that our algorithm outputs essentially unbiased counts with substantially improved accuracy.