Abstract
Counting DNA or RNA molecules using next-generation sequencing (NGS) suffers from amplification biases. Counting unique molecular identifiers (UMIs) instead of reads is still prone to over-estimation due to amplification and sequencing artifacts and under-estimation due to lost molecules. We present an algorithm that corrects for these errors, based on a mechanistic model of the PCR and sequencing process whose parameters have an immediate physical interpretation and are easily estimated. We demonstrate that our algorithm outputs essentially unbiased counts with substantially improved accuracy.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.