Polee: RNA-Seq analysis using approximate likelihood

Daniel C. Jones; Walter L. Ruzzo

doi:10.1101/2020.09.09.290411

Abstract

The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving un-certainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression.

Availability The method is implemented in a Julia package available from https://github.com/dcjones/polee

Contact dcjones{at}cs.washington.edu

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

https://github.com/dcjones/polee

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.