%0 Journal Article
%A Tuerk, Andreas
%A Wiktorin, Gregor
%A Güler, Serhat
%T Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
%D 2014
%R 10.1101/011767
%J bioRxiv
%P 011767
%X Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragment bias, which is not represented appropriately by current statistical models of RNA-Seq data. This article introduces the Mix2(rd. ”mixquare”) model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2 model can be efficiently trained with the Expectation Maximization (EM) algorithm resulting in simultaneous estimates of the transcript abundances and transcript specific positional biases. Experiments are conducted on synthetic data and the Universal Human Reference (UHR) and Brain (HBR) sample from the Microarray quality control (MAQC) data set. Comparing the correlation between qPCR and FPKM values to state-of-the-art methods Cufflinks and PennSeq we obtain an increase in R2 value from 0.44 to 0.6 and from 0.34 to 0.54. In the detection of differential expression between UHR and HBR the true positive rate increases from 0.44 to 0.71 at a false positive rate of 0.1. Finally, the Mix2 model is used to investigate biases present in the MAQC data. This reveals 5 dominant biases which deviate from the common assumption of a uniform fragment distribution. The Mix2 software is available at http://www.lexogen.com/fileadmin/uploads/bioinfo/mix2model.tgz.
%U https://www.biorxiv.org/content/biorxiv/early/2014/11/24/011767.full.pdf