ABSTRACT
Motivation Transcriptomic profiles can improve our understanding of the phenotypic molecular basis of biological research, and many statistical methods have been proposed to identify differentially expressed genes under two or more conditions with RNA-seq data. However, statistical analyses with RNA-seq data often suffer from small sample sizes, and global variance estimates of RNA expression levels have been utilized as prior distributions for gene-specific variance estimates, making it difficult to generalize the methods to more complicated settings. We herein proposed a Bartlett-Adjusted Likelihood based LInear mixed model approach (BALLI) to analyze more complicated RNA-seq data. The proposed method estimates the technical and biological variances with a linear mixed effect model, with and without adjusting small sample bias using Bartlett’s corrections.
Results We conducted extensive simulations to compare the performance of BALLI with those of existing approaches (edgeR, DESeq2, and voom). Results from the simulation studies showed that BALLI correctly controlled the type-1 error rates at the various nominal significance levels, and produced better statistical power and precision estimates than those of other competing methods in various scenarios. Furthermore, BALLI was robust to variation of library size. It was also successfully applied to Holstein milk yield data, illustrating its practical value.
Availability and Implementation BALLI is implemented as R package and freely available at http://healthstat.snu.ac.kr/software/balli/.
Contact won1{at}snu.ac.kr