Abstract
Ribosome profiling (Riboseq) is a powerful technique for measuring protein translation, however, sampling errors and biological biases are prevalent and poorly understand. Addressing these issues, we present Scikit-ribo (https://github.com/hanfang/scikit-ribo), the first open-source software for accurate genome-wide A-site prediction and translation efficiency (TE) estimation from Riboseq and RNAseq data. Scikit-ribo accurately identifies A-site locations and reproduces codon elongation rates using several digestion protocols (r = 0.99). Next we show commonly used RPKM-derived TE estimation is prone to biases, especially for low-abundance genes. Scikit-ribo introduces a codon-level generalized linear model with ridge penalty that correctly estimates TE while accommodating variable codon elongation rates and mRNA secondary structure. This corrects the TE errors for over 2000 genes in S. cerevisiae, which we validate using mass spectrometry of protein abundances (r = 0.81). From this, we determine the Kozak-like sequence directly from Riboseq and discover novel roles of the DEAD-box protein Dhh1p, deepening our understanding of translation control.