Abstract
Ribosome profiling, an application of nucleic acid sequencing for monitoring ribosome activity, has revolutionized our understanding of protein translation dynamics. This technique has been available for a decade, yet the current state and standardization of publicly available computational tools for these data is bleak. We introduce XPRESSyourself, an analytical toolkit that eliminates barriers and bottlenecks associated with this specialized data type by filling gaps in the computational toolset for both experts and non-experts of ribosome profiling. XPRESSyourself automates and standardizes analysis procedures, decreasing time-to-discovery and increasing reproducibility. This toolkit acts as a reference implementation of current best practices in ribosome profiling analysis. We demonstrate this toolkit’s performance on publicly available ribosome profiling data by rapidly identifying hypothetical mechanisms related to neurodegenerative phenotypes and neuroprotective mechanisms of the small-molecule ISRIB during acute cellular stress. XPRESSyourself brings robust, rapid analysis of ribosome-profiling data to a broad and ever-expanding audience and will lead to more reproducible and accessible measurements of translation regulation. XPRESSyourself software is perpetually open-source under the GPL-3.0 license and is hosted at https://github.com/XPRESSyourself, where users can access additional documentation and report software issues.
Footnotes
Updated several figured; added table summarizing other ribosome profiling packages; added Venn diagrams comparing original results with ours; expanded performance metrics table; updated ISRIB dataset insights based on improved thresholding
List of Abbreviations
- AWS
- Amazon Web Services
- BAM
- Binary Sequence Alignment Map
- BED
- Browser Extensible Data
- cDNA
- complementary DNA
- CDS
- coding sequence of gene
- ChIP-seq
- chromatin immunoprecipitation sequencing
- CPU
- central processing unit
- dbGaP
- Database of Genotypes and Phenotypes
- DNA
- deoxyribonucleic acid
- FDR
- false discovery rate
- FPKM
- fragments per kilobase of transcript per million
- GEO
- Gene Expression Omnibus
- GTF
- General Transfer Format
- IGV
- Integrative Genomics Viewer
- ISR
- integrated stress response
- ISRIB
- ISR inhibitor
- mRNA
- messenger RNA
- nt
- nucleotide
- PCA
- principal component analysis
- PCR
- polymerase chain reaction
- RAM
- random access memory
- RNA
- ribonucleic acid
- RNA-Seq
- RNA sequencing
- RPKM
- reads per kilobase of transcript per million
- RPM
- reads per million
- rRNA
- ribosomal RNA
- TCGA
- The Cancer Genome Atlas
- TE
- translation efficiency
- TPM
- transcripts per million
- UMI
- unique molecular identifier
- UTR
- untranslated region
- VCF
- Variant Call Format