Abstract
Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples and 42 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that mis-splicing occurs at different rates across introns and tissues and that these splicing inaccuracies are primarily affected by the abundance of core components of the spliceosome assembly and its regulators. Using publicly available data on short-hairpin RNA-knockdowns of numerous spliceosomal components and related regulators, we found support for the importance of RNA-binding proteins in mis-splicing. We also demonstrated that age is positively correlated with mis-splicing, and it affects genes implicated in neurodegenerative diseases. This in-depth characterisation of mis-splicing can have important implications for our understanding of the role of splicing inaccuracies in human disease and the interpretation of long-read RNA-sequencing data.
Competing Interest Statement
S.G. is a current employee of Verge Genomics, a venture-backed startup company. The other authors declare no competing interests.
Footnotes
Abbreviations
- 5’ss
- Donor splice site
- 3’ss
- Acceptor slice site
- AGEZ
- AG Exclusion Zone
- bp
- Base pair
- BP
- Branch Point
- CDTS
- Context-dependent tolerance score. It represents a measure of DNA sequence constraint in humans66
- effsize
- Probability of superior outcome between two compared groups101. It represents the probability that a randomly selected observation from group A will have a higher score than a randomly selected observation from group B. Unlike p-values, effect sizes are independent of the sample size
- FCTX
- Frontal cortex brain tissue
- GO
- Gene Ontology enrichment analysis
- GTEx v8
- Genotype-Tissue Expression (GTEx) v8 project44 (https://gtexportal.org/home/tissueSummaryPage)
- IQR
- Interquartile range
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- MES
- Maximum Entropy Scan score, http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html
- mod3
- Modulo3 of a distance in base pairs. The modulo3 was calculated by dividing a distance figure in base pairs by 3 and obtaining the remainder of this division. mod3=0 reflects that the division by 3 has been exact; mod3=1; reflects that the division by 3 has produced value 1 as remainder; mod3=2 reflects that the division by 3 has produced value 2 as remainder
- mRNA
- Messenger RNA
- MSRD
- Mis-splicing Ratio calculated at the 5’ss (donor splice site) of a given annotated intron
- MSRA
- Mis-splicing Ratio calculated at the 3’ss (acceptor splice site) of a given annotated intron
- NMD
- Nonsense-mediated decay pathway
- phastCons20
- The mean interspecies conservation score across for 20 alignments (human, 16 primates, dog, mouse and tree shrew) to the human genome of the proximal intronic sequences (−5/+35bp, −35/+5bp, ‘/’ meaning exon-intron junction) tested. http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phastCons20way/
- PPT
- Polypyrimidine Tract
- pre-mRNA
- Messenger RNA precursor
- q
- FDR-adjusted p-value. The False Discovery Rate (FDR) multiple testing adjustment method102 was formally described by Yoav Benjamini and Yosef Hochberg (i.e. Benjamini-Hochberg method) in 1995
- RBP
- RNA-binding Protein
- shRNA
- Short Hairpin RNA
- TPM
- Transcripts Per Million
- V
- V-statistic produced by a paired Wilcoxon rank sum test with continuity correction.