Abstract
Advances in next-generation sequencing have led to the discovery of many alternative splice isoforms at the transcript level, but the protein-level existence of most of these isoforms remains unknown. To survey the landscape of protein alternative isoform expression in the human proteome, we developed a proteotranscriptomics tool and workflow, which filters RNA sequencing data by junction reads before translating splice junctions into amino acid sequences. We further limit in silico sequence translation strictly to a single phase to reduce false positives in splice junction identification at the protein level. In total, we re-analyzed public RNA sequencing datasets and constructed custom FASTA databases from 10 human tissue types (heart, lung, liver, pancreas, ovary, testis, colon, prostate, adrenal gland, and esophagus). We used the custom database to identify splice junction peptides in proteomics datasets from the same 10 human tissues as well as 19 cardiac anatomical regions and cell types. We identified a total of 1,984 protein isoforms including 345 unique splice-specific peptides not currently documented in common proteomics databases. The proteotranscriptomics approach using restricted sequence databases described here may help reveal previously unidentified alternative protein isoforms, and aid in the study of alternative splicing at the proteome level.
- Abbreviations
- A3SS
- Alternative 3′ splice site;
- A5SS
- Alternative 5′ splice site;
- MXE
- Mutually exclusive exons;
- PKA
- Protein kinase A;
- PTC
- Premature termination codon;
- PTM
- Post-translational modification;
- RI
- Retained introns;
- SE
- Skipped exons.