ABSTRACT
Aberrant splicing is a hallmark of leukemias with mutations in splicing factor (SF)-encoding genes. Here we investigated its prevalence in pediatric B-cell acute lymphoblastic leukemias (B-ALL), where SFs are not mutated. By comparing them to normal pro-B cells, we found thousands of aberrant local splice variations (LSVs) per sample, with 279 LSVs in 241 genes present in every comparison. These genes were enriched in RNA processing pathways and encoded ~100 SFs, e.g. hnRNPA1. hnRNPA1 3’UTR was pervasively misspliced, yielding the transcript subject to nonsense-mediated decay. Thus, we knocked it down in B-lymphoblastoid cells, identified 213 hnRNPA1-dependent splicing events, and defined the hnRNPA1 splicing signature in pediatric leukemias. One of its elements was DICER1, a known tumor suppressor gene; its LSVs were consistent with reduced translation of DICER1 mRNA. Additionally, we searched for LSVs in other leukemia and lymphoma drivers and discovered 81 LSVs in 41 genes. 77 LSVs were confirmed using two large independent B-ALL RNA-seq datasets. In fact, the twenty most common B-ALL drivers showed higher prevalence of aberrant splicing than of somatic mutations. Thus, post-transcriptional deregulation of SF can drive widespread changes in B-ALL splicing and likely contribute to disease pathogenesis.
INTRODUCTION
Despite advances in the treatment of pediatric B-ALL, children with relapsed or refractory disease account for a substantial number of childhood cancer-related deaths. Adults with B-ALL experience even higher relapse rates and long-term event-free survival of less than 50% (Roberts and Mullighan, 2015). Recently, significant gains in the treatment of B-ALL have been achieved through the use of immunotherapies directed against CD19, a protein expressed on the surface of most B-cell neoplasms (Scheuermann and Racila, 1995; Sikaria et al., 2016). These gains culminated in the recent FDA approval of tisagenlecleucel and axicabtagene ciloleucel, CD19-redirected chimeric antigen receptor (CAR) T-cell immunotherapies, for patients with refractory/relapsed B-cell malignancies. However, relapses occur in 10-20% of patients with B-ALL treated with CD19 CAR T cells, often due to epitope loss and/or B-cell de-differentiation into other lineages (Gardner et al., 2016; Jacoby et al., 2016; Maude et al., 2014; Topp et al., 2015). Other targets for immunotherapy include CD20 and CD22 (Fry et al., 2017; Haso et al., 2013; Maino et al., 2016; Raetz et al., 2008). However, neither antigen is uniformly expressed in B-ALL, and factors accounting for this mosaicism are poorly understood (Sikaria et al., 2016).
We previously reported a new mechanism of pediatric B-ALL resistance to CD19-directed immunotherapy. We discovered that in some cases, resistance to CD19 CAR T cells was generated through alternative splicing of CD19 transcripts. This post-transcriptional event was mediated by a specific splicing factor (SF) SRSF3 and generated a CD19 protein isoform invisible to the immunotherapeutic agent via skipping of exon 2 ((Sotillo et al., 2015), reviewed by (Alderton, 2015; Behjati, 2015)).
Our discovery of a resistance mechanism based on alternative splicing prompted us to investigate the extent of this phenomenon in additional B-ALL cases. While driver mutations in splicing factors such as SRSF2, SF3B1, and U2AF1 have recently been discovered in myelodysplastic syndrome/acute myelogenous leukemia (Graubert et al., 2012; Papaemmanuil et al., 2011; Yoshida et al., 2011) and chronic lymphocytic leukemia (Quesada et al., 2012; Wang et al., 2011), SF mutations have not been reported in B-ALL. Nevertheless, our prior work suggested the possibility that SRSF3 (and by inference other SFs) could be deregulated in B-ALL (Sotillo et al., 2015), bringing about wide-spread splicing aberration.
This model would be particularly attractive because B-ALL is a chromosome translocation-driven disease where the prevalence of somatic mutations and copy number variations is relatively low. For example, the commonly mutated IKZF1 gene (which encodes the Ikaros transcription factor) is affected by missense mutations in just ~20% of B-ALL cases. Similarly, mutations in the key tumor suppressor gene (TSG) TP53 are found in only ~7% of B-ALLs (per COSMIC database) (Forbes et al., 2015; Futreal et al., 2004). In addition, both genes are robustly transcribed across individual B-ALLs and thus are not epigenetically silenced. This raises the possibility that they and other TSGs are dysregulated by post-transcriptional events, such as alternative splicing.
MATERIALS AND METHODS
Bone Marrow Fractionation
Isolated mononuclear cells and whole bone marrow aspirates were obtained, respectively, from the University of Pennsylvania Stem Cell and Xenograft Core facility and CHOP Hematopathology Laboratory. For pediatric bone marrow samples, mononuclear cells were isolated by spinning over Ficoll gradient, as described earlier (Tasian et al., 2012). Residual red blood cells were lysed with Ammonium Chloride Lysis buffer with gentle rocking at room temperature for 10 min. Cells were pelleted by spinning at 250 x g for 10 min at 4° C and washed once with cold PBS/2%FBS. Cells were resuspended in 1mL PBS/2%FBS and incubated with 500uL FC Block on ice for 10 min. Cells were stained with 1mL CD34-PE, 500uL CD19-APC, and 500uL IgM-FITC for 30 min on ice. Cells were pelleted at 1300RPM for 6 min at 4° C and washed twice in cold PBS. Cells were resuspended in 3mL PBS/2%FBS containing 1uL/mL of 0.1mg/mL DAPI. Cells were sorted 4-ways using MoFlo ASTRIOS directly into RLT Lysis buffer (Qiagen) at a ratio of 1:3.
Primary B-ALL Samples Acquisition
24 primary pediatric B-ALL samples were obtained from the CHOP Center for Childhood Cancer Research leukemia biorepository. Mononuclear cells from fresh bone marrow or peripheral blood specimens were purified via Ficoll gradient separation (Tasian et al., 2012) and cryopreserved for downstream experimental use.
RNA-seq of bone marrow fractions, primary samples and cell lines
RNA was isolated using Qiagen RNeasy Mini Kit. RNA integrity and concentration were found using Eukaryote Total RNA Nano assay on BioAnalyzer (CHOP NAPCore). RNA-seq was performed on 10ng-1ug of total RNA according to GeneWiz protocol of PolyA selection, Illuminia Hi-seq, 2×150bp pair-end, 350M raw reads per lane.
RNA-Seq alignment, quantification, and differential expression
Fastq files of RNA-seq obtained from GeneWiz were mapped using STAR aligner (Dobin et al., 2013). STAR was run with the option "alignSJoverhangMin 8". We generated STAR genome reference based on the hg19 build. Alignments were then quantified for each mRNA transcript using HTSeq with the Ensemble-based GFF file and with the option "-m intersection-strict". Normalization of the raw reads was performed using the trimmed mean of M-values (TMM). Principal component analysis (PCA) was done on normalized count values of the samples using a correlation matrix and a calculated score for each principal component. Differential expression of wild-type and knock-down or pro-B and B-ALL RNA-Seq datasets were assessed based on a model using the negative binomial distribution, a method employed by the R package DESeq2 (Love et al., 2014). Subsequent bar charts were generated using the R package ggplot2. Those differential genes that had a p-value of <0.05 were deemed as significantly up or down-regulated.
Splicing analysis
In order to detect LSVs we used the MAJIQ (version 1.03) tool (Vaquero-Garcia et al., 2016). We ran MAJIQ on the Ensemble-based GFF annotations, disallowing de novo calls (Zerbino et al., 2017). We chose for further analysis LSVs that had at least a 20% change at a 95% confidence interval between two given conditions. Using in-house customized Perl and Ruby scripts, we filtered for events that corresponded to exon inclusion events only, forcing one event per LSV. For the genes containing differential LSVs in all 18 comparisons, we identified enriched gene ontologies (GO) using the gene functional classification tool DAVID (v. 6.7), reporting the top most significant hits based on p-value and false discovery rate. Heatmaps were generated for each ∆PSI value of each LSV that passed the 20% change threshold between Pro-B and B-ALL and additionally filtered for a frequency of n>=2 in our B-ALL samples. These were generated using the R package gplots. LSVs detected by MAJIQ were also tested using Leafcutter (version 0.2.7) (Li et al., 2018) for validation. We used LeafCutter for qualitative validation of LSV as it too, like MAJIQ, offers detection of complex and de-novo splice variants. However, LeafCutter’s model uses intron clusters that do not correspond directly to inclusion levels and is unable to model intron inclusion, hence assessment of exact inclusion levels or comparison to RT-PCR are generally not possible. In addition, we validated the splicing of hnRNPA1 3’UTR we performed RT-qPCR using a forward primer spanning exons 9-10 and reverse primers in exon 11. We normalized to actin and PHL156 as it most closely resembled Pro-B cells in its splicing pattern.
Spearman’s Correlations
Correlations and their significance were computed using the nonparametric Spearman’s rank-order correlation implemented in R function cor.test(). Correlation analysis was performed between 1) RQ values of hnRNPA1 exon 11 inclusion as determined by RT-qPCR and PSI values of hnRNPA1 exon 11 inclusion as determined by MAJIQ, 2) RQ values of hnRNPA1 constitutive exons 2-3 and RQ values of hnRNPA1 exon 11 inclusion as determined by RT-qPCR (normalized to actin and PHL156), and 3) Normalized counts of hnRNPA1 constitutive exon 9. And normalized counts of hnRNPA1 exon 11 as determined by RNA-seq.
Nonsense Mediated Decay Inhibition
To inhibit NMD we treated 2mil Nalm6 B-ALL lines in duplicate with either DMSO or 30ug/mL Cyclohexamide for 6 hours followed by RNA extraction. To confirm inhibition of NMD we performed gel-based PCR with primers spanning SRSF3 poison exon 4 to detect accumulation of a canonical NMD target. To determine if either hnRNPA1 3’UTR was an NMD substrate we performed RT-qPCR with forward primer spanning exons 9-10 and reverse primers in exon 11 or spanning exons 12-13. Data was normalized to hnRNPA1 constitutive exons 2-3 and DMSO.
siRNA knock-down of hnRNPA1
Biological duplicate experiments were performed on 2 million P493-6 B-lymphoblastoid cells 27 electroporated using Amaxa Program O-006 with either 133nM non-targeting siRNA (Dharmacon) or 300nM ON-TARGET Plus Human hnRNPA1 SMARTpool siRNA (Dharmacon). Cells were plated in 2mL warm tetracycline-free RPMI for 24 hr. RNA isolation and RNA-seq were performed as described above. Knockdown was validated through RT-qPCR and Western Blot with anti-hnRNPA1 antibody (Abcam#ab5832). Pathway analysis was performed as described above. Our motif detection pipeline utilized RBPMap (v. 1.1) with a medium stringency level (p-value<=0.005) and conservation filter (Paz et al., 2014). We performed this analysis on exons affected and +/-200nts into adjacent intronic regions. We then identified hnRNPA1 motifs (Ray et al., 2013) and calculated the average z-score and p-value for all significant hits.
Datasets
Cancer gene symbols and annotations were downloaded from the COSMIC database (Forbes et al., 2015; Futreal et al., 2004). Known splice factors were annotated and obtained from published studies or ensemble-annotated databases. Pediatric B-ALL samples from the TARGET consortium (phs000218.v19.p7) were accessed via NCBI dbGaP (the Database of Genotypes and Phenotypes) Authorized Access system. Pediatric B-ALL samples from the St Jude Children’s Research Hospital (EGAD00001002704 and EGAD00001002692) were accessed by permission from the Computational Biology Committee though The European Bioinformatics Institute (EMBL-EBI).
RESULTS
RNA-seq analysis of bone marrow-derived human B-cells
To determine if patterns of splicing dysregulation occur in B-ALL, we first generated normal B-lymphocyte datasets corresponding to potential cells of origin. To this end, we obtained from the University of Pennsylvania Stem Cell and Xenograft Core facility two healthy adult bone marrow biopsies and from the Children’s Hospital of Philadelphia (CHOP) Hematopathology Laboratory two bone marrow biopsies from children without leukemia. We enriched for mononuclear cells using Ficoll gradient separation, stained cells for combinations of stage-specific surface markers, and sorted B-cell subsets using flow cytometry. Specifically, we fractionated bone marrow progenitors into early progenitors (CD34+/CD19-/IgM-), pro-B (CD34+/CD19+/IgM-), pre-B (CD34-/CD19+/IgM-), and immature B (CD34-/CD19+/IgM+) populations (Fig. 1a top and Supp. Fig. 1a). We then extracted RNA, performed RNA-seq, and quantified transcript levels. Concordant with flow cytometric profiles, CD19 mRNA levels increased throughout differentiation stages, CD34 mRNA was confined to early progenitors and pro-B fractions, and IgM transcript was expressed only in the immature fraction (Fig. 1a, bottom). Furthermore, we performed principal component analysis (PCA) on all expression datasets and found tight clustering of the four fractions from different donors, suggesting that at the level of mRNA expression B-cell differentiation supersedes individual variations (Fig. 1b). To ensure that the clustering of fractions was not driven solely by CD19 and CD34 expression, we repeated the PCA, removing CD19 and CD34 expression contributions and observe very similar clustering of fractions (Supp. Fig 1b).
RNA-Seq analysis of primary B-ALL samples
We obtained 24 primary pediatric B-ALL samples from the CHOP Center for Childhood Cancer Research leukemia biorepository. Mononuclear cells from fresh bone marrow or peripheral blood specimens were purified via Ficoll gradient separation and cryopreserved for downstream experimental use. We extracted total RNA and analyzed sample integrity. Based on the presence of intact RNA (evident by 28S and 18S bands on gels), an RNA integrity score >5.2, and RNA concentration of >40ng/uL, we successfully extracted high quality RNA for sequencing from 18 out of 24 frozen samples (Supp. Fig. 1c, red asterisks indicate samples that did not pass QC and were not sequenced). These samples were comprised of several different phenotypic and genetic B-ALL subtypes (Mullighan et al., 2007) (Supplementary Table 1 and Supp. Fig. 1d). We performed RNA-seq of these leukemia samples and first compared them to adult and pediatric normal bone marrow fractions with respect to CD19 and CD34 expression. We found that B-ALL samples closely resembled the pro-B fractions in that CD19 and CD34 transcripts were readily detectable in the low-to-medium range (Fig. 1c). To extend this analysis to the entire transcriptome, we then performed PCA on B-ALL samples and four sets of normal bone marrow fractions (two pediatric and two adult). Once again, the 18 B-ALL samples clustered most closely with the pro-B fractions (Fig. 1d). This similarity was reflected in shortest Euclidian distances between PC1 coordinates for the centroids of the B-ALL and pro-B cell samples (Supp. Fig. 1e). Therefore, we chose the pro-B fractions as cell-of-origin controls for B-ALL splicing analysis.
Global patterns of aberrant splicing in pediatric B-ALL
To detect patterns of alternative splicing in B-ALL, we utilized the MAJIQ (Modeling Alternative Junction Inclusion Quantification) algorithm (Vaquero-Garcia et al., 2016). MAJIQ offers the ability to detect, quantify, and visualize complex splicing variations, including de-novo variations, while comparing favorably in reproducibility and false discovery rate to common alternatives (Norton et al., 2017; Vaquero-Garcia et al., 2016). Using MAJIQ, we performed 18 independent pairwise comparisons between the average of pediatric pro-B cell fractions and leukemia samples. To assess heterogeneity in splicing across samples, we measured the number of differential LSVs (minimum of 20% change in splicing, 95% CI) in each sample and compared their identities. We observed that each B-ALL specimen had >3000 LSVs but most LSVs were either unique (first bar in Fig. 2a) or shared by only a small number of B-ALL patient samples (subsequent bars in Fig. 2a). However, we also found a total of 279 aberrant LSVs in 241 genes that were detected in all of our 18 pairwise comparisons (last bar in Fig. 2a). Provocatively, when this 241-gene set was analyzed using DAVID (Database for Annotation, Visualization, and Integrated Discovery) (Dennis et al., 2003), the top gene ontology (GO) categories (Blake et al., 2013) were related to RNA splicing (Fig. 1a, inset). When we investigated individual SF transcripts, we found that of the 167 well-characterized SF genes (Supplementary Table 2, from (Sebestyen et al., 2016)), 101 were alternatively spliced in B-ALL compared to pro-B cells (Fig 2b, bottom left, proB- vs B-ALL bar chart). Moreover, these changes were highly specific to the malignant phenotype and not to states of normal B-cell differentiation (Fig 2b, upper and middle panels). For example, only 6 SF transcripts were alternatively spliced during the early progenitors-to-proB transition and only 9 during the proB-to-preB transition (Fig 2b middle).
For each LSV, MAJIQ provides a ∆PSI (percent-spliced-in) value to indicate changes in isoform abundance. We observed that many members of the hnRNP and SRSF families, which play key roles in exon inclusion or skipping (Matera and Wang, 2014), exhibit profound changes in ∆PSI values (Fig 2c). These events included increases in the so-called ‘poison exons’ with in-frame stop codons in several SRSF proteins (Jumaa and Nielsen, 1997). Of note, some transcripts contain multiple aberrant LSVs. For example, SRSF11 has three distinct LSVs with different corresponding start/end genomic coordinates.
To increase confidence in the detection of splice factor related LSVs in B-ALL compared to normal pediatric pro-B samples we also compared B-ALL to normal adult pro-B cells and to publicly available RNA-seq data corresponding to the CD19-positive normal bone marrow cells (Casero et al., 2015). We then looked for overlap between each of these comparisons and detected 21 out of 25 LSVs consistently altered in B-ALL when compared to normal pediatric pro-B, adult pro-B or bone marrow samples (Fig 2d).
To decouple observed changes in splicing from changes in gene expression we performed differential expression analysis on genes with frequently detected LSVs. We found that most genes containing LSVs did not have associated changes in expression, with the notable exception of hnRNPA1, which was observed to be downregulated (two-fold difference) in approximately half of the B-ALL samples (Fig 2e). This and the pervasive nature of hnRNPA1 splicing alterations (Fig 2d, right) prompted us to investigate its regulation further.
Dysregulated splicing of hnRNPA1 in B-ALL
According to the Ensembl database (Zerbino et al., 2017), there is evidence for hnRNPA1 transcripts with alternatively spliced 3’ UTRs, notably the canonical long proximal exon 11 and two shorter, distal exons (exons 12/13, ENST00000547566) (Fig. 3a, top). We observed that the exon 11-containing transcript predominated in normal pro-B cells (Fig 3b, red), but in a typical B-ALL sample (representative sample PHL50) the predominant event was the skipping of exon 11 to exons 12 and 13 (Fig. 3b, blue). To ensure validity of this event, we also ran the LeafCutter splicing algorithm to detect splice events present in the averaged RNA-seq datasets from 18B-ALL samples and 4 pro-B samples. Confirming our MAJIQ analysis we were able to detect alternative splicing of hnRNPA1 3’UTR with this independent analysis (Supp. Fig. 2a). While exon 11 percent-spliced-in (PSI) values varied across leukemia samples, most leukemias had increased skipping of exon 11 compared to pro-B samples (Fig. 3c, blue). We also observed intron 10 inclusion in all pro-B and B-ALL samples, although this event was not significantly different between normal and malignant samples (Fig 3c, grey stacks).
We then validated and quantitated exon 11 inclusion by RT-qPCR using a forward primer spanning exons 9-10 and a reverse primer in exon 11 (Fig 3a,d). Using Spearman’s rho statistic we observed a strong positive association between MAJIQ predictions and RT-qPCR validations (0.61, p-value=0.00737) (Fig. 3e). We next applied the same correlation analysis to hnRNPA1 mRNA expression levels versus exon 11 inclusion and found a positive correlation between the two measurements when measured by RT-qPCR (0.6, p-value=0.00878) or RNA-seq (0.82, p-value=0.00002) (Fig. 3f). This suggested that preferential splicing from exon 10 to the distal 3’ UTR exons 12/13 (skipping exon 11) may decrease hnRNPA1 RNA steady state levels. Based on the exon junction present between exons 12 and 13 present downstream from the stop codon we speculated this transcript might be a target of NMD. To test this, we treated Nalm6 B-ALL cells with translation inhibitor cyclohexamide to transiently inhibit NMD (Amrani et al., 2004; Carter et al., 1996; Huang et al., 2011). To show effective inhibition of NMD, we performed gel-based PCR for primers spanning SRSF3 exon 4, a canonical target of NMD (Supp Fig 2b). We then performed RT-qPCR for hnRNPA1 3’ UTR exons and observed accumulation of the transcript that skips to exon 12/13, but not actin or the transcript including exon 11 (Fig 3g). This indicated that transcripts that skip exon 11 to exons 12/13 may be indeed targeted by NMD.
To model this event and identify potentially affected transcripts, we knocked down hnRNPA1 with siRNA in the Epstein-Barr Virus (EBV) transformed human B-lymphoblastoid P493-6 cell line (Pajic et al., 2001) and performed RNA-seq analysis. Knockdown of hnRNPA1 was confirmed by RT-qPCR (Fig 4a) and western blot (Fig 4b). Differential expression analysis showed that levels of hnRNPA1 mRNA were robustly decreased by the hnRNPA1 siRNA compared to the non-targeting siRNA control (Supp. Fig. 2c). Using MAJIQ we identified 213 LSVs in 184 genes associated with knockdown of hnRNPA1 (minimum of 20% ∆PSI, 95% CI). Of these hnRNPA1-dependent LSVs, 74 (or more than 30%) were present in at least one B-ALL sample (Fig. 4c), with more than half of these 74 LSVs altered in 10 or more samples. DAVID analysis of the genes with overlapping LSVs revealed enrichment for macromolecular metabolic pathways (Fig 4d), supporting the well-established role of hnRNP proteins in RNA metabolism (Weighardt et al., 1996). We next searched for presence of the hnRNPA1 binding motif (Burd and Dreyfuss, 1994) in the exons and flanking intronic sequences (+/-200nt) involved in overlapping LSVs. We identified the UAGGG motif in 68 out of 74 LSVs in question. Among these genes characterized by overlapping LSVs, involvement in macromolecular metabolism, and presence of hnRNPA1 binding motif, was DICER1. Interestingly, one of the LSVs in the DICER1 gene was altered in B-ALL samples in the same manner as in the hnRNPA1 KD cells (Fig. 4f). We also confirmed the presence of this LSV using LeafCutter algorithm (Supp. Fig2d). This LSV maps to the 5’UTR of the DICER1 transcript (Supp. Fig. 2e), potentially diminishing its translation efficiency. This type of deregulation would be consistent with the loss-of-function mutation and copy number variations in DICER1 in several types of cancer including leukemias (Foulkes et al., 2014).
Aberrant splicing of leukemia drivers in pediatric B-ALL
We next aimed to identify alternative splicing in other genes that contribute to leukemogenesis. We retrieved all genes (141) with known somatic mutations in hematologic malignancies from the COSMIC database (v.82) (Forbes et al., 2015; Futreal et al., 2004) (Supplementary Table 3) and searched for aberrant LSVs affecting these genes in B-ALL samples using the very stringent ΔPSI>50% cutoff for exon inclusion/skipping. We identified 81 aberrant LSVs in 41 genes that were present in at least two B-ALL samples. They accurately separated leukemia samples from the non-transformed pro-B cell counterparts following hierarchical clustering using Euclidian distances (Fig. 5a). Of note, these LSVs affected roughly a third of B-ALL driver genes. Some (in genes such as FBXW7) were present in almost all B-ALL samples, attesting to their potential significance in leukemia pathogenesis. Again, to decouple changes in splicing from changes in expression we performed differential expression analysis on genes containing LSVs in the COSMIC genes. Consistent with SF data analysis, the majority of genes in the majority of samples did not have changes in expression associated with changes in splicing (Supp. Fig. 3a).
To confirm our findings in independent datasets, we searched for aberrant LSVs in the COSMIC genes in the TARGET (Hunger et al., 2013) and St Jude Children’s Research Hospital (SJCRH) (Gu et al., 2016) B-ALL datasets using pediatric pro-B cells for comparison. We found 229 LSVs in 80 genes in the TARGET data and 362 LSVs in 95 genes in the SJCRH data (Fig 5b). Importantly, 64 of 81 LSVs identified in CHOP datasets significantly overlapped with TARGET (Fisher’s Exact Test, p=7e-124) and SJCRH datasets (Fisher’s Exact Test, p=1e-139), 13 different LSVs are present in both CHOP and SJCRH datasets and another 2 LSVs are present in CHOP and TARGET datasets (Fig 5b). We then narrowed our analysis to LSVs in top 20 B-ALL tumor suppressors and oncogenes as defined in the COSMIC database. In both datasets, we discovered largely overlapping LSVs corresponding to 15 out of 20 genes, such as IL7R, FLT3, TP53, etc. For example, we detected increased inclusion of TP53 exon 9i in B-ALL compared to pro-B (Fig. 5c, red), which would promote the expression of the SRSF3-regulated p53-β isoform (Supp. Fig. 3b) involved in cell cycle and cell death (Bourdon et al., 2005; Fujita et al., 2009; Marcel et al., 2014; Tang et al., 2013). Of note, LSV frequencies were much greater than frequencies of somatic mutation (Fig. 5d), suggesting that in B-ALL these driver genes are preferentially affected by post-transcriptional mechanisms, such as alternative splicing.
DISCUSSION
Personalized cancer diagnostics traditionally employ selected oncogene panels, which can identify mutations in specific genes known or suspected to be drivers in human malignancies. Hematologic malignancy sequencing panel typically include dominant oncogenes (e.g., FLT3 and IL7R), recessive tumor suppressors (e.g., TP53 and FBXW7), and haploinsufficient DNA/RNA caretakers (e.g., splicing factor SRSF2). Results from such genetic profiling of diagnostic cancer specimens can identify prognostic mutations and inform treatment selection for patients. Our data demonstrate that genetic deregulation occurs in B-ALL at the level of splicing in the absence of genetic mutations. For example, our analyses of B-ALL transcriptomes demonstrated that several SRSF genes controlling exon inclusion (Matera and Wang, 2014) show widespread variations in their own splicing patterns, some of which are known to decrease protein levels (Jumaa and Nielsen, 1997). Consequently, we observed aberrant splicing of some of their known target transcripts such as TP53 (Tang et al., 2013), which encodes the key tumor suppressor gene. Thus, our current data show that clinical genetic testing panels may be inadequate to identify all potential therapeutic vulnerabilities within B-ALL cells. Of note, the majority of aberrant LSVs were highly concordant across different datasets. This reproducibility validates our conclusions and alleviates the potential concern that sample preparation conditions (e.g., storage at ambient temperature) could be affecting RNA surveillance and thus impacting analysis of alternative splicing (Dvinge et al., 2014).
One of the most consistent changes in exon usage we observed was non-canonical selection of 3’ UTRs of hnRNPA1, a splice factor implicated in cancer progression (David et al., 2010; Michlewski and Caceres, 2010). While it is overexpressed in some cancers, in the B-ALL model, hnRNPA1 LSV correlated with a decrease in hnRNPA1 mRNA abundance. This is in agreement with our data that suggests this multi-exon 3’UTR would trigger non-sense mediated decay of the transcript. It is also possible that new 3’UTR sequences could create additional sites for targeting by microRNAs, which are known to play a key role in hnRNPA1 downregulation in chemotherapy-resistant ovarian cancer cells (Rodriguez-Aguayo et al., 2017). Interestingly, knockdown of hnRNPA1 in an Epstein-Barr Virus (EBV) transformed human B-cell lymphoblastoid cell line resulted in aberrant splicing of Dicer, a key enzyme in microRNA biogenesis (reviewed in (Sotillo and Thomas-Tikhonenko, 2011)). Beyond the hnRNPA1-Dicer axis, we identified strong (ΔPSI>50%) LSVs in ~30% of genes included in the COSMIC database because of their documented involvement in hematologic malignancies (Forbes et al., 2015; Futreal et al., 2004). Interestingly, four of these LSVs affect Drosha, another key enzyme in the microRNA biogenesis pathway (Sotillo and Thomas-Tikhonenko, 2011), suggesting a strong link between splicing and miRNA machineries.
Of even greater importance is the fact that B-ALL-specific LSVs affect 15 out of 20 top leukemia driver genes (including the aforementioned FLT3, IL7R, and TP53), with frequencies far exceeding those of somatic mutations. This discovery could explain why the prevalence of somatic mutations and copy number variations in B-ALL is low compared to other human cancers. It remains to be determined whether splicing alterations in oncogenes and tumor suppressor genes are functionally equivalent to gain-of-function and loss-of-function mutations, respectively. If so, interfering with splicing using RNA-based therapeutics and/or available small molecule inhibitors could be used to inhibit oncogenes such as FLT3 and to activate dormant tumor suppressor gene such as TP53. Such strategies could yield tangible therapeutic benefits across a broad spectrum of childhood B-ALL subtypes.
CONFLICT OF INTEREST
The authors declare that they have no competing interests.
ACKNOWLEDGMENTS
This work was supported by grants from the NIH (T32 HL007439 to KLB, T32 CA 115299 to EG, K08 CA184418 to SKT, R01 AG046544 to YB), Stand Up To Cancer - St. Baldrick’s Pediatric Dream Team Translational Research Grant (SU2C-AACR-DT1113 to ATT), William Lawrence and Blanche Hughes Foundation (2016 Research Grant ATT), St. Baldrick’s Foundation (RG 527717 to ATT), Alex’s Lemonade Stand Foundation (Innovation Award to ATT), and CURE Childhood Cancer (Basic Research Award to KLB).
B-ALL samples used in this submission were passed to investigators with a coded identifier and no protected health information (PHI). Basic demographic, treatment, relapse, and survival outcome data for delivered specimens were provided through an online honest broker system. Specimens were obtained via informed consent on institutional research protocols in accordance with the Declaration of Helsinki.
The original RNA-Seq data sets will be deposited in the NCBI GEO database (accession number pending).
Footnotes
The authors disclose the following sources of support: NIH (T32 HL007439 to KLB, T32 CA 115299 to EG, and K08 CA184418 to SKT, R01 AG046544 to YB), Stand Up To Cancer - St. Baldrick’s Pediatric Dream Team Translational Research Grant (SU2C-AACR-DT1113 to ATT), William Lawrence and Blanche Hughes Foundation (2016 Research Grant ATT), St. Baldrick’s Foundation (RG 527717 to ATT), Alex’s Lemonade Stand Foundation (Innovation Award to ATT), and CURE Childhood Cancer (Basic Research Award to KLB).