Abstract
The heptad repeating sequence of the C-terminal domain (CTD) of the largest subunit of RNA polymerase II is highly conserved in eukaryotes. In yeast, a CTD code consisting of pairs of heptad repeats is essential for viability. However, the strict requirement of diheptad repeats for the CTD function in transcription and splicing is unexplained. Here we show that CoAA (gene symbol RBM14), an oncoprotein and mammalian transcriptional coactivator, possesses diheptad repeats and directly interacts with the CTD. CoAA comprises 27 copies of tyrosine-rich repeats and regulates pre-mRNA synthesis and alternative splicing. Tyrosine substitutions in either the CoAA repeats or the CTD repeats diminish their interactions. Ser2- or Ser5-phosphorylated CTD peptides exhibit higher binding affinity to CoAA than the corresponding non-phosphorylated CTD peptide. CoAA dynamically interacts with both the CTD and hnRNP M, which is an alternative splicing regulator also comprising diheptad repeats. Arginine methylation of CoAA switches its interaction from the hnRNP M repeats to the CTD repeats. This study provides a mechanism for CoAA at the interface of transcription and alternative splicing, and explains the functional requirement of diheptad repeats in the CTD. In the human genome, tyrosine-rich repeats similar to the CoAA repeats were only found in six oncoproteins including EWS and SYT. We suggest that the diheptad sequence is one of the signature features for the CTD interaction among oncoproteins involved in transcription and alternative splicing. We anticipate that direct RNA Pol II interaction is a mechanism in oncogenesis.
INTRODUCTION
The CTD of RNA polymerase II (RNAP II) largest subunit is required for pre-mRNA synthesis and RNA processing (1–4). This coordination requires the binding of an array of protein factors to the heptad sequence repeats (YSPTSPS) present in the CTD. Crystal structures have been determined in a number of CTD-modifying enzymes and factors involved in mRNA capping or polyadenylation (1). However, the puzzling requirement for multiple repeats in the CTD has yet to be explained. Extensive evidence supports for a requirement of tyrosine or serine phosphorylation and dephosphorylation of the CTD during transcription cycle (5–12). In addition, genetic studies in budding yeast demonstrate that pairs of heptad CTD repeats constitute the essential functional unit (13,14). Yeast survival relies on the presence of undisrupted diheptad repeats, in which a dual requirement of tyrosine and serine residues confers essential CTD activities. These observations predict that certain mammalian CTD-binding proteins will recognize its tyrosines in a diheptad pattern and its phosphorylated serines. These CTD-binding proteins shall be critical in transcription and alternative splicing.
RESULTS AND DISCUSSION
Transcriptional coactivators stimulate gene activation and recruit the RNAP II complex during transcription (15). We find here that a transcriptional coactivator CoAA (coactivator activator, gene symbol RBM14) is a direct CTD-interacting protein with tandem diheptad repeats. CoAA was initially identified as a coactivator (16), and regulates transcription-coupled pre-mRNA alternative splicing (17). The human CoAA gene is amplified in cancers with recurrent deletion of its enhancer sequence (18). This deletion causes a defect in CoAA alternative splicing and blocks stem cell differentiation in tumorigenesis (19,20). CoAA gene amplification is found in mutant immune cells in tumor microenvironment of solid cancers associated with chronic inflammation (21).
CoAA contains two N-terminal RNA recognition motifs (RRMs) and a C-terminal activation domain containing 27 copies of tyrosine- and glutamine-rich repetitive sequence (Fig. 1a), previously termed the YxxQ repeats (16,18). The transcriptional activity of CoAA requires its YxxQ repeats. In the mammalian genome, repetitive sequences homologous to the YxxQ repeats of CoAA are present in only six oncoproteins including SYT and the EWS family members based on the database analysis at ScanProsite (18,22) (Supplementary Fig. S1). The sequence repeats in these oncoproteins are transcriptionally active and have been shown to be necessary for tumor initiation (21,23).
To seek proteins that interact with the intriguing YxxQ repeats of CoAA, we carried out mass spectrometry using recombinant YxxQ repeats (307-545) as bait. The most efficiently bound proteins were identified as heterogeneous ribonucleoprotein hnRNP M isoforms (Supplementary Table S1, in a separate PDF file), and immunoblotting of bound proteins also detected RNAP II (Supplementary Fig. S2). hnRNP M is known to interact with pre-mRNA and to regulate pre-mRNA splicing on/off during heat shock (24–26). hnRNP M contains two N-terminal RRM domains, 27 copies of methionine-rich IERM repeats (24), and a C-terminal RRM domain (Fig. 1a and Supplementary Fig. S3). CoAA has a related secondary structure with hnRNP M and also belongs to the hnRNP family (16). Due to less abundance, CoAA protein was not initially identified in the hnRNP protein complex relying 2D gel analysis. The presence of the imperfect 27-copy repeats in both CoAA and hnRNP M is a striking shared feature in addition to their RRM domains. Importantly, pairs of heptads were identified in both the YxxQ and the IERM repeats similar to that of the CTD (Fig. 1a and Supplementary Fig. S4).
Both CoAA and hnRNP M are RRM proteins involved in transcription and alternative splicing. We next investigated their RNA-dependent in vivo associations with transcription and splicing complexes using RNAP II and the splicing regulator PSF as markers, respectively. Using coimmunoprecipitation (co-IP) of nucleus lysates from cells, CoAA interacted with hnRNP M; however, their association towards transcription and splicing complexes appeared to be dynamic. Specifically, CoAA is closely associated with RNAP II but not PSF, independent of RNA. hnRNP M is closely associated with PSF but not RNAP II, dependent of RNA (Fig. 1b). Consistently, CoAA but not hnRNP M is found in the complex containing phosphorylated RNAP II (Supplementary Fig. S5), and only a fraction of CoAA is colocalized with RNAP II in the nucleus (Supplementary Fig. S6). The interaction of CoAA and hnRNP M may reflect a dynamic association of transcription and splicing complexes. These results nonetheless showed that different pools of CoAA and hnRNP M exist in vivo.
To test the potential direct interactions among repeat-containing CoAA, hnRNP M, and RNAP II, we carried out in vitro pull-down studies with mutagenesis. CoAA YxxQ repeats directly interacted with hnRNP M IERM repeats in vitro (Fig. 1c). In addition, consistent with co-IP observations, CoAA but not hnRNP M interacted with serine-phosphorylated recombinant CTD (Fig. 1d). When the CTD was mutated with alanine insertions (CTD-YA) to disrupt its diheptad feature (Fig. 1a), a lethal phenotype in yeast (14), the CTD binding to CoAA was diminished (Fig. 1d). When the CTD was mutated but preserving the diheptad repeats (CTD-AA), a viable phenotype in yeast, the CTD binding to CoAA was present except reduced. When mutations were introduced to the 27 tyrosine residues in CoAA YxxQ repeats through gene synthesis, alanine mutations AxxQ abolished the CTD binding but phenylalanine mutations FxxQ did not (Fig. 1e). Together, these data indicate that the tyrosine residues in CoAA are required for the diheptad CTD interaction.
We further confirmed the direct interaction between CoAA and the CTD through peptide binding studies. Recombinant GST fusion CoAA YxxQ repeats (307-545) were coated on a 96-well plate and incubated with synthetic biotin-labeled CTD diheptad peptides. Bound peptides were detected via streptavidin conjugates. EC50 can be measured using binding kinetics. We compared unphosphorylated, Ser2- or Ser5-phosphorylated and tyrosine to alanine mutated CTD peptides. The Ser2- and Ser5-phosphorylated diheptad peptides bound to the YxxQ repeats with median effective concentration EC50 at 135 nM and 101 nM, respectively (Fig. 2a). The binding of unphosphorylated CTD peptide occurred at a much lower affinity with EC50 at 1066 nM. However, the tyrosine to alanine mutant of the CTD peptides essentially abolished the interaction of the YxxQ repeats, binding was not detectable (ND). As controls, the IERM peptide bound to the YxxQ repeats with EC50 at 83 nM. The recombinant CoAA AxxQ mutant failed to bind to any of these peptides (Fig. 2a and Supplementary Fig. S7). A reciprocal binding assay using the YxxQ diheptad peptide and recombinant mutated CTD-AA and CTD-YA proteins indicated the diheptad feature of the CTD is required for the interaction (Fig. 2b).
Furthermore, a competition assay showed that the IERM peptide was able to compete with the Ser5-phosphorylated but not with alanine mutated CTD for binding to CoAA (Fig. 2c). These data collectively indicate that the serine phosphorylation of CTD is critical for high binding affinity. The tyrosine residues in the CTD, similar to that in CoAA, are essential for the interaction. The diheptad peptides from both repeats are sufficient as minimal binding units consistent with the previous finding in yeast (13). In vivo, however, higher binding efficiency would be expected with longer repeats. It might also be physiologically meaningful that CoAA binds to the CTD and hnRNP M with comparable affinity so that their binding could be interchangeable upon regulation.
The transcriptional and splicing activities of CoAA and hnRNP M were further analyzed to gain more insights into their interrelationship. CoAA is transcriptionally active and hnRNP M is not (Fig. 3a). The tyrosine to alanine AxxQ but not to phenylalanine FxxQ mutations in the YxxQ repeats abolished CoAA activity, which can be explained by the requirement of tyrosine residues in the CTD interaction (Fig. 1e). When individual domains were tested by Gal4-fusion system, only the YxxQ or FxxQ repeats had potent transcriptional activity reflecting their direct interactions with the CTD (Fig. 3b).
Using a splicing cassette of CD44 minigene, we found that CoAA and hnRNP M counterregulate alternative splicing choices (Fig. 3c). In addition, chromatin immunoprecipitation (ChIP) analysis of the CoAA gene, known to be regulated by CoAA itself (18), showed non-overlapping chromatin binding profiles between CoAA and hnRNP M (Fig. 3d). While the chromatin occupancy of CoAA parallels with that of RNAP II at the CoAA exon regions, hnRNP M showed increased chromatin binding following the decrease of CoAA. CoAA and hnRNP M potentially balance exon inclusion and skipping through their heterodimerization. Their preferential association to RNAP II complex or splicing complex might provide a bridging connection for transcription-coupled splicing (Supplementary Fig. S4d), in which the higher transcription rate often links to one splicing choice and the lower rate to another (27).
Although CoAA and hnRNP M are likely regulated at multiple levels during transcription, we examined their protein arginine methylations for two reasons. First, the hnRNP family proteins are extensively methylated during transcription (28–30). Second, arginine dimethylation has been shown in EWS, an RRM protein with homology to the CoAA YxxQ repeats (31) (Supplementary Fig. S1). Our results indicated that CoAA but not hnRNP M is arginine methylated by CARM1 predominantly at the regions surrounding the YxxQ repeats (Fig. 4a and Supplementary Fig. S8a). When fractionated CoAA from cell nucleus was tested for the CTD interaction upon CARM1 methylation, CoAA that co-purified with hnRNP M had increased CTD binding preferentially to phosphorylated CTD. In contrast, CoAA free from hnRNP M failed to bind to the CTD (Fig. 4b-c). During dose-dependent CARM1 treatment, CoAA switched binding from hnRNP M to the CTD upon arginine methylation (Fig. 4d). Our results do not exclude the presence of additional possible regulations including methylation by other arginine methyltransferases, but nonetheless demonstrate regulated CoAA binding with hnRNP M switching to the CTD under methylation (Fig. 4e).
The CTD heptad sequence is strongly conserved in eukaryotes. This study provides the first example to our knowledge of a CTD-binding protein with diheptad repeats. The CTD is a compact β-spiral structure and becomes relaxed or exposed upon serine phosphorylation (32). Consistent with this, CoAA recognizes the minimal CTD diheptad peptide through a pair of its tyrosine residues whose conformation is optimized by nearby Ser5 or Ser2 phosphorylation. On the other hand, the CoAA YxxQ repeats appear to require regulation, such as surrounding arginine methylations, in order to be accessible to the CTD. CoAA becomes more proteolytically sensitive upon arginine methylation as well as hnRNP M interaction (Supplementary Fig. S8b). Since all three molecules possess diheptad peptides, it remains to be determined whether hnRNP M interaction is required for CoAA prior to bind to the CTD. In conclusion, this study together indicates that CoAA diheptad repeats directly interacts with RNAP II CTD repeats.
The multiple tyrosine and glutamine-rich repeats in mammalian protein databases are restricted to only a few oncoproteins including CoAA (18). This implicates a fundamental important role of oncoproteins as the CTD interaction, whose defect impacts transcription-coupled alternative splicing. CoAA is previously shown to control stem cell differentiation at initial stage, and gene amplification in CoAA disrupts its own alternative splicing and blocks stem cell differentiation (18,19,33). Therefore, the defect in both stem cell regulation and RNA polymerase II interaction in oncoproteins is a conceivable mechanism in oncogenesis (21).
CONFLICT OF INTERESTS
LK is an inventor of CoAA patent. BWO receives research support and equity in Coactigon, Inc., a company designed to produce future anti-cancer drugs.
METHODS
Plasmids and Antibodies
CoAA, MMTV-luciferase and glucocorticoid receptor plasmids are previously described (16,34). hnRNP M and hnRNP M1 were isolated by RT-PCR from HeLa cells and inserted into pcDNA3. CoAA and hnRNP M fragments were inserted in-frame into pM vector (Clontech) containing Gal4 DNA-binding domain to produce Gal4-fusion proteins. 5XGAL-luciferase reporter was from Stratagene. pETv5 was from Dr. Harald Konig (35). GST-CTD-YA and GST-CTD-AA were from Dr. John W. Stiller (13). pSG5-CARM1 was from Dr. Michael R. Stallcup. GST-CTD was from Dr. James L. Manley. The AxxQ and FxxQ mutants were generated by the substitution of 27 tyrosines with 27 alanines or phenylalanines using gene synthesis (MCLab) and verified by sequencing and Western blots. Rabbit polyclonal anti-CoAA was previously generated to against the RRM domains (1-156). Commercial antibodies are as follows: anti-FLAG M2 (F-3156, Sigma); anti-PSF (P-2860, Sigma); anti-Myc (46-0603, Invitrogen); anti-hnRNP M (2A6), anti-hnRNP M/M1 (1D8), (sc-20001, sc-20002, Santa Cruz); and anti-RNAP II CTD (8WG16, MMS-126R; H5, MMS-129R; H14, MMS-134R; Covance).
Mass Spectrometry
GST-YxxQ (10 μg) was incubated with HeLa nuclear extracts in the binding buffer (20 mM Tris-HCl [pH 7.6], 50 mM NaCl, 75 mM KCl, 1 mM EDTA, 10% glycerol, 0.1% Triton X-100, 1 mM DTT and protease inhibitors). Bound proteins excised from preparative SDS-PAGE were washed with 50% acetonitrile and subjected to mass spectrometry analysis at the Harvard Microchemistry Facility. Sequence analysis included proteolytic digestion, microcapillary HPLC nano-electrospray tandem ion trap mass spectrometry, and MS/MS peptide sequence determination. Identifications were made with the Algorithm Sequest program.
Recombinant Protein Binding Assays
In vitro binding assays were performed by incubating GST-YxxQ resin (20 μl, 2 μg) and 35S-methionine-labeled, in vitro translated hnRNP M protein fragments (5 μl) produced by rabbit reticulocyte lysate using TNT Quick Coupled Transcription/Translation Systems (Promega). Proteins were incubated at 4°C for 2 hours in the binding buffer above. Bound proteins were washed 3 times with the binding buffer and subjected to SDS-PAGE and autoradiography. For CTD phosphorylation in vitro, GST fusion proteins of the CTD and its mutants CTD-AA and CTD-YA were phosphorylated by Cdc2 kinase (NEB) using 32P-γ-ATP for 30 minutes at 30°C before incubated with immunoprecipitated CoAA or hnRNP M. Partially degraded GST-CTD protein fragments were not able to be phosphorylated but present in the system. Bound proteins were detected by autoradiography.
Western Blotting and Co-immunoprecipitation (Co-IP)
Nuclear extracts were isolated by incubating cells in buffer A (20 mM HEPES, pH 7.4, 10 mM KCl, 1 mM EDTA, 1 mM EGTA, 0.1% Triton X-100, 1 mM DTT) with addition of leupeptin, aprotinin and trypsin inhibitor at 10 mg/ml for 15 min on ice. The cell pellets were then extracted in buffer B (20 mM HEPES, pH 7.4, 420 mM NaCl, 10 mM KCl, 1 mM EDTA, 1 mM EGTA, 0.5 mM MgCl2, 1 mM DTT) with protease inhibitors for 30 min. Nuclear extracts were further filtered using a 0.65 μm core size spin column (Millipore) to completely remove insoluble cellular debris. For coimmunoprecipitation, 5 ul of antibodies were captured by Protein A/G agarose (Santa Cruz) for IgG or by Immobilized Protein L agarose (Pierce) for IgM. The immune complexes were washed before incubating with 1:10 diluted nuclear extracts in the binding buffer. The precipitates were washed and subjected to Western blotting analysis using appropriate primary antibodies. The blots were detected with the ECL system (Amersham Pharmacia).
Peptide Binding Assays
White Microlite 2+ 96-well plates were coated overnight at 37°C with 500 ng/well of either GST-YxxQ protein (307-545) or GST-CTD-AA or GST-CTD-YA(13) in the presence of 5 μg/well of BSA. The coated plates were blocked with 5% BSA in 0.01 M Tris-HCl/0.15 M NaCl for 1 hour. The wells were then incubated overnight in duplicates with biotin-labeled CTD, IERM or YxxQ peptides at increasing concentrations of 6.4, 32, 160, 800, 4000, and 20000 ng/ml. Competition assays were performed using increasing concentrations of nonlabeled IERM peptide at 12.8, 64, 320, 1600, 8000, 40000 ng/ml and a constant concentration of biotin-labeled CTD peptides at 104 ng/ml. The plates were quickly washed 4 times with the binding buffer (20 mM HEPES, pH 7.4, 50 mM NaCl, 75 mM KCl, 1 mM EDTA, 0.05% Triton X-100, 10% glycerol, 1 mM DTT), and dried again overnight to prevent disruption of the binding equilibrium during later washing steps. Bound biotin peptides were then detected by incubating with 1 U/ml of HRP-conjugated streptavidin (Roche) on ice for 45 min in PBS/0.1% Triton X-100. Washed plates were added with 50 μL/well ECL detecting reagents and read on a Dynex luminometer for 3 seconds. EC50 and IC50 values were determined with sigmoidal non-linear progression curve fit in Schild plots using KaleidaGraph program. Peptide sequences: CTD, SYSPTSPSYSPTSPS; pSer2 CTD, SYpSPTSPSYpSPTSPS; pSer5 CTD, SYSPTpSPSYSPTpSPS; Tyr/Ala CTD, SASPTSPSASPTSPS; IERM, TIERMGSGVERMGPA; YxxQ, SYGAQAASYGAQSAA. In biotin-labeled peptides, two lysine residues and a C6 spacer (aminocaproic acid) are inserted between biotin and peptides in the format: [Biotin]-KK-C6-peptide.
Cell Culture and Transient Transfection
CV1 cells were maintained in DMEM supplemented with 10% fetal bovine serum and 5 μg/μl penicillin/streptomycin in 5% CO2 at 37 °C. In GAL4 reporter assays, cells in 24-well plates were transfected in triplicates with the 5XGAL-luciferase reporter (100 ng) and GAL4 fusion plasmids (200 ng) using Lipofectin (Life Technologies, Inc.). In MMTV reporter assays, cells were transfected with MMTV-luciferase (100 ng), glucocorticoid receptor (10 ng), CoAA (200 ng), or hnRNP M (200 ng) plasmids per well and induced by dexamethasone (Dex) (100 nM) for an additional 16 hours after transfection. Luciferase activities were measured by a Dynex luminometer. Data are shown as means of triplicate transfections ± standard errors.
Alternative Splicing Analysis
The pETv5 alternative splicing reporter contains the CD44 variable exon 5 and its adjacent intron sequences inserted between pre-proinsulin exons 2 and exon 3(36). The detecting primers on pre-proinsulin exons distinguish the minigene from the endogenous transcripts. 293 cells were transiently transfected with pETv5 minigene driven by RSV promoter together with CoAA or hnRNP M expression vectors or with their siRNA (100 nM) as indicated. Total RNA from cells was prepared with Trizol reagent (Invitrogen), treated with DNase I, and followed by RT-PCR. RT-PCR primers are as follows: sense, AGTGGATCCGCTTCCTGCCCC; antisense, CTGCCGGGCCACCTCCAGTGCC. The target sequence of siRNA (Dharmacon) is 5’-AGAUUAUCCAUGCAUUACA-3’ for hnRNP M, 5’-GUAACCAGCCAUCCUCUUA-3’ for CoAA, and 5’-UAGCGACUAAACACAUCAA-3’ for the control.
Chromatin Immunoprecipitation (ChIP)
HeLa cells were incubated with 1% formaldehyde for 10 min to crosslink proteins and DNA, and the reaction was stopped by 125 mM glycine. Cells were lysed and sonicated in buffer containing 20 mM Tris pH 8.0, 75 mM NaCl, 75 mM KCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 10% glycerol, 1 mM DTT, and protease inhibitors. Immunoprecipitation was carried out using salmon sperm DNA-blocked protein A/G resin (Upstate) and individual antibodies. The crosslinking was reversed by eluting with 0.1 M NaHCO3, 1% SDS, 0.3 M NaCl at 65°C for 4 hours. Purified DNA (Qiagen kit) was subjected to real-time PCR analysis. Primer pairs used on the CoAA gene are listed in Supplementary Table S2.
Protein Chromatography
RNAP II was partially purified from HeLa nuclear extracts using Sephacryl S-200 HR 10/30 gel filtration column with buffer A (20 mM Tris pH 8.0, 1 mM EDTA, and 1 mM DTT) containing 150 mM NaCl (Supplementary Fig. S8c). CoAA or desalted RNAP II fractions were further purified by a Mono Q column (5 ml) equilibrated with buffer A containing 10 mM NaCl and step eluted with buffer A containing 0.1-0.8 M NaCl. Fractions were analyzed by Western blotting using anti-CTD 8WG16 (1:200), H5 (1:200), CoAA (1:200) or hnRNP M (1:500) antibodies. Mono Q 0.4-0.5 M NaCl fractions containing RNAP II were concentrated before use. CoAA was purified using Mono Q column only.
Arginine Methylation by CARM1
GST fusion proteins of CoAA and hnRNP M fragments were in vitro arginine methylated by baculovirus expressed CARM1 (0.1 μg) in a 20 μl methylation reaction (20 mM Tris pH 8.0, 1 mM EDTA, and 200 mM NaCl) containing 1 μCi S-adenosyl-L-[methyl-3H] methionine (AdoMet, Amersham Bioscience) at 30°C for 60 min(37). The SDS gel containing labeled protein was stained with Coomassie blue, soaked with Amplify solution for 15 min, before fluorography (Supplementary Fig. S8a). In methylation-induced binding assays, CoAA from partially purified fractions was immunoprecipitated and washed with the binding buffer for three times before methylated by CARM1. Methylation of immunoprecipitated CoAA was performed under the same methylation conditions as above except using cold AdoMet (0.25 mM, Sigma) for 20 min. Methylated CoAA captured on beads was further incubated with purified RNAP II and analyzed by coimmunoprecipitation.
Sequence Repeat Analysis
The uniqueness of the YxxQ repeats of CoAA and the IERM repeats of hnRNP M in the human genome was analyzed by ScanProsite at ExPASy Proteomics Server (http://ca.expasy.org), using protein databases at Swiss-Prot (release 44.3; 156998 entries) and TrEMBL (release 27.3; 1379120 entries). The diheptad sequence patterns in CoAA and hnRNP M were revealed by RADAR program (Rapid Automatic Detection and Alignment of Repeats www.ebi.ac.uk/Tools/Radar).
ACKNOWLEDGEMENTS
We thank Dr. Dorothy Tuan for critical suggestions, and Dr. Yun Kyoung Kang for help. We thank Dr. John W. Stiller and Dr. Pengda Liu for providing CTD-AA and CTD-YA, Dr. Harald König for providing pET-v5 minigene, Dr. Michael Stallcup for providing CARM1, and Dr. James L. Manley for providing the GST-CTD construct. This work was supported in part by the Georgia Cancer Coalition (L.K.), the NIH (W.X.) and the NIH NIDDK (B.W.O.).