Oncoprotein CoAA repeats interact with RNA polymerase II CTD repeats

The heptad repeating sequence of the C-terminal domain (CTD) of the largest subunit of RNA polymerase II is highly conserved in eukaryotes. In yeast, a CTD code consisting of pairs of heptad repeats is essential for viability. However, the strict requirement of diheptad repeats for the CTD function in transcription and splicing is unexplained. Here we show that CoAA (gene symbol RBM14), an oncoprotein and mammalian transcriptional coactivator, possesses diheptad repeats and directly interacts with the CTD. CoAA comprises 27 copies of tyrosine-rich repeats and regulates pre-mRNA synthesis and alternative splicing. Tyrosine substitutions in either the CoAA repeats or the CTD repeats diminish their interactions. Ser2- or Ser5-phosphorylated CTD peptides exhibit higher binding affinity to CoAA than the corresponding non-phosphorylated CTD peptide. CoAA dynamically interacts with both the CTD and hnRNP M, which is an alternative splicing regulator also comprising diheptad repeats. Arginine methylation of CoAA switches its interaction from the hnRNP M repeats to the CTD repeats. This study provides a mechanism for CoAA at the interface of transcription and alternative splicing, and explains the functional requirement of diheptad repeats in the CTD. In the human genome, tyrosine-rich repeats similar to the CoAA repeats were only found in six oncoproteins including EWS and SYT. We suggest that the diheptad sequence is one of the signature features for the CTD interaction among oncoproteins involved in transcription and alternative splicing. We anticipate that direct RNA Pol II interaction is a mechanism in oncogenesis.


INTRODUCTION
The CTD of RNA polymerase II (RNAP II) largest subunit is required for pre-mRNA synthesis and RNA processing (1)(2)(3)(4). This coordination requires the binding of an array of protein factors to the heptad sequence repeats (YSPTSPS) present in the CTD. Crystal structures have been determined in a number of CTD-modifying enzymes and factors involved in mRNA capping or polyadenylation (1). However, the puzzling requirement for multiple repeats in the CTD has yet to be explained. Extensive evidence supports for a requirement of tyrosine or serine phosphorylation and dephosphorylation of the CTD during transcription cycle (5)(6)(7)(8)(9)(10)(11)(12). In addition, genetic studies in budding yeast demonstrate that pairs of heptad CTD repeats constitute the essential functional unit (13,14). Yeast survival relies on the presence of undisrupted diheptad repeats, in which a dual requirement of tyrosine and serine residues confers essential CTD activities. These observations predict that certain mammalian CTDbinding proteins will recognize its tyrosines in a diheptad pattern and its phosphorylated serines. These CTD-binding proteins shall be critical in transcription and alternative splicing.

RESULTS AND DISCUSSION
Transcriptional coactivators stimulate gene activation and recruit the RNAP II complex during transcription (15). We find here that a transcriptional coactivator CoAA (coactivator activator, gene symbol RBM14) is a direct CTDinteracting protein with tandem diheptad repeats. CoAA was initially identified as a coactivator (16), and regulates transcription-coupled pre-mRNA alternative splicing (17). The human CoAA gene is amplified in cancers with recurrent deletion of its enhancer sequence (18). This deletion causes a defect in CoAA alternative splicing and blocks stem cell differentiation in tumorigenesis (19,20). CoAA gene amplification is found in mutant immune cells in tumor microenvironment of solid cancers associated with chronic inflammation (21).
CoAA contains two N-terminal RNA recognition motifs (RRMs) and a C-terminal activation domain containing 27 copies of tyrosineand glutamine-rich repetitive sequence (Fig. 1a), previously termed the YxxQ repeats (16,18). The transcriptional activity of CoAA requires its YxxQ repeats. In the mammalian genome, repetitive sequences homologous to the YxxQ repeats of CoAA are present in only six oncoproteins including SYT and the EWS family members based on the database analysis at ScanProsite (18,22) ( Supplementary Fig. S1). The sequence repeats in these oncoproteins are transcriptionally active and have been shown to be necessary for tumor initiation (21,23).
To seek proteins that interact with the intriguing YxxQ repeats of CoAA, we carried out mass spectrometry using recombinant YxxQ repeats (307-545) as bait. The most efficiently bound proteins were identified as heterogeneous ribonucleoprotein hnRNP M isoforms (Supplementary Table S1, in a separate PDF file), and immunoblotting of bound proteins also detected RNAP II ( Supplementary Fig. S2). hnRNP M is known to interact with pre-mRNA and to regulate pre-mRNA splicing on/off during heat shock (24)(25)(26). hnRNP M contains two N-terminal RRM domains, 27 copies of methionine-rich IERM repeats (24), and a C-terminal RRM domain ( Fig.  1a and Supplementary Fig. S3). CoAA has a related secondary structure with hnRNP M and also belongs to the hnRNP family (16). Due to less abundance, CoAA protein was not initially identified in the hnRNP protein complex relying 2D gel analysis. The presence of the imperfect 27-copy repeats in both CoAA and hnRNP M is a striking shared feature in addition to their RRM domains. Importantly, pairs of heptads were identified in both the YxxQ and the IERM repeats similar to that of the CTD (Fig. 1a and Supplementary Fig. S4).
Both CoAA and hnRNP M are RRM proteins involved in transcription and alternative splicing. We next investigated their RNA-dependent in vivo associations with transcription and splicing complexes using RNAP II and the splicing regulator PSF as markers, respectively. Using coimmunoprecipitation (co-IP) of nucleus lysates from cells, CoAA interacted with hnRNP M; however, their association towards transcription and splicing complexes appeared to be dynamic. Specifically, CoAA is closely associated with RNAP II but not PSF, independent of RNA. hnRNP M is closely associated with PSF but not RNAP II, dependent of RNA (Fig. 1b). Consistently, CoAA but not hnRNP M is found in the complex containing phosphorylated RNAP II ( Supplementary Fig. S5), and only a fraction of CoAA is colocalized with RNAP II in the nucleus ( Supplementary Fig. S6). The interaction of CoAA and hnRNP M may reflect a dynamic association of transcription and splicing complexes. These results nonetheless showed that different pools of CoAA and hnRNP M exist in vivo.
To test the potential direct interactions among repeat-containing CoAA, hnRNP M, and RNAP II, we carried out in vitro pull-down studies with mutagenesis. CoAA YxxQ repeats directly interacted with hnRNP M IERM repeats in vitro (Fig. 1c). In addition, consistent with co-IP observations, CoAA but not hnRNP M interacted with serine-phosphorylated recombinant CTD (Fig.  1d). When the CTD was mutated with alanine insertions (CTD-YA) to disrupt its diheptad feature (Fig. 1a), a lethal phenotype in yeast (14), the CTD binding to CoAA was diminished (Fig. 1d). When the CTD was mutated but preserving the diheptad repeats (CTD-AA), a viable phenotype in yeast, the CTD binding to CoAA was present except reduced. When mutations were introduced to the 27 tyrosine residues in CoAA YxxQ repeats through gene synthesis, alanine mutations AxxQ abolished the CTD binding but phenylalanine mutations FxxQ did not (Fig. 1e). Together, these data indicate that the tyrosine residues in CoAA are required for the diheptad CTD interaction.
We further confirmed the direct interaction between CoAA and the CTD through peptide binding studies. Recombinant GST fusion CoAA YxxQ repeats (307-545) were coated on a 96-well plate and incubated with synthetic biotin-labeled CTD diheptad peptides. Bound peptides were detected via streptavidin conjugates. EC50 can be measured using binding kinetics. We compared unphosphorylated, Ser2-or Ser5-phosphorylated and tyrosine to alanine mutated CTD peptides. The Ser2-and Ser5-phosphorylated diheptad peptides bound to the YxxQ repeats with median effective concentration EC50 at 135 nM and 101 nM, respectively (Fig. 2a). The binding of unphosphorylated CTD peptide occurred at a much lower affinity with EC50 at 1066 nM. However, the tyrosine to alanine mutant of the CTD peptides essentially abolished the interaction of the YxxQ repeats, binding was not detectable (ND). As controls, the IERM peptide bound to the YxxQ repeats with EC50 at 83 nM. The recombinant CoAA AxxQ mutant failed to bind to any of these peptides ( Fig. 2a and Supplementary Fig. S7). A reciprocal binding assay using the YxxQ diheptad peptide and recombinant mutated CTD-AA and CTD-YA proteins indicated the diheptad feature of the CTD is required for the interaction (Fig. 2b).
Furthermore, a competition assay showed that the IERM peptide was able to compete with the Ser5-phosphorylated but not with alanine mutated CTD for binding to CoAA (Fig. 2c). These data collectively indicate that the serine phosphorylation of CTD is critical for high binding affinity. The tyrosine residues in the CTD, similar to that in CoAA, are essential for the interaction. The diheptad peptides from both repeats are sufficient as minimal binding units consistent with the previous finding in yeast (13). In vivo, however, higher binding efficiency would be expected with longer repeats. It might also be physiologically meaningful that CoAA binds to the CTD and hnRNP M with comparable affinity so that their binding could be interchangeable upon regulation.
The transcriptional and splicing activities of CoAA and hnRNP M were further analyzed to gain more insights into their interrelationship. CoAA is transcriptionally active and hnRNP M is not (Fig.  3a). The tyrosine to alanine AxxQ but not to phenylalanine FxxQ mutations in the YxxQ repeats abolished CoAA activity, which can be explained by the requirement of tyrosine residues in the CTD interaction (Fig. 1e). When individual domains were tested by Gal4-fusion system, only the YxxQ or FxxQ repeats had potent transcriptional activity reflecting their direct interactions with the CTD (Fig. 3b).
Using a splicing cassette of CD44 minigene, we found that CoAA and hnRNP M counterregulate alternative splicing choices (Fig. 3c). In addition, chromatin immunoprecipitation (ChIP) analysis of the CoAA gene, known to be regulated by CoAA itself (18), showed non-overlapping chromatin binding profiles between CoAA and hnRNP M (Fig.  3d). While the chromatin occupancy of CoAA parallels with that of RNAP II at the CoAA exon regions, hnRNP M showed increased chromatin binding following the decrease of CoAA. CoAA and hnRNP M potentially balance exon inclusion and skipping through their heterodimerization. Their preferential association to RNAP II complex or splicing complex might provide a bridging connection for transcription-coupled splicing ( Supplementary Fig. S4d), in which the higher transcription rate often links to one splicing choice and the lower rate to another (27).
Although CoAA and hnRNP M are likely regulated at multiple levels during transcription, we examined their protein arginine methylations for two reasons. First, the hnRNP family proteins are extensively methylated during transcription (28)(29)(30). Second, arginine dimethylation has been shown in EWS, an RRM protein with homology to the CoAA YxxQ repeats (31) (Supplementary Fig. S1). Our results indicated that CoAA but not hnRNP M is arginine methylated by CARM1 predominantly at the regions surrounding the YxxQ repeats ( Fig. 4a and Supplementary Fig. S8a). When fractionated CoAA from cell nucleus was tested for the CTD interaction upon CARM1 methylation, CoAA that co-purified with hnRNP M had increased CTD binding preferentially to phosphorylated CTD. In contrast, CoAA free from hnRNP M failed to bind to the CTD (Fig. 4b-c). During dose-dependent CARM1 treatment, CoAA switched binding from hnRNP M to the CTD upon arginine methylation (Fig. 4d). Our results do not exclude the presence of additional possible regulations including methylation by other arginine methyltransferases, but nonetheless demonstrate regulated CoAA binding with hnRNP M switching to the CTD under methylation (Fig. 4e).
The CTD heptad sequence is strongly conserved in eukaryotes. This study provides the first example to our knowledge of a CTD-binding protein with diheptad repeats. The CTD is a compact -spiral structure and becomes relaxed or exposed upon serine phosphorylation (32). Consistent with this, CoAA recognizes the minimal CTD diheptad peptide through a pair of its tyrosine residues whose conformation is optimized by nearby Ser5 or Ser2 phosphorylation. On the other hand, the CoAA YxxQ repeats appear to require regulation, such as surrounding arginine methylations, in order to be accessible to the CTD. CoAA becomes more proteolytically sensitive upon arginine methylation as well as hnRNP M interaction ( Supplementary Fig. S8b). Since all three molecules possess diheptad peptides, it remains to be determined whether hnRNP M interaction is required for CoAA prior to bind to the CTD. In conclusion, this study together indicates that CoAA diheptad repeats directly interacts with RNAP II CTD repeats.
The multiple tyrosine and glutamine-rich repeats in mammalian protein databases are restricted to only a few oncoproteins including CoAA (18). This implicates a fundamental important role of oncoproteins as the CTD interaction, whose defect impacts transcriptioncoupled alternative splicing. CoAA is previously shown to control stem cell differentiation at initial stage, and gene amplification in CoAA disrupts its own alternative splicing and blocks stem cell differentiation (18,19,33). Therefore, the defect in both stem cell regulation and RNA polymerase II interaction in oncoproteins is a conceivable mechanism in oncogenesis (21). CTD-AA and CTD-YA, Dr. Harald König for providing pET-v5 minigene, Dr. Michael Stallcup for providing CARM1, and Dr. James L. Manley for providing the GST-CTD construct. This work was supported in part by the Georgia Cancer Coalition (L.K.), the NIH (W.X.) and the NIH NIDDK (B.W.O.).

CONFLICT OF INTERESTS
LK is an inventor of CoAA patent. BWO receives research support and equity in Coactigon, Inc., a company designed to produce future anti-cancer drugs.    Supplementary Figure S7. In vitro binding of diheptad peptides. a-c, The data sets in Fig. 2 were analyzed for the percentage of peptide bound as a function of peptide concentration. d, Diheptad peptide sequences, peptide molecular weight, and determined EC50 and IC50 values are listed with color corresponding to graphs. ND indicates the binding affinity below detectable level. In biotin-labeled peptides, two lysine residues and a C6 spacer (aminocaproic acid) are inserted between biotin and peptides in the format: [Biotin]-KK-C6-peptide. e, Coomassie blue staining of recombinant GST fusion proteins used in the assays. Figure S8. Arginine methylation of CoAA by CARM1. a, CoAA but not hnRNP M is methylated by CARM1. GST fusion proteins of CoAA or hnRNP M were tested for arginine methylation using CARM1 methyltransferase and labeled with [3H]-AdoMet in vitro using Histone H3 as a positive control. Auto-methylated CARM1 is indicated by asterisks. Same gel was fluorographed for methylation and Coomassie blue stained for protein loading. Exposure time was either 1 (shown in Fig. 4a) or 8 days as indicated. Positive bands were indicated by open circles. Degraded GST fragments were not methylated. b, Arginine methylation by CARM1 and hnRNP M interaction increase proteolytic sensitivity of CoAA. CoAA from the Mono Q fraction of 0.3 M NaCl (0.1 mg/ml) was either CARM1 methylated or incubated with His-tagged recombinant hnRNP M (0.02 mg/ml) before being subjected to proteolysis with increasing amounts of trypsin (1, 0.8, 4, 20, 100, 500 ng/ml). Proteolytic products were analyzed by anti-CoAA Western blot. c, Partial purification of RNAP II. HeLa nuclear extract was fractionated by gel filtration using a Sephacryl S-200 HR 10/30 column. Eluted fractions at 10-12 ml were collected for further Mono Q ion exchange purification. Each fraction was blotted with combined anti-CTD antibodies 8WG16 (1:200) and H5 (1:200).

Mass Spectrometry
GST-YxxQ (10 μg) was incubated with HeLa nuclear extracts in the binding buffer (20 mM Tris-HCl [pH 7.6], 50 mM NaCl, 75 mM KCl, 1 mM EDTA, 10% glycerol, 0.1% Triton X-100, 1 mM DTT and protease inhibitors). Bound proteins excised from preparative SDS-PAGE were washed with 50% acetonitrile and subjected to mass spectrometry analysis at the Harvard Microchemistry Facility. Sequence analysis included proteolytic digestion, microcapillary HPLC nano-electrospray tandem ion trap mass spectrometry, and MS/MS peptide sequence determination. Identifications were made with the Algorithm Sequest program.

Recombinant Protein Binding Assays
In vitro binding assays were performed by incubating GST-YxxQ resin (20 l, 2 g) and 35 Smethionine-labeled, in vitro translated hnRNP M protein fragments (5 l) produced by rabbit reticulocyte lysate using TNT Quick Coupled Transcription/Translation Systems (Promega). Proteins were incubated at 4 o C for 2 hours in the binding buffer above. Bound proteins were washed 3 times with the binding buffer and subjected to SDS-PAGE and autoradiography. For CTD phosphorylation in vitro, GST fusion proteins of the CTD and its mutants CTD-AA and CTD-YA were phosphorylated by Cdc2 kinase (NEB) using 32 P--ATP for 30 minutes at 30°C before incubated with immunoprecipitated CoAA or hnRNP M. Partially degraded GST-CTD protein fragments were not able to be phosphorylated but present in the system. Bound proteins were detected by autoradiography.

Western Blotting and Co-immunoprecipitation (Co-IP)
Nuclear extracts were isolated by incubating cells in buffer A (20 mM HEPES, pH 7.4, 10 mM KCl, 1 mM EDTA, 1 mM EGTA, 0.1% Triton X-100, 1 mM DTT) with addition of leupeptin, aprotinin and trypsin inhibitor at 10 mg/ml for 15 min on ice. The cell pellets were then extracted in buffer B (20 mM HEPES, pH 7.4, 420 mM NaCl, 10 mM KCl, 1 mM EDTA, 1 mM EGTA, 0.5 mM MgCl2, 1 mM DTT) with protease inhibitors for 30 min. Nuclear extracts were further filtered using a 0.65 m core size spin column (Millipore) to completely remove insoluble cellular debris. For coimmunoprecipitation, 5 ul of antibodies were captured by Protein A/G agarose (Santa Cruz) for IgG or by Immobilized Protein L agarose (Pierce) for IgM. The immune complexes were washed before incubating with 1:10 diluted nuclear extracts in the binding buffer. The precipitates were washed and subjected to Western blotting analysis using appropriate primary antibodies. The blots were detected with the ECL system (Amersham Pharmacia).
Cell Culture and Transient Transfection CV1 cells were maintained in DMEM supplemented with 10% fetal bovine serum and 5 µg/µl penicillin/streptomycin in 5% CO2 at 37 °C. In GAL4 reporter assays, cells in 24-well plates were transfected in triplicates with the 5XGALluciferase reporter (100 ng) and GAL4 fusion plasmids (200 ng) using Lipofectin (Life Technologies, Inc.). In MMTV reporter assays, cells were transfected with MMTV-luciferase (100 ng), glucocorticoid receptor (10 ng), CoAA (200 ng), or hnRNP M (200 ng) plasmids per well and induced by dexamethasone (Dex) (100 nM) for an additional 16 hours after transfection. Luciferase activities were measured by a Dynex luminometer. Data are shown as means of triplicate transfections ± standard errors.

Alternative Splicing Analysis
The pETv5 alternative splicing reporter contains the CD44 variable exon 5 and its adjacent intron sequences inserted between pre-proinsulin exons 2 and exon 3 (36). The detecting primers on pre-proinsulin exons distinguish the minigene from the endogenous transcripts. 293 cells were transiently transfected with pETv5 minigene driven by RSV promoter together with CoAA or hnRNP M expression vectors or with their siRNA (100 nM) as indicated. Total RNA from cells was prepared with Trizol reagent (Invitrogen), treated with DNase I, and followed by RT-PCR. RT-PCR primers are as follows: sense, AGTGGATCCGCTTCCTGCCCC; antisense, CTGCCGGGCCACCTCCAGTGCC. The target sequence of siRNA (Dharmacon) is 5'-AGAUUAUCCAUGCAUUACA-3' for hnRNP M, 5'-GUAACCAGCCAUCCUCUUA-3' for CoAA, and 5'-UAGCGACUAAACACAUCAA-3' for the control.

Chromatin Immunoprecipitation (ChIP)
HeLa cells were incubated with 1% formaldehyde for 10 min to crosslink proteins and DNA, and the reaction was stopped by 125 mM glycine. Cells were lysed and sonicated in buffer containing 20 mM Tris pH 8.0, 75 mM NaCl, 75 mM KCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 10% glycerol, 1 mM DTT, and protease inhibitors. Immunoprecipitation was carried out using salmon sperm DNA-blocked protein A/G resin (Upstate) and individual antibodies. The crosslinking was reversed by eluting with 0.1 M NaHCO3, 1% SDS, 0.3 M NaCl at 65°C for 4 hours. Purified DNA (Qiagen kit) was subjected to real-time PCR analysis. Primer pairs used on the CoAA gene are listed in Supplementary Table S2.

Protein Chromatography
RNAP II was partially purified from HeLa nuclear extracts using Sephacryl S-200 HR 10/30 gel filtration column with buffer A (20 mM Tris pH 8.0, 1 mM EDTA, and 1 mM DTT) containing 150 mM NaCl (Supplementary Fig. S8c). CoAA or desalted RNAP II fractions were further purified by a Mono Q column (5 ml) equilibrated with buffer A containing 10 mM NaCl and step eluted with buffer A containing 0.