Abstract
Transcription of the ribosomal RNA precursor by RNA polymerase (Pol) I is a major determinant of cellular growth and dysregulation is observed in many cancer types. Here, we present the purification of human Pol I from cells carrying a genomic GFP-fusion on the largest subunit allowing the structural and functional analysis of the enzyme across species. In contrast to yeast, human Pol I carries a single-subunit stalk and in vitro transcription indicates a reduced proofreading activity. Determination of the human Pol I cryo-EM reconstruction in a close-to-native state rationalizes the effects of disease-associated mutations and uncovers an additional domain that is built into the sequence of Pol I subunit RPA1. This ‘dock II’ domain resembles a truncated HMG-box incapable of DNA-binding which may serve as a downstream-transcription factor binding platform in metazoans. Biochemical analysis and ChIP data indicate that Topoisomerase 2a can be recruited to Pol I via the domain and cooperates with the HMG-box domain containing factor UBF. These adaptations of the metazoan Pol I transcription system may allow efficient release of positive DNA supercoils accumulating downstream of the transcription bubble.
Introduction
Transcription of DNA into RNA is carried out by three nuclear polymerases (Pols) in most higher eukaryotes1. These multi-subunit Pols diverge in target loci, structure and regulation 2. Understanding the underlying molecular mechanisms is a central goal of molecular biology. However, these mechanisms have been mostly studied in lower model organisms due to experimental limitations. In higher eukaryotes, regulatory variations dependent on tissue type, developmental state and cell-cycle stage are adding additional layers of complexity. The structure-function analysis of human Pol II3 and Pol III4–7 showed both similarities in the catalytic mechanisms and divergence in regulatory elements among organisms.
Human RNA polymerase (hPol) I has a single target gene, the 47S ribosomal RNA precursor (pre-rRNA), from which the 5.8S, 18S and 28S rRNA are processed8. These processed RNAs contribute to ribosome formation together with the 5S rRNA synthesized by Pol III9. rRNA synthesis contributes up to 80% of total cellular RNA10 and must therefore be tightly regulated. Hence, dysregulation of hPol I is associated with pathologies, such as cancer and developmental diseases, for example Treacher Collins Syndrome11. Unsurprisingly, inhibition of hPol I has been explored as a therapeutic strategy with some success in cancer treatment and future potential12. The molecular action of rRNA synthesis inhibitors is not entirely understood and may range from the activation of DNA-damage responses upon interference with replication13 to a specific reduction of Pol I transcription by preventing promoter escape during initiation14 or inhibiting elongation15.
The composition of hPol I is similar to yeast Pol I16 of which detailed crystal structures are known17,18. A catalytic core of ten subunits is complemented by a protruding stalk subcomplex and a heterodimeric RPA49/RPA34 subcomplex. The latter is related to Pol II initiation factors TFIIF and TFIIE19 and has homologues in Pol III20. The stalk was proposed to be divergent between yeast and human, as DNA- and protein sequence based searches have not identified an homologue of subunit A14 in human cells16. Table 1 summarizes the subunit terminologies for yeast and mammalian Pol I in comparison to human Pol II and Pol III subunits and correlates nomenclature. Regulation of Pol I is diverse21 and can be achieved by post-translational modification (PTM) of Pol I subunits or transcription factors. Nutrient availability22 and growth factor signal transduction23 activate Pol I initiation by phosphorylation of initiation factor Rrn3. Rrn3 is essentially conserved among species24–26 and primes Pol I for initiation by interacting with the stalk subcomplex27–30. Furthermore, dephosphorylation of the stalk is required for efficient Pol I function in yeast31 and hyper-acetylation of RPA49 reduces Pol I activity under stress32.
Functionally, hPol I transcription has been studied in extracts or partially purified systems33. In contrast, yeast Pol I transcription could be studied in detail using purified and recombinantly expressed components, allowing a clear definition of subunit functionalities in transcription initiation28,34, elongation35,36, cleavage37, backtracking38,39 and termination40,41. Such studies allowed a detailed dissection of (sub-)domain and transcription factor functions.
Due to the lack of a well-defined in vitro system consisting of purified components, it is unclear whether the results of structure-function studies can be transferred to higher organisms. Apparently, many factors are conserved functionally but diverge in composition42. In addition to RRN3 the hPol I transcription requires the initiation factors ‘Selectivity Factor 1’ (SL1) and ‘upstream binding factor’ (UBF). SL1 comprises the subunits TAF1A, TAF1B, TAF1C (homologues of yeast Core Factor), the two additional factors TAF1D43 and TAF1244, and includes the TATA-binding protein (TBP). UBF consists of six consecutive HMG boxes, is a part of initiation complexes45 and binds to the body of actively transcribed rDNA genes46, apparently preventing re-association of nucleosomes.
It remains poorly understood how Pol I structurally and functionally adapted to the increased regulatory demands in human cells. Here, we show how hPol I can be exclusively purified from a modified human cell line in its natural form and determine its structure by single-particle electron cryo-microscopy (cryo-EM). The structure reveals a previously unknown, built-in platform that may allow docking of transcription factors on the downstream face of the polymerase. Phylogenetic analysis allows following the evolution of Pol I by the loss of a subunit and the gain of additional domains in higher organisms. We present in vitro transcription assays demonstrating a limited proofreading ability of the human enzyme and map known mutations on the structure to understand Pol I -related pathologies.
Results
Specific tagging and purification of human RNA polymerase I
To study the structure and function of hPol I in vitro, we first created a cell line that allows the specific enrichment of the complete enzyme in its native state without contamination of hPol III. Using the CRISPR/Cas9 technology in a dual-nicking approach, a cleavable sfGFP tag was fused to the genomic sequence of the largest Pol I subunit RPA1 of the Hela P2 cell line47. Following identification of positive clones by single-cell FACS based on GFP fluorescence intensity, correct insertion was confirmed by site-specific PCR. Homozygous insertion was verified by western blot against subunit RPA1 (Fig. 1). The approach we previously reported for the generation of an RPAC1-tagged cell line4 can hence be generally applied for reliable homozygous knock-in of C-terminal fusion tags.
hPol I purification from lysates of the RPA1-sfGFP cell line relies on a single affinity purification step followed by site-specific tag-cleavage, resulting in a highly enriched sample (Fig. 1C; Sup. Fig. 1). As judged by mass spectrometry (Sup. Fig. 2), the sample partially co-purifies with the initiation factor RRN3 and contains stoichiometric amounts of hPol I subunits, including the RPA49/RPA34 sub-complex, which is sub-stoichiometric in rat Pol I purifications48. An optional subsequent ion-exchange chromatography step resulted in the loss of initiation factor RRN3 and the RPA49/34 subcomplex from most polymerases (Supplemental Fig. 1B).
Human Pol I shows reduced proofreading in vitro
Equipped with a cell line that allows the specific enrichment of hPol I, we now aimed at a detailed structural and functional characterization of this enzyme in vitro. To understand functional conservation, we first compared purified hPol I activity with its counterparts from S. cerevisiae and S. pombe in an in vitro elongation and cleavage assay. A fluorescently labeled RNA primer is extended in the presence of nucleotide triphosphates (NTPs) by Pol I, or cleaved due to the action of the TFIIS-related subunit RPA12 (Fig. 2A). While yeast Pol I specifically incorporates the correctly base-paired substrate, hPol I generates transcripts containing incorrectly incorporated NTPs under identical experimental conditions (Fig. 2B).
Furthermore, the cleavage pattern of yeast and human Pol I in the absence of NTPs diverges. While the 3’-end of the perfectly base-paired RNA primer can be cleaved up to three nucleotides by Sc and Sp Pol I, the main product of hPol I cleavage is at position -1, indicating a reduced backtracking ability. To exclude effects from potential sub-stoichiometry of the RPA49/34 complex, we added recombinantly co-expressed human RPA49/34, but observed neither increased backtracking/cleavage, nor reduced generation of mismatched transcripts (Sup. Fig. 3B). Similarly, the addition of recombinant Rrn3 to Sc Pol I does not hamper is functionality (Sup. Fig. 3D), suggesting that the observed effects do not originate from RRN3 present in the sample.
To test the influence of the substrate scaffold, we added a non-template (nt) strand with a mismatched bubble and tested a wealth of different template sequences (Sup. Fig. 3 D-I). On a mismatched bubble-template, backtracking is impaired even further, while the incorporation of incorrect NTPs generally remained, but showed some sequence specific variations in intensity. Functional analysis of substrate mis-incorporation rates indicated a similar effect when Sc Pol I is compared to Sc Pol II49. This is well in line with our observations and may originate from the flexibility among Pol I core and shelf modules as discussed50,51. To understand the evolution of Pol I and to rationalize the functional differences between the enzymes of different species, we determined the structure of human Pol I by cryo-EM.
Structure determination of hPol I
Whereas the structure of yeast Pol I has been extensively studied by X-ray crystallography17,18,52 and single-particle cryo-EM53–55, the human enzyme eluded structural characterization thus far. In a first step, negative stain EM screening revealed intact particles (Sup. Fig. 1C-E) and a 3D reconstructed negative stain envelope indicated an architecture comparable to S. cerevisiae Pol I. However, many particles show flexibilities in the clamp/stalk region that originate from heterogeneity or functional flexibility. High-resolution structure determination by single-particle cryo-EM was hampered by intrinsic flexibility and a strong bias in orientation distribution of hPol I particles. Finally, data collected from self-made graphene oxide-covered grids reduced orientational bias of non-crosslinked particles after extensive screening for preparation conditions (Pilsl et al., Methods Mol Biol, in press). We collected a total of 9,709 micrograph movies on a CryoARM 200 (JEOL) electron microscope equipped with K2 direct electron detector (Gatan) at a pixel size of 0.968Å. Preprocessing and particle picking in Warp56 was followed by binning and 2D classification in RELION 4.057, yielding 145,554 particles that were subsequently subjected to sequential 3D classification (Sup. Fig. 4; Sup. Table 1). A 3D reconstruction with an overall resolution of 4.09 was obtained, revealing secondary structures for most regions of the molecule. Models for common subunits RPABC1-ABC5 and the RPAC1/2 assembly were transferred from a hPol III reconstruction5. Homology models of the hPol I subunits RPA1, RPA2, RPA49, RPA34, RPA12 and RPA43 were generated based on sequence and secondary structure alignments with the crystal structures of their S. cerevisiae counterparts (Sup. Data 1) using the MODELLER software package58. Model fitting and rigid body refinement allowed interpretation of both negative stain and cryo-EM densities and later aided by AlphaFold predictions59.
To the knowledge of the authors, this is the first example for the de novo reconstruction of a previously unknown, non-symmetric macromolecule obtained with a CryoARM 200 electron microscope. Details of data collection and handling strategies are similar to recent reports60–62 and are described in the methods section.
Insights into hPol I architecture
The negative stain density shows both, the stalk and the RPA49/RPA34 heterodimer (Sup. Fig. 1E). However, some 3D classes lack density for the region of the clamp core and clamp head domain of subunit RPA1 and the stalk, indicating a high flexibility of this sub-assembly.
The cryo-EM reconstruction of hPol I (Fig. 2D) shows connected density for the common hPol subunits RPABC1-5, the RPAC1/2 dimer, the N-terminal domain of subunit RPA12 and most parts of subunit RPA2, with exception of the C-terminal clamp and anchor domains (residues 1010-1134). Furthermore, density for the jaw, funnel, foot and most parts of the cleft domain of subunit RPA1 (residues 630-1661 excluding loops) and for the RPA49/34 heterodimer allowed unambiguous fitting of homology models. In our reconstruction, weak density for the stalk subcomplex, the clamp and dock domains of subunit RPA1 indicate increased shelf module flexibility. Similar to yeast Pol I crystal structures, the linker and tWH domains of subunit RPA49 and the C-terminal extension of subunit RPA34 are also flexible in human Pol I.
The assembly of RPAC1/2 reflects the conformation known in hPol III and tightly interacts with subunit RPA2. The N-terminus of subunit RPA12 can be placed on the lobe of subunit RPA2, demonstrating the stable association of the subunit. Global contraction of Pol I modules upon activation has been observed in the enzymes of S. cerevisiae17,18 and S. pombe63 and may be a regulatory feature of Pol I50,51. Negative stain EM and cryo-EM sample freezing of hPol I complexes without the use of crosslinking reagents to artificially stabilize conformations may indicate a close-to-native state of functional importance in the human Pol I. Overall, the architecture of hPol I reflects that of the yeast counterparts, but allows insights into the effects of Pol I-related mutations identified in human disease and reveals two major adaptations accumulating upon evolution: the stalk sub-complex (flexible in our density) and the RPA1 foot domain.
Mapping of disease-associated mutations to Pol I subunit structures rationalizes enzyme deficiencies
Four disease phenotypes were linked to mutation of Pol I subunits in humans: Acrofacial Dysostosis (Cincinnati type)64,65, Treacher-Collins syndrome (TCS)66–69, Hypomyelinating Leukodystrophy (HL)68,70, and a juvenile neurodegenerative phenotype akin to the HL-phenotype71. With the structural model of hPol I determined (Fig. 2), we mapped these known mutations to gain insight into the underlying molecular pathologies (Sup. Fig. 6).
Acrofacial Dysostosis, Cincinnati type, leads to craniofacial abnormalities during development and is caused by mutations E593Q and V1299F in subunit RPA164,65. Mutation E593Q is located in proximity to the catalytic center and may directly affect the nucleotide addition (Sup. Fig. 6c). In contrast, V1299F is situated on the interface of RPA1 with RPA12 and may destabilize the association of this subunit with the hPol I core (Sup. Fig. 6d).
Treacher-Collins syndrome (TCS) is a craniofacial developmental disease caused by various mutations in the genes TCOF1, POLR1B, POLR1C or POLR1D. Serine 682 of RPA2 directly contacts the bridge helix (likely H967 of RPA1) which may be affected by the mutation resulting in partially hindered translocation (Sup. Fig. 6f). In contrast, R1003 of subunit RPA2 is situated in the DNA/RNA binding cleft and may be required to stabilize folding of the hybrid-binding domain within RPA2 (Sup. Fig. 6g). Hence R1003C and R1003S69 may lead to a destabilization of RPA2 and thus the active center. Other TCS-associated mutations within subunit RPAC2 (E47K, T50I, L51R, G52E, L55V, R56C, L82S, G99S) cluster at intra-subunit and RPAC1 inter-subunit contacts66–68 (Sup. Fig. 6h). Structural alignment with the human Pol III structure reveals a similar fold and suggests destabilizing effects of these mutations, similar to R279Q/W of subunit RPAC1. Therefore, polymerase-associated TCS mutations can be functionally classified according to their effects: (1) Impaired Pol I transcription activity (RPA2 mutations) and (2) Effect on Pol I and Pol III transcription.
Similar to TCS, Hypomyelinating Leukodystrophy (HL) is a neurodegenerative disease that cannot be classified as a Pol I- or Pol III-associated disease per se. HL mutations are found in subunit RPAC1 which is shared between both polymerases or in Pol III subunits RPC1 and RPC268,70. Comparing the structures of hPol I and hPol III shows that mutations of the RPAC1 N-terminus (T26I, T27A, P30S, N32I) are likely to have a Pol III-specific effect as this region appears flexible in hPol I, but mediates interactions to the polymerase core in Pol III (Sup. Fig. 6i). The nearby mutation N74S (as N32I) affects Pol III assembly but apparently does not impair Pol I biogenesis or nuclear import68. Additionally, RPAC1 mutations I105F, H108Y and R109H were found to impair RPC2 interaction in hPol III but not RPA2 in Pol I, again suggesting Pol III-specificity (Sup. Fig. 6j). Additional RPAC1 mutations M65V, V94A, A117P, G132D, C146R, R191Q, I262T, T313M and E324K are involved in the formation of intra-subunit contacts, likely affecting RPAC1 folding itself (Sup. Fig. 6e).
Finally, the mutation S934L in RPA1 is associated with a juvenile neurodegenerative phenotype akin to the HL-phenotype associated with Pol III disruption71. This mutation occurs in a small loop of RPA1 which forms contacts with RPA2 in the vicinity of the bridge helix N-terminus (Sup. Fig. 6b). This may generally disrupt and destabilize the Pol I core to some extent.
A single-subunit stalk is the predominant configuration for Pol I
One of the major differences between Pol I enzymes of different organisms lies within the stalk subcomplex. DNA- and protein-sequence based searches identified homologues for 13 of the 14 yeast Pol I subunits except for the stalk-subunit A14 16. Divergence of the stalk subunits among DNA-dependent RNA polymerases is well documented. Compared to the Pol II stalk, a domain-swap between yeast Rpb4 and Rpb7 and the yeast Pol I stalk subunits A14 and A43 was observed in the crystal structure of the Pol I subcomplex37,72. With this swap, subunit A14 appears to harbor limited functional importance. Deletion of the subunit in S. cerevisiae is not lethal but results in conditional growth defects indicating regulation deficiencies73,74, similar to observations in S. pombe75.
To analyze whether hPol I indeed carries a single-subunit stalk, mass spectrometric analysis of all protein bands in our purification was performed. The 13 subunits identified in situ and initiation factor RRN3 were found to be present with sequence coverages over 25 % (Sup. Fig. 2). Additional proteins were not identified with similar confidence. To clarify whether the absence of a second Pol I stalk subunit is specific to human cells and to understand the changed composition of the enzyme during its evolution, we carried out bioinformatic analysis: First, we generated a phylogenetic tree based on sequence similarity of the Pol I subunits RPA1, RPA34 and RPA43 to cover the polymerase core and the peripheral sub-complexes (Fig. 3). Generating a Pol I-specific conservation tree removed bias that may originate from the influence of unrelated genes on global alignments in standard phylogenetic analysis. We clearly find that only organisms of the Saccharomycotina in the Dikarya clade carry sequences for the subunit A14, indicating that a single-subunit stalk is the standard Pol I configuration.
Built-in transcription factors differ among organisms
Phylogenetic analysis also showed that the ‘expander’ (DNA-mimicking) element is present in all analyzed organisms. This flexible insertion in the jaw domain of the largest subunit mimics DNA binding to inactive Pol I dimers17,18 or monomers63.
The RPA49/34 heterodimer resembles the yeast A49/A34.5 sub-complex with functions in initiation and elongation28,35,76 and is present in cryo-EM reconstructions. The subcomplex is related to the Pol II initiation factors TFIIF and TFIIE19 and stays attached to the Pol I core throughout its transcription cycle in vivo77, but may be lost under some conditions in vitro37,55,63. The TFIIE-related, C-terminal tWH domain of subunit RPA49 is flexible in our reconstructions as expected for Pol I monomers and most elongation states. Similarly, we do not observe density for the mammalian-specific C-terminal extension of subunit RPA34 (compare Fig. 2/3). This is also the case for a C-terminal extension of the hPol III subunit RPC5 that contributes to enzyme stability despite being flexibly linked4.
The C-terminal domain of RPA34 is enlarged to 55 kDa in humans compared to the 27 kDa yeast protein (Fig. 2C; Supplementary Data 1). The C-terminal extension is present in higher organism classes, such as Mammalia and Amphibia, but shows no clear conservation in sequence, predicted secondary structure or length (Fig. 3), and is flexible in our cryo-EM reconstruction. To determine functional similarity with the yeast counterparts, we tested binding of recombinant human RPA49/34 to the S. cerevisiae enzyme purified from an A49 deletion strain resulting in a 12-subunit Pol I (Pol IΔ). Direct cross-species binding of the RPA49/34 heterodimer to Sc Pol I in vitro was not possible, likely due to divergence of the charged tail region (‘ARM’) of RPA34 and its binding site on the ‘external’ domain of the second largest subunit RPA2.
In contrast to direct interaction, functional cross-species complementation of recombinant yeast and human sub-complexes was possible (Sup. Fig. 3C). Recombinant Sc A49/34.5 and Hs RPA49/34 both recovered the activity of hPol IΔ in elongation and cleavage. Hence, interaction interfaces apparently co-evolved, while subcomplex function was retained from yeast to human. Both, Sc and Hs RPA49/34 can bind to DNA independent of core Pol I (Sup. Fig. 5C). While the main interface with DNA apparently lies within the TFIIE-related tWH domain of RPA49, the flexible and divergent RPA34 tail is capable of independent DNA-interaction. Notably, the elongation and cleavage pattern indicated no major differences depending on the type of heterodimer added (Sc or Hs version). Therefore, reduced proofreading of hPol I apparently is an intrinsic enzymatic feature of the core enzyme rather than effects introduced by divergent heterodimer subunits or their sub-stoichiometric co-purification.
A previously undescribed domain is built into the largest subunit of human Pol I
The second major difference between yeast and human Pol I is an insertion in the ‘foot’ domain of the largest subunit RPA1 (Fig. 2C; Supplemental Data 1). The Pol II foot domain serves as transient interaction platform for the regulatory co-activator complex ‘mediator’78 and is enlarged compared to yeast Pol I17,18. This may lead to a speculation about a comparable regulatory role of the foot insertion specifically required in humans but not in yeast. We found well-defined cryo-EM density on the downstream face (front) of hPol I subunit RPABC1 (Rpb5) that is closely connected to the foot insertion site. Domain prediction using the HHPRED package79 indicated a clear homology to a High Mobility Group (‘HMG’) box domain with the closest fit to the structure of HMG box 5 of the hPol I transcription factor UBF80. Hence, we constructed a homology model of the foot insertion and fitted the resulting model into the observed cryo-EM density. This allows an unambiguous placement of the domain without adjustment, indicating that the hPol I foot insertion indeed resembles a built-in HMG box (Fig. 4).
The HMG box-containing ‘dock II’ domain may serve as interface for Topoisomerase 2a
Canonical HMG box domains can bind the minor groove of a DNA duplex in a sequence-specific or unspecific manner with a preference for non-B-form conformations81. Overlay with a model HMG box (box 2 of the human HMGB1 protein) shows that the DNA-binding site of the hPol I foot insertion is completely occluded by the common Pol subunit RPABC1 (Fig. 4C/D), indicating a divergent function. Furthermore, structure-based sequence alignment of the RPA1 foot HMG box shows that the so-called ‘minor wing’ is absent. This minor wing consists of an N-terminal motif and the C-terminal extension of the HMG helix three (Fig. 4E). Both regions cooperate in DNA-binding of canonical HMG boxes but are absent in RPA1. Furthermore, a loop between HMG-box helices one and two directly interacts with DNA and contributes to sequence specificity82. In RPA1, we observed an insertion between the corresponding helices α27d and α27e that contacts loop T56-V60 of subunit RPABC1 (Fig. 4). In contrast, a basic surface patch is found on the opposite face (Sup. Fig. 7). To test whether DNA-interaction is possible, we recombinantly expressed MBP-tagged versions of the domain (full length and minimal) and tested their ability to bind an unspecific dsDNA-fragment. Significant DNA-binding was not observed, although the full-length fragment may retain some very low affinity in vitro. We conclude that the RPA1 foot insertion represents a truncated HMG-box ‘major wing’ unable to bind DNA.
Apart from binding DNA, HMG boxes can promote interaction between proteins. This appears the most likely function for the RPA1 foot HMG-box, which we hence termed ‘dock II’. The human HMGB1 protein was found to interact with Topoisomerase (Top) 2a independent of DNA, while promoting the activity of this enzyme83. In fact, active Top2a co-purifies with the hPol I-RRN3 complex84 and was described to be part of the hPol I transcription initiation machinery85. Therefore, we asked whether recombinant human Top2a lacking the unstructured C-terminal domain86 can interact with the RPA1 dock II domain. Indeed, we observe a shift in native PAGE of full length, but not minimal dock II or the MBP-tag alone, indicating the possibility for transient interaction (Sup. Fig. 7E).
Consequently, we asked whether Top2a binding mapped to the rDNA gene in cells and whether it would hint towards a typical initiation factor behavior85. To this end, we re-analyzed previously published Top2a ChIP-Seq data from mouse cells87 and mapped the initiation factor TAF1B (part of SL1 and homologous to TFIIB88,89), UBF, Pol I46 and Top2a to the rDNA gene as described90. As shown in Fig. 5A, TAF1B maps to clear peaks at the spacer promoter and the main rDNA promoter, defining the transcription start site (TSS). Pol I is distributed over the gene body and the spacer promoter, as expected in growing cells. Strikingly, Top2a maps to the rDNA locus but does not show the profile of a classical initiation factor, such as RRN3 which peaks at the promoter and tails out in the 5’ region of the rDNA gene46. Instead, Top2a is present over the entire gene, with some peaks in the 3’ region. These peaks apparently overlay with the UBF-binding sites.
Physical interaction with UBF indicates functional cooperativity of Top2a and HMG-box containing proteins
Results from ChIP-Seq reanalysis do not exclude the possibility that Top2a is also part of some initiation complexes, but indicate either a Pol I - independent rDNA gene association, an elongation factor like behavior in cooperation with Pol I, and/or DNA-binding cooperativity with UBF. To test whether a physical interaction between UBF and Top2a takes place as indicated by co-localization of ChIP peaks, we performed immunoprecipitation assays from cell lysates using anti Top2a antibodies. Western blot analysis of pull-downs confirms the direct interaction between Top2a and hPol I shown via its subunit RPA49. Furthermore, the observed signals for UBF are in line with an interaction in cells (Fig. 5B).
To clarify whether UBF-Top2a interaction is direct, we tested the binding of recombinant FLAG-tagged UBF (fUBF) and Top2a. Incubation of both proteins in vitro followed by a pulldown using anti-FLAG antibodies shows a clear band for Top2a in western blots (Fig. 5C, lane 2). Increasing salt concentration weakened (lane 3, 100 mM KCl) and finally abolished (lane 4, 200 mM KCl) the co-IP. We conclude that Top2a can interact with both, Pol I and UBF in human cells and in vitro.
Discussion
The cryo-EM reconstruction of human Pol I demonstrates the overall conserved architecture of multi-subunit, DNA-dependent RNA polymerases in eukaryotes and completes the archive of yeast17,18,20,91 and mammalian3–6 nuclear Pol structures (Sup. Fig. 8). We find that human Pol I, like that of most organisms, carries a single-subunit stalk and built-in transcription factors show structural and functional similarities to TFIIF, TFIIE and TFIIS. Mapping of known hPol I mutations associated with human disease to the structural model (Sup. Fig. 6) rationalizes their effects on the enzyme.
We show that functional cross-species complementation of RPA49/34 subcomplexes is possible, which is in line with a conserved role in supporting initiation and elongation stages of the transcription cycle while accumulating divergent regulatory properties92–94. An increased flexibility of the clamp/stalk module in hPol I is indicated by the cryo-EM reconstruction (Fig. 2) and may explain an increased rate of incorrect nucleotide addition we observe in comparison to the yeast enzymes in vitro (Fig. 2B). This can be explained either by an impaired proof-reading due to reduced backtracking ability of hPol I, or a generally higher rate of substrate promiscuity. In yeast Pol I, module contraction is a feature of activation95. Especially during DNA melting upon transcription initiation96,97, contraction is required to stably associate melted template and non-template strands. Notably, the catalytic center, including the active site magnesium ion, is among the flexible parts in the hPol I cryo-EM reconstruction. The pronounced shelf module flexibility may indicate the importance of such a mechanism in higher eukaryotes, or simply point to a lack of defined intermediate conformations under close-to-native conditions in human cells.
While we do not observe any cryo-EM density for bound human RRN3, it can be assumed that binding to hPol I is similar to the S. cerevisiae counterpart27–29, due to sequence conservation of the factor24 and its binding sites in Pol I subunit RPA43 and the dock domain of subunit RPA1 (Sup. Data 1). Yeast Pol I subunit A14 is not involved in Rrn3 contacts27–29. Therefore, its absence in the human enzyme does not disagree with this model. Notably, purification by ion exchange chromatography leads to a dissociation of human RRN3 and the RPA49/34 heterodimer (Sup. Fig. 1B), indicating a reduced affinity and hence the possibility for efficient regulation of interaction with the core enzyme by PTMs, such as RRN3 phosphorylation98 and RPA49 acetylation32.
Most strikingly, our study identifies a previously unknown built-in transcription factor-like domain that resembles the fold of a truncated HMG box (Fig. 4). This ‘dock II’ domain is only found in higher organisms (Fig. 3) and shows similarities to HMG box 5 in UBF. While its function will be studied in more detail in the future, we find evidence that it may serve as an interaction platform for human Topoisomerase 2a. Three possible reasons for this interaction come to mind (Fig. 6): (1) Top2a could be part of Pol I initiation complexes in human cells85, while it does not appear to be involved in yeast PIC formation. Top2a recruitment to the downstream edge of human Pol I PICs via the built-in HMG box may be an attractive way to release tension from the DNA that accumulates upon spontaneous melting. In Pol II initiation systems, the XPB translocase in TFIIH occupies a similar position and carries out a comparable though not identical function in yeast99 and human PICs100,101. Deletion of the Top2a C-terminus leads to a six-fold reduction in RRN3 co-purification, but only a two-fold reduction in hPol I co-purification85, arguing for the possibility of RRN3 co-dependent Top2a recruitment via the foot-HMG box domain. (2) Positive supercoiling accumulates in the direction of transcription102, especially in Pol I-transcribed rDNA genes103, due to an increased loading rate104 and speed compared to other polymerases39. To release this supercoiling, Top2a may be recruited to the downstream face of elongating hPol I via the built-in HMG box. This may be reflected in an elongation-factor like behavior of Top2a and could be exclusive to the first round of transcription of a previously inactive rDNA gene. Following Top2a-supported opening of the gene by initial hPol I transcription, including nucleosome removal assisted by FACT105, association of UBF over the gene body46 may prevent closing and strong accumulation of positive supercoiling during subsequent rounds of Pol I transcription. (3) Co-dependent association of UBF with Top2a over the rDNA gene may create periodic hubs that allow the transient recruitment of Top2a to hPol I on active genes to release positive supercoiling. The three C-terminal HMG-boxes of UBF may be responsible for a hand-over of Top2a to the dock II-HMG box and recover Top2a following its transient interaction with Pol I. In addition, UBF association with DNA introduces additional supercoiling itself106. In actively transcribed genes, high on/off rates of UBF can be expected, leading to the local requirement of Top2a that could be satisfied by UBF association of the enzyme.
Options (2) and (3) are supported by fact that Top2a signal is detected on the entire gene and co-localization of Top2a with UBF in some regions is observed in ChIP-Seq studies (Fig. 5A). An initiation factor-like profile for Top2a that would point towards option (1) is not detected. Though possibly coincidental, further evidence for hypothesis (3) arises from phylogenetic analysis demonstrating that UBF versions start to appear in the same organism in which we detect the presence of the dock II domain (Sup. Fig. 9) and from the recent finding that Top2 localization to the nucleolus depends on Pol I activity in human cells107. In line with this, we demonstrate that physical interaction between UBF and Top2a is possible.
Nevertheless, additional functions for the HMG box-containing dock II domain independent of Top2a can be imagined. The domain clashes with the ‘trestle’ helix of the CTR9 subunit in the PAF-complex, a Pol II elongation factor108. This may prevent PAF action in human Pol I transcription, even though an effect in yeast Pol I elongation was reported109. Furthermore, the HMG-box containing SOX factors assist DNA-detachment from nucleosomes110 and SSRP1 is a component of FACT, that also contains a single HMG box and is required for hPol I transcription through nucleosomes105. In fact, single HMG box-containing proteins were described to functionally support human FACT111. Together with the positioning of dock II close to the incoming (downstream) DNA duplex, this also supports the speculation of a function in efficient nucleosome encounter of hPol I. Most of these factors, however, require a direct DNA-interaction of their HMG box, which appears unlikely for dock II due to occlusion of the DNA-interface by RPABC1 and its mutated DNA-binding site (Fig. 4).
During initial peer-review of this work, two groups also reported cryo-EM reconstructions of hPol I 112,113. The focus of one study lies on the structural basis of backtracking and cleavage112, while the other also reports a co-structure with RRN3113. Our colleagues present reconstructions with higher overall resolution, but do not comment on the role of the novel dock II domain and involvement of Top2a in rDNA transcription. Therefore, the findings of the three studies support and supplement each other.
While a detailed analysis of dock II function(s) will now commence, it is not surprising that another transcription factor-related domain is built into metazoan Pol I. In addition to TFIIF and TFIIE elements within subunits RPA49/34, TFIIS elements in RPA12 and a DNA-mimicking element in RPA1, integration of an HMG-box element seems to contribute to the accumulating specialization of Pol I during evolution.
Author Contributions
JLD planned, carried out and evaluated all experiments including cell line generation, hPol I purification, functional biochemistry, structure determination and microscopy. MP collected cryo-EM data and contributed to functional biochemistry and cryo-EM processing. KS carried out phylogenetic analyses. ABl performed and evaluated fluorescence and confocal microscopy. MH and FBH purified proteins and contributed to functional biochemistry. ER, GA-P and AV contributed to data evaluation. VL supplied recombinant human Top2a. KP purified fUBF and carried out UBF-Top2 co-IPs. JLD, KT, CB and MP prepared and screened cryo-EM grids. ABr carried out mass spectrometry analysis. JCM and TM evaluated ChIP data. CE designed and supervised research and wrote the manuscript with input from all authors.
Data Availability
The cryo-EM density of human Pol I was deposited in the Electron Microscopy Data Bank. Model coordinates were deposited with the Protein Data Bank. Further material can be obtained from the corresponding author upon reasonable request.
Competing interests
The authors declare no competing interests.
Methods
CRISPR/Cas9 genome editing
HeLa cells were cultivated in DMEM medium (21885, Gibco) supplemented with 10 % FBS (10270, Gibco) and 1 % Penicillin/Streptomycin (P0781, Sigma Aldrich) at 37°C and 5 % CO2 atmosphere. Genomic integration of sfGFP ORF at the C-terminus of RPA1 was done by CRISPR/Cas9 according to a published protocol (Ran et al, 2013b) with some modifications and identical as previously published for RPAC1-sfGFP 4.
Design of the guide RNAs (gRNAs) was done with a web-based tool (https://www.benchling.com/crispr/) and annealed oligonucleotides (gRNA1 = GCTCCAAGGACCCTTGGTGA; gRNA2 = CGGGGTAGCTGCTATCTCAG) were cloned via BbsI as described in the manual into the Cas9n expression vector pSpCas9n(BB)-2A-Puro (PX462) V2.0, which was a gift from Feng Zhang (Addgene plasmid #62987; https://www.addgene.org/62987/; RRID: Addgene_62987). A donor plasmid carried a short GS-linker sequence with an embedded HRV 3C protease cleavage site and the sfGFP ORF surrounded by two large sequence segments homologous to the insertion locus in the genome.
HeLa cells were transfected with a 1:1:1 molar ratio of gRNA1 and gRNA2 vectors together with the donor plasmid using FuGENE HD Transfection Reagent (E2311, Promega) according to the manufacturer’s instructions. Several days later the GFP-expressing cells were enriched by flow cytometry using a BD FACSAria™ IIu cell sorter at the Central FACS Facility of the RCI Regensburg (Center for Interventional Immunology). GFP-positive cells were seeded as single cells on 96-well plates. After 2-3 weeks, colonies were expanded. These monoclonal populations were validated for the tag insertion by PCR on extracted genomic DNA (gDNA), sequencing and western blot.
About 1*106 cell were resuspended in proteinase K buffer (20 mM Tris pH 7.5, 300 mM NaCl, 25 mM EDTA, 2 % (w/v) SDS, 0.2 mg/ml proteinase K) and incubated overnight at 50°C before performing isopropanol precipitation. The resuspended gDNA was used as template for PCR to validate the homozygous introduction of the GS-linker and sfGFP ORF into the POLR1A genomic locus (Primer: POLR1A-fwd1: 5’-TTGGGATCCGGTCAAACTC-3’, POLR1A-rev1: 5’-#CAGCAAAGCATGGCTTCC-3’, POLR1A-fwd2: 5’-CAGTGGGATCTTGGGATCTG-3’, POLR1A-rev2: 5’-TGCTACGCTGTACTTGACTC-3’). To further validate the result, the PCR product was gel extracted (QIAquick Gel Extraction Kit, 28706, QIAGEN) and sequenced (Microsynth Seqlab). Additional characterization of the selected homozygous cell line was done by Western Blot. Cells from a confluent 6 cm plate (about 2.7*106 cells, 83.3901.300, Sarstedt) were harvested with 300 µl of boiling 1x SDS loading dye (3 % (w/v) glycerol, 1.68 % (v/v) β-mercaptoethanol, 0.03 % (w/v) bromophenol blue, 26 mM Tris pH 6.8, 0.42 % (w/v) SDS) and vigorously shaken at 95°C for 15 min. Prestained marker (7719S, NEB), as well as 10 µl of sample from the parental and the newly generated cell line, were loaded on an SDS gel (NP0223BOX, Thermo Fisher Scientific) and proteins were separated by electrophoresis. After blotting (Trans-Turbo Blot, Bio-Rad) the proteins onto a PVDF membrane (1704275, Bio-Rad), Ponceau S staining confirmed equal loading. The tagged protein RPA1 was detected by the primary antibody (sc-48385, Santa Cruz Biotechnology), which was subsequently detected by the fluorescently labeled secondary antibody (926-32210, Li-COR). Prestained marker and secondary antibody were detected by different wavelengths (Odyssey Infrared Imager Model 9120, Li-COR).
The selected cell line RPA1-sfGFP was cultivated adherently and adapted to suspension growth as follows: Cells from 8 flasks (about 7×107 cells total; 83.3912.302, Sarstedt) were detached by incubation with trypsin (25300, Gibco) at 37°C for 5 min, transferred to a spinner flask (250 mL total volume; 4500, Corning) and cultured in suspension with high-glucose DMEM (11965, Gibco) supplemented with 1 % FBS (10270, Gibco) and 1 % Penicillin/Streptomycin (P0781, Sigma Aldrich) under moderate stirring at 37°C and 5 % CO2 atmosphere. To expand the culture, 1x the current volume of fresh media including all supplements was added when cells reached a density of ∼7×105 cells/ml and the culture was transferred to spinner flasks of increasing volume when required. Cells were harvested by centrifugation and washed with PBS before flash-freezing the pellet.
Purification of human Pol I
Human Pol I purification was performed similarly to 4 with some modifications. RPA1-sfGFP cell pellet was resuspended in twice the volume of the cell pellet’s weight of lysis buffer (20 mM Hepes pH 7.8, 420 mM NaCl, 1 mM MgCl2, 10 µM ZnCl2, 0.5 % (v/v) NP-40, 4 mM β-mercaptoethanol, 1x protease inhibitor mix (Benzamidine & PMSF)) supplemented with 7 U/ml DNase I (M610A, Promega) and lysed by Dounce homogenization and incubation on ice for 30 min. After centrifugation at 20,000*g and 4°C for 15 min, the whole-cell lysate was incubated with pre-equilibrated GFP-Trap Dynabeads (gtd, Chromotek) for binding. The beads were washed once with four times and once with twice the slurry volume of wash buffer (20 mM Hepes pH 7.8, 420 mM NaCl, 1 mM MgCl2, 10 µM ZnCl2, 2 % (v/v) glycerol, 4 mM β-mercaptoethanol), before being eluted with the volume of the slurry with wash buffer supplemented with 10 µg of 3C protease per 1 g of cell pellet for 4 h at 4°C. In case an anion-exchange chromatography was performed, the GFP-elution was diluted with buffer A (20 mM Hepes pH 7.8, 1 mM MgCl2, 10 µM ZnCl2, 2 % (v/v) glycerol, 5 mM DTT) to reach a final concentration of 140 mM NaCl. The sample was loaded on a MonoQ 1.6/5 PC column (Pharmacia Biotech) with 60 mM ammonium sulfate and eluted stepwise in buffer A with increasing the concentration of ammonium sulfate up to 1 M. A linear gradient over five column volumes to 200 mM followed by steps of five column volumes with 200 mM, 350 mM, 600 mM and 1 M ammonium sulfate was applied. hPol I eluted at 350 mM ammonium sulfate concentration. hPol I was used immediately or flash-frozen in liquid nitrogen and stored at - 80°C for further experiments.
RNA Elongation and Cleavage Assay
RNA Elongation and Cleavage Assay was performed as described 4 with small modifications. 0.5 pmol of Pol I from S. cerevisiae, S. pombe, or H. sapiens were preincubated with 0.25 pmol of different pre-annealed minimal or bubble nucleic acid scaffolds (sequence information summarized in Sup. Table 2 and schematically shown in each figure along with the gel)in transcription buffer (20 mM Hepes pH 7.8, 40 mM (NH4)2SO4, 28 mM NaCl, 8 mM MgSO4, 10 µM ZnCl2, 10 % (v/v) glycerol, 10 mM DTT) for 1 h at 20°C in a 45 µl reaction. In case purified RPA49/RPA34 heterodimer was added, 1x, 5x or 10x molar excess of heterodimer compared to polymerase was included during the preincubation. For RNA elongation, 10 µmol of each desired NTP (marked specifically at each lane in the figure) were added and the reaction was incubated for 1 h at 28°C. To examine cleavage activity, the preincubated reaction was incubated for 1 h at 28°C without the addition of NTPs. Afterwards, nucleic acid purification was examined by adding 5M NaCl to a final concentration of 0.5 M and 800 µl 100 % ethanol. After precipitation for at least 1 h at -20°C, the sample was centrifuged for 30 min at 20,000*g and 4°C. The pellet was washed with 80 % ethanol and, after drying, resuspended in 1x RNA loading dye (4 M Urea, 1x TBE, 0.01% bromophenol blue and 0.01% xylene cyanol only for FAM-labeled constructs). The sample was heated to 95°C for 5 min. As control 0.25 pmol of scaffold were treated identically, without addition of polymerase and NTPs. 0.125 pmol of FAM-labeled RNA product were separated by gel electrophoresis (20 % polyacrylamide gel containing 7 M Urea) and visualized with a Typhoon FLA9500 (GE Healthcare).
Purification of RPA49/RPA34 variants
The S. cerevisiae full-length heterodimer was purified as described19. Sc A49 with a C-terminal hexa-histidine tag and Sc A34 were co-expressed in E. coli BL21 (DE3) RIL in LB medium with 0.2 mM IPTG for 18 h at 18°C. The cells were resupended in lysis buffer (50 mM Tris pH 7.5, 300 mM NaCl, 10 mM β-mercaptoethanol, 1x protease inhibitor (PI) mix (Benzamidine & PMSF)) and sonified. After centrifugation, the lysate was loaded onto preequilibrated Ni-NTA beads (30230, Qiagen) by gravity-flow, washed with six times the bed volume of buffer Wash I (50 mM Tris pH 7.5, 1 M NaCl, 10 mM β-mercaptoethanol, 1x PI) and six times the bed volume of Wash II (50 mM Tris pH 7.5, 300 mM NaCl, 30 mM imidazole, 10 mM β- mercaptoethanol, 1x PI) before elution (50 mM Tris pH 7.5, 300 mM NaCl, 100 mM imidazole, 10 mM β-mercaptoethanol, 1x PI). The sample was diluted 3-fold with dilution buffer (50 mM Tris pH 7.5, 10 mM β-mercaptoethanol) before loading onto a MonoS 5/50 GL column (GE Healthcare) with buffer A (50 mM Tris pH 7.5, 100 mM NaCl, 5 mM DTT). Elution was performed with a linear gradient of NaCl concentration up to 1 M. Sc A49/34 eluted at around 280 mM NaCl. The corresponding fractions were pooled and concentrated with 10 kDa cut off (UFC801024, Millipore) and applied to a Superdex200 Increase 100/300 (GE Healthcare) equilibrated with buffer A. Pooled peak fractions were concentrated and flash-frozen for storage at -80°C.
The different variants of the human heterodimer (RPA49FL/RPA34FL, RPA49FL/RPA341-343, RPA34131-510) were cloned with a N-terminal 6xHis-tag on RPA49 and untagged RPA34, except for RPA34131-510, which carries an N-terminal 6xHis-tag itself. The proteins were coexpressed in E. coli BL21 (DE3) RIL in LB medium with 0.2 mM IPTG overnight at 18°C. Cells were resuspended in lysis buffer (50 mM MES pH 6.3, 300 mM NaCl, 10 mM β-mercaptoethanol, 1x protease inhibitor (PI) mix (Benzamidine & PMSF)) and lysed by sonification. After centrifugation, the lysate was loaded onto preequilibrated Ni-NTA beads (30230, Qiagen) by gravity-flow, washed subsequently with six times the bed volume of buffer Wash I (50 mM MES pH 6.3, 1 M NaCl, 10 mM β-mercaptoethanol, 1x PI), ATP-Wash (50 mM MES pH 6.3, 1 M NaCl, 10 mM β-mercaptoethanol, 1x PI supplemented with 2 mg/ml denatured proteins and 0.5 mM ATP), another ATP-Wash after 10 min of incubation and Wash II (50 mM MES pH 6.3, 300 mM NaCl, 10 mM imidazole, 10 mM β-mercaptoethanol, 1x PI) before elution (50 mM MES pH 6.3, 300 mM NaCl, 200 mM imidazole, 10 mM β-mercaptoethanol, 1x PI). The ATP-Wash steps were performed at room temperature. The sample was diluted 5-fold with buffer A (50 mM Tris pH 7.5, 10 mM β-mercaptoethanol) before loading onto a MonoS 5/50 GL column (GE Healthcare) with buffer A supplemented with 100 mM NaCl. Elution was performed with a linear gradient of NaCl concentration up to 2 M. The corresponding fractions were pooled and concentrated with 10kDa cut off (UFC801024, Millipore) and applied to a Superdex200 Increase 100/300 (GE Healthcare) equilibrated with SEC buffer (50 mM Tris pH 7.5, 150 mM NaCl, 5 mM DTT). Pooled peak fractions were concentrated and flash-frozen for storage at -80°C.
Purification of recombinant dock II domain
Two variants of the human dock II domain (RPA11060-1155 (full-length), RPA11081-1146 (minimal)) were cloned with a C-terminal His-MBP-tag. The proteins as well as tag-only were expressed overnight at 20°C in E. coli BL21 (DE3) RIL in LB medium with 0.2 mM IPTG. Cells were resuspended in lysis buffer (50 mM MES pH 6.3, 300 mM NaCl, 10 mM β-mercaptoethanol, 1x protease inhibitor (PI) mix (Benzamidine & PMSF)) and lysed by sonification. After centrifugation, the lysate was loaded onto preequilibrated Ni-NTA beads (30230, Qiagen) by gravity-flow, washed subsequently with six times the bed volume of buffer Wash I (50 mM MES pH 6.3, 1 M NaCl, 10 mM β-mercaptoethanol, 1x PI) and Wash II (50 mM MES pH 6.3, 300 mM NaCl, 10 mM imidazole, 10 mM β-mercaptoethanol, 1x PI) before elution (50 mM MES pH 6.3, 300 mM NaCl, 200 mM imidazole, 10 mM β-mercaptoethanol, 1x PI). The eluat was buffer-exchanged to SEC buffer (50 mM Tris pH 7.5, 150 mM NaCl, 5 mM DTT) with a PD10 column (17-0850-01, GE Healthcare) and applied to a Superdex 75 Increase 10/300 GL(GE Healthcare) equilibrated with SEC buffer. Pooled peak fractions were concentrated and flash-frozen for storage at -80°C.
Electrophoretic mobility shift assay
A total of 100 fmol pre-annealed 40 bp DNA (EMSA-DNA-strand1: 5’-Cy5-CTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGA-3’; EMSA-DNA-strand2: 5’-TCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAG -3’) were mixed with up to 50-fold molar excess of purified protein (as labeled in the figure) in EMSA buffer 1 or 2 (EMSA-buffer-1: 10 mM Tris pH 7.5, 50 mM NaCl, 1 mM MgCl2, 4 % glycerol, 0.5 mM EDTA, 0.5 mM DTT; EMSA-buffer-2: 20 mM Hepes pH 7.8, 150 mM NaCl, 2 % glycerol, 0. 2% Triton-100, 0.2 % Tween-20, 5 mM DTT) and incubated at room temperature for 30 min. Afterwards 6x loading dye (10 mM Tris pH 7.6, 60 mM EDTA, 60 % glycerol, 0.03 % Orange G) was added to reach 1x concentration. 10% polyacrylamide gels in 0.4x TBE were pre-run at 110 V for 30 min before the reaction was separated at 110 V for 1:45 h at 4°C. The Cy5-labeled DNA was detected with a Typhoon FLA9500 (GE Healthcare).
Confocal microscopy
For fluorescence imaging, cells were grown adherently on glass cover slips to 50 % confluency. After washing the cells with pre-warmed (37°C) PBS, they were fixed with 3.7 % paraformaldehyde in PBS for 10 min at 37°C. The fixation was stopped by replacing the solution with 100 mM glycine in PBS for 5 min at 37°C. After that cells were washed twice with PBS, mounted on the specimen slide with the help of a drop of Prolong Gold Antifade Mountant with DAPI (P36941, Thermo Fisher Scientific), and dried in the dark at least overnight.
The fluorescent specimens were imaged using a Plan-Apochromat 63x/1,4 Oil DIC Objective at a Zeiss LSM980/Airyscan 2 confocal microscope. sfGFP was excited by a 488 nm diode laser and emission was detected using a 300-720 nm band pass filter. Separately DAPI was excited by a 405 nm diode laser and emission was detected using a 300-720 nm band pass filter. For the 3D model a Z-stack was imaged using the internal GaAsP-PMT detectors from 490-668 nm for sfGFP and 410-473 nm for DAPI in a two-track process. Image processing was done using the Zeiss AxioVision software. The 3D Volume images were created in Imaris 9.6.
Analysis of Pol I subunits RPA1, RPA34, RPA43 and A14
Data sets from Pol I subunits were generated using their corresponding InterPro114 entries (RPA1: IPR015699, RPA34: IPR013240, RPA43: IPR041901 and IPR041178, A14: IPR013239 downloaded on 07.06.2021). A common dataset of RPA1, RPA34 and RPA43 was generated by searching for common species within the three InterPro families. To each obtained species the concatenated sequence of RPA1, RPA34 and RPA43 was assigned.
Phylogenetic analysis
Sequence alignment tool MAFFT 115 has been used with default options and a gap open penalty of 70. The resulting alignment was filtered manually on highly diverged sequences. To improve the quality of the phylogenetic analysis without losing information for each genus only one sequence was chosen. On the resulting data set with 513 sequences Gblocks 116 (options: b3=5000, b4=2, b5=a) has been applied to remove uninformative columns. By means of RAxML 117 using the option -f a and the substitution model PROTGAMMAAUTO 100 trees were generated and a consensus tree was derived. The root has been placed between the supergroups of Sar and Haptophyta and the supergroup of Amorphea 118. The resulting phylogenetic tree was analyzed with respect to the taxonomic distribution. Sequences were grouped according to branching points in the phylogenetic tree (Fig. 3). In order to retrieve the taxonomic group where the A14 subunit is present the species related to the A14 subunit InterPro entry are compared with the species given in the phylogenetic tree.
Sequence analysis of RPA34 and RPA1
By means of MAFFT sequence alignment of each subunit was generated using varied gap open penalties (RPA34: 50, RPA1: 20). Due to higher sequence variety within RPA34 sequences BLOSUM30 was used instead of the default parameter. In order to account the divergence between the taxonomic groups given from the phylogenetic tree, the alignment was split into these groups and each group was analyzed separately on the presence or absence of the RPA34 C-terminal extension, the RPA1 foot domain and the RPA1 expander domain. Sequences from Homo sapiens have been used as reference to identify the region of interests (399-510; 1074-1139; 1365-1488, respectively). The median length and standard deviation of the regions of interest have been calculated for each group. To unravel the sequence and structural conservation of the regions of interest the conservation score given in Jalview 119 has been extracted after removing all columns containing only gaps. The mean conservation score is calculated by summing up over all column scores divided by the number of columns. Scores are grouped into 5 categories: not conserved (0-3), weakly conserved (3-5), medium conserved (5-7), conserved (7-9), strongly conserved (9-11). Secondary structures were predicted using Ali2D 79,120. Secondary structure elements were assigned when more than 5 amino acids have medium to high probability in more than 90 % of the sequences within each group. Bridging of two secondary structure elements over less than 5 differently annotated amino acids are counted as one element. If gaps are present in more than 90 % of the sequences, they are ignored.
Mass Spectrometry
Protein bands were cut out from the gel, washed with 50 mM NH4HCO3, 50 mM NH4HCO3/acetonitrile (3/1), 50 mM NH4HCO3/acetonitrile (1/1) and lyophilized. After a reduction/alkylation treatment and additional washing steps, proteins were in gel digested with trypsin (Trypsin Gold, mass spectrometry grade, Promega) overnight at 37 °C. The resulting peptides were sequentially extracted with 50 mM NH4HCO3 and 50 mM NH4HCO3 in 50 % acetonitrile. After lyophilization, peptides were reconstituted in 20 µl 1 % TFA and separated by reversed-phase chromatography. An UltiMate 3000 RSLCnano System (Thermo Fisher Scientific, Dreieich) equipped with a C18 Acclaim Pepmap100 preconcentration column (100 µm i.D. x 20 mm, Thermo Fisher Scientific) and an Acclaim Pepmap100 C18 nano column (75 µm i.d. x 250 mm, Thermo Fisher Scientific) was operated at a flow rate of 300 nl/min and a 60 min linear gradient of 4 % to 40 % acetonitrile in 0.1 % formic acid. The LC was online-coupled to a maXis plus UHR-QTOF System (Bruker Daltonics) via a CaptiveSpray nanoflow electrospray source. Acquisition of MS/MS spectra after CID fragmentation was performed in data-dependent mode at a resolution of 60,000. The precursor scan rate was 2 Hz processing a mass range between m/z 175 and m/z 2000. A dynamic method with a fixed cycle time of 3 s was applied via the Compass 1.7 acquisition and processing software (Bruker Daltonics). Prior to database searching with Protein Scape 3.1.3 (Bruker Daltonics) connected to Mascot 2.5.1 (Matrix Science), raw data were processed in Data Analysis 4.2 (Bruker Daltonics). Swiss-Prot Homo sapiens database (release-2020_01, 220420 entries) was used for database search with the following parameters: enzyme specificity trypsin with one missed cleavage allowed, precursor tolerance 0.02 Da, MS/MS tolerance 0.04 Da, Mascot peptide ion-score cut-off 25. Deamidation of asparagine and glutamine, oxidation of methionine, carbamidomethylation or propionamide modification of cysteine were set as variable modifications.
Native PAGE
To investigate protein-protein interaction, blue-native PAGE was performed. Five times molar excess of MBP only or tagged human dock II domain was incubated with recombinant Top2a ΔC (1-1217) in binding buffer (20 mM Hepes pH 8.0, 150 mM NaCl, 50 mM KCl, 1 mM MgCl2, 2 % glycerol, 2 mM β-mercaptoethanol) for 30 min at room temperature. After adding NativePAGE sample buffer, the samples were separated on a Native PAGE 3-12% gradient gel at 150 V for 90 min with light blue cathode and anode buffer (NativePAGE™ Novex® Bis-Tris Gel System, BN1003BOX, Novex) and coomassie stained.
Top2a co-immunoprecipitation
To investigate Top2a interaction partners, co-immunoprecipitation was performed from U2OS Nuclear Extract (15 mg/ml total protein). Top2a was immuno-precipitated using an anti-Top2a antibody (ab12318, Abcam) immobilized on Dynabeads Protein A magnetic beads (Thermofisher, c/n 10001D) according to the manufacturer’s instruction. Antibodies were cross-linked to beads using DPM (Thermofisher, c/n 21666) as recommended by the manufacturer. Beads were blocked with BSA in PBS overnight. 100 µL NE was diluted by dilution buffer (25 mM TrisHCl pH7.9, 12.5 mM MgCl2, 10% glycerol, 0.03% NP40) to a final KCl concentration of 150 mM and treated by 500 U of benzonase (Sigma, E1014) for 30 min at 4 °C. 25 µl of the beads were added and the suspension was incubated on a rotating wheel for 1 hour at 4°C. Beads were washed three times with 100 µl wash buffer (25 mM TrisHCl pH7.9, 150 mM KCl, 12.5 mM MgCl2, 10% glycerol, 0.03% NP40) and proteins were eluted by incubation in 1x LDS-sample buffer (Thermofisher, c/n NP0007) at 65°C for 10 minutes. Immunoprecipitated proteins were analyzed by Western blot using anti-UBF, anti-RPA49 and anti-Top2a antibodies (sc-9131, Santa Cruz; 611413 BD Transduction; and ab12318, Abcam).
UBF-Top2a pulldown
To investigate protein-protein interaction, a pulldown assay using purified recombinant Flag-tagged UBF (fUBF) and purified Top2a was performed. fUBF was expressed in insect cells and purified as described earlier121. Top2a was obtained from Inspiralis (c/n HT210). Proteins were incubated together in pulldown buffer (25 mM TrisHCl pH7.9, 12.5 mM MgCl2, 10% glycerol, 0.03% NP40 supplemented with 50,100 or 200 mM KCl as marked in the Fig. 5C) for 20 min at 4°C. To each sample, 20 µl anti-FLAG M2 Magnetic Beads (Sigma, M8823) were added and the suspension was incubated on a rotating wheel for 30 min at 4°C. Beads were washed three times with wash buffer (25 mM TrisHCl pH7.9, 12.5 mM MgCl2, 10% glycerol, 0.03% NP40 supplemented with 50,100 or 200 mM KCl) and proteins were eluted by incubation in 1x LDS-sample buffer (Thermofisher, c/n NP0007) at 65°C for 10 minutes. Proteins were analyzed by Western Blot using anti-UBF and anti-Top2a antibodies (sc-9131, Santa Cruz; ab12318, Abcam).
Reanalysis of previously published ChIP datasets
Raw data was handled, mapping coordinates exacted, and the data displayed as previously published 90. The used data was: Top2A GSE99197_SRR5585950_TOP2A-MEF 87. ArrayExpress E-MTAB-5839 data sets were: ChIP-seq_UBF_MEFs_UBFfl_Rep1; ChIP-seq_RPI_MEFs_UBFfl_Rep1; ChIP-seq_Rrn3_MEFs_UBFfl_Rep1; ChIP-seq_TBP_MEFs_UBFfl_Rep1; ChIP-seq_TAF68_MEFs_UBFfl_Rep1 46. Taf1c is not included in the Figure since it is identical to the Taf1b mapping but data is also available in E-MTAB-5839 as ChIP-seq_TAF95_MEFs_UBFfl_Rep1.
Negative stain EM
hPol I samples were centrifuged (4°C; 15,000 rpm; Eppendorf table top centrifuge) for 5 min. Five µl of the samples were then applied to glow-discharged 400-mesh copper grids (G2400C; Plano) with a self-made carbon film of ∼7 nm thickness (Pilsl, et al, Methods Mol Biol, in press). After 1 min, grids were washed in ddH2O for 30 s, and stained three times with 5 µl saturated uranyl formiate solution (2x 20s, 1x 30 s). After each step, excess liquid was removed with a filter paper. Images were collected on a JEOL 2100-F Transmission Electron Microscope operated at 200 keV and equipped with TVIPS-F416 (4kx4k) CMOS-detector at 40,000x magnification (pixel size 2.7 Å) with alternating defocus (−1 to -3 µm).
The images were processed using RELION 3.157 as shown in Sup. Fig 1. A total of 76 micrographs were analyzed, yielding 46,196 auto-picked particles using Laplacian-of-Gaussian (LoG) routine. Following reference-free 2D sorting, a 3D classification (reference PDB: 5M3M low-pass filtered to 60 Å) yielded three reconstructions with different clamp/stalk flexibilities (Sup. Fig. 1).
Cryo-EM grid preparation and data collection
Reconstructions suffered from poor Fourier completeness. Screening for suitable conditions using crosslinking, gradient fixation122 and detergents, or variation of grid support types graphene (-oxide), ultrathin carbon or gold foil (Pilsl et al., Methods Mol Biol, in press) had limited success in removing orientational bias. Tilted data collection partially improved the bias even though 3D reconstruction was still hampered. Nevertheless, best results were obtained with GFP-trap eluted sample directly applied to graphene-oxide supported grids. However, this strategy retains some remaining 3C protease in the sample (Fig. 1C; Sup. Fig. 1A) that may have a negative influence on signal-to-noise ratio.
Graphene oxide grids were prepared using the surface assembly method on Quantifoil R1.2/1.3 grids123. Three microliters of sample were applied and incubated for 30 s at 100 % humidity at 4°C in a Vitrobot mark IV, blotted for 3 s with blot-force 8 and plunged into liquid ethane. A total of 9,709 micrograph movies were collected on a CryoArm200 cryo-electron microscope (JEOL) equipped with a K2 direct electron detector (Gatan), in-column energy filter and Cold-Field Emission Gun (low-flash interval 4 h). A total dose of 40 e-/A2 was fractionated over 40 frames at a defocus range of -1.2 to -2.7 µm using SerialEM124 in a 5×5 multi-hole strategy as described60.
Cryo-EM image processing and model building
Pre-processing was carried out using WARP56, followed by 2D and 3D classification and auto-refinement using Relion 4.057. During pre-processing motion-correction, CTF estimation and particle picking was performed. The pixel size was binned to 1.50846 Å/pix and particles extracted with a box size of 190. Rough 2D classification followed by 3D classification using a reference of hPol I obtained after stringent 2D classification and 3D refinement yielded a reconstruction at an overall resolution of 4.09 Å. Further 3D classification was performed to investigate the occupancy and flexibility of the dimerization domain of RPA49/34 and the clamp/stalk region. Models for common subunits RPABC1-5 and the RPAC1/2 assembly were transferred from a hPol III reconstruction5. Homology models of the hPol I subunits RPA1, RPA2, RPA49, RPA34, RPA12 and RPA43 were generated based on sequence and secondary structure alignments with the crystal structures of their S. cerevisiae counterparts (Sup. Data 1) using the MODELLER software package58. The models were adjusted in COOT125 and real-space refined using Phenix126. At later stages, released AlphaFold59 models were used to guide chain-tracing in poorly resolved areas. A model of the stalk subunit RPA43 is included in some figures, but was not deposited due to poor or absent cryo-EM density resulting from flexibility.
Acknowledgements
The authors especially thank Philip Gunkel for his contribution. We thank all past and present members of the Engel lab, Achim Griesenbeck, Colyn Crane-Robinson, Christophe Lotz, Marlene Vayssieres, Klaus Grasser, Herbert Tschochner, and Philipp Milkereit for help and discussion, Gerhard Lehmann and Nobert Eichner for IT support, Joost Zomerdijk for UBF-constructs, Volker Cordes for the Hela P2 cell line, Remco Sprangers for shared cell culture, Dina Grohmann and the Archaea Center for fermentation, and Thomas Dresselhaus for access to fluorescence microscopes. This work was in part supported by the Emmy-Noether Programm (DFG grant no. EN 1204/1-1 to CE) of the German Research Council and Collaborative Research Center 960 (TP-A8 to CE).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵