RNA sequence and structure determinants of Pol III transcriptional termination in human cells

The precise mechanism of transcription termination of the eukaryotic RNA polymerase III (Pol III) has been a subject of considerable debate. Although previous studies have clearly shown that at the end of RNA transcripts, tracts comprised of multiple uracils are required for Pol III termination, whether upstream RNA secondary structure in the nascent transcript is necessary for robust transcriptional termination is still subject to debate. We sought to address this directly through the development of an in cellulo Pol III transcription termination assay using a synthetic biology approach. Specifically, we utilized the recently developed Tornado expression system and a stabilized Corn RNA aptamer to create a Pol III-transcribed RNA that produces a detectable fluorescent signal when transcribed in human cells. To study the effects of RNA sequence and structure on Pol III termination, we systematically varied the sequence context upstream of the aptamer and identified sequence characteristics that enhance or diminish termination. We found that in the absence of predicted secondary structure, only poly-U tracts longer than then the average length found in the human genome (4–5 nucleotides), efficiently terminate Pol III transcription. We found that shorter poly-U tracts could induce termination when placed in proximity to secondary structural elements, while secondary structure by itself was not sufficient to induce termination. These findings demonstrate a key role for sequence and structural elements within Pol III-transcribed nascent RNA for efficient transcription termination, and demonstrate a generalizable assay for characterizing Pol III transcription in human cells.


Introduction
Cells require mechanisms for precisely terminating transcribed RNAs at the appropriate genetic loci in order to maintain proper genetic regulation and minimize undesired expression of downstream genomic regions [1][2][3]. The process of transcription termination in prokaryotes is well understood [4], but there remain questions regarding the mechanisms of termination in eukaryotes [1,[5][6][7][8]. This knowledge gap is particularly substantial for transcription mediated by eukaryotic RNA polymerase III (Pol III), which transcribes non-coding RNAs such as the 5S ribosomal RNA, tRNAs, snRNA, and a variety of miRNA [9,10]. Given the important roles played by these classes of RNA in health and disease processes [11,12], elucidating the mechanisms of transcription termination could provide insights into these import components of cellular regulation. 3 Pol III transcriptional termination occurs when the transcribing polymerase reaches a stretch of adenosines which is encoded into the nascent RNA as a poly-uracil (poly-U) tract [4]. The average lengths of these genomic tracts vary across eukaryotic species, with an average of 5-7 uracil nucleotides (nt) within the genome of Schizosaccharomyces pombe (S. pombe), 6-9 nt in Saccharomyces cerevisiae (S. cerevisiae) and 4-5 nt in humans [13]. In this regard, eukaryotic Pol III termination signals are similar to those employed in the bacterial intrinsic termination mechanism, which also occurs at a poly-U stretch [14]. In bacterial transcriptional termination, weak interactions between the A-U bases within the nascent RNA-DNA template hybrid signal the elongating RNA polymerase (RNAP) to transition into a pause conformation, and the resulting structural rearrangement contributes to the dissociation of the transcription complex from the template.
In addition to the poly-U tract, RNA secondary structure has been shown to play a role in transcription termination mechanisms [14]. For example, bacterial intrinsic transcriptional terminators require RNA secondary structures immediately adjacent to the poly-U tract, with more stable secondary structures contributing to greater termination efficiency [15,16]. However, the necessity of RNA secondary structure for eukaryotic Pol III termination is currently debated [6,7,[17][18][19][20]. Some reports indicate that Pol III efficiently terminates only when RNA structural elements are proximal to a poly-U tract [6,17,20], while other reports indicate that secondary structure is dispensable and Pol III termination is not enhanced in its presence [18,21]. Most studies have utilized in vitro transcription assays using reassembled purified 4 components of the S. cerevisiae Pol III transcription complex on DNA templates designed to contain various RNA sequence and structure contexts. In these assays, transcription components were first assembled into the full complex by loading on DNA templates, and then tested for their ability to read through poly-U tracts in the presence or absence of upstream RNA secondary structure [22]. The resulting transcript lengths were read out using radiolabeling and gel electrophoresis to determine whether or not upstream sequence and structural contexts were sufficient to terminate transcription at loci of interest. Even while using similar experimental procedures, several different groups observed different results as to whether RNA secondary structural elements adjacent to the poly-U tract enhances termination efficiency [6,7,17,18]. It is not yet clear why these reports reached seemingly disparate conclusions. Possible explanations include subtle differences in enzyme preparations leading to the presence or absence of currently unknown termination determinants [18]. It is also possible that mechanisms of transcription termination significantly varies across eukaryotic species, and there indeed exists species-specific variations within the Pol III enzyme's subunits that are known to influence transcription termination [21,23]. To provide additional insight into this important question, we decided to address a gap in these observations by systematically studying the role of RNA secondary structure in transcription termination in human cell lines.
Here we present an assay for interrogating RNA sequence and structure determinants required for efficient Pol III termination in human cells. Specifically, we adapted an in cellulo Pol III transcriptional reporter system based on fluorescent RNA aptamers that are active in human cell lines [24]. RNA aptamers are structural motifs capable of binding to target ligands with high specificity [25]. Here we utilized the Tornado system which contains the Corn aptamer embedded within RNA transcripts for direct reporting of Pol III transcriptional activity (Fig. 1A). Placing specific RNA sequences upstream of the Tornado reporter system allowed us to assess the impact of these sequences on transcription efficiency. This system enabled us to determine how various transcript characteristics, including poly-U tract length and the presence and location of predicted RNA structural elements, influence Pol III termination efficiency in human cells. We anticipate that this work will further clarify the mechanism of Pol III transcription termination and enable the forward design of synthetic variants for precise control of Pol III expression in human cells.

An Assay for Quantifying Pol III Transcription Termination in Human Cells
To investigate the Pol III termination mechanism in cellulo, we first sought to develop a method that could quantitatively characterize the abundance of Pol III-generated transcripts within a human cell line. We started with previous work which used the fluorescent RNA aptamer, Corn, to study the subcellular localization of Pol III transcripts [24]. When transcribed, Corn forms a secondary structure that binds the ligand 3,5difluoro-4-hydroxybenzylidene-imidazolinone-2-oxime (DFHO) with nanomolar affinity.
This binding event then activates fluorescence of DFHO, which when excited with light at a wavelength of 505 nm emits fluorescence at 545 nm. To enhance its stability to enable detection, the Corn aptamer is included within the middle of a tRNA scaffold sequence, which folds in such a way as to reduce RNA degradation [24]. Importantly for our purposes, the Corn aptamer system is both sufficiently photostable and transcribed 6 at sufficient levels from the human U6 (hU6) Pol III promoter to enable the transcripts to be detected in cells using flow cytometry [24,26,27].
To employ these parts for the current study, we refined methods for quantifying RNA transcript levels in human cells [24]. We first sought to detect Pol III-driven transcription by expressing Corn aptamer-containing transcripts in the human embryonic kidney (HEK293FT) cell line (Supplementary Fig. 1A) [24]. Specifically, we transfected HEK293FT cells with the plasmid pAV-U6+27-tCORN, a plasmid construct containing in order, a human hU6 promoter, a 27 bp U6 leader sequence commonly included for optimal expression [27], the Corn aptamer fused to a tRNA scaffold, and a SV40 termination site [24]. In our experiments this initial design did not yield a signal that was significantly different than the background signal (Supplementary Fig. 1B).
We hypothesized that this lack of signal may arise from inadequate transcript stability. Therefore, to boost the observable signal, we adapted the recently developed Tornado system which was designed to enhance the detectable signal from the Corn aptamer [26]. In the Tornado system, the Corn aptamer-tRNA scaffold is flanked by two twister ribozyme sequences (Fig. 1A). Following transcription, these self-cleaving ribozymes cleave the RNA in two locations, which allows the nuclear protein RtcB to ligate the free ends together, producing a circularized RNA containing the Corn aptamer. This circular RNA is protected from endogenous exonucleases, allowing Corn aptamer transcripts to accumulate to higher concentrations and thus conferring an enhancement in fluorescence ( Fig. 1A) [26]. To utilize the Tornado system, we introduced a Reporter module into our constructs, consisting of an hU6 promoter, the same U6 leader sequence, a twister ribozyme, the Corn aptamer fused to a tRNA scaffold, a second twister ribozyme, and an SV40 termination site. When transfected into HEK293FT cells, this Tornado construct enabled robust detection of Pol III-driven transcription (Fig. 1B).
We concluded that this Tornado-based system is well-suited for quantifying Pol IIIdriven expression and termination in cellulo.

Poly-U Sequence Length Modulates Pol III Termination
We next sought to investigate how Pol III termination efficiency varies with the length of the poly-U sequence tract. To this end, we modified the Tornado reporter construct to include an additional Terminator module downstream of the U6 leader sequence and upstream of the Reporter module sequence elements ( Fig. 2A). These Terminator modules incorporated a varying number of U nucleotides in the transcribed RNA. To study the effect of poly-U sequence length on termination independent of RNA structure, we included a computationally designed 'linear' sequence within the Terminator module, upstream of the poly-U sequence, which is predicted to lack any intramolecular RNA structures. Candidate linear sequences were designed using the Nucleic Acids PACKage (NUPACK) [28], and verified to be predicted to be singlestranded when included in a transcript alongside the entire Tornado Reporter module Using this Terminator module approach, we evaluated the effect of poly-U tract lengths ranging from 1 to 8 nt on Pol III termination in cellulo (Fig. 2B). For both linear sequence contexts, we observed that past a certain length, increasing the poly-U tract length decreased reporter signal, indicating more efficient termination (Fig. 2C).
Interestingly, within this assay, we actually observed a small increase in signal when comparing constructs containing poly-U tracts of length 4 nt to a length of 1 nt (Supplementary Table 1). This was surprising, as tract length of 4 uracils is the average length of all poly-U tracts in human Pol III-expressed genes [13]. We observed a trend of decreasing signal output only after poly-U tracts reached a size of 7 nt or greater. When comparing against the background signal from our vector-only control, only poly-U tracts of 7 or 8 uracils demonstrated no significant difference in observed signal (Supplementary Table 2). We speculated that if our model transcripts require longer poly-U tracts to achieve efficient termination than do endogenous Pol III-driven transcripts [13], perhaps other model transcript features could confer efficient termination with shorter poly-U tracts.

RNA Structure Adjacent to the Poly-U Tract Enhances Pol III Termination
We next sought to investigate how upstream RNA structure might influence Pol III termination at poly-U tracts. We started by adapting our expression constructs to include a sequence that introduces a well-known secondary structural element by encoding a portion of the 5S ribosomal RNA (rRNA) hairpin. This 5S rRNA hairpin sequence was previously employed to investigate the impact of RNA secondary structure on Pol III termination using in vitro transcription assays [6]. In our investigation, we placed this sequence, predicted to fold into a 9 bp RNA hairpin structure with a 5 nt loop, immediately upstream of the poly-U tract (Fig. 3A). NUPACK analysis was then used to confirm that (i) the upstream linear region was still predicted to assume a singlestranded conformation, and (ii) no other competing RNA structures were predicted as a consequence of introducing the 5S rRNA hairpin (Supplementary Fig. 2). We first evaluated how adding this hairpin influences termination in the context of a 1 nt U tract.
By itself, the single U did not cause significant termination (Fig. 2B), and no increase in termination (loss of reporter fluorescence) was observed due to the addition of the hairpin (Fig. 3B). Interestingly the addition of the hairpin with the 1nt U tract caused an increase in observed fluorescence in the Linear-2 sequence context. We next extended the poly-U tract to 4 nt, corresponding to the average poly-U length of Pol III transcripts in the human genome [13]. For these constructs, upon introduction of the hairpin, we observed significant decreases in fluorescence in comparison to all three of the previously tested conditions of (i) a poly-U tract of 1 uracil and no hairpin, (ii) a poly-U tract of 4 uracils and no hairpin , and (iii) a poly-U tract of 1 uracil and an adjacent hairpin. We observed this pattern for both choices of linear sequence. We conclude that within our setup, the presence of secondary structure in a nascent transcript enhances Pol III termination at a 4 nt poly-U tract.

The Distance of the RNA Secondary Structure Impacts its Ability to Enhance Pol III Termination
We next investigated whether the position of the secondary structural element within the RNA transcript impacts its enhancement of termination efficiency. In the constructs analyzed in Figure 3, the hairpin was placed immediately adjacent to the poly-U track (i.e., a distance of 0 nt upstream). For comparison, we generated constructs in which the hairpin was instead placed on the other side of the linear sequence-at a distance of 10 nt upstream of the poly-U tract (Supplementary Figure 3). We again employed NUPACK analysis to confirm that new transcripts were predicted to assume the expected conformations (Supplementary Fig. 2). For constructs with a 1 nt U tract, the addition of a hairpin at 10 nt upstream did not increase termination ( Supplementary   Fig. 3B). Rather, the addition of a hairpin in this position appeared to increase fluorescence (Supplementary Table 1). When the poly-U tract was extended to 4 nt for constructs with a hairpin 10 nt upstream of the poly-U tract, fluorescence decreased significantly in comparison to the other conditions tested (Supplementary Fig. 3B), indicating that even at this distance, secondary structure enhances Pol III termination.
Similar trends were observed for both linear sequences. These data demonstrated that inclusion of a secondary structural element 10 nt upstream from the poly-U tract enhances Pol III termination in a similar fashion to that which occurs when the hairpin is directly adjacent to the poly-U tract.
Interestingly, the above observation contrasts with the prokaryotic intrinsic termination mechanism, where the RNA hairpin must be placed immediately adjacent to the poly-U tract in order to confer effective termination [14]. We therefore sought to investigate how far upstream RNA structure can be placed and still enhance Pol III termination. To do so, we used NUPACK to design a new transcript with a longer linear sequence (Linear-3) that enables insertion of the RNA hairpin up to 20 nt upstream from the poly-U tract. First, we confirmed that Linear-3-based constructs exhibited the same patterns observed for Linear-1-and Linear-2-based constructs when the hairpin was placed 0 nt or 10 nt upstream from the poly-U tract (Fig. 4B). We then created a series of constructs by incrementally increasing the distance between the hairpin and the poly-U tract. Overall, we observed a general trend of increasing fluorescence (decreasing termination efficiency) with increasing distance between the hairpin and the poly-U tract; there was no observable impact on termination efficiency once this distance reached 20 nt (Fig. 4C).
Overall, these data suggest that within the context of our assay, the location of RNA secondary structure and the length of the poly-U tract interact to modulate Pol III transcription termination efficiency in human cells.

Discussion:
In this study, we developed a means for quantitatively interrogating Pol III termination in human cells. We found that both poly-U sequence length, and the presence and position of an RNA hairpin structure can both influence Pol III termination.
Specifically, we found that poly-U tracts alone can enhance transcription termination if they are at least 7 or 8 nts in length (Fig. 2). In addition, an RNA hairpin structure can enhance the termination when used in conjunction with shorter poly-U lengths (Fig. 3), although this effect is diminished the further away this hairpin structure is from the poly-U tract (Fig. 4). Notably, RNA structure by itself did not appear to cause termination (Fig. 3). This is an important advance in understanding, since previous studies evaluating the impact of multiple RNA sequence and structure elements on Pol III termination offered conflicting findings about the importance of these features [6,7,17,18]. This study thus offers a potential resolution in supporting the interpretation that poly-U sequence and RNA structure are important for Pol III termination.
It is important to note the differences between this study and previous work.
Notably, the previous studies utilized assays in yeast as well as in vitro transcription reactions with purified components. Therefore, the cellular context of these assays and our current work differ, and it is possible that our conclusions only specifically apply to human Pol III transcription termination.
Interestingly, the finding that both poly-U tract length and RNA secondary structure can enhance Pol III transcription is similar to the case of prokaryotic intrinsic termination [4]. Prokaryotic intrinsic termination occurs when the RNAP encounters a poly-U tract and changes from an elongation to paused state. During this pause, secondary structure encoded within the nascent RNA forms and acts to further destabilize the transcription complex resulting in transcription termination [16,29]. In addition, the RNA secondary structure is often immediately adjacent to the poly-U tract in prokaryotes [14]. Notably, our findings for eukaryotic Pol III termination differ from the prokaryotic case in that the RNA structure still has an influence when not placed immediately adjacent to the poly-U tract. Potentially, this lack of spatial requirements may be due to the ability of Pol III to undergo extensive backtracking following interaction with the poly-U tract [6]. This backtracking of the polymerase may result in the repositioning of RNA secondary structure to be adjacent to the transcription complex, enabling termination. It is possible that the weaker hybridization forces between average length poly-U tracts and template result in a confirmational shift of the polymerase in a similar manner to what is seen during prokaryotic transcription 13 termination [30,31]. This shift may make the polymerase more sensitive to further destabilization forces, resulting in termination either from an upstream secondary structure or larger poly-U tracts. Further work will be needed to uncover the exact biomolecular interactions that are occurring during Pol III termination.
We also note that our synthetic biology approach for studying transcription termination using fluorescent RNA aptamers should be able to be used to study other features of Pol III termination. It would be of interest to assay the impacts of some notable factors such as the degree of upstream inter-nucleotide base stacking and the minimum free energy (MFE) of secondary structure [16]. Perhaps this work could also be utilized to perform functional enzymatic assays in cellulo to study how changes to the Pol III subunits, involved in termination, coordinate termination events with various nascent RNA sequence and structural elements [31]. We can envision testing an orthogonal Pol III mutant within our system following depletion of wild type Pol III that has been tagged with an inducible degradation systems [32]. As we currently do not expect any issues with adapting this assay for all culturable eukaryotic species, further testing may provide a complete model for Pol III termination across the eukaryotic domain.
This study adds to the growing body of knowledge of the RNA sequence and structural determinants of Pol III termination. Altogether, our system enables one to characterize termination within a context that may be most relevant for understanding the natural regulation of cellular processes which are known to impact both human development and disease [11,33]. This could also be important from a biotechnology standpoint, as an increased understanding of Pol III termination may lead to forward-14 design of novel termination sequences that possess desired levels of termination in different genetic contexts, which could be useful for defining expression of genes useful in a range of biotechnology applications [15].

Methods:
Design of RNA sequences: RNA sequences were designed, and structure prediction analysis performed using the Nucleic Acids PACKage [28]. RNA secondary structures were predicted from sequence utilizing the NUPACK online web portal in analysis mode. All folding queries were run under the RNA setting at 37 o C using default parameters. Novel linear regions (i.e. RNA sequences predicted to not fold into secondary structures) were designed using the NUPACK web server in design mode by utilizing dot bracket notation of the desired length with the design feature. For example, to produce a hairpin with 5 base pairs and a loop of 4 nt, we input the notation (((((….))))), which NUPACK used to generate an RNA sequence predicted to fold into that structure. RNA sequence outputs were then inserted into the complete sequence construct to ascertain whether they were predicted to fold as designed in that context.
Only sequences that were predicted to fold as designed were used in the study.
Cloning: All cloning was done utilizing the Gibson assembly protocol or inverse PCR [34,35]. All geneblocks and oligo primers were ordered from Integrated DNA Technologies. Assembled plasmids were transformed into and stored within NEB® Turbo Competent Cells (Supplementary Table 3). All constructs were sequenced verified using Quintara Biosciences. All construct variants utilized in the main text were based off the construct pAV-U6+27-Tornado-Corn. The construct pAV-U6+27-tCORN, also from the Jaffrey lab, was obtained from Addgene (Addgene plasmid #106233) (Supplementary Fig. 1). A table of Addgene accession numbers for constructs utilized in this study, except for pAV-U6+27-Tornado-Corn, can be found in Supplementary   Table 4. The construct pAV-U6+27-Tornado-Corn was received as a gift from Dr. Samie Jaffrey.
Construct Preparation: Plasmids were transformed into NEB® Turbo Competent Cells.   (Supplementary Fig. 6). The bead population was identified by FSC-A vs. SSC-A gating, and 9 bead subpopulations were identified through two fluorescent channels. MEFL values corresponding to each subpopulation were supplied by the manufacturer and a calibration curve was generated for the experimentally determined MFI vs. the manufacturer specified MEFLs. A linear regression was performed with the constraint that 0 MFI equals 0 MEFL, and the slope from the regression was used to convert MFI to MEFL for each cellular population.
Finally, we confirmed that reporter output did not vary significantly across the course of a representative 1.5 h flow cytometry data collection experiment ( Supplementary Fig. 7), ruling out potential artifacts due to the order in which samples were analyzed.
Data Analysis: The mean fluorescence intensity (MFI) described above of the singlecell, transfected population was calculated and exported for further analysis. To calculate signal, the FITC channel MFI was averaged across three biological replicates.
A vector-only control sample transfected with the BFP transfection control and empty vector (pcDNA) was treated with 5 µM DFHO dye, averaged across three biological replicates, and used to measure background signal. Statistical significance of measured fluorescence differences between specified cell populations was measured by utilizing a one-tailed heteroscedastic Welch's t-test either alone, or followed by the Benjamini-Hochberg procedure [36]. Data and statistical analysis were performed using Excel  Illustration of predicted RNA conformation which terminates at poly-U tracts of varying lengths. The U6 leader sequence is predicted to fold into a 5' hairpin structure. C) Termination efficiency was experimentally quantified for constructs varying in poly-U length for two different linear sequences, with outputs compared to the fluorescence control., as reported in Fig. 2C. Statistical significance of the indicated comparisons (brackets) was measured using one-tailed heteroscedastic Welch's t-tests followed by the Benjamini-Hochberg procedure with a false discovery rate cutoff of 0.05 (Supplementary Table 2). * = p < 0.05, ** = p< 0.01, *** = p < 0.001. of the poly-U tract (where Hairpin distance = X nt). The secondary structure utilized is a 23 nt portion of the 5S ribosomal RNA (rRNA) predicted to fold into a 9 bp hairpin. In these constructs, the Linear-3 sequence was used to enable X to be up to 20 nt. B) Constructs based upon Linear-3 exhibit a pattern which is similar to those based upon Linear-1 and -2 for hairpin distances of 0 or 10 nt. Significance was measured using a one-tailed heteroscedastic Welch's t-test followed by the Benjamini-Hochberg procedure with a false discovery rate cutoff of 0.05 (Supplementary Table 2). ** = p < 0.01, *** = p < 0.001. C) Moving the hairpin further upstream from the poly-U tract reduces termination efficiency. Observed fluorescence for hairpin distances 10, 14, 15, 16, 17, 18, 19 differed significantly from the no terminator control using the same statistical test (Supplementary Table 2