Genome-scale chromatin interaction dynamic measurements for key components of the RNA Pol II general transcription machinery

Background A great deal of work has revealed in structural detail the components of the machinery responsible for mRNA gene transcription initiation. These include the general transcription factors (GTFs), which assemble at promoters along with RNA Polymerase II (Pol II) to form a preinitiation complex (PIC) aided by the activities of cofactors and site-specific transcription factors (TFs). However, less well understood are the in vivo PIC assembly pathways and their kinetics, an understanding of which is vital for determining on a mechanistic level how rates of in vivo RNA synthesis are established and how cofactors and TFs impact them. Results We used competition ChIP to obtain genome-scale estimates of the residence times for five GTFs: TBP, TFIIA, TFIIB, TFIIE and TFIIF in budding yeast. While many GTF-chromatin interactions were short-lived (< 1 min), there were numerous interactions with residence times in the several minutes range. Sets of genes with a shared function also shared similar patterns of GTF kinetic behavior. TFIIE, a GTF that enters the PIC late in the assembly process, had residence times correlated with RNA synthesis rates. Conclusions The datasets and results reported here provide kinetic information for most of the Pol II-driven genes in this organism and therefore offer a rich resource for exploring the mechanistic relationships between PIC assembly, gene regulation, and transcription. The relationships between gene function and GTF dynamics suggest that shared sets of TFs tune PIC assembly kinetics to ensure appropriate levels of expression.


Background
Transcription is a highly complex biochemical process whose exquisite regulation is of fundamental importance in determining cell function and fate. A tremendous amount of information is available on the structure, biochemical functions, and relationships of various transcription factors (TFs), co-factors, and subunits of the general transcription machinery (1)(2)(3)(4)(5). This includes structures of the transcription preinitiation complex (PIC), which assembles at promoters and consists of the general transcription factors (GTFs) TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, as well as RNA polymerase II (Pol II) (1)(2)(3)(6)(7)(8)(9). In addition, genome-wide analyses have provided global snapshots of many factors along the eukaryotic DNA template (10)(11)(12)(13). These combined studies have led to a conceptual framework in which PICs are assembled stepwise at promoters. This process begins with nucleation by TFIID, a multisubunit complex that contains the DNA-binding subunit TATA-binding protein (TBP) (14,15), and can be further facilitated by binding of TFs and coactivators that physically contact GTFs (16). In vitro, following the binding of TBP/TFIID to a TATA-containing promoter, TFIIA and TFIIB can then associate with the complex, followed by Pol II in association with TFIIF, and then TFIIE (17). This multisubunit complex provides the substrate for recruitment of TFIIH (18), whose activities are required in vivo but may be dispensable in vitro using naked DNA substrates (19). A key factor contributing to PIC assembly in vivo is Mediator, which physically contacts multiple GTFs and modulates the activities of TFIIH (19)(20)(21). Live-cell imaging has documented the dynamic behavior of these factors and is generally consistent with such an assembly pathway, albeit occurring via highly dynamic and short-lived complexes (22). Importantly, the understanding of PIC assembly has emerged mainly from studies that have focused on the analysis of stable complexes formed in vitro or identified in vivo, lacking information about the locus-specific dynamics of the process. Furthermore, some evidence suggests that the canonical in vitro assembly pathway may not apply to PICs at all promoters in vivo (23)(24)(25). In addition to unexplored assembly pathway complexity, it has become apparent that in vivo transcription is a highly dynamic and stochastic process, with RNA synthesis often occurring from individual genes in bursts, and with variability occurring among genetically identical cells (26,27). Most models of RNA expression based on these types of observations do not posit particular features of protein-DNA complex behavior as the explanation, and relatively few genes have been analyzed in depth (28)(29)(30)(31)(32). Indeed, live cell imaging approaches have revealed that while TFs in general display very dynamic interactions with chromatin, the functional consequences of their interaction kinetics are only beginning to be explored on a mechanistic level (22,33,34).
The premise of this study is that PIC assembly dynamics are variable across the genome and that identification of kinetic pathways in PIC assembly will shed light on mechanisms of regulation that operate at the level of transcription initiation. To better understand PIC assembly in vivo, we have used an approach called competition chromatin immunoprecipitation (competition ChIP, ref. (35)) to measure the site-specific, genome-scale chromatin binding dynamics of five GTFs (TBP, TFIIA, TFIIB, TFIIE, and TFIIF) in the budding yeast S. cerevisiae. In addition, we compared promoter binding dynamics of these factors with RNA synthesis rates to determine how chromatin binding of key PIC components relates to the production of RNA. To our knowledge, this represents the first comprehensive analysis of PIC dynamics, provides a global picture of PIC assembly, and highlights promoter-specific variation.

Results
Competition ChIP (CC) is an approach in which cells harbor two isoforms of a transcription factor of interest with distinguishable epitope tags (Fig. 1a). We engineered diploid yeast cells to constitutively express one isoform with a Myc tag under control of the endogenous promoter and with the second isoform tagged with HA and under inducible GAL promoter control. In the CC experiments, cells were shifted to galactose at time zero to induce expression of the HAtagged competitor isoform, followed by cell culture sample collection at various time points (Fig. 1b). We then measured the relative occupancies of the Mycand HA-tagged species genome-wide at each time point (Fig. 1c) and used the relative occupancies as input to a model that describes the competition for chromatin binding to each site, yielding the sitespecific residence time (Fig. 1d). The principle of the assay is outlined in Fig. 1e,f, which illustrate how the occupancy ratios of the two isoforms of a particular factor would change if the factor has a short or long residence time at a particular site. Notably, TFIIA, TFIIE, and TFIIF are biochemically composed of more than one subunit, and thus, for these factors we epitope tagged one subunit and placed one copy of each subunit under GAL control in order to induce balanced expression when cells were grown in galactose (see Methods).
For each factor, we first measured the levels of both isoforms by Western blotting (Fig. 2a-c; Additional file 1: Fig. S1, Additional file 2: Table S1). The time-dependent accumulation of competitor isoforms could be fit to the Hill equation with induction half-times of ~43 min and Hill coefficients of ~4.5 on average (Fig. 2b,c). We estimated residence times by fitting the normalized time-dependent turnover ratios to a turnover model (36) (see Methods), and compared the fits to the HA-tagged competitor's synthesis rate. In this way, we were able to assign residence times for binding interactions with significantly longer (> 1 min) rates of turnover compared to the rate of competitor synthesis, and for reliable fits that were not significantly different from the rate of competitor induction, we were able to classify the chromatin binding residence times as < 1 min (see Methods). Overall, we were able to estimate residence times for each GTF binding to ~3000 or more promoters ( Fig. 2d; Additional file 3: Table S2). This represents roughly half of the Pol II promoters in the S. cerevisiae genome. Representative fits are shown in Fig. 2e; Additional file 1: Fig. S2. Note that the HA/Myc ratios at sites with rapid turnover closely mimic the time course of competitor induction, whereas more In (e,f) the icons in the top row indicate relative levels of constitutive (red) and competitor (green) isoforms.
long-lived complexes have turnover ratios that are notably displaced to the right of the competitor induction curves. The distributions of turnover times are shown in Fig. 2f. We identified different numbers of sites for each TF for which we were able to assign residence times; this is indicative of differences in the number of sites for which we were able to obtain reliable fits of the kinetic data, as well as likely differences in the efficiency of formaldehyde capture of short-lived complexes. It is notable that the majority of TBP, TFIIA, TFIIB, and TFIIF chromatin interactions were short-lived (i.e. < 1-2 min) whereas the majority of TFIIE complexes displayed residence times in the several minutes range. It was also notable that TFIIF residence times were bimodal, with most estimates being short-lived (~2 min or less) and the rest in the 5-10 min range ( Fig. 2f; discussed below).
To determine the relationship between GTF promoter residence time and the rate of RNA synthesis from the corresponding genes, we measured newly synthesized RNA under these same conditions (Additional file 1: Fig. S3a; Additional file 4: Table S3). Replicate samples (n=2) were acquired at 20 and 60 minutes post galactose induction. There was excellent agreement between the replicates and between the two time points (Fig. 3a; Additional file 1: Fig. S3b-d). Dynamic transcriptome analysis (DTA, ref. (37)) was applied to estimate RNA synthesis rates (Additional file 1: Fig. S3e), which were in reasonable concordance with earlier data from cells grown in galactose (Additional file 1: Fig. S3f, ref. (38)). We divided the mRNA synthesis rates into quartiles and compared them to GTF residence times (Fig. 3b,c). Residence times for TFIIA and TFIIB were on average modestly shorter for highly expressed genes compared to genes with lower expression levels, which may suggest a kinetic bottleneck in PIC assembly for poorly expressed genes that occurs after the binding of these two factors (see Discussion). Strikingly, the average TFIIE residence time increased with gene expression level across these four groups of genes (Fig. 3c), suggesting that the TFIIE residence time is an indicator of gene expression level. To relate residence time to RNA synthesis more directly, we calculated the ratio of mRNA molecules made per GTF binding event, which we previously defined as transcription efficiency (TE, ref. (36)). TE was on average < 1 mRNA synthesized per binding event for TBP, TFIIA, TFIIB, and TFIIF (Fig. 3d), suggesting that binding events by these factors do not efficiently give rise to the synthesis of mRNA. In addition, the TE values increased gradually and progressively for these factors with TBP having the lowest TEs and TFIIE the highest, in line with the in vitro assembly pathway in which TBP binds to promoters first, followed by TFIIA and TFIIB, which provide a platform for binding of TFIIF in association with Pol II (17). Notably, the median TE for TFIIE was close to one, suggesting that binding of TFIIE to promoters was associated with the production of one mRNA molecule on average. The results suggest that PIC formation is an increasingly efficient process along a pathway from TBP to TFIIE, and that the assembly of a TFIIE-containing PIC is associated with the production of a single molecule of mRNA. Using all of the GTF residence time data for Principal Component . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint Analysis (PCA) revealed a correlation between GTF binding dynamics and RNA synthesis along the first principal component, PC1 ( Fig. 3e; Additional file 1: Fig. S4), which was driven mainly by the positive correlations between TFIIE/TFIIF and RNA synthesis rate (Fig. 3f,g). This conclusion was further supported by linear modeling of the GTF residence time contributions to transcription rates (Additional file 1: Fig. S5a,b).
We next looked for pairwise relationships between the chromatin binding residence times of each GTF, and highlighted each gene by transcription rate (Fig. 4). TBP was less informative as most TBP binding events measured were short-lived and not well correlated with transcription rate (Additional file 1: Fig. S5c). In fact, the residence times of TFIIA, TFIIB, and TFIIF were not correlated with transcription rate either. This was in contrast to the positive correlation that was observed between TFIIE residence time and transcription rate (Additional file 1: Fig. S5). Interestingly, we observed a cluster of highly expressed genes whose promoters had long-lived TFIIE along with long-lived TFIIF.
Next, we clustered all the genes for which we obtained residence time measurements for four factors (TFIIA, TFIIB, TFIIE, and TFIIF (n = 1417)). We omitted TBP from this analysis due to the reduced number of sites with reliable estimated residence times > 1 min. We identified ten clusters spanning the full range of transcription rates (Fig. 5a,b; Additional file 5: Table S4).  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint Consistent with the results presented above, the most highly expressed genes had longer-lived TFIIE and/or TFIIF, whereas poorly expressed genes had promoters with longer-lived TFIIA. Longer residence times of TFIIB were associated with genes in several clusters, including in particular genes that were poorly . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint expressed (cluster 8; Fig. 5a,b). Notably, the relatively long residence times of both TFIIE and TFIIF at cluster 1 promoters were associated with the production of multiple mRNAs, suggesting the formation of stable (sub)complexes that promote transcriptional bursting.
In support of the biological significance of the observed residence time differences, genes within clusters 1-7 were functionally related (Fig. 5c). Cluster 1 genes include ribosomal protein genes and genes involved in RNA binding and translation. Additionally, cluster 2 genes are involved in biosynthetic processes; cluster 3 genes include those involved in Golgi organization; cluster 4 genes are involved in localization, transport, and the proteasome; cluster 5 genes are involved in proteasome degradation; cluster 6 genes have roles in nucleocytoplasmic transport; and cluster 7 genes are involved in proteasome and protein-lipid complex organization. The longer GTF residence times (as well as higher gene expression rates) at ribosomal protein genes in cluster 1 compared to the GTF residence times at other genes are statistically highly significant (Additional file 1: Fig.  S6a-c). Moreover, expression of genes in most of these clusters is controlled by particular TFs (or sets of TFs; Fig. 5d,e, Additional file 1: Fig. S7), suggesting a mechanistic relationship between particular TFs and PIC assembly dynamics. Modest but significant increases in TFIIA and TFIIE residence times were observed at promoters with strong TATA elements versus those without such an element (39); these changes were consistent with a significant increase in RNA synthesis rate driven by TATA-containing promoters versus those without strong TATA elements (Additional file 1: Fig. S8a-c).
An unexpected observation was the above mentioned bimodal distribution of TFIIF residence times (Fig. 2f, Fig. 4). We observed functional enrichment of the genes in each of these two classes, with promoters in both classes associated with different subsets of genes involved in translation/ribosome. Genes with short-lived TFIIF were further associated with other biosynthetic processes (Additional file 1: Fig. S9a,b). Consistent with this, the genes in the long-lived and short-lived TFIIF classes were associated with particular enriched TFs, some of which were shared (Additional file 1: Fig.  S9c-f). Among the TFs associated with genes in the long-and short-lived TFIIF classes, Rap1 was of particular interest as competition ChIP data were available for Rap1 from a prior study (40). Although Rap1 residence times were not correlated with residence times for TBP or TFIIB, there was a moderate correlation between Rap1 residences times and the residence times for TFIIA and TFIIE (Pearson's correlation coefficients ~ 0.37 and 0.3, respectively), and Rap1 residence times were significantly longer at genes with long-lived TFIIF compared to genes with short-lived TFIIF (Additional file 1: Fig. S10a,b).

Discussion
The computational approach employed here for extraction of kinetic parameters from CC data is well supported by comparison with previous work. The TBP residence times obtained by analysis of CC data in this study were correlated with the residence times obtained from an older study using microarray data (Additional file 1: Fig. S10c; (41)) and are also broadly consistent with kinetic results for TBP in human cells (42). This includes the rank ordering in which tRNA genes had much longer residence times than mRNA genes. Previously, we used a formaldehyde crosslinking kinetic approach, called CLK, to measure chromatin binding dynamics (43). While the CLK method is technically challenging as well as locusspecific (44), we observed a rough agreement between the kinetic parameters obtained by the two methods for the handful of loci for which complementary measurements are available (Additional file 1: Fig. S10d).
Live cell imaging has revealed that the majority of TF-chromatin interactions studied are short-lived, with residence times on the order of seconds (22,(45)(46)(47)(48)(49)(50)(51). This includes TFIIB (22,52,53), for which CC results are reported here. The observation of highly dynamic binding by TFs has led to the view that such dynamics enable temporally responsive regulation of gene expression, and that TF residence times are associated with the duration of bursts in which more than one RNA molecule is synthesized during the TF period of occupancy on the promoter (30,(54)(55)(56). Consistent with the observation of frequent short-lived chromatin interactions for TFs, we observed that the majority of the interactions between TBP, TFIIA, TFIIB, or TFIIF and chromatin had residence times of less than one minute (Fig. 2f). It was not possible to reliably estimate the residence times of these short-lived . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint interactions using CC, but they must last long enough to be captured by crosslinking. It is likely that other very short-lived interactions were not detectable by our method because of their inability to be crosslinked. Conversely, it is possible that long-lived chromatin interactions such as those we report here would be difficult to detect with live cell imaging particularly if they occur infrequently, although evidence is emerging for TF-chromatin binding residence times on the minutes time scale using live cell imaging (57).
The biological significance of the residence times reported here is supported by the functional enrichment of genes in each of the clusters (Fig. 5c). This argues strongly that GTF residence time dynamics are tuned to facilitate expression levels that ensure that cells function and respond in physiologically appropriate ways. Since these gene sets are controlled by specific sets of TFs (Fig. 5d,e), it is reasonable to suggest that GTF dynamics are influenced in predictable ways by the TFs that control expression of the associated genes. It is understood that TFs exert context-specific effects on gene expression, and such effects have been generally described in terms of effects mediated by co-regulatory interactions with other TFs as well as epigenetic control, including DNA methylation (58)(59)(60). In future work, it could be interesting to explore how GTF residence times are impacted by manipulation of such regulators. We suggest that RNA output resulting from the interplay of these variables is at least partly a consequence of the capacity to catalyze the formation of functional PICs by overcoming kinetic bottlenecks in PIC assembly that are also related to the underlying DNA sequence and chromatin environment.
A striking observation from the results of this study is that the residence time of TFIIE is correlated with the mRNA synthesis rate, and the ratio of mRNA molecules produced to the TFIIE residence time suggests that one TFIIE binding event is associated with the production of one mRNA molecule (Fig. 3d). This is in contrast to the other GTFs for which one binding event was associated with less than one mRNA molecule produced. We were not able to measure Pol II directly using CC because we do not have a system for inducing the expression of all of the Pol II subunits to generate a competitor isoform of Pol II. However, TFIIF can serve as a proxy for Pol II itself as biochemical and structural data support a model in which TFIIF enters the PIC in association with Pol II (3,(61)(62)(63)(64). The combined results suggest that the formation of a PIC is an inefficient process in vivo, with most interactions of GTFs leading to subcomplexes that decay rather than leading to formation of a PIC capable of producing mRNA. This general view of transcription initiation inefficiency is consistent with live cell imaging data obtained by analysis of a gene array in a mouse cell line (65). Moreover, this pattern is broadly consistent with a PIC assembly pathway derived from in vitro studies in which TBP/TFIID initially interacts with DNA directly, followed by the binding of TFIIA and TFIIB, which provide a platform for the binding of Pol II and TFIIF, and subsequently TFIIE ( Fig. 6; (24)). We infer the existence of stable TFIIB complexes on the basis of slow turnover at a relatively small number of genes; it appears that most TFIIB-containing complexes are unstable and that assembly of TFIIB in the PIC requires Pol II (22). Despite the dispensability of some GTFs in vitro under certain conditions, our results are also consistent with depletion experiments showing that all of the GTFs are required for all Pol IImediated transcription in vivo, and that stable, partially assembled PICs are not detectable (66). Of note, however, we did observe a small number of relatively long-lived complexes containing TFIIA or TFIIB (Fig. 2d,f; Table S1). Such long-lived complexes could be consistent with the formation of a subcomplex of GTFs that is durably bound to promoters and promotes reinitiation (67). The formation of long-lived scaffolds of GTFs at some promoters is also suggested by the residence times of TFIIE and TFIIF at Cluster 1 genes, which were associated with the production of multiple mRNAs (Fig. 5b). Lastly, our analysis includes the minimal set of GTFs required for in vitro transcription using a naked DNA template (68)(69)(70). In future work and using methods suitable for analysis of multi-subunit complexes, it will be interesting to investigate the dynamics of TFIIH (71,72), Mediator and Pol II itself (73,74). Other important questions that could be addressed by performing kinetic measurements in suitably perturbed cells include probing the roles of promoter chromatin structure, and particularly the function of the first nucleosome (66). Taken together, we feel that the results presented here provide a foundation for future work to understand how TFs, co- The results presented here suggest that for the majority of genes, PICs are unstable until TFIIE binding which leads to functional PIC assembly, the initiation of RNA synthesis and release of Pol II and PIC disassembly. At a relatively small subset of genes e.g., genes coding for ribosomal subunits, relatively stable PICs are formed upon TFIIF binding (note lighter color for disassociation) and further stabilized by TFIIE binding, followed by the initiation of RNA synthesis. Upon Pol II release, stable PICs may be disassembled or at certain promoters may be stable and lead to transcriptional bursting. The formation of more stable PICs is likely associated with promoter-specific features and cofactors. The figure is meant to be illustrative and does not represent accurate sizes or molecular shapes of the factors of interest.
factors, and the native chromatin environment contribute mechanistically to the establishment of the rates of transcription initiation observed in vivo.

Conclusions
The results reported here provide a wealth of kinetic information describing the chromatin binding dynamics of five key GTFs at the majority of promoters in budding yeast. In general agreement with live cell imaging results, we find that many interactions are too short-lived to be measured by CC. However, there are many interactions with residence times in the several minutes range, and importantly, promoters with shared GTF kinetics are functionally related. This supports a model in which the rates of RNA synthesis in vivo are influenced or perhaps controlled by rates of PIC assembly, which themselves result from the combination of promoter sequence, chromatin environment and the TFs and cofactors that impact them. Overall, the kinetic behavior is consistent with the stepwise PIC assembly pathway established using purified components in vitro and in which the RNA synthesis rate is closely correlated with the residence time of TFIIE. These results suggest that at most promoters, relatively unstable GTF subcomplexes give rise to more stable fully assembled PICs and that the initiation of RNA synthesis is accompanied by PIC dissolution. At certain promoters, GTF binding events are associated with the production of multiple mRNAs, suggesting the formation of stable PIC subcomplexes that facilitate transcription reinitiation.

Yeast strains
The parental diploid strain W303 (75) was used to generate all of the competition ChIP strains. For each GTF, one allele was N-terminally tagged with 3xHA and placed under the control of an inducible GAL1 promoter. The other allele was N-terminally tagged with 9xMyc and remained under the control of the endogenous promoter (76).
For construction of the GAL1-induced alleles, the plasmid pFA6-His3MX6-PGAL1-3HA (RRID:Addgene_41610, ref. (76)) was used to obtain the His3MX6-PGAL1-3HA cassette by PCR amplification (see Additional file 6: Table S5 for primers) and was integrated into the genome using standard yeast molecular biology techniques. For the GTFs TFIIA, TFIIE, and TFIIF, which consist of two subunits, one copy of each subunit was placed under GAL1 control to ensure balanced expression of the competitor isoform. Following integration of the HIS3-GAL1-3HA cassette at one gene subunit, the strain was transformed with the TRP1-GAL1 cassette from pFA6-TRP1-PGAL1 (RRID:Addgene_41606, ref. (76)), placing the second subunit under GAL1 control but without an epitope tag. The 9xMyc tag was integrated into the genome of another isolate of W303 using the integration and Cre-recombinase knockout method and reagents developed by Gauss et al (77). The 9xMyc tag and loxP-flanked KanMX6 marker were PCR amplified from pOM20 and integrated into the yeast genome using standard methods as above. The . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint KanMX6 marker was then then deleted using the GALinducible Cre recombinase carried on the plasmid pSH47 (78). The Myc-tagged strains were then transformed with pRS319 (RRID:Addgene_35459, ref. (79)) to introduce a LEU3 marker for selection. In subsequent steps, diploid strains with HA-or Myctagged alleles were sporulated and haploid segregants were mated to yield the competition ChIP (CC) strains with different tags on each of the alleles for the GTF of interest. Proper integration and function of the targeted alleles were confirmed for all strains by PCR (Additional file 6: Table S5 for primers), Western blotting using anti-HA or anti-Myc antibodies, and targeted DNA sequencing of the modified loci.

Western blotting
To measure the time course of synthesis of the GAL1induced alleles, CC strains were grown in 175 ml YEP+2% raffinose. At OD600 of 0.6, a 20 ml aliquot of the culture was collected for the 0 min time point and 11 ml of 30% galactose was added to the remaining culture. 20 ml aliquots were removed at 10, 20, 25, 30, 40, 60, 90 and 120 minutes after galactose addition, and whole cell extracts were prepared from them as described previously (44). Whole cell extract protein was resolved on 10-12% SDS-Page gels (depending on the size of the tagged protein). The protein was transferred overnight to 0.22µ PVDF membranes and probed using either anti-HA (Abcam Cat# ab9110, RRID:AB_307019) or anti-Myc (Abcam Cat# ab32, RRID:AB_303599) antibodies followed by detection using either the HRP-conjugated goat anti-mouse secondary antibody, (for Myc; Thermo Fisher Scientific Cat# 31430, RRID:AB_228307) or goat anti-rabbit secondary antibody (for HA; Thermo Fisher Scientific Cat# 31460, RRID:AB_228341) and ECL substrate (Thermo Fisher Scientific Cat# 32106).

CC time course experiments and ChIP-seq library preparation
Each CC strain was inoculated in 100 ml YEP+2% raffinose at 30° C and incubated overnight. These starter cultures were then used the next day to inoculate 2,250 ml cultures of YEP+2% raffinose at an initial OD600 of 0.05. When an OD of 0.6 was reached, for the 0 minute timepoint 250 ml of the culture was crosslinked by adding 6.75 ml formaldehyde (Thermo Fisher Scientific Cat# F79-500) to achieve a final concentration of 1% for 20 minutes. The reaction was then quenched by adding 15 ml of 2.5 M glycine for 5 minutes and the cells were collected by centrifugation. To the rest of the 2,000 ml culture, 142.8 ml of 30% galactose was added to yield a final concentration of 2%. At 10, 20, 25, 30, 40, 60, 90, and 120 minute time points, 250 ml of the culture was collected, crosslinked, and quenched the same way as the 0 min time point. Cell pellets were washed 3 times with TBS buffer (40 mM Tris-HCl, pH 7.5 plus 300 mM NaCl) and ChIP was performed as described (80). The HA and Myc antibodies used for ChIP were the same as those used for western blotting described above. Successful ChIP was confirmed by RT-PCR using primers to detect binding to the URA3 promoter (5'-AAGATGCCCATCACCAAAA-3' and 5'-AAGAATACCGGTTCCCGATG-3'). ChIP-seq libraries were prepared following the manufacturer's instructions using the Illumina TruSeq ChIP library prep kit set A and B (Cat# IP-202-1012 and IP-202-1024). Successful amplification was confirmed by RT-PCR using the URA3 promoter primers. Library quality was assessed using an Agilent Bioanalyzer 2100 and the Agilent-1000 DNA kit (Agilent Cat# 5067-1504), and libraries were quantified using the Qubit dsDNA Quantitation, High Sensitivity kit (Cat# Q32851). A 5nM pool of each library was sequenced on by the UVA Genome Analysis and Technology Core (RRID:SCR_018883) using Illumina NextSeq500 and NextSeq2000 instruments. Nascent RNA labelling Nascent RNA labelling was performed as previously described (81). Briefly, W303 cells were grown as for competition ChIP and induced with 2% galactose for 20 or 60 minutes. An 800 ml culture in YEP +2% raffinose was grown at 30° C to an OD600 of 0.6, then 57ml of 30% galactose was added. Twenty minutes after galactose addition, 400 ml of the culture was divided into 200 ml aliquots and 500 µl of 2M 4thiouracil (4-sU, Sigma-Aldrich Cat# 440736-1G) was added to one of the flasks with vigorous mixing and returned to the shaking incubator for 6 minutes. Cells with and without 4-sU were pelleted and washed with TBS. At the 60 minute time point the remaining 400 ml culture was split and treated as described for the 20 minute timepoint culture. Two biological replicates were obtained for each condition.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint S. pombe strain SY78 cells were used as a spike-in normalization control. 100 ml of S. pombe cells were grown in YE media (0.5% yeast extract plus 3% glucose) to an OD600 of 0.6 and labelled by adding 125 µl of 2M 4-sU for 6 minutes and collected by centrifugation.
The S. cerevisiae W303 cells and S. pombe SY78 cells were mixed in an 8:1 ratio for each condition and RNA was isolated using the Ribopure Yeast Kit (Ambion Cat# AM1924). 40 µg of RNA was biotinylated with 4 µg of MTSEA Biotin XX (Biotium Cat# 90066). The biotinylated RNA was isolated by binding to 80 µl of a Dynabeads MyOne Streptavidin C1 bead suspension (Invitrogen Cat# 65001) by rotating the tube for 15 minutes, and the unbound supernatant was saved. The bound RNA was eluted in 50 µl of streptavidin elution buffer. The eluted RNA and the RNA in the flowthrough were purified and concentrated using RNeasy columns (Qiagen Cat# 74104).

RNA-seq
Ribosomal RNA was depleted using the Ribo Minus Yeast module (Thermo Fisher Scientific Cat# 45-7013) and libraries were constructed using the Ultra Directional RNA Library Prep Kit (NEBNext Cat# E74205) and Multiplex Oligos (NEBNext Cat# E73355). Sequencing was performed by Novogene using the Illumina NovaSeq 6000 platform.

Preprocessing of high throughput DNA sequencing data
Libraries prepared from each time point for a given GTF and for either HA-or Myc-tagged samples were sequenced in a single multiplexed run. Raw read quality was assessed using FASTQC (v0.11.5) (82). Fastq files from individual flow cells were merged and reads were mapped to the sacSer3 reference genome using Bowtie2 (v2.2.6) (83) with default settings. Overall read mapping was typically in the 90+% range, yielding ~20-30M reads per time point on average. The resulting SAM files were converted to BAM format, unmapped reads were removed and the BAM files were sorted and indexed using SAMtools (v0.1.19-44428cd) (84). The landscape of read mapping was inspected using the Integrated Genomics Viewer (IGV) (85) and peaks of enrichment were identified using MACS2 (v2.1.0.20151222) (86) applied to each of several early time point Myc datasets with an input dataset as control and options --nomodel --extsize 147. Peaks from individual MACS2 runs were browsed in IGV, then concatenated and merged using the bedtools (v2.18.2) merge function (87). Count tables were then generated by associating reads with the peak intervals using bedtools multicov. Read counts were normalized in a three-step process. First, read counts in each peak and for each time point were normalized to the overall read depth. Next, read counts for the HA samples were normalized to the average relative levels of the factor of interest using the average values obtained from three independent western blots. Lastly, the normalized HA read count matrix was divided by the normalized Myc count matrix to yield the ratio count tables for mathematical modeling as described below. Importantly, this normalization approach was validated by comparison with earlier results: residence times derived from normalized TBP CC data were strikingly well correlated with TBP CC data obtained many years earlier and using arrays rather than sequencing (Additional file 1: Fig. S10c).

Deriving residence times from competition ChIP-seq ratio data using a mass action kinetics turnover model.
We adapted the approach of Zaidi et al. (36) originally developed for TBP competition ChIP-chip data, to fit a differential equation based turnover model at every GTF site using normalized competition ChIP-seq data from multiple GTFs. We used normalized count tables (see previous section of Methods) with HA/Myc ratios for every GTF site, ( ), for every timepoint, . We ultimately estimate the ratio of fractional occupancies of HA-over Myc-tagged GTF, ! ( ) " ( ) ⁄ with and representing HA-and Myc-tagged proteins, respectively, from ( ) at every timepoint. We then fit a mass action kinetic turnover model to the estimated ratio of fractional occupancies at every promoter site where a peak was identified. More specifically, we first fit the normalized ratio of HA-over Myc-tagged relative protein levels as estimated by Western blotting versus induction time, which we denote ! ( )/ " with and representing HA-and Myctagged protein, respectively, to a Hill model . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. (1) In Additional file 7: Table S6, we show the resulting fitted parameters ( # , $/&'() ) and statistics associated with the significance of each parameter's contribution to the fit for every GTF. In this case, we fixed the Hill coefficient, , to be an integer and selected the value that maximized the adjusted & . In order to satisfy the = 0 and → ∞ boundary condition of the mass action kinetic turnover model shown below in Eqs. (2) and (3) ( ( ) − ). We then effectively solve the following coupled differential equations, which model each GTF's turnover at every site which we assume follows mass action kinetics, where . and ) are the molecular onand off-rate respectively: We assume that these rates are the same for both HA-and Myc-tagged GTFs. These coupled equations cannot be solved analytically. Thus, we effectively solve them and fit the resulting ratio of occupancies, , to the background subtracted, scaled competition ChIP-seq data using Mathematica. Briefly, we use the function ParametricNDSolveValue twice to return an effective, numerical solution of Eqs. (2) and (3) as a function of the parameters . " and ) : ! ( ; . " , ) ) and " ( ; . " , ) ). We then take the ratio of the outputs of ParametricNDSolveValue, ! ( ; . " , ) )/ " ( ; . " , ) ), and input it into NonlinearModelFit which then fits this ratio to the background subtracted, scaled competition ChIP-seq data. We and others formally show the ratio of fractional occupancies is relatively insensitive to the on-rate, . " , while being highly sensitive to the offrate, ) . We derive the physical residence time for every GTF at every site using $/& = ln 2 / ) . Finally, we make use of an observation made in (36) to make precise starting estimates of the residence time for non-linear model fitting using NonlinearModelFit. Specifically, the residence time is well approximated by a relatively simple linear or quadratic function of  Table S7, we show the initialization formulas used for the final turnover model fit to the competition ChIP-seq data used to derive the final estimates of residence times for every GTF. Finally, NonlinerModelFit returns a number of statistics associated with the fit at every site. This includes an error estimate of the off-rate, ∆ ) , and the adjusted & . Sites that yielded a relative error ∆ ) / ) < 3 and adjusted & > 0.7 were used in downstream analysis involving residence time estimates.

Fitting additional reliably fast sites
After initial fitting, additional reliably fast sites were added to the estimated residence times. These were identified through fitting Hill equation Eq. (1) with the R nls function to the normalized HA/Myc count ratios which were further normalized to range between zero and one. Hill coefficients were provided from protein induction curve fits (Fig. 2c, Additional file 1: Fig. S1e).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint Initial estimates for fitting the Hill model using the nls function were set with parameter start = list(t1/2CC=40, XCC = 1), and parameter control was set to nlc. For each GTF, sites without estimated residence times from the turnover model whose (Fig.  1d) were less than 2 min were classified as reliably fast (< 1 min). All residence time estimates are available in Additional file 3: Table S2.
For plotting purposes, the residence times for the reliably fast sites were generated with the R runif function with min=0, max=1. At the beginning of each script, the function set.seed was used with parameter 42 for reproducibility. In each plot, the randomly generated values are highlighted either by their separation by dashed line or shaded area.

Gene assignment and filtering
Individual regions were assigned to the nearest genes with calcFeatureDist_aY function (available from https://github.com/AubleLab/annotateYeast) with default parameters. Only regions within -250 to 100 bp from transcription start sites (TSSs) were kept. If multiple regions were assigned to one gene, only the closest one was kept. Regions assigned to tRNAs were removed from the analysis.
To create alignment indexes for S. pombe (used for normalization), the S. pombe FASTA file (ASM294v2) was obtained from Ensembl (89) and converted to an index files with the hisat2-build function with default parameters. The paired-end FASTQ files were then mapped against the created index files and further processed analogously to S. cerevisiae.
The aligned reads were quantified over S. cerevisiae genes using Rsubread (2.4.3) (91) featureCounts function with parameters GTF.featureType="gene", GTF.attrType="gene_id", countMultiMappingReads=TRUE, strandSpecific=2, isPairedEnd=TRUE. The GTF and FASTA files provided to the function were obtained from Ensembl (89), genome assembly R64-1-1. To normalize the data, normalization factors for each sample were calculated as the total number of reads mapped to S. pombe divided by 2,000,000. The normalized counts were obtained by dividing the raw counts by each sample's corresponding normalization factor. Genes with 0 counts in more than half of the samples were filtered out.
Principal component analysis (PCA) was performed by first creating a DESeq object from the raw count DESeq2 was used to identify any differences in gene expression between samples grown for 20 or 60 minutes in galactose. Raw counts from samples with thiouracil addition were passed to DESeqDataSetFromMatrix function with design parameter set to time in galactose. S. pombe normalization factors were set with normalizationFactors. Genes with adjusted p-value (padj) < 0.05 were considered differentially expressed between the two conditions. Synthesis rates were estimated with DTA (2.36.0) (93) DTA.estimate function. S. pombe-normalized counts from samples with thiouracil addition were used for the analysis. All genes with 0 count in any of the samples were filtered out and the final matrix passed to the function. All genes from the final filtered matrix were passed to the parameter reliable. Further parameters were set to: tnumber=Sc.tnumber, check=TRUE, ccl = 150, mRNAs=60000, condition="real_data", ratiomethod="bias", and time in the phenomat object was set to 6. Final synthesis rates in mRNA per cell per minute were obtained by dividing the synthesis rates output from the . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint DTA.estimate function by 150 (length of the cell cycle in minutes). The final synthesis rates are available in Additional file 4: Table S3. Comparison of synthesis rates between samples grown for 20 vs. 60 minutes in galactose was performed using DTA.dynamic.estimate function similarly as described above with additional columns timeframe and timecourse in the phenomat object specifying 20 vs. 60 minute conditions. The correlation between the synthesis rates of the two time courses was calculated using the R cor function with method="pearson".  (38), and Rap1 residence times from Lickwar et al, 2012 (40). Correlations were calculated with R cor function. For residence time correlations, where we do not have exact time estimates for fast sites, Pearson's correlation was used, while for synthesis rates, Spearman's rank correlation was used.

Model plotting
Examples of model fits were obtained by extracting Hill equation coefficients, as described in "Fitting additional reliably fast sites" section of Methods. Output model values and the measured competition ChIP (CC) values were both scaled to range between zero and one to create comparable plots by dividing the values by the estimated Xcc parameter.

Visual inspection with genome browser
To view the normalized HA/Myc ratios in the genome browser, BAM alignment files were first converted to bigWig files using the deepTools (3.3.1) (94) bamCoverage function. The parameter scaleFactor was set to per million mapped reads scaling factor for the Myc samples and to per million mapped reads multiplied by HA/Myc protein induction ratio for the HA samples. The final log2 transformed ratios of HA/Myc were obtained by passing the generated bigWig files to the deepTools bigwigCompare function with parameter operation set to log2.

Residence time vs. synthesis rate
To explore the residence times of each analyzed GTF in relationship to synthesis rates, synthesis rates were first divided into quartiles using the R ntile function with the parameter ngroups set to 4. Residence times within each synthesis quartile were plotted as boxplots with ggplot2 (3.3.6) (95) geom_boxplot function, where the middle line represents the median, the lower and upper hinge represent the first and third quartiles, and the whiskers represent 1.5 * interquartile range of the values.
The correlations between synthesis rates and residence times were calculated with R cor function with method set to "pearson".
Linear models between synthesis rates were built with R lm function either as linear models between synthesis rate and residence times of individual GTFs or as a linear model between synthesis rates and a linear combination of residence times of all factors in one model.

Transcription efficiency
Transcription efficiency (TE) was obtained by multiplying the synthesis rate by residence time of a given TF. The log2 transformed values were plotted with ggplot2 geom_violin function to better represent the efficiency of a binding event to produce an RNA molecule (values below zero represent multiple binding events for RNA molecule synthesis). Medians of the log2 transformed TE values for each TF were added to the violin plots with the tidyverse (1.3.1) (96) stat_summary function with parameter fun=median.

PCA
To represent genes or GTFs using their corresponding high dimensional data in low dimensional space, we performed PCA on the residence times with or without exclusion of the reliably fast sites. Since the residence time estimates for all TFs were not available for all genes the missing values were imputed with the missMDA (1.18) package (97). The table containing the reliable residence times was first passed to the estim_ncpPCA function with parameter method.cv set to "Kfold". The residence time table was then passed to the imputePCA function along with the ncp object outputted from the estim_ncpPCA function. The completeObs object from the outputted list was then passed to the prcomp function with parameter scale.=TRUE to obtain the principal components. Depending on the orientation of the input matrix passed to the prcomp function, principal components representing genes or GTFs were obtained. To colorcode the PCA plot with mean synthesis rates, the . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint tidyverse (1.3.1) (96) function stat_summary_2d was used with parameter z set to the synthesis rates and parameter color set to "transparent". Viridis (0.6.2) (98) color scale "B" was used for coloring. The first two principal components from the "gene-oriented" PCA matrices were then correlated with the residence times of each TF and with the synthesis rates using the R function cor with method="pearson".

Residence time and synthesis rate comparison between gene classes
The list of genes with TATA-containing promoters was obtained From Rhee and Pugh, 2012 (99). Genes were classified as ribosomal subunit if their systematic name started with "RPL". To compare the residence times and synthesis rates between classes, a twotailed t-test was carried out with results plotted using the ggpubr (0.4.0) (100) stat_compare_means function with parameters set to method = "t.test", label = "p.signif". The symbols indicate the following: n.s. p > 0.05, * p <= 0.05, ** p <= 0.01, *** p <= 0.001, and **** p <= 0.0001. To compare residence times across synthesis quartiles, synthesis rates were separated into the four quartiles based on synthesis rates within each group (e.g. TATA-containing and TATA-less). Box plots were created using ggplot2 (3.3.6) (95) geom_boxplot function, where the middle line represents the median, the lower and upper hinge represent the first and third quartiles, and the whiskers represent 1.5 * interquartile range of the values.

Heatmap
Only genes for which residence times were available across all GTFs (except for excluded TBP, whose residence times are mostly <1 minute and would therefore present mostly randomly generated values) were included in the heatmap (n = 1417). Reliable fast residence times were replaced by randomly generated values between zero and one (function runif: min=0, max=1; set.seed (42)). Prior to plotting, residence times for each factor were z-score normalized using the R function scale with default settings. A final heatmap was created with the ComplexHeatmap (2.6.2) (101) function Heatmap with parameters set to clustering_method_rows="ward.D", row_split=10. Genes belonging to each of the 10 clusters (Additional file 5: Table S4) were extracted from the heatmap object and mean synthesis rates for each cluster were calculated.

Functional annotation
Genes belonging to each heatmap cluster were passed to g:Profiler (102) for pathway enrichment. In g:Profiler, S. cerevisiae S88C was selected as organism and data sources were set to GO molecular function (GO:MF), GO biological process (GO:BP), KEGG, WikiPathways (WP), and TRANSFAC. Additionally, genes from the clusters were tested for enrichment within genes associated with DNA-binding factors (DBFs) from Rossi et al, 2021 (13), here referred to as Yeast Epigenome database (see section "Yeast DBF database (Yeast Epigenime)" of the Methods for information about data accessions and curation). Enrichment was established by performing Fisher's exact test (R function fisher.test, parameter alternative="greater"), where the universe was set to the union of all genes involved in the heatmap and all genes associated with a given factor. Final p-values were corrected for multiple testing with false discovery rate (FDR, R function p.adjust: method="fdr"). Results with FDR padj < 0.05 or p < 0.05 were considered significant.

Yeast DBF database (Yeast Epigenome)
BED files from Rossi et al, 2021 (13) were obtained from Gene Expression Omnibus under accession number GSE147927. Replicates were merged with the bedtools (v2.29.2) (87) merge function after they were sorted with the base Linux sort function with parameters -k1,1 -k2,2n. Regions were then assigned to genes analogously to assignment of the CC regions (see "Gene assignment and filtering" section of Methods). The output consists of gene lists for individual DBFs within promoter regions.

CC
Competition ChIP DBF DNA binding factor DTA dynamic transcriptome analysis . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets supporting the conclusions of this article are available in the GEO repository, GSE235002. The CC samples are available from GSE235000, and nascent RNA-seq samples from GSE235001. The scripts are available from https://github.com/AubleLab/PIC_competition_ChIP_ scripts (103).

Competing interests
The authors declare that they have no competing interests.

Funding
Funding for this work was provided by National Institutes of Health (Grant R01 GM55763 to DTA) and the Biomedical Sciences Graduate Program, University of Virginia (Wagner fellowship to KK). The funders did not have any role in the design of the study and collection, analysis, and interpretation of the data nor in the writing of the manuscript.

Authors' contributions
EAH and SJS performed the yeast molecular biological and molecular genetic manipulations. Samples and libraries for CC were obtained by SJS. Preprocessing of the data and initial analyses were performed by DTA. SB developed the kinetic model and performed kinetic analysis of the CC data using the mass-action model. KK analyzed the residence time results, analyzed the nascent RNA-seq data, and generated all of the figures. All authors contributed to writing the manuscript.

List of additional files
Additional file 1.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 26, 2023. ; https://doi.org/10.1101/2023.07.25.550532 doi: bioRxiv preprint