SUMMARY
Accurate chromosomal DNA replication is essential to maintain genomic stability. Genetic evidence suggests that certain repetitive sequences impair replication, yet the underlying mechanism is poorly defined. Replication could be directly inhibited by the DNA template or indirectly, for example by DNA-bound proteins. Here, we reconstituted replication of mono-, di- and trinucleotide repeats in vitro using eukaryotic replisomes assembled from purified proteins. We found that structure-prone repeats are sufficient to impair replication. Whilst template unwinding was unaffected, leading strand synthesis was inhibited, leading to fork uncoupling. Synthesis through hairpin-forming repeats relied on replisome-intrinsic mechanisms, whereas synthesis of quadruplex-forming repeats required an extrinsic accessory helicase. DNA-induced fork stalling was mechanistically similar to that induced by leading strand DNA lesions, highlighting structure-prone repeats as an important potential source of replication stress. Thus, we propose that our understanding of the cellular response to replication stress also applies to stalling induced by repetitive sequences.
INTRODUCTION
Faithful and accurate chromosomal DNA replication is a fundamental process that is required to maintain genome stability and is performed by a multi-protein complex termed the replisome (Bell and Labib, 2016). The replisome encounters various types of challenges, including DNA damage, DNA-bound proteins, collisions with the transcriptional machinery, RNA-DNA hybrids (R-loops), topological stress and limiting dNTPs (Gaillard et al., 2015). Under unperturbed conditions, leading strand synthesis is coupled to unwinding, and this contributes to maximal fork rates (Yeeles et al., 2017). However, when synthesis is stalled, CMG can continue to unwind at a reduced rate, a scenario termed helicase-polymerase uncoupling (Devbhandari and Remus, 2020; Taylor and Yeeles, 2018, 2019).
In addition to exogenous factors, certain DNA sequences can intrinsically pose a challenge to the replisome, in terms of both fidelity and dynamics. Most of our understanding of how DNA affects its own replication stems from studies of expansion-prone repeats which drive nearly 50 different neurodegenerative diseases (Lopez Castel et al., 2010; McMurray, 2010; Mirkin, 2007). Roughly half of these conditions are caused by expansion of just three repeat classes – (CGG/CCG)n, (GAA/TTC)n and (CTG/CAG)n (hereafter referred to as (CGG)n, (GAA)n and (CTG)n). In these diseases, the number of repeat units is highly variable within the general population. When repeats expand to an intermediate range, individuals may exhibit partial phenotypes. Further expansion, usually within one generation, leads to a full mutation. For example, Fragile X syndrome is caused by expansion of (CGG)n repeats within the 5’ UTR of the FMR1 locus. Unaffected individuals harbour 6-52 (CGG)n repeats, an intermediate range is 53-250, whereas patients harbour 250 to 2,000 repeat units.
One of the earliest proposed mechanisms for contractions or expansions of repeats was replication slippage (Hartenstine et al., 2000; Petruska et al., 1998), a process by which the template and nascent strands reanneal out of register due to the repetitive nature of the template (Kunkel, 1990). However, large scale contractions and expansions cannot easily be explained by slippage. Furthermore, slippage can occur in any repetitive sequence, yet only some repeats undergo large scale expansions. Replication, transcription and various repair pathways have been implicated in large scale expansions (Khristich and Mirkin, 2020), but the exact underlying mechanisms are not yet fully understood. Current models are based on the finding that repeat expansions correlate with the propensity of sequences to fold into unusual DNA secondary structures.
Several types of non B-DNA secondary structures have been characterised, including (i) hairpins, (ii) G-quadruplexes (G4s), (iii) intercalated motifs (i-motifs) and (iv) triplexes. Hairpins are simple intramolecular fold-back structures that rely on classic Watson-Crick base pairing. Inverted repeats and palindromes can form perfectly annealed hairpins while (CNG)n repeats can form mismatch-containing hairpins. G4s are four stranded intra-or intermolecular structures formed by Hoogsteen base pairing between guanine residues (Gellert et al., 1962). Four guanines can form a planar arrangement termed a G-quartet and stacking of multiple G-quartets yields a G4. While G-rich sequences can form G4s, C-rich sequences can form a four-stranded structure called an i-motif, where pairs of hemi-protonated cytosines form Hoogsteen base pairing in a criss-cross pattern (Gehring et al., 1993). Hairpins, G4s and i-motifs can all form locally within a stretch of single stranded DNA (ssDNA). In contrast, triplex DNA requires a donor duplex DNA, with a third strand annealing via Hoogsteen base pairing (Frank-Kamenetskii and Mirkin, 1995). Triplexes can arise from homopurine-homopyrimidine mirror repeats, such as (GAA)n repeats, and their formation is favoured by negative supercoiling (Wells, 2008).
The first evidence that repeats can stall fork progression in vivo was the detection of replication intermediates of plasmids containing (CGG)n repeats in bacteria by two-dimensional (2D) gel electrophoresis (Samadashwily et al., 1997). Stalling was observed in both orientations and was later also detected in budding yeast and mammalian cells (Pelletier et al., 2003; Voineagu et al., 2009). In contrast, stalling by (GAA)n repeats in yeast only occurs when they are on the lagging strand template (Kim et al., 2008; Krasilnikova and Mirkin, 2004; Shishkin et al., 2009), whereas stalling by (CTG)n repeats is significantly weaker and is orientation-independent (Pelletier et al., 2003; Viterbo et al., 2016). More recent evidence of perturbed replication in patient-derived cells seem inconsistent with findings in yeast. Fiber labelling of individual replication forks in the CGG-expanded FMR1 locus from Fragile X syndrome cells revealed very little stalling (Gerhardt et al., 2014). Interestingly, replication forks progressed in either direction in cells from unaffected individuals, whereas almost all forks in patient cells replicated (CGG)n as the leading strand template. Similar experiments with cells from Friedreich’s ataxia patients showed pronounced stalling in the GAA-expanded FXN locus. Fork directionality was also altered, positioning the (GAA)n repeats on the leading strand template (Gerhardt et al., 2016), which is the exact opposite orientation that generates stalls in budding yeast (Kim et al., 2008; Krasilnikova and Mirkin, 2004; Shishkin et al., 2009). The reasons for these discrepancies are unclear. Furthermore, the underlying mechanism of repeat-induced stalling is poorly defined. Stalling could be induced indirectly, for example by DNA-bound proteins or R-loops. In the case of (CTG)n repeats, stalling was suggested to be driven by binding of mismatch repair factors to mismatched hairpins (Viterbo et al., 2016). This raises the question of whether the DNA template by itself is sufficient to stall the replisome. If so, which sequences stall and what is the underlying mechanism? Finally, how does the replisome recover from such blocks?
Studies of repeat replication in vitro have thus far been limited to primer extension assays and have shown that polymerases are impeded by (CGG)n, (CTG)n and (GAA)n repeats (Delagoutte et al., 2008; Gacy et al., 1998; Kang et al., 1995; Murat et al., 2020; Ohshima and Wells, 1997; Usdin and Woodford, 1995). Most studies employed bacterial or viral polymerases, with very little work done with all three eukaryotic replicative polymerases. One study compared yeast pol δ with human pols α and ε, all of which were stalled by (CGG)7 (Kamath-Loeb et al., 2001). One limitation of such assays is the use of ssDNA templates that are pre-folded into structures. Whether sufficient ssDNA could be exposed for structures to form during unperturbed coupled leading strand synthesis is unknown. Another caveat is the lack of any additional replisome components. Reconstituted E. coli replisomes are not affected by (CTG)n repeats but are stalled by short (CCG)n repeats and inverted repeats (Lai et al., 2016; Le et al., 2015). To date, studies of repeat replication with reconstituted eukaryotic replisomes are lacking.
In this study we set out to determine the molecular events that transpire when the eukaryotic replisome encounters repetitive templates. Using reconstituted replisomes assembled from purified budding yeast proteins, we found that certain repeats induce leading strand stalling. Since these experiments lack components from other pathways, they indicate that DNA alone can cause replication fork stalling. We tested a wide range of mono-, di- and trinucleotide repeats and found that stalling correlated most with structure-forming capacity. Mechanistically, the CMG helicase was able to continue unwinding but synthesis was inhibited, resulting in helicase-polymerase uncoupling, thereby resembling events induced by a leading strand DNA lesion. We found that the two major replicative polymerases, pols δ and ε, exhibit different inherent capacities to synthesize through hairpin-forming repeats and uncovered a role for pol δ in rescuing DNA-induced leading strand stalling. Moreover, fork recovery mechanisms differed by the type of secondary structure that repeats can form. Leading strand synthesis through hairpin-forming repeats was modulated by various replisome-intrinsic aspects, including the presence of pol δ, synthesis rate by pol ε and levels of dNTPs. In contrast, quadruplex-forming repeats were not affected by any of these factors, but instead required the extrinsic accessory helicase Pif1 for efficient replication. Altogether, these results provide a mechanistic understanding of how the eukaryotic replisome copes with challenging repetitive templates and highlights certain sequences as an important potential source of endogenous replication stress.
RESULTS
(CGG)n repeats induce leading strand stalling
To investigate the effect of repeats on the eukaryotic replisome, we constructed a set of substrates for in vitro replication assays, whereby eukaryotic replisomes are assembled using purified budding yeast proteins (Yeeles et al., 2015; Yeeles et al., 2017). (CTG)n, (GAA)n and (CGG)n repeats were cloned 3 kb downstream of the replication origin (Fig. 1A) of a 9.8 kb substrate that supports origin-specific replication initiation (Taylor and Yeeles, 2018). Short oligonucleotides were used for initial cloning, followed by a PCR-free approach which involved iterative steps of controlled expansion of repeats to yield substrates with up to 161 uninterrupted repeats (Scior et al., 2011). Given the potentially unstable nature of certain repeats during propagation in bacteria, we validated that our final preparations contained the correct insert size and sequence (Fig. S1). Since replication initiates from a defined position, we can assign which sequences serve as the leading and lagging strand templates. When describing insert sequences throughout this manuscript, we refer to sequences that reside on the leading strand template. For example, the (CGG)61 substrate contains 61 CGG repeats on the leading strand template, and therefore 61 CCG repeats on the lagging strand template.
To avoid the confounding effects of two replication forks converging on a circular template, we first performed reactions on linear templates. Plasmids were linearised with a restriction enzyme (AhdI) such that the replication origin was positioned 1.5 kb from one end, and 8.2 kb from the other, with the repeats located within the 8.2 kb fragment. Enzymes required for Okazaki fragment maturation were omitted to simplify analysis. As expected, analysis of the control replication reaction by denaturing alkaline gel electrophoresis produced three main products: the leftward moving 1.5 kb leading strand, the rightward moving 8.2 kb leading strand, and a heterogeneous population of smaller unligated lagging strand Okazaki fragments (Fig. 1B, lane 1). Replication of substrates containing (CTG)161 or (GAA)161 did not differ from the empty vector control (Fig. 1B, lanes 1-3). However, a very faint 3kb stall band was reproducibly detected with (CGG)161 (Fig. 1B, lane 4). The intensity of this stall was increased when reactions were performed without pol δ (Fig. 1C), suggesting a role for pol δ in preventing or rescuing leading strand stalls induced by (CGG)161. Since these experiments lack components from other pathways, we conclude that the DNA template itself can induce fork stalling, and that this is modulated by polymerase usage.
Stalling threshold is 17 (CGG)n repeats and is orientation-dependent
To establish the threshold for (CGG)n stalling we replicated a set of substrates with increasing repeat units in the absence of pol δ. This revealed that as few as 17 repeats were sufficient to induce some stalling, which was further enhanced with 21 and 41 repeats, and saturated with 61 repeats or more (Fig. 1D). Similar results were obtained with circular plasmids in the presence of topoisomerase I (Fig. S2A, B, C), indicating that stalling is neither promoted nor prevented by a topologically closed template or by topoisomerase activity. When compared to a stall driven by a site-specific leading strand DNA lesion (a cyclobutane pyrimidine dimer; CPD), even the longest (CGG)n inserts produced a partial stall, also evident by the larger proportion of full length 8.2 kb products (Fig. 1D, compare lanes 9 and 10). Consistent with the accumulation of stalled forks, large replication intermediates were observed by native gel electrophoresis (Fig. S2D), mirroring the pattern seen by alkaline denaturing analysis. We note that (CGG)n inserts containing 81 repeat units or more were not completely stable in bacteria (Fig. S1B, lanes 7-10, seen as smearing below the main band). We therefore chose to use (CGG)61 in all subsequent experiments as it drove maximal stalling but was genetically stable.
If, as suggested by genetic evidence, the orientation of repeats relative to replication origins plays a role, one might expect to observe a difference in stalling as a function of orientation. To test this idea, we reversed the orientation of these repeats to yield (CAG)n, (TTC)n and (CCG)n templates. While we were able to clone (CAG)161 and (TTC)161, we were only able to obtain stable clones of up to 61 CCG repeat units, as longer CCG repeats are unstable in this orientation in bacteria (Hirst and White, 1998; Shimizu et al., 1996). Nonetheless, in contrast to (CGG)n templates, replication of all (CCG)n substrates produced no detectable stalls (Fig. 1E), even when compared side-by-side (Fig. S2E). Replication of (CAG)161 and (TTC)161 produced no stalling with either linear or circular templates (Fig. S2F, G). In summary, as many as 161 (CTG)n or (GAA)n repeats do not induce stalling in either orientation, whereas 17 (CGG)n repeats or more stall replication forks, but only when positioned on the leading strand template.
Short (CG)n repeats also induce leading strand stalling
The fact that (CGG)n produced a stall, yet other trinucleotide repeats did not, suggested that it is not simply their repetitive nature that causes a stall. We considered the possibility that stalling is caused by DNA secondary structures. While all (CNG)n repeats can fold into hairpins, the thermal stability of (CGG)n hairpins is significantly higher (Gacy and McMurray, 1998), possibly explaining the stalling observed only with (CGG)n. This raises the prediction that other G-rich hairpin-forming repeats may also stall the replisome. To test this, we cloned and replicated a range of dinucleotide repeats. Of these, stalling was only observed with (CG)n repeats (Fig. 2A and Fig. S3A), which are indeed G-rich and form hairpins in solution (Murat et al., 2020). Relative to (CGG)n, much shorter stretches of (CG)n dinucleotides produced a strong stall (Fig. 2A), with a lower threshold of only 10 repeat units. Similar to that observed with (CGG)n templates, analysis of (CG)n replication products on a native gel revealed accumulation of replication intermediates (Fig. S3B) and the stalling threshold was similar with circular templates (Fig. S3C). Therefore, these results provide further evidence that hairpin-forming repeats can stall the replisome. To further support this interpretation, we generated scrambled sequences with the same length, base pair composition and strand bias as (CGG)21 or (CG)24. For each repeat type we chose two randomly generated sequences which contain minimal stretches of consecutive CG or CGG repeats, thereby interrupting continuous base-pairing within the predicted hairpins. All of the scrambled sequences were replicated without any stalling (Figure 2B, C). Altogether, these results indicate that the nucleotide composition and strand bias of (CGG)n and (CG)n repeats do not account for their ability to stall leading strand synthesis. Rather, stalling is most consistent with their structure-forming potential.
The replisome is affected by quadruplex-forming homopolymers
The leading strand stalling we observed correlated with the ability of sequences to fold into hairpin structures. We reasoned that repeats that form other types of DNA secondary structures may also impede replication. We therefore tested the effect of guanine and cytosine homopolymers, which can fold into a G4 or i-motif, respectively (Murat et al., 2020). Leading strand stalls were indeed observed, with a threshold of 20 and 30 repeat units for (G)n and (C)n, respectively (Fig. 2D, E). This difference in threshold was also seen when compared side-by-side within the same experiment (Fig. S3D) and was maintained on circular templates (Fig. S3E, F). In contrast, stretches of over 200 consecutive adenine or thymine residues, which are not predicted to form stable secondary structures, did not cause a significant stall (Fig. 2F). Altogether, we conclude that hairpin- and quadruplex-forming repeats can stall the replisome.
Pol δ drives recovery from hairpin-forming, but not quadruplex-forming repeats
Our results thus far highlight four different types of repeats that induce leading strand stalling – (CGG)n, (CG)n, (C)n and (G)n. Given our initial observation that pol δ can assist replication through (CGG)161 (Fig. 1B, C), we next asked whether this holds true for the other sequences. While replication of (CGG)61 and (CG)24 was improved by the presence of pol δ, stalling by (C)50 and (G)50 was essentially unaffected (Fig. 3A and Fig. S3G). Thus, the ability of pol δ to synthesise past these sequences correlates with the type of secondary structure that they can form.
To assess if stalling is terminal or transient we performed pulse-chase experiments, in which nascent DNA was labelled with dATP for the first 10 minutes and chased with excess unlabelled dATP, allowing us to follow the fate of forks labelled within the pulse without detection of new initiation events. In the absence of pol δ, stalling by (CGG)61 was persistent for at least two hours, indicating that pols α and ε are unable to resolve this stall (Fig. 3B). In contrast, in the presence of pol δ stalling at the earliest time point was weaker, gradually resolved over time, and was barely discernible by 40 minutes (Fig. 3C). A similar pattern was observed with (CG)24 (Fig. 3D, E). These results indicate that pol δ does not prevent the formation of stalls induced by (CGG)n and (CG)n but rather resolves them. Pulse chase experiments with (C)50 and (G)50 revealed persistent stalling regardless of the presence of pol δ (Fig. S4), further supporting our earlier observation that pol δ cannot support replication through these two sequences (Fig. 3A). In summary, hairpin-forming sequences induce persistent stalls in the absence of pol δ, but these are resolved over time when pol δ is present. In contrast, G4- and i-motif-forming sequences generate persistent stalls that cannot be resolved by pol δ.
The ability of pol δ to rescue certain leading strand stalls could either require its continued presence within the replisome during stalling or could occur behind the fork. To test whether pol δ could rescue pre-existing stalls we carried out pulse-chase experiments in which stalls were pre-formed during the pulse, and pol δ was only introduced in the chase. Stalling at (CGG)61 and (CG)24 was evident after 10 min and remained unaltered in the absence of pol δ (Fig. 3F, lanes 4 vs 5 and 7 vs 8). However, pol δ was able to resolve most of these stalls within 10 min (Fig. 3F, compare lanes 5 vs 6 and 8 vs 9). We note that we have observed some degree of variation in pol δ-dependent rescue efficiency, with synthesis through (CG)24 typically being less efficient than (CGG)61. Leading strand rescue by pol δ was largely dependent on RFC/PCNA (Fig. S5), suggesting that PCNA is either retained or reloaded on the leading strand template after stalling occurs. We conclude that pol δ can rescue pre-existing leading strand stalls in a PCNA-dependent fashion.
DNA-induced stalls trigger helicase-polymerase uncoupling
Replication forks could either stall due to impaired unwinding by the CMG helicase or inhibition of synthesis by pol ε, which would trigger uncoupled unwinding downstream of the stall. The fact that pol δ could rescue pre-existing stalls (Fig. 3F) supports the latter, as it strongly argues for the presence of a free primer-template junction and an available exposed template downstream. Previous work revealed that repriming past a leading strand CPD by pol α is inefficient, and that an exogenously added primer allows resumption of leading strand synthesis (Taylor and Yeeles, 2018). Primer annealing only occurs if ssDNA is exposed, thereby serving as an indirect measure of uncoupled CMG unwinding. We therefore asked whether a primer that anneals 265 nt downstream of the insert would promote the formation of a restart product. Indeed, addition of this primer, but not a scrambled control primer, led to the appearance of a 5 kb restart product for all four stall-forming repeats, to an extent similar to that seen with a leading strand CPD template (Fig. 4A). This result strongly suggests that stalling is not a consequence of CMG arrest, but is rather due to lack of synthesis by pol ε. Interestingly, while pol δ resolved the 3 kb stall products induced by (CGG)61 and (CG)24, 5 kb restart products were still evident (Fig. 4B, lanes 8 and 9). Therefore, CMG continued to unwind at least 265 nt beyond the repeats in both cases. Thus, although pol δ can resolve certain leading strand stalls, it cannot completely prevent uncoupling.
Additional evidence for helicase-polymerase uncoupling was seen upon closer inspection of replication products analysed on native gels, whereby faster migrating species accumulated. These species were previously shown to correspond to uncoupled products, in which CMG has unwound to the end of the template but without any synthesis (Taylor and Yeeles, 2018). This was especially clear with the (CG)n templates, where uncoupled products accumulated at levels similar to those observed with a CPD containing template (Fig. S3B). To increase the fraction of uncoupled products, we truncated substrates with EcoRV so that CMG has to unwind only 1.6 kb beyond the insert rather than 5 kb. When analysed on a native gel, uncoupled products were observed for all four classes of sequences (Fig. 4C), but were not observed for (CGG)61 and (CG)24 when pol δ was present (Fig. 4D), which is consistent with resumption of synthesis. Altogether, these results show that structure-forming repeats can trigger helicase-polymerase uncoupling and that pol δ limits the extent of uncoupling by rescuing leading strand synthesis at (CGG)61 and (CG)24, but not at (C)50 or (G)50.
Read-through of (CGG)n and (CG)n is facilitated by pol ε variants or elevated dNTPs
The observation that pol ε could not synthesise past (CGG)n or (CG)n, yet pol δ could, may be explained by their different enzymatic properties. More specifically, the weak strand displacement activity of pol ε relative to pol δ might preclude it from coping with hairpin-forming repeats. This activity can be mildly enhanced by inactivating the exonuclease domain of pol ε (Ganai et al., 2016). In addition, modelling of the most frequent cancer-associated pol ε mutation (P286R) in budding yeast (P301R) revealed a hyperactive enzyme in which DNA entry into the exonuclease domain is blocked, allowing it to synthesise past a hairpin structure more efficiently than an exonuclease-dead mutant (Parkash et al., 2019; Xing et al., 2019). We therefore wondered whether these pol ε variants might be able to resolve leading strand stalls even in the absence of pol δ. Leading strand stalls induced by either (CGG)61 or (CG)24 were significantly weaker in reactions carried out with pol ε P301R (Fig. 5A, compare lanes 1,2 vs 3,4 and 9,10 vs 11,12), while an exonuclease-dead pol ε produced an intermediate phenotype. Neither of these pol ε variants were able to replicate past (G)50 or (C)50 (Fig. S6A) and similar results were obtained when pol δ was present (Fig. S6B). Pol ε P301R was able to rescue pre-existing stalls produced by WT pol ε (Fig. S6C), and this was largely dependent on RFC/PCNA (Fig. S6D). These observations are almost identical to those obtained with pol δ (Fig. 3F and S5), suggesting that pol ε P301R and pol δ employ a similar mechanism to rescue leading strand stalls.
Inactivation of the exonuclease domain of pol ε shifts the balance from proofreading to synthesis, leading to an overall increase in synthesis rate. Other factors that enhance synthesis rate could also play a role. We therefore asked whether increased dNTPs could ameliorate DNA-induced stalling. We performed pulse-chase experiments in which dATP was the labelled nucleotide, and chased with either excess unlabelled dATP alone, or an excess of all four dNTPs (raised from 30 μM to 400 μM). In the absence of pol δ, elevated dNTPs significantly improved replication past (CGG)61 but not (CG)24 (Fig. 5B, compare lanes 5 vs 6 and 8 vs 9). In the presence of pol δ, excess dNTPs also improved synthesis past (CG)24 (Fig. 5B, compare lanes 17 vs 18). However, there was no effect on replication of (G)50 or (C)50, regardless of pol δ (Fig. S7A). Thus, increased concentrations of dNTPs improve the ability of both replicative polymerases to resolve stalls induced by hairpin-forming repeats. Combined with the results obtained with pol δ and pol ε variants, we conclude that the replisome can cope with hairpin-forming repeats by a variety of replisome-intrinsic mechanisms.
Pif1 resolves DNA-induced stalls
In contrast to hairpin-forming repeats, none of the conditions or enzyme variants we tried thus far allowed the replisome to cope with stalls induced by (G)50 and (C)50, both of which can form quadruplex structures. We considered that the ssDNA binding protein RPA may play a protective role as it has been demonstrated to unfold G4 structures (Fan et al., 2009; Ray et al., 2013; Salas et al., 2006). However, stalled products were observed across a broad range of RPA concentrations (10-200 nM) with all tested sequences, regardless of pol δ (Fig. S7B, C). Therefore, in this context, RPA does not prevent or resolve DNA-induced leading strand stalls.
Several accessory helicases have been implicated in replication of repetitive or structure-prone DNA (Anand et al., 2012; Sauer and Paeschke, 2017). In budding yeast, Pif1 has been shown to play an important role in allowing efficient replication past G4 sequences in vivo (Dahan et al., 2018; Lopes et al., 2011; Paeschke et al., 2011) and in vitro (Byrd et al., 2018; Maestroni et al., 2020; Paeschke et al., 2013; Ribeyre et al., 2009; Sparks et al., 2019b). We therefore assayed the ability of purified Pif1 to rescue DNA-induced stalled forks. Strikingly, not only was Pif1 able to fully rescue replication past (G)50, it also accelerated replication through all of the other sequences (Fig. 6A). Importantly, an ATPase active site mutant of Pif1 (K264A) which cannot unwind DNA (Fig. S8A), was unable to perform any of these tasks (Fig. 6), indicating an essential requirement for its helicase motor function. For comparison, we also tested the nuclease-helicase Dna2, but found it had no effect on DNA-induced stalling despite showing robust nuclease activity (Fig. S8B, C). Pif1 was previously shown to directly bind PCNA (Buzovetsky et al., 2017) and to collaborate with pol δ and PCNA in break induced replication (BIR) (Saini et al., 2013; Wilson et al., 2013) and in stimulating strand displacement during lagging strand maturation (Koc et al., 2016; Osmundson et al., 2017; Rossi et al., 2008). However, our results show that the ability of Pif1 to resolve DNA-induced stalls is distinct from these functions, as it did not require pol δ (Fig. 6B) or PCNA (Fig. S8D). Altogether, we conclude that Pif1 is a general-purpose accessory helicase that accelerates recovery from a variety of leading strand DNA-induced stalls.
DISCUSSION
We have reconstituted repeat replication with eukaryotic replisomes and have found that DNA alone is sufficient to cause significant leading strand stalling. Therefore, certain DNA sequences are an important source of endogenous replication stress. Mechanistically, stalling induced by DNA repeats and leading strand DNA lesions is similar – CMG unwinding is unaffected and inhibition of synthesis triggers helicase-polymerase uncoupling. Furthermore, we demonstrate that the two major replicative polymerases exhibit different inherent capacities to cope with repetitive templates, with pol δ showing more robust activity than pol ε, allowing it to rescue leading strand stalls caused by hairpin-forming repeats. The replisome could recover from stalls induced by hairpin-forming sequences by employing a variety of replisome-intrinsic mechanisms, including pol δ, hyperactive pol ε or elevated dNTPs. In contrast, stalls induced by quadruplex-forming sequences required extrinsic support, revealing a general role for the Pif1 helicase in accelerating recovery from a variety of DNA-induced stalls. These results invoke several interesting and important questions, including the root cause of stalling and the emergence of different recovery mechanisms.
It is evident that only certain sequences induce leading strand stalling, yet the underlying reason is unclear. Our results show stalling cannot be easily explained by the repetitive nature of sequences, their base pair composition or their strand bias. Rather, fork stalling is best correlated with the ability of sequences to fold into stable DNA secondary structures. Although (CGG)n repeats have been shown to fold into a G4 structure (Fry and Loeb, 1994) or Z-DNA (Renčiuk et al., 2011) in vitro, this only occurs under non-physiological conditions (Amrane and Mergny, 2006; Fojtík et al., 2004). Thus, hairpins are the most physiologically likely structures formed by (CGG)n, with stretches of over 12 repeats suggested to form branched hairpins (Amrane and Mergny, 2006). The stall threshold we observed (n=17) was surprisingly low, meaning that in most normal FMR1 alleles (n=5-63) local uncoupling may occur, providing a plausible mechanism for small scale expansions. Given that even short (CGG)n repeats are inherently difficult to replicate, it would be tempting to speculate that Pif1 and pol δ become more important in efficient and accurate replication of expanded Fragile X syndrome alleles.
Out of all the sequences we tested, stalling by relatively short (CG)n repeats exhibited the highest proportion of stalled forks. Recent NMR analysis shows that in solution (CG)n repeats form hairpins (Murat et al., 2020). Although (CG)n repeats could in theory also form cruciforms ahead of the fork, this does not happen even in negatively supercoiled plasmids (Singleton et al., 1982) because CG-rich DNA inhibits cruciform nucleation (Sinden, 1994). Interestingly, (CG)n repeats are extremely rare - not only in the human genome, but across the entire tree of life - constituting less than 1% of all dinucleotides in most species (Srivastava et al., 2019). Methylation of cytosine within CpG increases its rate of deamination, resulting in C to T transitions. This has been proposed as the main evolutionary mechanism for genomic suppression of (CG)n dinucleotides (Pfeifer, 2006). However, trinucleotides such as (CGG)n do not show such remarkable genomic depletion, despite harbouring the same CpG sequences. This suggests that (CG)n sequences undergo negative selection. We propose that the capacity of (CG)n to efficiently stall replication serves as a selective force that leads to their genomic suppression. Nonetheless, the human reference genome contains over 50 loci with 10 or more consecutive (CG)n repeats, with the longest one being a single instance of (CG)14. Since the minimal stalling threshold we observed was 10 repeats, these genomic loci may be hot spots of fork stalling and uncoupling and may be more dependent on the activities of pol δ and Pif1.
The lack of stalling by (GAA)n repeats may seem unexpected, as these repeats induce robust stalling in vivo in multiple organisms (Gerhardt et al., 2016; Kim et al., 2008; Krasilnikova and Mirkin, 2004; Shishkin et al., 2009). However, the fact that stalling is observed in opposite orientations in yeast and human patient derived cells strongly points to additional factors being involved. One possible factor could be sequence context. Analysis of SV40-based (GAA)n plasmids by electron microscopy revealed the formation of unusual fork structures such as reversed forks (Follonier et al., 2013). Interestingly, in this context only weak and transient stalling was observed. Triplex structures were also observed, and these formed between the (GAA)n repeats and other GA-rich regions within the plasmid. This is in agreement with earlier studies showing that in bacteria, two tracts of (GAA)n can form triplexes, whereas a single tract cannot (Vetcher et al., 2002). It is therefore possible that our substrates lack a sufficiently long second GA-rich array to serve as a dsDNA donor. An alternative explanation was raised by a recent study carried out in DT40 cells, where replication stalling by relatively short (GAA)n tracts was suggested to occur due to R-loops (Šviković et al., 2019). Altogether, we conclude that within our experimental conditions, (GAA)n repeats by themselves do not cause significant leading strand stalling.
Our results with guanine homopolymers are consistent with previous analysis of the effects of G4 forming sequences on replication and the role of Pif1 in resolving stalling (Paeschke et al., 2011). While past work supports the idea that G4 structures impede replication, the evidence is conflicting with regards to the effect of their orientation relative to replication origins. Loss of epigenetic information in avian DT40 cells due to uncoupling can be induced by a single G4 forming sequence, but only when positioned on the leading strand template (Sarkies et al., 2010). Similarly, genetic instability of G4-forming human minisatellites in budding yeast is only induced when the G-rich strand is positioned on the leading strand template (Lopes et al., 2011). In contrast, live cell imaging of fluorescent arrays in budding yeast detected delays in replisome progression only when G4 sequences were positioned on the lagging strand template (Dahan et al., 2018). Our results show that cytosine homopolymers also induce leading strand stalling and NMR spectroscopy analysis directly demonstrated that (C)22 forms an i-motif (Murat et al., 2020). It is therefore possible that for some G4-forming sequences the C-rich strand produces a stall due to an i-motif structure, whereas in other cases the G-rich strand does so due to a G4 structure.
In the context of replication, DNA secondary structures could either form ahead of the fork or behind the fork. Our current working model is that DNA secondary structures form behind CMG, since we observe efficient uncoupling. However, we cannot rule out the possibility that our substrates contain pre-existing structures, and that these are bypassed intact by CMG. Recent work in Xenopus egg extracts revealed that CMG is able to bypass a large protein cross-linked to the leading strand template (Sparks et al., 2019a), although this required generation of ssDNA downstream by the accessory helicase Rtel1, and may require additional factors. Single molecule studies revealed that yeast CMG possesses an Mcm10-dependent “gate” that allows it to transition from ssDNA to dsDNA (Wasserman et al., 2019), which could perhaps allow it to bypass certain structures on the leading strand. Alternatively, CMG may unwind and dismantle pre-existing structures. While the capacity of CMG to unwind various DNA secondary structures is unknown, our results imply that if CMG does unwind structures, these must form again behind it to inhibit synthesis. But how could structures arise on the leading strand if unwinding and synthesis are coupled? Although pol ε directly binds CMG, its catalytic domain is tethered via a flexible linker (Zhou et al., 2017). This raises the possibility that stochastic disengagement of pol ε from the leading strand template leads to local uncoupling and exposure of short stretches of ssDNA, thereby allowing structures to form. However, fork stalling induced by (CG)24 was extensive, which would require such a stochastic event to be very frequent. Another option is that structures form on the ssDNA stretch that runs between the exit channel of CMG and the active site of pol ε. At present there is no exact information on the length of exposed leading strand template during coupled synthesis. Current estimates are at least 16 nt, based on a recent structure of pol ε bound to CMG (Yuan et al., 2020). Importantly, the minimum length required to form a three stacked G4 or i-motif is 15 nt, whereas hairpins could nucleate from even shorter sequences. Very recent super resolution imaging of individual replication forks in human cells have detected G4 structures behind CMG, but not in front of it (Lee et al., 2021), providing support for our model that structures form as a consequence of replication.
We have discovered that the replisome can intrinsically resolve stalls induced by hairpin-forming sequences through multiple mechanisms, with pol δ playing a major role. In contrast, stalls induced by quadruplex-forming sequences require the extrinsic support of the accessory helicase Pif1. Our results are in strong agreement with a recent high-throughput primer extension assay that tested the ability of T7 polymerase to extend through all possible 1-6 nt long repeats, as well as a large library of hairpin, G4 and i-motif sequences (Murat et al., 2020). Synthesis by this model polymerase gradually progressed through hairpins, with more stable hairpins taking longer to resolve, but was terminally stalled at either G4s or i-motifs. It thus seems that quadruplexes are a more robust block to synthesis by many polymerases. In contrast, we found that the two major eukaryotic replicative polymerases exhibit varying intrinsic capacities to synthesise through hairpins. The strand displacement activity of pol δ most likely evolved for the purpose of Okazaki fragment maturation. However, this comes with the added benefit of allowing pol δ to rescue leading strand stalls caused by hairpin-forming sequences.
Replication fork uncoupling leads to exposure of ssDNA on the leading strand template, threatening genetic and epigenetic stability. It is therefore essential to minimize these events. Although pol δ was able to resume synthesis of hairpin-forming repeats on the leading strand, local uncoupling was not completely prevented. Several types of DNA lesions on the leading strand template induce events similar to those we observed here, including inhibition of synthesis and uncoupling of synthesis from unwinding. Interestingly, similar to its ability to synthesise past hairpin-forming sequences, pol δ could also rescue leading strand synthesis past 8-oxoguanine and thymine glycol (Guilliam and Yeeles, 2021). In contrast, replication past an abasic site or a CPD could not be carried out by any of the replicative polymerases (Taylor and Yeeles, 2018, 2019). However, translesion synthesis by pol η could perform synthesis past a CPD (Guilliam and Yeeles, 2020). This requirement of an external factor is very much akin to the role of Pif1 in rescuing replication of quadruplex-forming sequences. Thus, the molecular events that underlie DNA-induced stalling could be mechanistically analogous to those induced by leading strand DNA lesions, exhibiting both intrinsic and extrinsic recovery pathways.
The exposure of ssDNA by uncoupled CMG generates a checkpoint response by binding of RPA and subsequent activation of the ATR kinase. Drugs that induce replication stress, such as hydroxyurea and aphidicolin, as well as ATR inhibition, induce DNA double strand breaks in non-random locations across the genome. Genome wide studies have identified structure forming repeats, inverted repeats, quasi-palindromes (Shastri et al., 2018) and short poly(dA:dT) homopolymers (Tubbs et al., 2018) as hot spots for breakage. These observations are broadly in agreement with our findings and strengthen the notion that the ability of certain sequences to interfere with replication is dictated by their structure-forming potential. In addition, recent work revealed that in tumours which exhibit Microsatellite Instability (MSI), large scale expansions of (AT)n repeats generate genome-wide cleavage sites for the Mus81-Eme1 nuclease. Furthermore, the WRN helicase plays a major role in resolving such structures and preventing breakage (van Wietmarschen et al., 2020). Despite the clear ability of G4s and i-motifs to stall replication in our experiments, these were not identified as hot spots in such studies. One possibility is that Pif1 and other related helicases such as FancJ and Rtel1 resolve these structures very efficiently, thereby preventing uncoupling and breakage. Another possibility is that these structures are refractory to cleavage by structure-specific endonucleases. Nonetheless, it would be tempting to speculate that repetitive sequences that can inherently stall replication, such as those identified in this study, are also hotspots for fork collapse or breakage.
In summary, we have shown that repetitive DNA is an important potential source of endogenous replication stress and have revealed how the eukaryotic replisome is able to cope with difficult-to-replicate sequences. The response of the replisome to certain repetitive sequences is mechanistically similar to events driven by leading strand DNA lesions. We therefore propose that repetitive sequences can also induce the checkpoint response to replication stress. Thus, our broad knowledge and understanding of the cellular response to replication stress and DNA damaging agents may now be extended to encompass DNA-induced replication stalling.
AUTHOR CONTRIBUTIONS
Methodology and Investigation, C.S.C.D, M.D.M, S.W and G.C; Conceptualization, supervision, writing and funding acquisition, G.C.
DECLARATION OF INTERESTS
The authors declare no competing interests.
STAR METHODS
Cloning
All replication templates are based on the 9.8 kb pZN3 plasmid (Taylor and Yeeles, 2018), in which a new linker was inserted 3 kb downstream from the ARS306 origin, yielding pGC504. Repeats were cloned step-wise using a previously described method for expansion of repeats (Scior et al., 2011). Briefly, repeats were first cloned using annealed oligonucleotides. For the first expansion step, annealed duplexes were used as a source of insert. In subsequent steps, each resulting template was used both as a source of insert and as a target vector. The use of type IIS restriction enzymes (BsaI and Esp3I) allowed seamless cloning of uninterrupted repeats. Because of the unstable nature of some repeats, we first cloned repeats into a pSMART derivative in which we removed a BsaI site and introduced a new linker. Although this vector has been designed to better support unstable inserts, we found that repeats were overall more stable in the pZN3 backbone. We therefore removed two BsaI sites from pGC504, to generate pGC542, and from that point onward cloned all repeats directly into pGC542. To clone repeats in the reverse orientation we replaced the linker in pGC542 so that the PacI and NotI sites were reversed, yielding pGC558. See Table 1 for complete annotation of all plasmids used and generated in this study and Table 2 for a list of oligonucleotides.
Protein expression and purification
The expression and purification of most proteins used in this study have been described before (Baretić et al., 2020; Coster et al., 2014; Deegan et al., 2019; Douglas et al., 2018; Frigola et al., 2013; Goswami et al., 2018; Hill et al., 2020; On et al., 2014; Yeeles et al., 2015; Yeeles et al., 2017). For full details see Table 3. To generate the pol ε P301R expression strain (ySW1), a synthetic gene fragment spanning part of pol2 which contains the desired mutation (ordered as a gBlock, IDT) was cloned using HiFi assembly (New England Biolabs, E2621S) to replace the corresponding WT sequence in plasmid pAJ6, yielding plasmid pSW62. Plasmid pSW62 was linearised with Bsu36I, transformed into yeast strain yAE94 (Yeeles et al., 2015) and positive transformants were selected for on plates lacking TRP. Integration was confirmed by PCR of genomic DNA as described (Coster et al., 2014). WT and mutant pol ε variants were purified as previously described (Yeeles et al., 2015) except that yeast cultures were not synchronized.
Pif1 and Pif1 K264A were expressed and purified as described (Deegan et al., 2019) with the following modifications: imidazole concentrations were 0 mM during lysis, 15 mM during washes and 300 mM for elution. The eluate from the HIS pulldown was diluted 1:2 to reduce salt and loaded on a monoS column. Pif1 containing fractions were concentrated and loaded onto a 24 mL Superdex 200 column equilibrated in 0.15 mM NaCl. Pif1 was concentrated with a 30 kDa Amicon to 2 μM.
For Sld2 expression, pGC441 was transformed into BL21 bacteria and grown overnight in a starter culture of 250 ml LB-broth with 100 µg/ml ampicillin and 37 µg/ml chloramphenicol at 37°C. The next day 20 ml per litre starter was added to 12 litre of LB with ampicillin and chloramphenicol, incubated at 37°C until OD(600) reached 0.5, then cooled on ice for 20 min, and IPTG was added to 0.2 mM IPTG. Induction took place at 16°C overnight. Cells were harvested by centrifugation and pellets resuspended in buffer S [25 mM HEPES pH 7.6, 10% glycerol, 0.02% NP-40, 0.1% Tween, 1 mM EDTA, 1 mM DTT] + 0.5 M NaCl and protease inhibitors, incubated with 0.1 mg/ml lysozyme for 20 mins at 4°C and sonicated for 4 mins (5 sec on / 5 sec off) on ice. The lysate was cleared by centrifugation at 15,000 rpm, 15 min, 4°C using a JA-25.50 rotor. The cleared lysate was incubated 1 hour at 4°C with 2.4 ml of 20% glutathione agarose slurry pre-washed in lysis buffer. The beads were washed extensively with buffer S + 0.5 M NaCl and finally resuspended in 3 ml wash buffer with 200 μg PreScission protease and incubated on rotating wheel 2 hours at 4°C. The eluate was collected, the beads washed with 3 x 1 ml buffer S + 0.5 M NaCl and all fractions were pooled and diluted 1:2 in buffer S without salt. The sample was applied to a HiTrap SP FF 1 ml column equilibrated in buffer S + 250 mM NaCl. After washing with 20 CV of equilibration buffer Sld2 was eluted in 0.5 ml fractions with 100% buffer S + 700 mM NaCl for 10 CV. Sld2-containing fractions were pooled and concentrated with a 10 kDa Amicon to 1.1 μM.
Preparation of templates for replication assays
All plasmids were maintained in NEB Stable E. coli cells (New England Biolabs, C3040H) and purified using the HiSpeed Plasmid Maxi kits (Qiagen, 12663) from bacteria grown at 30°C to minimize loss or rearrangements of unstable inserts. We sometimes observed variability in the overall efficiency of in vitro replication between substrates, presumably due to a contaminant. This variability was eliminated by further purifying templates in batch using PlasmidSelect Xtra resin (VWR, 28-4024-02). Plasmid DNA was diluted fivefold in 3 M ammonium sulfate, added to 300 μl bead slurry pre-washed with 2.3 M ammonium sulfate and incubated for 30 min rotating. The beads were washed four times with 1.9 M ammonium sulfate. Supercoiled plasmid was eluted with 2 x 1 ml of 1.5 M ammonium sulfate. Both fractions were pooled and dialysed for 3 h and ON against 1 L of 0.1xTE buffer in the dark using in a D-Tube Dialyzer Mini, MWCO 6-8 kDa (Merck, 71504). The dialysed DNA sample was concentrated to 400 μl with a 100 kDa Amicon, precipitated with 1 ml 100% ethanol were performed at RT. All ammonium sulfate buffers contained 100 mM Tris HCl pH 7.5 and 10 mM EDTA.
In vitro replication assays
For MCM loading and phosphorylation, 3 nM plasmid DNA template was incubated with 5 mM ATP, 75 nM Cdt1/Mcm2-7, 45 nM Cdc6, 20 nM ORC, 50 nM DDK in 25 mM HEPES-KOH (pH 7.6), 100 mM potassium glutamate, 0.01% NP-40-S, 1 mM DTT, 10 mM Mg(OAc)2, 0.1 mg/ml BSA (1x reaction buffer) and 80 mM KCl at 24°C for 10 min. For replication of linearised templates 0.6 U/μl AhdI was added during MCM loading. For truncated templates in Figure 4C and D, 2 U/μl EcoRV was also added. Loading was stopped by adding 120 nM S-CDK for further 5 min. The loaded reaction was diluted in 1x reaction buffer so that the final dilution in the replication reaction was 6-fold. To start replication the following components were added to the reaction: 200 μM CTP, GTP, UTP, 30 μM dATP, dCTP, dGTP, dTTP, 33 nM α-[33P]-dATP, 5 mM ATP, 10 nM S-CDK, 30 nM Dpb11, 100 nM GINS, 40 nM Cdc45, 20 nM Pol ε, 10 nM Mcm10, 40 nM RPA, 20 nM Csm3/Tof1, 20 nM Mrc1, 30 nM RFC, 40 nM PCNA, 10 nM TopoI (for circular reactions), 40 nM Pol α, 5 nM Pol δ (where indicated), 20 nM Sld3/7, 20 nM Sld2, and the mix was incubated at 30°C for the indicated time. For samples loaded on denaturing gels, 0.5 U/μl SmaI was added 5 min before the end of the reaction, which eliminates product length heterogeneity which stems from variable initiation sites (Taylor and Yeeles, 2018). Reactions were stopped with 100 mM EDTA. For pulse-chase experiments, unlabeled deoxyribonucleotide concentrations were adjusted during the pulse to 30 μM dCTP, dTTP, dGTP and 2.5 μM dATP, or 7.5 uM dATP for experiments without RFC/PCNA. After a 10 min pulse, the chase was performed by adding 200 μM unlabelled dATP, or for Figure 5B stopped at the indicated time point by addition of EDTA to 100 mM. For repriming experiments: oligonucleotides were added to 60 nM (molecules) before starting the replication reaction.
Post-reaction sample processing
For samples to be analysed on denaturing gels, alkaline loading dye (0.5 M NaOH, 10% sucrose, xylene cyanol in water) was added at 1/10 volume. Samples were loaded in denaturing 0.8% agarose gels run at 32 V overnight in 30 mM NaOH, 2 mM EDTA.
For reactions to be loaded on native gels, SDS (to 0.1%) and proteinase K (1/100 volumes) were added and incubated at 37°C for 20 min. The sample volume was increased to 25 ul with TE and DNA was extracted with phenol:chloroform:isoamyl alcohol 25:24:1 (Sigma-Aldrich, P2069). The extracted sample was mixed with 5x Invitrogen™ Novex™ High-Density TBE Sample Buffer and loaded on a 1% agarose / TAE gel.
Substrate preparation for helicase assays
Complementary oligonucleotides containing a 5’ overhang were resuspended to 10 µM in 10 mM Tris pH-8.0. One oligo was labelled in a reaction containing 5 pmol of DNA, 1X PNK buffer, 1U of PNK enzyme (NEB, M0201S), and γ-P32-ATP (0.03 mCi). The reaction was incubated for 30 min at 37°C and subsequently heat inactivated for 20 min at 65°C. Excess γ-P32-ATP was then cleared using a G50 column (GE healthcare, 2753002) and volume adjusted to 100 µl (=50 nM). To generate duplex DNA 1 pmol of labelled oligo was mixed with 1.5 pmol of unlabelled oligo and incubated at 90°C for 5 min in a thermal cycler. The mix was then gradually cooled down to room temperature over 2 hours. Duplex DNA was stored at −20°C.
Helicase assays
Helicase assays were carried out using 0.5 nM γ-P32-ATP labelled duplex with a 5’ overhang in buffer containing 25 mM Hepes 7.6, 2 mM MgOAc, 0.1 mg/ml BSA and 2 mM ATP. Reactions were assembled on ice, equilibrated to room temperature and the respective helicases (Pif1 or Dna2) added to 50 nM final concentration. Reactions were incubated for 30 min at 30°C and samples collected at different time points (5, 10 and 20 min). Reactions were stopped by addition of 0.5% SDS and 200 mM EDTA. The samples were supplemented with Novex Hi-Density TBE Sample buffer (ThermoFisher Scientific, LC6678) and analysed on 10% Novex TBE gels (ThermoFisher Scientific, EC62755BOX) at 150V for 1 hour in 1X TBE. Gels were dried onto filter paper, autoradiographed with phosphoscreens imaging plates (Fujifilm) and developed on a Typhoon phophorimager (GE Healthcare).
CPD Substrate
Preparation of a substrate containing site-specific DNA damage (CPD) was prepared as previously described (Taylor and Yeeles, 2018) with several modifications. An oligonucleotide containing a CPD (AflII CPD, HPLC-purified; TriLink Biotechnology) was synthesised and stored in 10 mM Tris-Hcl (pH 8.0), 1 mM EDTA at −20°C. To introduce the oligo into the plasmid of interest (pGC504), 4 x 200 µg of the relevant plasmid was cut with 15 µl (150U) of Nt.BbvCI (NEB, R0632) in a 200 µl final volume reaction at 37°C for 3 hours. The reaction was stopped by adding 50 mM EDTA. Following digestion, competitor oligonucleotide (AflII competitor, IDT) was added to 1000-fold molar excess over plasmid concentration (27 µL from 1 mM Stock). The mix was incubated at 50°C for 20 min, then transferred to 37°C and SDS added to 0.1%. After 5 min, 1/100 volumes of proteinase K (New England Biolabs P8107S) was added and incubated at 37°C for a further 15 min. All tubes were then pooled and the gapped plasmid purified. Excess oligo was separated from gapped plasmid using High prep PCR magnetic beads (Magbio, AC-60050) with a ratio of 1.8 µl of bead slurry/µl of sample and binding for 30 min at room temperature. Bound fractions were washed 3 times with a mix containing 70% EtOH and 0.02% NP-40 and then eluted in 1X TE. DNA was pooled and concentration measured. This step usually yielded around 60% of input material.
100 µg of gapped plasmid was collected per oligonucleotide ligation. Complementary oligonucleotide containing a CPD (AflII CPD) was added at a 20-fold molar excess and incubated at 50°C for 15 min before gradually letting it cool down to room temperature. 100 µg of DNA was ligated in 1X T4 DNA ligase buffer (NEB: B0202S) and T4 ligase (100U/µg) (NEB: M0202M) plus 2 mM Mg(OAc)2 overnight at 16°C in the dark. The following day, SDS (to 0.1%) and proteinase K (1/100 volumes) were added and incubated at 37°C for 20 min. The ligated plasmid was then subjected to CsCl gradients as in (Taylor and Yeeles, 2018) to specifically purify fully ligated supercoiled substrates. Following the CsCl gradient DNA was dialyzed against two changes of 2 L TE over 16 h total in a D-Tube Dialyzer Midi, MWCO 6-8 kDa (Merck 71507) at 4°C in the dark to remove all traces of CsCl. The DNA was collected and subjected to ethanol precipitation using 0.3 M NaCl + 2.8 volumes ice cold 100% ethanol in dry ice. The pellet was harvested, washed with room temperature 70% ethanol, harvested, air-dried and resuspended in 50 μL TE. As a control, the exact same procedure was also carried out with an undamaged oligo (AflII undamaged) and the resulting template replicated in the same manner as the parental template, indicating that the observed stalling was induced by the CPD and not due to the process itself.
Analytical digestion of substrates
Substrates of interest were subjected to enzymatic digestion to verify the length of the repetitive sequence. Briefly, 100 ng of plasmid was digested with 0.5U of NotI (NEB, R0189L) and PacI (NEB, R0547L) in 1X Cutsmart buffer (NEB, B7204S) at 37°C for 30 min. Reactions were stopped by adding 50 mM EDTA. The samples were then supplemented with Novex Hi-Density TBE Sample buffer (ThermoFisher Scientific, LC6678) and analysed on 10% Novex TBE gels (ThermoFisher Scientific, EC62755BOX) at 150V for 1 hour in 1X TBE. Gels were then stained with SYBR™ Gold Nucleic Acid Gel Stain (Invitrogen, S11494) for 20 min at room temperature in the dark and imaged on a Typhoon phophorimager (GE Healthcare).
ACKNOWLEDGMENTS
This work was funded by a Wellcome Trust and Royal Society Sir Henry Dale fellowship (210470/Z/18/Z) as well as internal funding from the Institute of Cancer Research. We would like to thank Max Douglas for reagents, experimental support and critical reading of the manuscript. We would also like to thank Jonathon Pines, Wojciech Niedzwiedz, Sebastian Guettler, Marco Di Antonio, Christian Zierhut and Allison McClure for critical reading of the manuscript.