Abstract
Faithful DNA replication is crucial for viability of cells across all kingdoms of life. Targeting DNA replication is a viable strategy for inhibition of bacterial pathogens. Clostridioides difficile is an important enteropathogen that causes potentially fatal intestinal inflammation. Knowledge about DNA replication in this organism is limited and no data is available on the very first steps of DNA replication. Here, we use a combination of in silico predictions and in vitro experiments to demonstrate that C. difficile employs a bipartite origin of replication that shows DnaA-dependent melting at oriC2, located in the dnaA-dnaN intergenic region. Analysis of putative origins of replication in different clostridia suggests that the main features of the origin architecture are conserved. This study is the first to characterize aspects of the origin region of C. difficile and contributes to our understanding of the initiation of DNA replication in clostridia.
1. Introduction
Clostridioides difficile (formerly Clostridium difficile) (Lawson et al., 2016) is a Grampositive anaerobic bacterium. C. difficile infections (CDI) can occur in individuals with a disturbed microbiota and is one of the main causes of hospital associated diarrhea, but can also be found in the environment (Smits et al., 2016). The incidence of CDI has increased worldwide since the beginning of the century (Smits et al., 2016; Warriner et al., 2017). Consequently, the interest in the physiology of the bacterium has increased in order to understand its interaction with the host and the environment and to explore news pathways for intervention (van Eijk et al., 2017; Crobach et al., 2018).
One such pathway is the replication of the chromosome. Overall, DNA replication is a highly conserved process across different kingdoms of life (O’Donnell et al., 2013; Bleichert et al., 2017). In all bacteria, DNA replication is a tightly regulated process that occurs with high fidelity and efficiency, and is essential for cell survival. The process involves many different proteins that are required for the replication process itself, or to regulate and aid replisome assembly and activity (Katayama et al., 2010; Murray and Koh, 2014; Chodavarapu and Kaguni, 2016; Jameson and Wilkinson, 2017; Schenk et al., 2017). Replication initiation and its regulation arguably are candidates for the search of novel therapeutic targets (Fossum et al., 2008; Grimwade and Leonard, 2017; van Eijk et al., 2017).
In most bacteria, replication of the chromosome starts with the assembly of the replisome at the origin of replication (oriC) and proceeds bidirectionally (Chodavarapu and Kaguni, 2016). In the majority of bacteria replication is initiated by the DnaA protein, an ATPase Associated with diverse cellular Activities (AAA+ protein) that binds specific sequences in the oriC region. The binding of DnaA induces DNA duplex unwinding, which subsequently drives the recruitment of other proteins, such as the replicative helicase, primase and DNA polymerase III proteins {Chodavarapu, 2016 #974}. Termination of replication eventually leads to disassembly of the replication complexes (Chodavarapu and Kaguni, 2016).
In C. difficile, knowledge on DNA replication is limited. Though many proteins appear to be conserved between well-characterized species and C. difficile, only certain replication proteins have been experimentally characterized for C. difficile (Torti et al., 2011; Briggs et al., 2012; van Eijk et al., 2016). DNA polymerase C (PolC, CD1305) of C. difficile has been studied in the context of drug-discovery and appears to have a conserved primary structure similar to other low-[G+C] gram-positive organisms (Torti et al., 2011). It is inhibited in vitro and in vivo by compounds that compete for binding with dGTP (van Eijk et al., 2019; Xu et al., 2019). Helicase (CD3657), essential for DNA duplex unwinding, was found to interact in an ATP-dependent manner with a helicase loader (CD3654) and loading was proposed to occur through a ring-maker mechanism (Davey and O’Donnell, 2003; van Eijk et al., 2016). However, in contrast to helicase of the Firmicute Bacillus subtilis, C. difficile helicase activity is dependent on activation by the primase protein (CD1454), as has also been described for Helicobacter pylori (Bazin et al., 2015; van Eijk et al., 2016). C. difficile helicase stimulates primase activity at the trinucleotide 5’d(CTA), but not at the preferred trinucleotide 5’-d(CCC) (van Eijk et al., 2016).
DnaA of C. difficile has not been studied to date. Although no full-length structure has been determined for DnaA, individual domains of the DnaA protein from different organisms have been characterized (Majka et al., 1997; Zawilak et al., 2003; Erzberger et al., 2006; Zawilak-Pawlik et al., 2017). DnaA proteins generally comprise four domains (Zawilak-Pawlik et al., 2017). Domain I is involved in protein-protein interactions and is responsible for DnaA oligomerization (Weigel et al., 1999; Abe et al., 2007; Natrajan et al., 2009; Jameson et al., 2014; Kim et al., 2017; Zawilak-Pawlik et al., 2017; Martin et al., 2018; Matthews and Simmons, 2019; Nowaczyk-Cieszewska et al., 2019). Little is known about a specific function of domain II and this domain may even be absent (Erzberger et al., 2002). It is thought to be a flexible linker that promotes the proper conformation of the other DnaA domains (Abe et al., 2007; Nozaki and Ogawa, 2008). Domain III and Domain IV are responsible for the DNA binding. Domain III contains the AAA+ motif and is responsible for binding ATP, ADP and single-stranded DNA, as well as certain regulatory proteins (Kawakami et al., 2005; Cho et al., 2008; Ozaki et al., 2008; Ozaki and Katayama, 2012). Recent studies have also revealed the importance of this domain for binding phospholipids present in the bacterial membrane (Saxena et al., 2013). The C-terminal Domain IV contains a helixturn-helix motif (HTH) and is responsible for the specific binding of DnaA to so called DnaA boxes (Blaesing et al., 2000; Erzberger et al., 2002; Fujikawa et al., 2003).
DnaA boxes are typically 9-mer non-palindromic DNA sequences, and the E. coli DnaA box consensus sequence is TTWTNCACA (Schaper and Messer, 1995; Wolanski et al., 2014). The boxes can differ in their affinity for DnaA, and even demonstrate different dependencies on the ATP co-factor (Speck et al., 1999; Patel et al., 2017). Binding of domain IV to the DnaA boxes promotes higher-order oligomerization of DnaA, forming a filament that wraps around DNA (Erzberger et al., 2006; Ozaki et al., 2012; Scholefield and Murray, 2013). It is thought that the interaction of the DnaA filament with the DNA helix introduces a bend in the DNA (Erzberger et al., 2006; Patel et al., 2017). The resulting superhelical torsion facilitates the melting of the adjacent A+T-rich DNA Unwinding Element (DUE) (Kowalski and Eddy, 1989; Erzberger et al., 2006; Zorman et al., 2012). Upon melting, the DUE provides the entry site for the replisome proteins. Another conserved structural motif, a triplet repeat called DnaA-trio, is involved in the stabilization of the unwound region (Richardson et al., 2016; Richardson et al., 2019).
The oriC region has been characterized for several bacterial species. These analyses show that oriC regions are quite diverse in sequence, length and even chromosomal location, all of which contribute to species-specific replication initiation requirements (Zawilak-Pawlik et al., 2005; Ekundayo and Bleichert, 2019). In Firmicutes, including C. difficile, the genomic context of the origin regions appears to be conserved and encompasses the rnpA-rpmH-dnaA-dnaN genes (Ogasawara and Yoshikawa, 1992; Briggs et al., 2012).
The oriC region can be continuous (i.e. located at a single chromosomal locus) or bipartite (Wolanski et al., 2014). Bipartite origins where initially identified in B. subtilis (Moriya et al., 1988) but more recently also in H. pylori (Donczew et al., 2012). The separate subregions of the bipartite origin, oriC1 and oriC2, are usually separated by the dnaA gene. Both oriC1 and oriC2 contain clusters of DnaA boxes, and one of the regions contains the major DUE region. The DnaA protein binds to both subregions and places them in close proximity to each other, consequently looping out the dnaA gene (Krause et al., 1997; Donczew et al., 2012). In H. pylori, DnaA domain I and II are important for maintaining the interactions between both oriC regions (Nowaczyk- Cieszewska et al., 2019).
In this study, we identified the putative oriC of C. difficile through in silico analysis and demonstrate DnaA-dependent unwinding of the oriC2 region in vitro. A clear conservation of the origin of replication organization is observed throughout the clostridia. The present study contributes to our understanding of clostridial DNA replication initiation in general, and replication initiation of C. difficile specifically.
2. Materials and Methods
2.1 Sequence alignments and structure modelling
Multiple sequence alignment of amino acid sequences was performed with Protein BLAST (blastP suite, https://blast.ncbi.nlm.nih.gov/Blast.cgi) for individual alignment scores and the PRALINE program (http://www.ibi.vu.nl/programs/pralinewww/) (Bawono and Heringa, 2014) for multiple sequence alignment. Sequences were retrieved from the NCBI Reference Sequences. DnaA protein sequences from C. difficile 630Δerm (CEJ96502.1), C. acetobutylicum DSM 1731 (AEI33799.1), Bacillus subtilis 168 (NP_387882.1), Escherichia coli K-12 (AMH32311.1), Streptomyces coelicolor A3(2) (TYP16779.1), Mycobacterium tuberculosis RGTB327 (AFE14996.1), Helicobacter pylori J99 (Q9ZJ96.1) and Aquifex aeolicus (WP_010880157.1) were selected for alignment. Alignment was visualized in JalView version 2.11, with coloring by percentage identity.
Secondary structure prediction and homology modelling were performed using Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2) (Kelley et al., 2015) using the intensive default settings. Phyre2 modelling of C. difficile 630Δerm DnaA (CEJ96502.1) was performed with 3 templates from A. aeolicus (PDB 2HCB, chain C), B. subtilis (PDB 4TPS, chain D) and E. coli (PDB 2E0G, chain A) and 21 residues were modelled ab initio. 95% of the residues were modelled with >90% confidence. Graphical representation was performed with the PyMOL Molecular Graphics System, Version 1.76.6. Schrödinger, LLC.
2.2 Prediction of the C. difficile oriC
To identify the oriC region of C. difficile the genome sequence of C. difficile 630Δerm (GenBank accession no. LN614756.1) was analyzed through different software in a stepwise procedure (Mackiewicz et al., 2004).
The GenSkew Java Application (http://genskew.csb.univie.ac.at/) was used with default settings for the analysis of the normal and the cumulative skew of two selectable nucleotides of the genomic nucleotide sequence ([G – C]/[G + C]). Calculations where performed with a window size of 4293 bp and a step size of 4293 bp. The inflection values of the cumulative GC skew plot are indicative of the chromosomal origin (oriC) and terminus of replication (ter).
Prediction of superhelicity-dependent helically unstable DNA stretches (SIDDs) was performed in the vicinity of the inflection point of the GC-skew plot, in 2.0 kb fragments comprising intergenic regions from nucleotide position 4291795 to 745 (oriC1) and 466 to 2465 (oriC2) of the C. difficile 630Δerm chromosome. Prediction of the SIDDs in the different clostridia (Table 1) was performed in the vicinity of the inflection points of the GC-plot retrieved from DoriC 10.0 database (http://tubic.tju.edu.cn/doric/public/index.php) (Luo and Gao, 2019), in 2.0 kb fragments comprising intergenic regions summarized in Table 1. The SIST program (https://bitbucket.org/benhamlab/sist_codes/src/master/) (Zhabinskaya et al., 2015) was used to predicted free energies G(x) by running the melting transition algorithm only (SIDD) with default values (copolymeric energetics; default: σ =–0.06; T = 37°C; x= 0.01M) and with superhelical density σ = −0.04.
We performed the identification of the DnaA box clusters by search of the motif TTWTNCACA with one mismatch (Supplementary information) in the leading strand on a 4432 bp sequence between the nucleotide position 4291488 to 2870 of the C. difficile 630Δerm chromosome, using Pattern Locator (https://www.cmbl.uga.edu//downloads/programs/Pattern_Locator/patloc.c) (Mrazek and Xie, 2006). Identification of the DnaA boxes in the different clostridia (Table 1) was performed with the same pattern motif in the leading strand of the intergenic regions summarized on Table 1.
DnaA-trio sequences and ribosomal binding sites where manually predicted based on Richardson et all. (Richardson et al., 2016) and on Vellanoweth and Rabinowitz (Vellanoweth and Rabinowitz, 1992), respectively.
All output data was obtained as raw text files and further processed with Prism 8.3.1 (GraphPad, Inc, La Jolla, CA) and CorelDRAW X7 (Corel).
2.3 Strains and growth conditions
E. coli strains were grown aerobically at 37°C in lysogeny broth (LB, Affymetrix) supplemented with 15 μg/mL chloramphenicol or 50 μg/mL kanamycin when required. E. coli strain DH5α (Table 2) for DnaA containing plasmid and E. coli MC1061 strain (Table 2) was used to maintain the oriC containing plasmids. E. coli MS3898 strain, kindly provided by Alan Grossman (MIT, Cambridge, USA) (Table 2) was used for recombinant DnaA expression. E. coli transformation was performed using standard procedures (Sambrook et al., 1989). The growth was followed by monitoring the optical density at 600 nm (OD600).
2.4 Construction of the plasmids
For overexpression of DnaA, the dnaA nucleotide sequence (CEJ96502.1) from C. difficile 630Δerm (GenBank accession no. LN614756.1) was amplified by PCR from C. difficile 630Δerm genomic DNA using primers oEVE-7 and oEVE-21 (Table 3). The PCR product was subsequently digested with NcoI and BglII. The vector pAV13 (Smits et al., 2011) (Table 4), containing B. subtilis dnaA cloned in pQE60 (Qiagen) was kindly provided by Alan Grossman (MIT, Cambridge, USA) and was digested with the same enzymes and ligated to the digested fragment to yield vector pEVE40 (Table 4).
To construct a plasmid carrying the complete predicted oriC, the predicted oriC region (nucleotide 4292150 to 1593 from C. difficile 630 GenBank accession no. LN614756.1) was amplified by PCR from C. difficile 630Δerm genomic DNA using primers oAP40 and oAP41 (Table 3). The PCR product was subsequently digested with EcoRI and PstI and ligated into pori1ori2 (Table 4), kindly provided by Anna Zawilak-Pawlik (Hirszfeld Institute of Immunology and Experimental Therapy, PAS, Wroclaw, Poland), that was digested with the same enzymes, to yield vector pAP205 (Table 4).
For the cloning of the predicted oriC1 region (nucleotide 4292150 to 24 of C. difficile 630Δerm genomic DNA) the primer set oAP30/oAP31 (Table 3) was used. The amplified fragment was digested with EcoRI and PstI and inserted onto pori1ori2 (Table 4) digested with same enzymes, yielding vector pAP83 (Table 4). For the cloning of the predicted oriC2 region (nucleotide 1291 to the 1593 of C. difficile 630Δerm genomic DNA) the primer set oAP32/oAP33 (Table 3) was used. The amplified fragment was digested with EcoRI and PstI and inserted onto pori1ori2 (Table 4) digested with same enzymes, yielding vector pAP76 (Table 4).
All DNA sequences introduced into the cloning vectors were verified by Sanger sequencing. For oriC containing vectors primers oAP56 and oAP57 (Table 3) were used for sequencing.
2.5 Overproduction and purification of DnaA-6xHis
Overexpression of DnaA-6xHis was carried out in E. coli strain CYB1002 (Table 2), harbouring the expression plasmid pEVE40 (Table 4). Cells were grown in 800 mL LB and induced with 1mM isopropyl-β-D-1-thiogalactopyranoside (IPTG) at an OD600 of 0.6 for 3 hours. The cells were collected by centrifugation at 4°C and stored at −80°C. Cells were resuspended in Binding buffer (1X Phosphate buffer pH7.4, 10 mM Imidazol, 10% glycerol) lysed by French Press and collected in phenylmethylsulfonyl fluoride (PMSF) at 0.1 mM (end concentration). Separation of the soluble fraction was performed by centrifugation at 13000xg at 4°C for 20 min. Purification of the protein from the soluble fraction was done in Binding buffer on a 1 mL His Trap Column (GE Healthcare) according to manufacturer’s instructions. Elution was performed with Binding buffer in stepwise increasing concentrations of imidazole (20, 60, 100, 300 and 500 mM). DnaA- 6xHis was mainly eluted at concentration of imidazole equal to or greater than 300mM.
Fractions containing the DnaA-6xHis protein were pooled together and applied to Amicon Ultra Centrifugal Filters with 30 kDa cutoff (Millipore). Buffer was exchanged to Buffer A (25 mM HEPES-KOH pH 7.5, 100 mM K-glutamate, 5 mM Mg-acetate, 10% glycerol). The concentrated DnaA protein was subjected to size exclusion chromatography on an Äkta pure instrument (GE Healthcare). 200 μL of concentrated DnaA-6xHis was applied to a Superdex 200 Increase 10/30 column (GE Healthcare) in buffer A at a flow rate of 0.5 ml min-1. UV detection was done at 280 nm. The column was calibrated with a mixture of proteins of known molecular weights (Mw): thyroglobulin (669 kDa), Apoferritin (443 kDa), β-amylase (200 kDa), Albumin (66 kDa) and Carbonic anhydrase (29 kDa). Eluted fractions containing DnaA-6xHis of the expected molecular weight (51 kDa) were quantified and visualized by Coomassie. Pure fractions were aliquoted and stored at −80°C for further experiments.
2.6 Immunoblotting and detection
For immunoblotting, proteins were separated on a 12% SDS-PAGE gel and transferred onto nitrocellulose membranes (Amersham), according to the manufacturer’s instructions. The membranes were probed in PBST (PBS pH 7,4, 0,05% (v/v) Tween-20) with the mouse anti-his antibody (1:3000, Invitrogen) and the respective secondary antibody goat anti-mouse-HRP (1:3000, DAKO) were used. The membranes were visualized using the chemiluminescence detection kit Clarity ECL Western Blotting Substrates (Bio-Rad) in an Alliance Q9 Advanced machine (Uvitec).
2.7 P1 nuclease Assay
For the P1 nuclease assay, 100 ng pAP205 plasmid was incubated with increasing concentrations of DnaA-6xHis (0.14, 0.54, 1 and 6.3 μM), when required, in P1 buffer (25mM Hepes-KOH (pH 7.6), 12% (v/v) glycerol, 1mM CaCl2, 0.2mM EDTA, 5mM ATP, 0.1 mg/ml BSA), at 30°C for 12 min. 0.75 unit of P1 nuclease (Sigma), resuspended in 0.01 M sodium acetate (pH 7.6) was added to the reaction and incubated at 30°C for 5 min. 220 μl of buffer PB (Qiagen) was added and the fragments purified with the miniElute PCR Purification Kit (Qiagen), according to manufacturer’s instructions. Digestion with BglII, NotI or ScaI (NEB) of the purified fragments was performed according to manufacturer’s instructions for 1 hour at 37°C. Digested samples were resolved on 1% agarose gels in 0.5xTAE (40 mM Tris, 20 mM CH3COOH, 1 mM EDTA PH 8.0) and stained with 0.01 mg/mL ethidium bromide solution afterwards. Visualization of the gels was performed on the Alliance Q9 Advanced machine (Uvitec). Images were processed in CorelDraw X7 software.
3. Results
3.1 C. difficile DnaA protein
C. difficile 630Δerm encodes a homolog of the bacterial replication initiator protein DnaA (GenBank: CEJ96502.1; CD630DERM_00010). Alignment of the full-length C. difficile DnaA amino acid sequence with selected DnaA homologs from other organisms demonstrates a sequence identity of 35% to 67%, with an even higher similarity (57% to 83%, Fig. 1A). C. difficile DnaA displays a greater sequence identity between the low-[G+C] Firmicutes (> 60%). When compared with the extensively studied DnaA proteins from E. coli and B. subtilis, the full-length protein has 43% and 62% identity, and a similarity of 63% and 78%, respectively (Fig. 1A).
To assess the structural properties of C. difficile DnaA, we predicted the secondary structure and generated a model of the protein using Phyre2 (Kelley et al., 2015) (Fig. 1B). The predicted DnaA model is based on three DnaA structures from different organisms: A. aeolicus (residues 101 to 318 and 334 to 437)(Erzberger et al., 2006) for domain III and IV, and B. subtilis (residues 2 to 79) (Jameson et al., 2014) and E. coli (residues 5 to 97) (Abe et al., 2007) for domain I and II.
Domain I of DnaA mediates interactions with a diverse set of regulators, and is involved in DnaA oligomerization (Zawilak-Pawlik et al., 2017; Nowaczyk-Cieszewska et al., 2019). We observe limited homology of C. difficile DnaA domain I with the equivalent domain of the selected organisms (Fig. 1A), although the overall fold is clearly conserved (Fig. 1B). Nevertheless, some residues (P45, F48) appear to be conserved in most of the selected organisms (Fig. 1A).
Domain II is a flexible linker that is possibly involved in aiding the proper conformation of the DnaA domains, and thus requires a minimal length for DnaA function in vivo (Nozaki and Ogawa, 2008). No clear sequence similarity is observed on domain II and modelling of the C. difficile DnaA protein suggests a putative disordered nature of this domain (Fig. 1).
Domain III is responsible for binding to the co-factors ATP and ADP, and in conjunction with domain IV essential for DNA binding (Kawakami et al., 2005; Ozaki et al., 2008; Ozaki and Katayama, 2012). Within domain III we readily identified the Walker A and Walker B motifs (WA and WB in Fig. 1A) of the AAA+ fold (residues 135-317), crucial for binding and hydrolyzing ATP. This domain is highly conserved among all the selected organisms (Fig. 1A) and comprises a structural center of β-sheets (Fig. 1B, pink domain). Other features of the AAA+ ATPase fold are present and conserved between the organisms, such as the sensor I and sensor II motifs required for the nucleotide binding (I and II, Fig. 1A). The arginine finger motif (the equivalent of R285 of E.coli DnaA in the VII box), important for the ATP dependent activation of DnaA (Kawakami et al., 2005), is conserved in C. difficile DnaA as well (R256 in motif box VII; Fig. 1A).
The C-terminal domain IV of the DnaA protein (residues 317 to 439, Fig. 1A), contains the HTH motif required for the specific binding to DnaA-boxes (Erzberger et al., 2002; Zawilak et al., 2003). Previous studies identified several residues involved in specific interactions with the DnaA boxes, that bind through hydrogen bonds and van der Waals contacts with thymines present in the DNA sequence (Blaesing et al., 2000; Fujikawa et al., 2003; Tsodikov and Biswas, 2011). The residues are conserved among all Firmicutes and E. coli, including the residues R371 (position R399 in E. coli), P395 (P423), D405 (D433), H406 (H434), T407 (T435), and H411 (H439), (Fig. 1B inset, red residues) (Fujikawa et al., 2003). Structural modeling of C. difficile DnaA predicts these residues to be exposed, providing an interface for DNA binding (Fig. 1B). Several residues were found to be involved in non-specific interactions with the phosphate backbone of the DNA, including some of the residues that confer the specificity (Fujikawa et al., 2003; Tsodikov and Biswas, 2011). These contacting residues appear less conserved between the selected organisms (Fig. 1A. Nevertheless, the residues for specific base recognition are conserved between the Firmicutes and E. coli, suggesting that C. difficile DnaA is likely to recognize the consensus DnaA box TTWTNCACA (Schaper and Messer, 1995).
3.2 Expression and purification of DnaA-6xHis
To allow for in vitro characterization of DnaA activity, we recombinantly expressed the C. difficile DnaA with a C-terminal 6xHis-tag in E. coli cells. To prevent the copurification of C. difficile DnaA with host DnaA protein, E. coli strain CYB1002 was used (a kind gift of A.D. Grossman). This strain is a derivative of E. coli MS3898, that lacks the dnaA gene and replicates in a DnaA-independent fashion (Sutton and Kaguni, 1997). Induction of the DnaA-6xHis protein was confirmed by Coomassie staining and immunoblotting with anti-his antibody at the expected molecular weight of 51 kDa (Fig. 2A, red arrow). Upon overexpression of DnaA-6xHis, smaller fragments were observed, which accumulated with a prolonged time of expression (Fig. 2A), most likely corresponding to proteolytic fragments of the DnaA-6xHis protein.
Purification of the recombinant DnaA-6xHis showed a clear band at the expected size when eluted at 300 mM imidazole concentration, but several lower molecular size bands were observed (Fig. S1). Therefore, the eluted fractions where further purified with size exclusion chromatography (SEC). This yielded a single product at the expected molecular weight of DnaA-6xHis, and its identity was confirmed by westernblot with anti-his antibody (Fig. 2B, red arrow). A minor band of lower molecular weight (approximately 38 kDa, <1% of total protein) was observed (Fig. 2B, green asterisk), which may reflect some instability of the N-terminus of the DnaA-6xHis protein, as it appears to have retained the C-terminal 6xHis tag.
3.3 In silico prediction of the oriC region
To identify the oriC region and the elements that are part of it (DUE, DnaA-trio and DnaA boxes) we performed different prediction approaches in a stepwise procedure, as initially described (Mackiewicz et al., 2004).
We first analyzed the DNA asymmetry of the genome of C. difficile 630Δerm (GenBank accession no. LN614756.1) (van Eijk et al., 2015), by plotting the normalized difference of the complementary nucleotides (GC-skew plot) (Necsulea and Lobry, 2007). C. difficile 630Δerm has a circular genome of 4293049 bp and an average G+C content of 29.1%. We used the GenSkew Java Application (http://genskew.csb.univie.ac.at/) for determining the chromosomal asymmetry. Asymmetry changes in a GC-skew plot can be used to predict the origin of replication region and the terminus region of bacterial genomes. Based on this analysis, the origin is predicted at approximately position 1 of the chromosome. The terminus location is predicted at approximately 2.18 Mbp from the origin region (Fig. 3A). These results were confirmed when artificially reassigning the starting position of the chromosomal assembly (data not shown). The gene organization in the putative origin region is rnpA-rpmH-dnaA-dnaN (position 4291488 to 2870, Fig. 3B), identical to the origin of B. subtilis (Ogasawara et al., 1985; Briggs et al., 2012), and therefore encompasses the dnaA gene (CD630DERM_00010, Fig. 3B) (Ogasawara et al., 1985; Briggs et al., 2012).
We next used the SIST program (Zhabinskaya et al., 2015) to localize putative DUEs in the intergenic regions in the chromosomal region predicted to contain the oriC. Hereafter we refer to these regions as oriC1 (in the intergenic region of rpmH-dnaA) and oriC2 (in the intergenic region dnaA-dnaN), in line with nomenclature in other organisms (Ogasawara et al., 1985; Donczew et al., 2012) (Fig. 3B). SIST identifies helically unstable AT-rich DNA stretches (Stress-Induced Duplex Destabilization regions; SIDDs) (Donczew et al., 2012; Zhabinskaya et al., 2015). In regions with a lower free energy (G(x) < y kcal/mol) the double-stranded helix has a high probability to become single-stranded DNA. With increasing negative superhelicity (σ = −0.06, Fig. 3C, green line) regions of both oriC1 and oriC2 become single stranded DNA (G(x) <2 kcal/mol). At low negative superhelicity (σ = −0.04, Fig. 3C, red line) short stretches of DNA of approximately 27 bp were identified with a significantly lower free energy. These regions with lower free energy at a negative superhelicity of −0.04 and −0.06 are potential DUE sites. The nucleotide sequence of the possible unwinding elements identified are represented in detail in Fig. 4 (grey boxes).
We then performed the identification of DnaA box clusters through a search of the consensus DnaA box TTWTNCACA containing up to one mismatch, using Pattern Locator (Mrazek and Xie, 2006). 22 putative DnaA boxes where identified in both the leading and lagging strain in the predicted C. difficile oriC regions (Fig. 4, pink boxes), 14 in the oriC1 region and 8 in the oriC2 region. Both the consensus DnaA box TTWTNCACA and variant boxes are found. A cluster of DnaA boxes was proposed to contain at least three boxes with an average distance lower than 100 bp in between (Mackiewicz et al., 2004). At least one such cluster can be found in each origin region (Fig. 4).
We also manually identified the putative ribosomal binding sites for the annotated genes (Fig. 4, dashed line).
Finally, we manually predicted DnaA-trio sequences (3’-[G/A]A[T/A]n>3-5’ preceded by a GC-cluster) in the predicted oriC regions, as this motif is required for successful replication in both E. coli and B. subtilis (Richardson et al., 2016; Katayama et al., 2017). We identified a clear DnaA-trio in the lagging strand upstream of a predicted DUE region in the oriC2 region, with the nucleotide sequence 5’- CACCTACTACTATTACTACTATGA-3’ (Fig. 4, light blue box), but no clear DnaA-trio was identified in the oriC1 region.
From all the observations, we anticipate that a bipartite origin is located in the dnaA chromosomal region of C. difficile with unwinding occurring downstream of dnaA, at the oriC2 region.
3.4 DnaA-dependent unwinding
To localize DnaA-dependent unwinding of oriC, we used the purified C. difficile DnaA- 6xHis protein and the predicted oriC sequence, to perform P1 nuclease assays as previously described (Sekimizu et al., 1988; Donczew et al., 2012). Localized melting resulting from DnaA activity exposes ssDNA to the action of the ssDNA-specific P1 nuclease. After incubation of a vector containing the oriC fragment with DnaA protein and cleavage by the P1 nuclease, the vector is purified and digested with different endonucleases to map the location of the unwound region.
We constructed vectors, based on pori1ori2 (Donczew et al., 2012), harboring C. difficile oriC1 (pAP76) or oriC2 (pAP83) individually, as well as the complete oriC region (pAP205) (Fig. 5A and S2A). For a more accurate determination of the unwound region, the vectors were subjected to digestion by three different restriction enzymes (BglII, NotI, ScaI), resulting in different restriction patterns. Limited spontaneous unwinding of the plasmid was observed in the C. difficile oriC-containing vectors (Fig. 5A and S2B). No DnaA-dependent change in restriction pattern was observed when using the single oriC regions (Fig. S2B), suggesting oriC1 and oriC2 individually lack the requirements for DnaA-dependent unwinding.
We did observe a DnaA-dependent change in digestion patterns for the oriC1oriC2- containing vector pAP205 (Fig 5). Digestion of this vector with BglII in the absence of DnaA-6xHis and P1 nuclease resulted in a linear DNA fragment (4638 bp) due the presence of a unique BglII restriction site (Fig. 5B, first lane, upper panel). The addition of P1 nuclease leads to the appearance of a faint band between 1650 and 3000 bp (Fig. 5B), consistent with previous observations that the presence of a plasmid DUE can result in low-level spontaneous unwinding due to the inherent instability of these AT- rich regions (Jaworski et al., 2016). Upon the addition of the DnaA-6xHis protein the observed band becomes more intense, suggesting a strong increase in unwinding (Fig. 5B, upper panel, red arrow).
Digestion of pAP205 with NotI in the absence of DnaA-6xHis and P1 nuclease results in fragments of 3804 and 842 bp, due to two NotI recognition sites in the vector (Fig 5B, 1st lane, middle panel). In the presence of just P1 nuclease, a similar low level of spontaneous unwinding is observed, resulting in the appearance of two additional faint bands, one between 1650 and 3000 bp and other between 1000 and 1650 bp (Fig. 5B). The addition of DnaA-6xHis results in an increase in intensity of both these bands in a dose dependent manner (Fig. 5A, middle panel, red arrows).
The ScaI digestions of pAP205 show a complex pattern, which we attribute to partially incomplete digestion under the conditions used, and which we have not been able to fully resolve. The most relevant observation is a clearly visible band of between 650 and 850 bp in the presence of both P1 and DnaA-6xHis (Fig. 5A, lower panel, red arrow). We do not observe spontaneous unwinding in the presence of only P1 nuclease, although the pattern is distinct from that of the control lane (Fig 5B, first lane, lower panel).
The DnaA-dependent appearance of the ~2000 bp band in the BglII digest, the ~1200 and ~2200bp bands in the NotI digest, and the ~700 bp band in the ScaI digest localize the DnaA-dependent unwinding of the C. difficile oriC in the oriC2 region (Fig. 5B, gray rectangle, DUE). Moreover, these results suggest that C. difficile has a bipartite origin of replication, as successful DnaA-dependent unwinding of C. difficile in the oriC2 region requires both oriC regions (oriC1 and oriC2).
3.5 Conservation of the origin organisation in related Clostridia
Our results suggest that the origin organization of C. difficile resembles that of a more distantly related Firmicute, B. subtilis. To extend our observations, we evaluated the genomic organization of the oriC region in different organisms phylogenetically related to C. difficile. We followed a similar approach as described above for C. difficile 630Δerm, taking advantage of the DoriC 10.0 database (http://tubic.tju.edu.cn/doric/public/index.php) (Luo and Gao, 2019). Importantly, our results with respect to the C. difficile origin of replication described above were largely congruent with the DoriC 10.0 database (data not shown). We retrieved the predicted oriC regions from the DoriC 10.0 database and performed an in-depth analysis of these regions for the closely related C. difficile strain R20291 (NC_013316.1), as well as the more distantly related C. botulinum A Hall (NC_009698.1), C. sordelli AM370 (NZ_CP014150), C. acetobutylicum DSM 1731 (NC_015687.1), C. perfringens str.13 (NC_003366.1) and C. tetani E88 (NC_004557.1) (Table 1).
Similar to C. difficile 630Δerm, the genomic context of the origin contains the rpmH- dnaA-dnaN region for most of the clostridia selected and mirrors that of B. subtilis (Fig. 6). The only exception is C. tetani E88 where the uncharacterized CLOTE0041 gene lies upstream of the dnaA-dnaN cluster (Fig. 6).
We also identified the possible DnaA boxes for the selected clostridia (Fig. 6, pink semicircle). Across the analyzed clostridia, oriC1 region presented more variability in the number of putative DnaA boxes, from 9 to 19, whereas oriC2 contained 5 to 9 DnaA boxes, with C. tetani E88 with the lowest number of possible DnaA boxes, both at the oriC1 (9 boxes) and oriC2 (5 boxes) regions (Fig. 6, pink semi-circle). In all the organisms we observe at least 1 DnaA cluster in each origin region, as also observed for C. difficile 630Δerm.
Prediction of DUEs using the SIST program (Zhabinskaya et al., 2015) identified several helically unstable regions that are candidate sites for unwinding (Fig. 6, dashed lines, and Fig. S3). Notably, in all cases one such region in oriC2 (Fig. 6, grey circle) is preceded immediately by the manually identified DnaA-trio (Fig. 6, light blue circle). Based on our experimental data for C. difficile 630Δerm, we suggest that in all analyzed clostridia, DnaA-dependent unwinding occurs at a conserved DUE downstream of the DnaA-trio in the oriC2 region (Fig. 6).
4. Discussion
Chromosomal replication is an essential process for the survival of the cell. In most bacteria DnaA protein is the initiator protein for replication and through a cascade of events leads to the successful loading of the replication complex onto the origin of replication (Mott and Berger, 2007).
Initial characterization of bacterial replication has been assessed in the model organisms E. coli and B. subtilis (Jameson and Wilkinson, 2017). Despite the similarities the structure of the replication origins and the regulation mechanisms are variable among bacteria (Wolanski et al., 2014). In contrast to E. coli, B. subtilis origin region is bipartite, with two intergenic regions upstream and downstream the dnaA gene. In C. difficile the genomic organization in the predicted cluster rnpA-rpmH-dnaA-dnaN, and the presence of AT-rich sequences in the intergenic regions is consistent with a bipartite origin, as in B. subtilis (Fig. 3).
The origin region contains several DnaA-boxes with different properties that are recognized by the DnaA protein. The specific binding of DnaA to the DnaA-boxes is mediated mainly through domain IV of the DnaA protein. From DNA bound structures of DnaA it was possible to identify several residues involved in the contact with the DnaA boxes, some of which confer specificity (Blaesing et al., 2000; Fujikawa et al., 2003; Tsodikov and Biswas, 2011). Analysis of the of C. difficile DnaA homology in domain IV did not show any difference in the residues involved on the DnaA-box specificity (Fig. 1, vertical arrows), suggesting the same consensus motif conservation as the DnaA-box TTWTNCACA for E.coli (Schaper and Messer, 1995). The conserved DnaA-box motif allowed us to identify several DnaA boxes along the intergenic regions of the oriC. Like in the bipartite origin of B. subtilis, we identified at least one cluster of DnaA-boxes in the C. difficile oriC, present at the oriC1 and the in oriC2 regions (Fig. 4 and 6). However accurate determination of the C. difficile DnaA-boxes was not resolved and further footprinting assays could provide insights on the DnaA-box conservation and affinities. Moreover, it remains to be determined whether the DnaA boxes are crucial for origin firing and/or transcriptional regulation.
The P1 nuclease assays place a region in which DnaA-dependent unwinding occurs in the oriC2 region of C. difficile, supported by the presence of the several features on the oriC2, such as the identified DUE and DnaA-trio, both required for unwinding (Kowalski and Eddy, 1989; Richardson et al., 2016). The presence of both oriC regions (oriC1 and oriC2) is required for melting in vitro, as observed for other bipartite origins (Wolanski et al., 2014). In contrast to the bipartite origin identified in H. pylori (Donczew et al., 2012), we did not observe unwinding of the oriC2 region alone. Though this may be a specific aspect of C. difficile oriC2, we cannot exclude that differences in the experimental setup (e.g. DnaA protein purification) could affect these observations. Nevertheless, our data are consistent with DnaA binding the DnaA-box clusters in both oriC regions, leading to potential DnaA oligomerization, loop formation, and unwinding at the AT-rich DUE site.
When analyzing the origin region between different clostridia, features similar to those of C. difficile are observed, such as conservation of DnaA-box clusters within both oriC regions in the vicinity of the dnaA gene. Similar to C. difficile and B. subtilis, a putative DUE element, preceded by the DnaA-trio, was also located within the oriC2 region (Fig. 4 and 6). Thus, the overall origin organization and mechanism of DNA replication initiation is likely to be conserved within the Firmicutes (Briggs et al., 2012). As spacing of the DnaA-boxes are determinants for the species-specific effective replication (Zawilak et al., 2003; Zawilak-Pawlik et al., 2005), these similarities do no exclude the possibilities that subtle differences in replication initiation exist, and further studies are required.
Additionally, several proteins can interact with the oriC region or DnaA, including YabA, Rok, DnaD/DnaB, Soj and HU (Briggs et al., 2012; Jameson and Wilkinson, 2017). In doing so they shape the origin conformation and/or stabilize the DnaA filament or the unwound region, consequently affecting replication initiation.
YabA or Rok affect B. subtilis replication initiation (Goranov et al., 2009; Schenk et al., 2017; Seid et al., 2017), but no homologs of these proteins have been identified in C. difficile. In B. subtilis, DnaD, DnaB and DnaI helicase loader proteins associate sequentially with the origin region resulting in the recruitment of the DnaC helicase protein (Marsin et al., 2001; Velten et al., 2003; Smits et al., 2010; Jameson and Wilkinson, 2017). In B. subtilis, DnaD binds to DnaA and it is postulated that this affects the stability of the DnaA filament and consequently the unwinding of the oriC (Ishigo- Oka et al., 2001; Martin et al., 2018; Matthews and Simmons, 2019). B. subtilis DnaB protein also affects the DNA topology and has been shown to be important for recruiting oriC to the membrane (Rokop et al., 2004; Zhang et al., 2005). C. difficile lacks a homologue for the DnaB protein, although the closest homolog of the DnaD protein (CD3653) (van Eijk et al., 2017) may perform similar functions in the origin remodeling (van Eijk et al., 2016). Direct interaction of DnaA-DnaD through the DnaA domain I was structurally determined and the residues present at the interface were solved (Martin et al., 2018). Despite high variability of this domain between organisms, half of the identified contacts for the DnaA-DnaD interaction are conserved within C. difficile, the S22 (S23 in B. subtilis DnaA), T25 (T26), F48 (F49), D51 (D52) and L68 (L69) (Fig. 1) (Martin et al., 2018; Matthews and Simmons, 2019). This might suggest a similar interaction surface for CD3653 on C. difficile DnaA. A characterization of the putative interaction between CD3653 and DnaA, and the resulting effect on DnaA oligomerization and origin melting awaits purification and functional characterization of CD3653. The Soj protein, also involved in chromosome segregation, has been shown to interact with DnaA via domain III, regulating DnaA-filament formation (Scholefield et al., 2012) and C. difficile encodes at least one uncharacterized Soj homolog. Bacterial histone-like proteins (such as HU and HBsu) can modulate DNA topology and have been shown the influence on oriC unwinding and replication initiation in other organisms (Krause et al., 1997; Chodavarapu et al., 2008). C. difficile encodes a homologue of HU, HupA (Oliveira Paiva et al., 2019). Though the role of Soj and HupA in DNA replication remains to be elucidated, our experiments show they are not strictly required for origin unwinding. Finally, Spo0A, the master regulator of sporulation, binds to several Spo0A-boxes present in this the oriC region in B. subtilis (Boonstra et al., 2013). Some of the Spo0A-boxes partially overlap with DnaA-boxes and binding of Spo0A can prevent the DnaA-mediated unwinding, thus playing a significant role on the coordination of between cell replication and sporulation (Boonstra et al., 2013). In C. difficile, Spo0A-binding has previously been investigated (Rosenbusch et al., 2012), but a role in DNA replication has not been assessed. For all the regulators with a C. difficile homolog discussed above (i.e. CD3653, Soj, HupA and Spo0A), further studies can be envisioned employing the P1 nuclease assays described here to assess the effects on DnaA-mediated unwinding of the origin.
In summary, through a combination of different in silico predictions and in vitro studies, we have shown DnaA-dependent unwinding in the dnaA-dnaN intergenic region of the bipartite C. difficile origin of replication. We have analysed the putative origin of replication in different clostridia and observed a conserved organization throughout the Firmicutes, although different mechanisms and modes of regulation might drive the initiation of replication. The present study is the first to characterize the origin region of C. difficile and form the start to further unravel the mechanism behind the DnaA-dependent regulation of C. difficile initiation of replication.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Author contributions
AMOP and WKS designed experiments. AMOP and CW performed the in silico analyses. AMOP, EVE and AF performed experiments. AMOP and WKS analysed data and wrote the manuscript. All authors read and approved the final version for submission.
Funding
Work in the group of WKS was supported by a Vidi Fellowship (864.10.003) of the Netherlands Organization for Scientific Research (NWO) and a Gisela Thier Fellowship from the Leiden University Medical Center.
Acknowledgments
We thank Alan Grossman for kindly providing the pAV13 vector and E. coli strain CYB1002. We thank Anna Zawilak-Pawlik for kindly providing the pori1ori2 vector and expert help in setting up the P1 assays. We also thank Luís Sousa for help with the SIDD and Pattern Locator coding files.