Application of transposon-insertion sequencing to determine gene essentiality in the acetogen Clostridium autoethanogenum

The majority of the genes present in bacterial genomes remain poorly characterised with up to one third of those that are protein encoding having no definitive function. Transposon insertion sequencing represents a high-throughput technique that can help rectify this deficiency. The technology, however, can only be realistically applied to easily transformable species leaving those with low DNA-transfer rates out of reach. Here we have developed a number of approaches that overcome this barrier in the autotrophic species Clostridium autoethanogenum using a mariner-based transposon system. The inherent instability of such systems in the Escherichia coli conjugation donor due to transposition events was counteracted through the incorporation of a conditionally lethal codA marker on the plasmid backbone. Relatively low frequencies of transformation of the plasmid into C. autoethanogenum were circumvented through the use of a plasmid that is conditional for replication coupled with the routine implementation of an Illumina library preparation protocol that eliminates plasmid-based reads. A transposon library was then used to determine the essential genes needed for growth using carbon monoxide as a sole carbon and energy source. IMPORTANCE Although microbial genome sequences are relatively easily determined, assigning gene function remains a bottleneck. Consequently, relatively few genes are well characterised, leaving the function of many as either hypothetical or entirely unknown. High-throughput, transposon sequencing can help remedy this deficiency, but is generally only applicable to microbes with efficient DNA-transfer procedures. These exclude many microorganisms of importance to humankind either as agents of disease or as industrial process organisms. Here we developed approaches to facilitate transposon-insertion sequencing in the acetogen Clostridium autoethanogenum, a chassis being exploited to convert single-carbon waste gases, CO and CO2, into chemicals and fuels at an industrial scale. This allowed the determination of gene essentiality under heterotrophic and autotrophic growth providing insights into the utilisation of CO as a sole carbon and energy source. The strategies implemented are translatable and will allow others to apply transposon-insertion sequencing to other microbes where DNA-transfer has until now represented a barrier to progress.


51
Although microbial genome sequences are relatively easily determined, assigning gene 52 function remains a bottleneck. Consequently, relatively few genes are well characterised, 53 leaving the function of many as either hypothetical or entirely unknown. Thus, even the Syn  The deployment of TIS typically is largely dependent on high frequency DNA transfer. This 66 excludes its application to many microbial species. Anaerobic bacteria, and in particular 67 4 members of the genus Clostridium, are of both medical and industrial importance but 68 generally display low rates of DNA transfer. This has limited the exploitation of TIS in this 69 grouping where to date TIS has only been applied 6 to the pathogen Clostridioides difficile 70 (formerly Clostridium difficile). One group of bacteria with increasing importance are the 71 anaerobic acetogens, typified by Clostridium autoethanogenum. Acetogens possess the 72 Wood-Ljungdahl pathway (WLP), or reductive acetyl-CoA pathway, which allows the 73 fixation of CO and CO2 7 . Suggested to be the earliest autotrophic pathway 8 , it is the most 74 energy efficient of the seven known carbon fixation pathway since it conserves energy while 75 all others require its input 9 . Reducing equivalents needed for metabolic processes are 76 obtained either from H2 or CO using hydrogenases or CO dehydrogenase (CODH), 77 respectively. Carbon is fixed via the Eastern branch of the pathway where, through a series of 78 cobalamin and tetrahydrofolate-dependent reactions, CO2 is reduced to a methyl group. The 79 methyl group from the Eastern branch is then combined with CO to form acetyl-CoA which 80 is the root of subsequent anabolic reactions [10][11][12][13] . 81 While the majority of acetogens synthesize acetate as the sole fermentation product some, 82 typified by C. autoethanogenum, naturally produce industrially relevant compounds as 2-3 83 butanediol and ethanol, the latter on a commercial scale 14 . Commercial efforts to extend the 84 product range further are ongoing with isopropanol being a notable example 15 . C. 85 autoethanogenum is one of the best understood autotrophic acetogens with a manually 86 annotated genome 16,17 and has been subjected to transcriptomic and proteomic analysis 18 . 87 In the current study we sought to maximise the benefit of available C. autoethanogenum 88 genome data through implementation of TIS. However, as DNA transfer into C. 89 autoethanogenum is only possible at relatively low frequencies, a number of essential 90 modifications to the procedure were required. Specifically, the use of a conditional replicon   For exploitation in C. autoethanogenum, further control was engineered into the system by  To confirm that TcdR production could be controlled by the addition of exogenous lactose in 129 strain C24, the Clostridium perfringens catP reporter gene encoding a chloramphenicol plasmid DNA (Fig. 3). This was assumed to be due to transposition of the mini transposon 152 from pMTL-YZ14 while in E. coli either into the genome or, as transposition into closed 153 8 circular autonomous plasmids is preferred, into alternative positions in the vector backbone.

154
The cut and paste nature of the transposition event would mean that plasmids would be 155 generated that either no longer carried a mini-transposon or which had been affected in their 156 maintenance or ability to transfer. Similar instabilities have been noted elsewhere 25 .

157
Cytosine deaminase catalyses conversion of 5-fluorocytosine (5-FC) to the toxic product 5-158 fluorouracil (5-FU) which ultimately blocks DNA and protein synthesis. On the plasmid 159 pMTL-CW20, codA is separated from its Pthl promoter (derived from the thiolase gene of   Table 1. There were 439 genes (11%) identified as candidate essential genes out of a total 220 of 4059 genes in the genome for heterotrophic growth on the rich medium YTF where 221 fructose and yeast extract serves as a carbon and energy source (Supplementary Table S1).

222
This is comparable with the number of genes in the Syn3.0 genome and close to the 404 223 reported in Clostridiodes difficile 1,6 . As expected, genes involved in fundamental biological 224 processes such transcription, translation, DNA replication and cell division are common in 225 the rich media essential gene list. Eighteen of the twenty common amino acids have clearly 226 annotated tRNA synthetases which appear essential except for tyrosine and asparagine.

227
Tyrosine appears to exhibit redundancy via CLAU_1290 (tyrZ) and CLAU_1635. There is 228 only one annotated asparagine tRNA synthetase (asnB) but it seems likely that there is 229 another present (CLAU_2687) and that together they provide functional redundancy meaning 230 that both genes are found to be non-essential. CLAU_2687 is currently annotated as a tRNA 231 synthetase class II but is most likely to be an asparagine-specific tRNA synthetase. Another 232 explanation for the non-essential status of the asparagine tRNA synthetase could be that C. 233 autoethanogenum uses a mechanism common to many bacterial and archaeal taxa which 234 entirely lack an asparagine tRNA synthetase. These taxa rely on a non-discriminating aspartic 235 acid tRNA synthetase followed by an amidotransferase to generate asparagine-tRNAs 30 .

236
The candidate essential gene list for rich medium calls into question several of the 237 annotations in the C. autoethanogenum genome. For instance, CLAU_0265 which is 238 annotated as a small acid-soluble spore protein is required on rich medium despite C. 239 autoethanogenum C24 never having been observed to sporulate. In addition, sporulation 240 should never have been required in the library preparation process. The gene must, therefore,  Table S1). In total, 758 genes (19% of the genome) were predicted to be required for autotrophic growth 249 by the endpoint of the CO-fed reactor (Supplementary Table S1). This includes all of the 250 'core' gene set which were also required on rich medium and all of the genes required to 251 grow on minimal medium lacking amino acids. The core gene set was predicted to be  deducing condition-specific genes, but the data is nevertheless extremely informative.  The importance of Nfn for autotrophic growth. In order to further verify the calling of 293 gene essentially under specific conditions using our parameters, a candidate gene was 294 selected for directed CRISPR mutagenesis. The nfn gene (CLAU_1539) encodes an electron-295 bifurcating ferredoxin-dependent transhydrogenase, responsible for the production of 296 NADPH from NADH and Fd 2-, thus recycling NAD+. Our TIS data analysis found that the 297 nfn gene was non-essential when C. autoethanogenum was grown on rich medium or when 298 grown on minimal medium with pyruvate, but when autotrophic conditions were used the 299 gene was essential. This suggested that a directed CRISPR knockout mutant should be 300 obtainable while the culture is grown under heterotrophic conditions but should fail to 301 survive when transferred to autotrophic conditions. A CRISPR in-frame deletion mutant of 302 nfn (∆nfn) was created which was viable on rich media, and on minimal medium with 303 pyruvate as a carbon source, but was unable to grow when CO was the sole carbon and 304 energy source.

305
Initially the ∆nfn strain was characterised in serum bottles, using minimal PETC media and 306 either 10 mM of sodium pyruvate, or 1 bar of CO in the headspace, as the carbon and energy 307 source. Serum bottles were inoculated with 1 ml (1:50 inoculum) of a late exponential culture 308 grown in the anaerobic cabinet on minimal media with fructose as a carbon source. The 309 cultures grown on pyruvate grew similarly to the wild type control, however, no evidence of 310 growth was evident when CO was used instead of pyruvate as the carbon and energy source.

311
This inability to utilise CO as a carbon and energy source was further demonstrated on a 312 larger scale using a fed-batch CSTR experiment, whereby a 1.5 L culture was inoculated with 313 16 150 ml of an early exponential culture grown on minimal media and pyruvate. The pH was 314 controlled with NaOH and H2SO4, and sparged through continual addition of nitrogen at a 315 rate of 60 ml/min. At the time of inoculation 5 mM of sodium pyruvate was added to the 316 culture. Once an OD600 of approximately 0.3 had been reached, CO was introduced at a rate 317 of 10 ml/min. In the case of the wild type culture, the strain was able to adapt to the CO 318 carbon and energy source and after 48 h the OD continued to increase after the pyruvate had 319 been depleted. In the case of the nfn mutant, the culture was not able to adapt to utilising CO, 320 and the optical density rapidly declined following depletion of the pyruvate.  Table S2). To that end, the 325 confusion matrix was generated (Table 2)  selenocysteine residue 23 . It appears from our data that both ORFs are required under 360 autotrophic conditions. Thus, the 44 kDa protein alone does not appear to be sufficient for 361 autotrophy and cells apparently require the 69 kDa protein to be autotrophic.

362
There are three putative formate dehydrogenases in the C. autoethanogenum genome encoded 363 by CLAU_0081, CLAU_2712/CLAU_2713 (fdhA), and CLAU_2907. Of these, fdhA alone 364 appears to be essential only on CO while the remaining two genes are required in neither 365 tested condition. The most important formate dehydrogenase is therefore fdhA which is found 366 in a complex with an NADP-specific electron-bifurcating [FeFe]-hydrogenase (Hyt) 42 . Two 367 of the three putative formate dehydrogenases are selenoenzymes which may be higher 368 efficiency than the cysteine-containing analogues, it is therefore tempting to speculate that 369 the non-selenoenzyme formate dehydrogenase may be present as a backup for low selenium 370 conditions 43 . However, it appears from our data that neither CLAU_0081 nor CLAU_2907 371 could provide sufficient activity in the fdhA mutants for them to not be outcompeted causing 372 fdhA to appear essential under autotrophic conditions.

373
The steps from formate to methyl-THF are catalysed by the products of CLAU_1572-374 CLAU_1576 which all appear to be required for growth on CO. CLAU_1574 and 375 CLAU_1576 additionally appear to also be required for growth on the rich medium.

376
The methyl group of methyl-THF is transferred to the Corrinoid Iron-Sulfur Protein 377 (CoFeSp) cofactor before being combined with the carbonyl group supplied by another   ferredoxin oxidoreductase (AOR; EC 1.2.7.5) but neither of them appear to be essential in 406 either growth condition. In case this the result is best explained by a lack of biological 407 necessity for this reaction since it has been shown that a double AOR knockout strain was 408 still viable autrophically 23 .

409
There are two candidate genes encoding pyruvate synthase enzymes for formation of 410 pyruvate from acetyl-CoA (CLAU_0896 and CLAU_2947) of which only CLAU_2947 411 appears to be required; this is true in both growth conditions. All of the genes encoding 412 functions for the pathways leading to lactate and 2,3-butanediol appear non-essential. In the 413 case of the conversion of acetolactate to acetoin and in the production of lactate utilising 414 NADH there appears to be only one gene encoding the relevant functions (CLAU_2851 and 415 CLAU_1108 respectively); in these cases redundancy is unlikely to be the reason for their 416 non-essential status meaning it is more likely these are unnecessary biological routes.

417
Overall, our findings highlight that TIS represents a powerful functional genomics tool which 418 can be applied to less genetically tractable organisms using the methods applied here.

419
Presented data allows a confident determination of the Wood-Ljungdahl pathway genes of C. with an atmosphere of 80% nitrogen, 10% carbon dioxide and 10% hydrogen at 37 °C. The 429 three media used, were YTF (Table S4-S7), PETC (Table S8-S10) and Fermentation (Table   430 S11-S13) medium. Plasmids were transferred from sExpress to C. autoethanogenum as  Matthew's correlation coefficient was used as a metric to assess the quality of the GSM 489 predictions, where "1" is a perfect correlation between experimental and predicted gene 490 essentiality, "0" no correlation, and "-1" perfect anti-correlation 39 . The model and the scripts 491 are available in GitHub (https://github.com/SBRCNottingham/C.auto_essentiality).

24
Sequencing and bioinformatics. Sequencing library preparation was performed as an amplicon 493 library using a splinkerette adapter 49 . Genomic DNA was fragmented to an average of 400 bp 494 using a covaris sonicator followed by bead purification using NEB sample preparation beads 495 at a ratio of 1.5X beads to sample. Fragmented DNA was end repaired and A-tailed using the 496 NEB Ultra II library preparation kit. Splinkerette adapters were ligated onto the end of A-tailed 497 fragments with reagents from the Ultra II library preparation kit. A 1X bead purification was 498 performed before an I-SceI digest step to cleave plasmid DNA between the library primer and 499 P7 primer. Another 1X bead purification was performed before PCR amplification of the 500 transposon junctions using KAPA HiFi polymerase. An initial denaturing step of 95°C for two 501 min was followed by 20 rounds of 95°C for 20 sec, 61°C for 30 sec then 72°C for 30 sec before 502 a final extension of two min at 72°C was performed.

503
PCR products with a size range of 250-500 bp were gel extracted from a low-melt agarose gel 504 using the NEB monarch gel extraction kit. Gel extracted products were analysed on an 505 Agilent bioanalyser using a DNA 1000 chip and quantified via Qubit and qPCR. Two