A widespread toxin-antitoxin system exploiting growth control via alarmone signalling

Under stressful conditions, bacterial RelA-SpoT Homologue (RSH) enzymes synthesise the alarmone (p)ppGpp, a nucleotide messenger. (p)ppGpp rewires bacterial transcription and metabolism to cope with stress, and at high concentrations inhibits the process of protein synthesis and bacterial growth to save and redirect resources until conditions improve. Single domain Small Alarmone Synthetases (SASs) are RSH family members that contain the (p)ppGpp synthesis (SYNTH) domain, but lack the hydrolysis (HD) domain and regulatory C-terminal domains of the long RSHs such as Rel, RelA and SpoT. We have discovered that multiple SAS subfamilies can be encoded in broadly distributed conserved bicistronic operon architectures in bacteria and bacteriophages that are reminiscent of those typically seen in toxin-antitoxin (TA) operons. We have validated five of these SASs as being toxic (toxSASs), with neutralisation by the protein products of six neighbouring antitoxin genes. The toxicity of Cellulomonas marina ToxSAS FaRel is mediated by alarmone accumulation combined with depletion of cellular ATP and GTP pools, and this is counteracted by its HD domain-containing antitoxin. Thus, the ToxSAS-antiToxSAS system is a novel TA paradigm comprising multiple different antitoxins that exemplifies how ancient nucleotide-based signalling mechanisms can be repurposed as TA modules during evolution, potentially multiple times independently.


INTRODUCTION
(UFB) approximation method was used to compute support values for branches out of 1000 1 replicates. Trees from RAxML and IQ-TREE were visualized with FigTree and subfamilies were 2 inspected for whether they contain mostly orthologues with at least moderate (>60%) bootstrap 3 support. Overall, we could classify sequences into 13 subfamilies of long RSHs, 30 subfamilies of SASs 4 and 11 subfamilies of SAHs. The sequences of each subfamily were aligned and used to make HMMs, 5 as above. The 35615 sequences in the MySQL database were re-scanned with the updated subfamily 6 HMMs and the database was updated, as reproduced here as two Excel files of all sequences 7 (Supplementary Table S1), and subfamily distribution across taxonomy (Supplementary Table S2). 8 9 Phylogenetic analysis of RSH subfamily representatives 10 To make the representative trees of Figure 1, we selected taxa from the RSH database to sample 11 broadly across the tree of life, and cover all subfamilies of RSHs. We used a Python script to select 15 12 representatives per SAS or SAH subfamily, 145 representatives for the almost universal bacterial 13 protein Rel, and 80 representatives for RelA and SpoT, based on taxonomy of the RSH-encoding 14 organism. The script calculates the total number of unique names on each taxonomic level (e.g., 15 phylum, class, order, genus and species) and optimises selection of representative sequences 16 accordingly, in order to sample taxonomy as broadly as possible within that subfamily. The Our Python tool FlaGs 5 was used find conserved genomic architectures, using the NCBI accession 27 numbers of representative RSH subfamily members (one per genus) as input. The legend files for the 28 gene cluster numbers in all conservation figures in this paper are found in Supplementary Table S3

26
Our previous evolutionary analysis of the RSH protein family applied high-throughput sensitive 27 sequence searching of 1072 genomes from across the tree of life 3 . Since the number of available 28 genomes has grown dramatically in the last decade, we revisited the evolution of RSHs, taking 29 advantage of our new computational tool, FlaGs to analyse the conservation of gene neighbourhoods 30 that might be indicative of functional associations 5 . FlaGs clusters neighbourhood-encoded proteins 31 into homologous groups and outputs a graphical visualisation of the gene neighbourhood and its 32 conservation along with a phylogenetic tree annotated with flanking gene conservation. We 33 identified and classified all the RSHs in 24072 genomes from across the tree of life using our previous 34 Hidden Markov Model (HMM)-based method. We then carried out phylogenetic analysis to identify 35 new subfamilies, generated new HMMs and updated the classification in our database 36 (Supplementary Tables S1 and S2). We have identified 30 subfamilies of SASs, 11 subfamilies of 37 SAHs, and 13 subfamilies of long RSHs (Figure 1). The nomenclature follows that of our previous 1 analysis, where prefixes are used to indicate taxonomic distributions 3 . 2 3 Putative toxSAS TA modules are widespread in Actinobacteria, Firmicutes and Proteobacteria 4 We ran FlaGs on each of all the subfamilies and discovered that Small Alarmone Synthetase (SAS) 5 genes can be frequently found in conserved bicistronic (sometimes overlapping) loci that are 6 characteristic of toxin-antitoxin (TA) loci. Five SAS subfamilies displaying particularly well conserved 7 TA-like arrangements: FaRel (which is actually tricistronic), FaRel2, PhRel, PhRel2 and CapRel ( Figure  8 2, Supplementary File S1 and Supplementary Table S3) were selected for further investigation. 9 Among bacteria, PhRel (standing for Phage Rel, the group to which Gp29 8 belongs) and FaRel are 10 found in multiple species of Firmicutes and Actinobacteria (hence the "Fa" prefix), along with 11 representatives of various Proteobacteria; FaRel2 is found in multiple Actinobacteria, and Firmicutes, 12 while PhRel2 is found in firmicutes in addition to bacillus phages. CapRel as a subfamily can be found 13 in a wide diversity of bacteria (including Cyanobacteria, Actinobacteria and Proteobacteria), hence 14 the "Cap" prefix. The putative antitoxins are non-homologous among cognate groups, with the 15 exception of PhRel and CapRel, which share a homologous putative antitoxin (Figure 2). PhRel and 16 CapRel are sister groups in the RSH phylogeny with medium support (81% MLB RAxML, 96% UFB IQ-17 TREE, Figure 1 and Supplementary Text S1), suggesting the TA arrangement has been conserved 18 during the diversification of these groups from a common ancestor. 19 The potential antitoxins are named with an 'AT' prefix to the SAS name. ATfaRel is a predicted SAH of 20 the PbcSpo family (Figure 1 Figure 3D). Despite the well-conserved bicistronic organisation, Mycobacterium tuberculosis 1 AB308 CapRel ( Figure 3E) initially displayed no detectable toxicity. Thus, we added a strong Shine-2 Dalgarno motif (5'-AGGAGG-3') to increase the translation initiation efficiency in order to drive up its 3 expression levels. In the case of Mycobacterium sp. AB308 CapRel, the protein became toxic. 4 Importantly, this toxicity is readily counteracted by the antitoxin ATcapRel ( Figure 3E). 5 Mycobacterium phage Squirty PhRel 8 did not display significant toxicity even when the expression 6 was driven with a strong Shine-Dalgarno sequence (Supplementary Figure S1A). The reason for this 7 seems to be a large deletion in the synthetase active site in Squirty PhRel (Supplementary Figure  8 S2). We also tested well-studied bacterial SASs that are not encoded in TA-like arrangements 9 (Staphylococcus aureus RelP 20, 21 and Enterococcus faecalis RelQ 22, 23 ). We detected no toxicity, even 10 when the expression is driven by a strong Shine-Dalgarno sequence (Supplementary Figure S1B). 11 The validated toxSAS toxins differ in the strength of the toxic effect in our system (Figure 3A-E): i) 12 FaRel2 and PhRel2 are exceedingly potent and no bacterial growth is detected upon expression of 13 these toxins from the original pBAD33 vector, ii) FaRel and PhRel are significantly weaker and small 14 colonies are readily visible and iii) CapRel is weaker still, with toxicity requiring the introduction of a 15 strong Shine-Dalgarno sequence in the pBAD33 vector. We have validated the observed toxicity by 16 following bacterial growth in liquid culture (Figure 3). 17 Next we tested whether enzymatic activity is responsible for the toxicity of toxSASs. To do so, we 18 substituted a conserved tyrosine in the so-called G-loop for alanine (Supplementary Figure S3). This 19 residue is critical for binding the nucleotide substrate and is highly conserved in (p)ppGpp 20 synthetases 24 . All of the tested mutants -PhRel2 Y173A ( Figure 4A), FaRel2 Y128A, PhRel Y143A and 21 FaRel Y175A -are non-toxic (Supplementary Figure S3). Therefore, we conclude that production of 22 a toxic alarmone is, indeed, the universal causative agent of growth inhibition by toxSASs. Finally, the 23 toxicity does not rely on the functionality of the host RSH machinery, since the toxicity phenotype is 24 identical in a ΔrelA ΔspoT (ppGpp 0 ) BW25113 E. coli strain (Supplementary Figure S4). 25 We then investigated whether toxSAS antitoxins inhibit toxSASs on the level of RNA (as in type I and 26 III TA systems) or protein (as in type II and IV TA systems). The former scenario is theoretically 27 possible, since, as we have shown earlier, E. faecalis SAS RelQ binds single-stranded RNA and is 28 inhibited in a sequence-specific manner 22 . To discriminate between the two alternatives, we mutated 29 the start codon of the aTphRel2, aTfaRel2 and aTphRel antitoxin ORFs to a stop codon, TAA. Since 30 all of these mutants fail to protect from the cognate toxSAS ( Figure 4B and Supplementary Figure  31 S5), we conclude that they act as proteins, that is, are type II or IV antitoxins. 32

33
The C. marina ATfaRel SAH hydrolase antitoxin cross-inhibits all identified toxSAS SASs

34
The antitoxin ATfaRel is a member of the PbcSpo subfamily of SAH hydrolases ( Figure 1A). This 35 suggests it acts via degradation of the alarmone nucleotide produced by the toxSAS (and thus as a 36 type IV TA system that does not require direct physical interaction of the TA pair). Therefore, we 1 hypothesised that ATfaRel is able to mitigate the toxicity of all of the identified toxSAS classes 2 through alarmone degradation. This is indeed the case (Figure 5A). Similarly, co-expression of 3 human SAH MESH1 25 universally counteracts the toxicity of toxSASs (Supplementary Figure S6). To 4 test if the hydrolysis activity is strictly necessary for antitoxin function, we generated a point mutant 5 of ATfaRel (D54A). Mutation of the homologous active site residue of Rel from Streptococcus 6 dysgalactiae subsp. equisimilis (Rel Seq ) abolishes (p)ppGpp hydrolysis 26 . As expected, the D54A 7 mutant is unable to counteract the toxicity from FaRel ( Figure 5B). The location of the SAS 8 immediately downstream of the SAH raises the question of whether this gene pair has evolved from 9 fission of a long RSH. However, if this was the case, FaRel and ATfaRel would branch in the long RSH 10 part of the phylogeny, which we do not see (Figure 1 rifampicin and nalidixic acid were used as controls for specific inhibition of translation, transcription 26 and replication, respectively. Addition of antibiotics causes rapid (within two minutes) inhibition of 27 the corresponding target process (Figure 6B, left panel). While expression of C. marina FaRel was 28 inhibitory to transcription, translation and replication, the first process to be affected was 29 transcription: the kinetics of inhibition is similar to that of rifampicin ( Figure 6B, right panel). While 30 the result is in good agreement with (p)ppGpp targeting all of the three processes, the swiftness of 31 the effect on transcription is surprising. 32 We next proceeded to assessing the effects on the intracellular nucleotide pools, with a special focus 33 on (p)ppGpp. First, we used metabolic labelling with 32 P-orthophosphoric acid combined with TLC 34 separation and autoradiography to assess the accumulation and degradation of nucleotide 35 alarmones upon expression of the C. marina FaRel toxSAS and ATfaRel SAH (Figure 6C and 36 Supplementary Figure S10). The expression of C. marina FaRel results in accumulation of 32 P-ppGpp, 1 which is counteracted by wild type -but not D54A substituted -ATfaRel. While the TLC-based 2 approach is efficient, allowing simultaneous analysis of multiple samples, it lacks the resolution and 3 the quantitative nature of the more laborious HPLC-based approach 28 . Therefore, we analysed the 4 kinetics of nucleotide pools upon expression of either FaRel alone (Figure 6DE) or co-expressed with 5 ATfaRel (Supplementary Figure S11) by HPLC. Expression of FaRel dramatically perturbs both 6 guanosine ( Figure 5D) and adenosine (Figure 5E) pools. While both GTP and ATP are rapidly 7 depleted, UTP and CTP levels, after the initial drop at two minutes, remain stable (Supplementary 8 Figure S11). The result is consistent with neither UTP and CTP serving as substrates for RSH enzymes. 9 The ppGpp levels peak at five minutes and drop at ten. The likely explanation is exhaustion of ATP 10 and GTP that serve as substrates for the RSH enzymes. Efficient depletion of ATP, which is 11 approximately two times more abundant in E. coli than GTP (2.2 mM vs 900 µM) 28 is surprising given 12 that RSH-catalysed pppGpp synthesis is expected to consume guanosines and adenosine in a one-to-13 one ratio. A possible explanation is that, similarly to a Streptomyces morookaensis SAS enzyme 29 , 14 FaRel also catalyses synthesis of pppApp using two ATP molecules as substrates. As judged by our 15 microscopy experiments using the membrane potential-sensitive dye DiSC 3 (5) 30 and the membrane 16 permeability-indicator Sytox Green 31 , the cells remained both intact and well energised upon 17 expression of FaRel (Supplementary Figure S8). Therefore, we can rule out an alternative hypothesis 18 that the reduced nucleotide pools were caused either by FaRel-dependent rapid inhibition of cell 19 metabolism, or by triggered leakage of cytoplasmic content. 20 The next logical step was to characterise the enzyme biochemically. Despite our best efforts, we 21 failed to express and purify wild type FaRel to homogeneity, even when co-expressed with ATfaRel. 22 We could, however, purify the enzymatically compromised Y175A mutant. Importantly, when 23 overexpressed, FaRel Y175A potently inhibits bacterial growth and this toxicity is counteracted by 24 ATfaRel (Supplementary Figure S12), indicating that Y175A and wild type FaRel share the same 25 mechanism of toxicity. We tested the enzymatic activity of FaRel Y175A in the presence of 26 radioactively-labelled 3 H-GTP or 3 H-GDP combined with unlabelled ATP (Figure 6F). Since E. faecalis 27 SAS RelQ is inhibited by single-stranded RNA and activated by pppGpp 22 , we also tested the effects of 28 1 µM mRNA and 100 µM pppGpp. While we detected no enzymatic activity in the presence of 3 H-29 GTP, in the presence of 3 H-GDP the activity of the catalytically compromised FaRel Y175A is similar to 30 that of wild type E. faecalis RelQ 22 . Unlike RelQ -and similarly to Staphylococcus aureus RelP 20 -31 FaRel Y175A is insensitive to the addition of pppGpp or mRNA(MF) (Figure 6F). When FaRel Y175A 32 was incubated with 3 H-ATP alone, we did not detect 3 H-pppApp formation. However, both 3 H-GTP 33 and 3 H-ATP is degraded, (Figure 6G). While we did not detect similar NTP degradation using E.

Class II antitoxins protect only from cognate toxSAS toxins 2
The gp29-mediated abrogation of growth is employed by the Phrann phage as a defence mechanism 3 against super-infection by other phages 8 . This raises the question of cross-inhibition between toxSAS 4 TA systems: do all of the identified antitoxins inhibit all of the toxSASs (similarly to how the type IV 5 antitoxin SAH ATfaRel protects from all of the tested toxSASs, see Figure 5A and Table 1) or is the 6 inhibition specific to toxSAS subfamilies TAs? Therefore, we exhaustively tested pairwise 7 combinations of all of the toxSASs with all of the antitoxins (Table 1 and Supplementary Figure S13). 8 ATphRel2, ATfaRel2, ATphRel, ATcapRel and AT2faRel antitoxins could not counteract their non-9 cognate toxSASs, demonstrating that different classes provide specific discrimination of self from 10 non-self. 11 12

Numerous SASs and SAHs are encoded in prophage-derived regions of bacterial genomes 13
Our initial search has identified 13 SASs in bacteriophage genomes, five of which we have confirmed 14 as toxSASs (Figures 2 and 3). However, this is likely to be an underestimate for two reasons.  Table S4). It is notable that of RelP and RelQ (the two most broadly distributed 23 SASs), RelP but not RelQ can be phage-associated. An evolutionary history that includes transduction 24 may be part of the reason why the various operon structures of RelP are less well conserved across 25 genera compared with RelQ (Supplementary File S1). SAHs are found in many more prophages and 26 prophage-like regions than SASs (90 versus 63 instances, Supplementary Table S4). We tested SAHs 27 encoded by Salmonella phages PVP-SE1 33 (PbcSpo subfamily) and SSU5 34 (PaSpo subfamily) in toxicity 28 neutralisation assays against validated toxSASs. Like the C. marina SAH ATfaRel, both of these stand-29 alone phage-encoded SAHs efficiently mitigate the toxicity of all the tested toxSASs (Table 1 and 30 Supplementary Figure S14). 31

32
DISCUSSION 33 Using our tool FlaGs, we have made the surprising discovery that multiple SAS subfamilies can be 34 encoded in TA-like genetic architectures. Through subsequent experimental validation, we have 35 found that the organisation of SAS genes into conserved TA-like bi-(and in one case tri-) cistronic 36 arrangements is an indicator of toxicity. Identification of bicistronic architectures has previously been 1 used as a starting point for prediction of TAs 35, 36 . However, these studies focussed on species that do 2 not encode toxSASs, and therefore these TA systems were not detected. By being associated with 3 novel antitoxins, toxSASs have also escaped identification in "guilt by association" analysis of 4 thousands of genomes 37 . This long-term obscurity is despite toxSAS-containing subfamilies being 5 broadly distributed, present in 239 genera of 15 Gram-positive and -negative phyla of bacterial 6 genomes sampled in this study. Thus, it is likely that there are other previously unknown TA systems 7 to be found that are identifiable through searching for conservation of gene neighbourhoods across 8 disparate lineages, as we have done with FlaGs. 9 The RSH protein family is widespread; most likely being present in the last common ancestor of 10 bacteria. Thus, for billions of years, these proteins have been used by bacteria to regulate their 11 growth rate in response to their environment by synthesising and hydrolysing nucleotide alarmones. 12 Paradoxically, the very ability of an alarmone to downregulate growth for continued survival is also 13 what gives it toxic potential. We have identified 30 subfamilies of SASs, five of which we have 14 validated as containing toxins, and two of which we have validated as non-toxic (RelP and RelQ). It is 15 likely that SASs exist on a continuum in terms of toxicity, with an antitoxin only being required at a 16 certain level of toxicity. This is supported by the observation that not all toxSASs have the same level 17 of toxicity, with one (M. tuberculosis AB308 CapRel) requiring a strong Shine-Dalgarno in order to 18 observe any toxicity in our system. For our five validated toxSAS systems, there are five different 19 homologous groups of antitoxins. This -and the lack of a multi-subfamily toxSAS-specific clade in 20 phylogenetic analysis -suggests toxic SASs could have evolved independently multiple times from 21 non-toxic SASs. In the evolution of a ToxSAS-antiToxSAS module from a non-toxic SAS, it is unlikely 22 that the toxic component evolved before the regulatory antitoxin, as this would be detrimental to 23 fitness. Rather, it is more likely that a SAS became regulated by a neighbouring gene, which relaxed 24 enzymatic constraints on the SAS, allowing it to evolve increased alarmone synthesis rates as well as 25 relax the precision of enzymatic catalysis leading to futile degradation of ATP. While depletion of ATP 26 and GTP pools is expected to contribute to the inhibition of transcription, the fact that the SAH 27 antitoxin efficiently counteracts the toxicity of all ToxSAS SAS enzymes suggests that accumulation of 28 the alarmone is the key toxic effect. We hypothesise that the depletion of the ATP and GTP 29 substrates is responsible for the decrease of (p)ppGpp levels after the initial spike at around five 30 minutes after the induction of FaRel expression. Notably, the (p)ppGpp level remains high in relation 31 to housekeeping ATP and GTP, thus ensuring the efficient shutdown of bacterial growth. 32

33
The specific cellular role of most of the toxSASs is unclear, with the exception of the phage PhRel-34 ATphRel (Gp29-Gp30) toxSAS TA pair, which seems to have a role in inhibition of superinfection 8 . In 35 this system, PhRel encoded by a prophage protects Mycobacteria from infection by a second phage. 36 Phage infection has previously been linked to alarmone accumulation and stringent response in 1 bacteria 38,39,40 . Presumably this is an example of a so-called abortive infection mechanism 41 , where 2 infected hosts are metabolically restricted, but the larger population is protected. A corollary of 3 alarmone-mediated phage inhibition is that incoming phages could bypass this defence system by 4 encoding alarmone hydrolases. Indeed, we have found a variety of different SAHs in different phage 5 genomes and prophage-like regions of bacterial genomes, suggesting there could be cross-talk 6 between ToxSASs and SAHs during infection and superinfection. 7 8

DATA AVAILABILITY 9
FlaGs is an open source Python application available in the GitHub repository 10 (https://github.com/GCA-VH-lab/FlaGs   tested SAS proteins. Genes that encode proteins belonging to a homologous cluster in more than 3 one genomic neighbourhood are coloured and numbered (see Supplementary Table S3 for the 4 identity of clusters with flanking gene accession numbers). The SAS gene is shown in black, and non-5 conserved genes are uncoloured. Validated TAs have red taxon names. SASs that we have tested and 6 are non-toxic have purple taxon names. Purple-and green-outlined grey genes are pseudogenes and 7 geometric mean of three biological replicates and shading represents the standard error, µ 2 is the 1 growth rate (± standard error) either upon induction of the toxin (in red) or in the absence of the 2 toxin (in black, vector control).