De novo assembly of the Pasteuria penetrans genome reveals high plasticity, host dependency, and BclA-like collagens

Pasteuria penetrans is a gram-positive endospore forming bacterial parasite of Meloidogyne spp. the most economically damaging genus of plant parasitic nematodes globally. The obligate antagonistic nature of P. penetrans makes it an attractive candidate biological control agent. However, deployment of P. penetrans for this purpose is inhibited by a lack of understanding of its metabolism and the molecular mechanics underpinning parasitism of the host, in particular the initial attachment of the endospore to the nematode cuticle. Several attempts to assemble the genomes of species within this genus have been unsuccessful. Primarily this is due to the obligate parasitic nature of the bacterium which makes obtaining genomic DNA of sufficient quantity and quality which is free from contamination challenging. Taking advantage of recent developments in whole genome amplification, long read sequencing platforms, and assembly algorithms, we have developed a protocol to generate large quantities of high molecular weight genomic DNA from a small number of purified endospores. We demonstrate this method via genomic assembly of P. penetrans. This assembly reveals a reduced genome of 2.64Mbp estimated to represent 86% of the complete sequence; its reduced metabolism reflects widespread reliance on the host and possibly associated organisms. Additionally, apparent expansion of transposases and prediction of partial competence pathways suggest a high degree of genomic plasticity. Phylogenetic analysis places our sequence within the Bacilli, and most closely related to Thermoactinomyces species. Seventeen predicted BclA-like proteins are identified which may be involved in the determination of attachment specificity. This resource may be used to develop in vitro culture methods and to investigate the genetic and molecular basis of attachment specificity.


12
Pasteuria penetrans is a gram-positive endospore forming bacterial parasite of Meloidogyne spp. the 13 most economically damaging genus of plant parasitic nematodes globally. The obligate antagonistic 14 nature of P. penetrans makes it an attractive candidate biological control agent. However, deployment 15 of P. penetrans for this purpose is inhibited by a lack of understanding of its metabolism and the 16 molecular mechanics underpinning parasitism of the host, in particular the initial attachment of the 17 endospore to the nematode cuticle. Several attempts to assemble the genomes of species within this 18 genus have been unsuccessful. Primarily this is due to the obligate parasitic nature of the bacterium 19 which makes obtaining genomic DNA of sufficient quantity and quality which is free from 20 contamination challenging. Taking advantage of recent developments in whole genome amplification, 21 long read sequencing platforms, and assembly algorithms, we have developed a protocol to generate 22 large quantities of high molecular weight genomic DNA from a small number of purified endospores. 23 We demonstrate this method via genomic assembly of P. penetrans. This assembly reveals a reduced 24 genome of 2.64Mbp estimated to represent 86% of the complete sequence; its reduced metabolism 25 reflects widespread reliance on the host and possibly associated organisms. Additionally, apparent 26 expansion of transposases and prediction of partial competence pathways suggest a high degree of 27 genomic plasticity. Phylogenetic analysis places our sequence within the Bacilli, and most closely 28 related to Thermoactinomyces species. Seventeen predicted BclA-like proteins are identified which 29 may be involved in the determination of attachment specificity. This resource may be used to develop 30 in vitro culture methods and to investigate the genetic and molecular basis of attachment specificity. 31 32 33

66
Pasteuria penetrans is an endospore forming Firmicute which is an obligate parasite of root-knot 67 nematode (RKN, Meloidogyne spp.), a globally distributed genus of plant parasitic nematodes which 68 are among the most economically devastating in agriculture [1,2]. P. penetrans act as natural 69 antagonists to RKN via two key mechanisms. Firstly, attachment of endospores to the nematode 70 cuticle hinders movement, migration through the soil, and thus root invasion [3,4]. Secondly, 71 bacterial infection of the plant feeding nematode results in sterilisation. As such P. penetrans is of 72 considerable interest as a biological alternative to chemical nematicides. The effective application of 73 Pasteuria spp. for this purpose is currently limited by lack of understanding of nematode attachment 74 specificity and in vitro culture method development. Attachment of the endospore to the nematode 75 cuticle is a determinative process in infection [5]. Pasteuria spp. may exhibit extremely fastidious 76 attachment profiles including species and population specificity [6,7]. Attempts to characterise the 77 molecular basis of attachment have identified two components which appear to be involved in this 78 process from the perspective of the endospore: collagen and N-acetyl-glucosamine (NAG). Treatment 79 of endospores with collagenase, NAGase, and the collagen binding domain of fibronectin inhibit 80 attachment [8][9][10][11]. This has prompted the current "Velcro-model" of attachment involving bacterial 81 collagen-like fibres, observable under electron microscopy on the exosporium surface, and nematode 82 cuticle associated mucins [12]. Recently, Phani et al. [13] demonstrated that knockdown of a mucin-83 like gene, Mi-muc-1, reduced cuticular attachment of P. penetrans endospores to M. incognita. 84 However, the exact nature of this host-parasite interaction is not known at the genetic or molecular 85 level. Additionally, no published medium is available in vitro production of P. penetrans [14]. This is 86 attributable to its obligate parasitism of nematodes that are themselves obligate parasites. In short, it is 87 not yet known what P. penetrans requires from its host in order to proliferate. Adding to this complex 88 picture is the apparent influence of "helper-bacteria" which have been implicated in growth promotion 89 [15]. This means that in order to complete its life cycle P. penetrans may rely on metabolic and/or 90 signalling pathways from the plant, the nematode, and from associated bacteria. 91 The difficulty of obtaining genomic DNA of sufficient quantity, quality, and purity from P. penetrans 92 has so far impeded attempts to obtain a high quality genomic assembly. An assembly of 20,360bp 93 with an N50 of 949bp (GCA_000190395. were transferred to a clean 1.5ml LoBind Eppendorf (Sigma) and washed three times in 1ml of HPLC 128 water containing Triton X-100 0.5%. Washed females were burst with a micropestle in a clean 1.5ml 129 LoBind Eppendorf (Sigma), and the contents subjected to a series of washes at room temperature: first 130 three times in 1ml HPLC water; second three times in 1ml 70% ethanol; and finally, once in a 500µl 131 0.05% sodium hypochlorite solution, before density selection on a sterile 1.25g/ml sucrose gradient. 132 All centrifugation steps were at 20817g for 15 minutes except spore pelleting after sodium 133 hypochlorite incubation which was 5 minutes. The resulting clean endospore suspensions were 134 inspected at 1000x magnification (Zeiss Axiosop). 135

136
Clean endospores were subjected to a 30-minute lysozyme digestion, spun to pellet, ground for 1 137 minute with a micropestle, re-suspended in 4µl scPBS, and then passed immediately into the Repli-g 138 whole genome amplification protocol for single cells (Qiagen). A 16hr isothermal amplification 139 protocol produced 15µg of genomic DNA. Amplified genomic material was visualised on a 0.5% 140 agarose gel, quantified with a Qubit hsDNA quantification kit (ThermoFisher), and assayed for 141 Pasteuria spp. specific 16S rRNA gene sequence using primers 39F and 1166R as previously 142 described [27]. The resultant library was submitted to Oslo Genomic Sequencing Centre for two runs 143 of PacBio SMRT cell sequencing. Legacy WGA Illumina data, from the same strain was included in 144 these analyses. The clean-up protocol for this material has been previously described [27]. The WGA, 145 debranching, S1 nuclease treatment, and Illumina library prep for this sample are described in 146 supplementary methods file 1. 147 (v4.9.6); and "clean" Illumina read sets were re-assembled using MIRA. This was repeated 154

Legacy Illumina assembly
iteratively, a total of 14 times, until no further improvements in assembly metrics were observed. 155

PacBio assembly 156
The PacBio sequence reads were trimmed and assembled initially using Canu (v1.5) [ an N50 of 2.26Mbp and a completeness score of 86%. Some coding sequences were predicted in one 197 version of the assembly but not in another, including lineage specific marker genes used by CheckM 198 for completeness scoring. A high number of contaminant and heterogenic markers were observed in 199 the legacy Illumina data assemblies; however, this was significantly reduced using the BlobTools 200 pipeline ( Fig. 1 and Fig 2). BLAST annotation of lineage specific marker genes returned by CheckM 201 within raw and cleaned Illumina assemblies returned with 73% and 74% of markers aligning to 202 Pelosinus spp. with an average identity of 93% and 92% respectively. Clostridium spp. returned as the 203 best hit in 14% of markers in both Illumina assemblies with an average identity of 89%. Although the 204 GSS also scored highly for contamination no high scoring BLAST hits indicating specific identifiable 205 contaminants were observable. 206 Contamination and heterogeneity were consistently lower in PacBio only assemblies; while 207 completeness was typically higher, except for raw Illumina assemblies whose completeness score was 208 inflated by contaminant markers. Hybrid assembly of the raw or BlobTools cleaned Illumina reads 209 with initial SMRT cell long reads offered a slight improvement on either Illumina assembly but a 210 significant decrease in the overall quality of the same PacBio data assembled alone. 211 Comparison with existing published genomic sequences revealed high identity alignment with our 212 PacBio assembly (Fig. 3a), although the coverage and length of alignments was often limited (Fig.  213 3b). Of the 2.4Mbp genome survey sequence (GSS) [17] 0.48Mbp aligned with our genome with 214 98.5% identity. Legacy Illumina data, which had been restricted to firmicute contigs using 215 the BlobTools pipeline, aligned with 99.4% identity to 0.77Mbp of our assembly. In contrast, 216 1.97Mbp of the legacy Illumina assembly aligned with 95% identity to the Pelosinus fermentans 217 genome. ANIm of the published P. nishizawae contigs aligned to only 286bp of both P. 218 penetrans PacBio assembly and GSS sequences with 88.5% identity. 219

220
Multiple marker gene phylogenetic analysis places P. penetrans within the Bacilli. Furthermore, 221 within the Bacilli P. penetrans is most closely related to Thermoactinomycetae (Fig. 4). 222 Pasteuria penetrans contained the most unique clusters both in absolute and relative terms compared 223 to firmicute genomes included in our analysis (Fig. 5a). Sporulation associated clusters showed much 224 higher conservation (Fig. 5b). Pasteuria penetrans contained predicted proteins which clustered with 225 Spo0F, Spo0B, and Spo0A from Bacillus species. Spo0A and Spo0F were also annotated by 226 BlastKOALA; Spo0B was not. No SinI or SinR domain containing proteins were predicted from P. 227 penetrans. 228 Of 3511 unique P. penetrans protein clusters 136 were annotated with transposase domains, 15 with 229 collagen triple helix domains, and 3223 were not annotated. An additional two transposase protein 230 clusters were shared by P. penetrans, B. thuringiensis, and T. vulgaris, giving a total transposase 231 cluster count of 138 in P. penetrans. The total number of transposase annotated clusters was 163 232 across all predicted proteomes. One P. penetrans protein functionally annotated with a collagen triple 233 helix repeat clustered with six proteins of B. thuringiensis and three proteins of C. difficile. 234

Metabolic modelling 235
Pasteuria penetrans showed a reduced metabolism relative to Thermoactinomyces vulgaris (Fig. 6), 236 returning 755 KEGG orthologues compared to 1871, representing a relative reduction of 59.6% in 237 components of well characterised pathways. The reduction of P. penetrans genome size is 238 approximately 30% relative to T. vulgaris. 239 When compared to the plant parasitic nematode symbiont Xiphinematobacter spp. and the Wolbachia 240 symbiont of the filarial parasite Brugia malayi (wBm), P. penetrans showed a comparative reduction 241 in pathways, each of these returning 572 and 545 KEGG orthologues respectively. 242 Pasteuria penetrans appears to possess a complete fatty acid biosynthesis pathway, although lacks the 243 fatty acid degradation pathway in its entirety. Both wBm and Xiphinematobacter spp. also lack this 244 pathway. Enzymes involved in glycolysis are absent up to and including the conversion of alpha-D-245 glucose 6-phosphate to beta-D-fructose 6-phosphate. Similarly, the pentose phosphate pathway 246 includes no glucose processing enzymes appearing to begin at β-D-fructose 6 phosphate and/or D-247 ribulose 5 phosphate. Pasteuria penetrans also possesses a partial chitin degradation pathway capable 248 of degrading chitin to chitobiose and N-acetyl D glucosamine. 249 Synthesis pathways for a significant majority of amino acids are absent except for Aspartate and 250 Glutamate. Conversion of glycine to serine and vice versa is predicted due to the presence of glyA. 251 The lysine biosynthesis pathway proceeds only as far as miso-diamelate which feeds directly into a 252 complete peptidoglycan synthesis pathway. Purine and pyrimidine biosynthesis pathways are present 253 but appear to be peripherally reduced. Several predicted proteases are also present. 254 ABC transporters carrying zinc, iron (II), manganese, phosphate, and branched chain amino acids are 255 present. An additional nucleotide binding ABC transporter implicated in cell division and/or salt 256 transport is also present. One component of an Iron complex transporter (FhuD) is predicted. From 257 this model, isoprenoid biosynthesis appears to proceed following the non-mevalonate pathway. No 258 pathways for the biosynthesis of siderophores were predicted from this assembly. None of the 259 components of a flagellar assembly were observed. 260 Sec-SRP and Twin arginine targeting (TAT) secretion pathways are predicted from KEGG 261 orthologies. We did not find evidence of orthologues to characterised toxins or virulence factors in the 262 P. penetrans genome. 263 A complete pathway for prokaryotic homologous recombination is predicted in our assembly. Base 264 excision and mismatch repair machinery also appears to be intact. Competence related proteins 265 ComEA and ComEC are predicted from KEGG orthologues. KinFin analysis also returned a putative 266 P. penetrans orthologue for ComEA as well as predicted proteins which clustered with competence 267 related proteins CinA and MecA from related firmicutes. 268 however, as this data has not been made available it is not possible to evaluate this assembly directly. 300

Characterisation of collagenous fibres
Our assembly is small with reference to free living bacilli but large in comparison to other bacteria 301 obligately associated with nematodes such as wBm (~1.1Mbp) [52] and Xiphinematobacter spp.

302
(~0.9Mbp) [53]. The completeness score of our assembly was high at 86% based on lineage specific 303 marker genes. Notably, the same lineage specific markers were not predicted in PacBio assemblies at 304 varying levels of coverage. This may indicate the interference of sequencing or amplification errors in 305 gene prediction. 306

307
Maximum likelihood phylogenetic analysis of core genes (Fig. 3)  Along with this set of BclA-like collagens, CotE is also predicted from our assembly where it is 397 possible that it might similarly be involved in a multi-component attachment process. The CotE 398 protein is also thought to be involved in the colonisation of the gut by C. difficile through C-terminal 399 binding and degradation of mucins [87]. Glycosylated mucins on the nematode cuticle are implicated 400 as the target in the 'Velcro' model of attachment [12]. notable as it has been suggested that electrostatic interactions may play an initial role in the 418 attachment process; the electrostatic potential of P. penetrans endospores having previously been 419 characterised as negative [93]. The predicted collagens in this assembly match very well with our 420 expectations of the molecular components of attachment based on experimental evidence to date. 421 However, further work is required to evaluate their role in this process, and to evaluate the Velcro-like 422 attachment model.    Table 1: BclA/C1q-like collagens identified in the P. penetrans genome with net charge from pdb2pqr, Sec/TM domain prediction from PREDTAT, and predicted binding sites in the globular C-terminal domains from the RaptorX server. Figure 7: Predicted structure of four BclA-like attachment candidate proteins recovered from the Pasteuria penetrans genome. Molecular structure left and corresponding electrostatic surface potential right. Protein structure was modelled in the RaptorX server and electrostatic potential was calculated using the pdbtopqr server and apbs (v1.5). Images were produced using the NGL viewer.