Exploration of polyyne biosynthetic gene cluster diversity in bacteria leads to the discovery of the Pseudomonas polyyne protegencin

Natural products that possess alkyne or polyyne moieties have been isolated from a variety of biological sources. In bacteria their biosynthesis has been defined, however, the distribution of polyyne biosynthetic gene clusters (BGCs), and their evolutionary relationship to alkyne biosynthesis, have not been addressed. We explored the distribution of alkyne biosynthesis gene cassettes throughout bacteria, revealing evidence of multiple horizontal gene transfer events. Investigating the evolutionary connection between alkyne and polyyne biosynthesis identified a monophyletic clade possessing a conserved seven-gene cassette for polyyne biosynthesis. Mapping the diversity of these conserved genes revealed a phylogenetic clade representing a polyyne BGC in Pseudomonas, pgn, and subsequent pathway mutagenesis and analytical chemistry characterised the associated metabolite, protegencin. In addition to unifying and expanding our knowledge of polyyne diversity, our results show that alkyne and polyyne biosynthetic gene clusters are promiscuous within bacteria. Systematic mapping of conserved biosynthetic genes across bacterial genomic diversity has proven to be a successful method for discovering natural products.

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint

Detection of biosynthetic gene clusters, phylogenetic, and phylogenomic analyses 124
A BLASTp [20] search of NCBI genomes, excluding Burkholderia (taxid:32008) and a local 125 database of Burkholderia assemblies (downloaded genomes and genomes assembled from 126 publicly available Illumina read data) was performed with the cepacin homologue (CcnK [4]) of 127 desaturase JamB as the query. The top 5,000 genus and species hits from NCBI were de-128 replicated, and their associated genomes downloaded and combined with the local collection. 129 The flanking 30 kbp of protein hits with an E-value less than 1.00e -50 was extracted, and 130 encoded protein domains predicted using Interproscan v5.38-76.0 [21]. Each sequence was 131 screened for the presence of three domains corresponding to the presence of a fatty acyl-AMP 132 ligase (IPR040097), fatty acid desaturase (IPR005804), and acyl carrier protein (IPR009081). 133 The presence of these three homologues were considered evidence of alkyne biosynthesis 134 potential. These sequence fragments were further screened for the presence of four additional 135 protein homologues: two desaturases, a thioesterase, and a rubredoxin protein via BLASTp, to 136 determine the potential for polyyne biosynthesis. Protein and nucleotide alignments were 137 generated using MAFFT v7.455 [22], with the exception of core-gene alignments that were 138 generated with Roary v3. 13 CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint

Mutagenesis of polyyne biosynthetic cluster 148
A range of in-frame, gene replacement, and insertional inactivation mutants were constructed in 149 P. protegens and T. caryophylli (Table S1)  were grown in LB medium at 30 °C overnight, and then inoculated onto PEM agar plates. After 155 incubation at 22 °C for 3 d, the medium in single plate was cut into small pieces after removing 156 cells and extracted with 4 ml of ethyl acetate (EtOAc) for 2 h, followed by rotary evaporation and 157 re-dissolving in 1 ml of 50 % acetonitrile in water. The crude extracts were then analysed by 158 UHPLC-ESI-Q-TOF-MS after centrifugation to remove debris. UHPLC-ESI-Q-TOF-MS analysis 159 were performed using a Dionex UltiMate 3000 UHPLC connected to a Zorbax Eclipse Plus C-18 160 column (100 × 2.1 mm, 1.8 μm) coupled to a Bruker Compact mass spectrometer. Mobile 161 phases consisted of water and acetonitrile (MeCN), each supplemented with 0.1% formic acid. 162 After 5 min of isocratic run at 5% MeCN, a gradient of 5% to 100% MeCN in 12 min was 163 employed with flow rate 0.2 ml min −1 , followed by keeping constant for 5 min and then returning 164 to initial conditions within 3 min. The mass spectrometer was operated in positive-ion or 165 negative-ion mode with a scan range of 50-3,000 m/z. Source conditions were: end-plate offset 166 at −500 V, capillary at −4,500 V, nebulizer gas (N2) at 1.6 bar, dry gas (N2) at 81 min −1 and dry 167

Distribution of alkyne biosynthesis and emergence of polyyne biosynthesis 192
Phylogenetic trees based on 4990 representatives of the alkyne biosynthetic cassette proteins: 193 fatty acyl-AMP ligase JamA, fatty acid desaturase JamB, and acyl carrier protein JamC, were 194 constructed to assess the distribution of alkyne biosynthesis in bacteria (Fig. 2). All three 195 phylogenies possessed the same broad topological structure, with most variation occurring 196 within the central deep branches (Fig. 2). Phylogenies were also constructed based on the 197 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint jamABC nucleotide sequences, which exhibited the same topological pattern to the protein-198 based phylogenies (Fig. S1). 199

200
The ability to synthesise the alkyne moiety was widely distributed across Proteobacteria 201 compared to other phyla, with representatives in most major phylogenetic clades potentially 202 representing multiple acquisition events into the phylum ( Fig. 2 and Fig. S2). Within the 203 Proteobacteria the alkyne biosynthesis cassette was predominantly found in 204 Betaproteobacteria, but also included representatives of the classes Alpha-, Delta-and 205 Gammaproteobacteria (Fig. S2) caryoynencin, and ergoynes, identified seven common genes (Fig. 3). In addition to the three 219 genes of the alkyne biosynthetic cassette, jamABC [13] two additional fatty acid desaturases, a 220 thioesterase, and rubredoxin-encoding genes were shared between the BGCs (Fig. 3). Using 221 this knowledge of conserved genes, we screened the flanking DNA sequence of the alkyne 222 jamABC cassettes for the presence of the remaining four genes. This revealed a monophyletic 223 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint clade in the alkyne phylogenies ( Fig. 2 and Fig. S2) where the 779 corresponding genomes 224 possessed the conserved polyyne gene cassette (Fig. 3), with a few exceptions. Three 225 discrepancies were observed within the monophyletic polyyne clade: B. gladioli strain 3848s-5 226 and three Streptomyces strains appeared to lack the co-localised thioesterase and rubredoxin 227 genes with the remaining polyyne core biosynthetic genes, but manual inspection of these 228 genomes revealed the BGCs were split across two contigs. A subset of 10 Actinobacteria 229 genomes appeared to have replaced the thioesterase-and rubredoxin-encoding genes with a 230 gene encoding a cytochrome P450 protein. These 10 genomes represented three genera and 231 were confined to a single sub-clade in the monophyletic polyyne clade. The final discrepancy 232 included two representatives of the family Mycobacteriaceae that lacked the rubredoxin gene. 233

234
To investigate the diversity of the monophyletic clade a separate phylogeny was constructed 235 based on one of the polyyne-associated desaturase proteins (Fig. 4). This phylogeny was 236 rooted using the basal branches of the clade of interest from both JamA and JamB phylogenies 237 ( Fig. 2): a Gammaproteobacteria sub-clade and Betaproteobacteria sub-clade. Within the 238 resulting phylogeny we defined five major clades representing three Betaproteobacteria clades, 239 one Gammaproteobacteria clade, and an Actinobacteria clade (Fig. 4). Each of the four 240 previously characterised polyynes corresponded to a different clade, with collimonins, 241 caryoynencin and cepacins localised to the three distinct Betaproteobacteria clades (Fig. 4). 242 The ergoynes, synthesised by G. sunshinyii, were in the Gammaproteobacteria clade, but with 243 deep-branching separating G. sunshinyii from the remainder of the clade members (Fig. 4). 244 Each Proteobacteria clade was dominated by a single genus and mainly structured with 245 relatively shallow branching. In comparison, the Actinobacteria clade possessed deep branching 246 and contained representatives of seven genera including Micromonospora, Actinomadura, and 247 Rhodococcus, but was dominated by Streptomyces species. This analysis identified the cepacin 248 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made As such, we sought to investigate the potential of an uncharacterised bacterial polyyne in 257 Pseudomonas (Fig. 5a), focussing on Pseudomonas protegens (formerly P. fluorescens) strains 258 Pf-5 and CHA0 as our model bacteria (Table S1). HPLC analysis of these two strains revealed (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint protegencin was isolated as a brownish, amorphous powder. Its 1 H, 13 C, COSY, HSQC, and 275 HMBC spectra were acquired in DMSO-d6 (Table S2 and Fig. S4c-g). The 1 H NMR 276 spectroscopic data displayed two olefinic protons (δH 6.65, 1H, dt, J = 16.0, 6.5, H-9; δH 5.79, 277 1H, d, J = 16.0 H-10), a methine proton (δH 4.06, 1H, H-18), and seven pairs of methylene 278 protons. The 13 C NMR and HSQC spectroscopic data (Table S2)  COSY and HMBC spectroscopic data analysis ( Fig. S4f-g). The HMBC correlations of H-9/C-284 11, C-8, and C-7, along with the couplings of H-10/C-9, C-11, C-12, C-8, and C-13 confirmed a 285 double bond located at C-9/C-10 next to the polyyne scaffold, as observed in caryoynencin. The 286 double bond at C-7/C-8 and hydroxyl group at C-6 in caryoynencin were missing in 287 protengencin, which was clarified by HMBC correlations from a methylene (H2-8) to two methine 288 carbons (C-9 and C-10) and two methylene carbons (C-6 and C-7), and from a methylene (H2-4) 289 to two methylene carbons (C-6 and C-5), as well as COSY couplings of H2-8/H-9 and H2-7. The 290 other COSY correlations of H2-3/H2-4 and H2-2, and of H2-4/H2-5, together with HMBC 291 correlations of H-2/C-1, C-3, and C-4, and of H-3/C-1, C-2, C-4, and C-5, confirmed the 292 structure of the saturated region of this metabolite. Therefore, the structure of protegencin was 293 elucidated as shown in Fig. 5c as a novel polyyne natural product. 294 295

Distribution of protegencin (pgn) BGC within Pseudomonas 296
Following the discovery of the previously uncharacterised polyyne metabolite, protegencin, we 297 sought to fully understand the species distribution of the pgn locus. The Pseudomonas 298 branches of the Gammaproteobacteria clade represented 67 Pseudomonas genomes. 299 Subsequent average nucleotide identity analysis (ANI) of these genomes indicated the 300 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made protegens and Pseudomonas asturiensis, and four unnamed species. The relatedness of these 303 species to one-another is highlighted in the core-gene-based phylogeny (Fig. 5d)

A conserved desaturase triad is essential for polyyne formation 313
The high conservation of the three desaturase genes and the thioesterase gene across all 314 orthologous polyyne BGCs is remarkable (Fig. 3). To identify their roles, we performed targeted 315 gene replacements. Specifically, we individually replaced the desaturase and thioesterase 316 genes with a kanamycin and apramycin resistance cassette in the P. protegens pgn and T. 317 caryophylli cay BGCs, respectively ( Fig. 5e and Fig. S5). Sequence analyses indicated that 318 pairs of desaturase genes (pgnE/cayB and pgnF/cayC) would have similar functions. The 319 deduced gene product of pgnH codes for a didomain enzyme with desaturase and thioesterase 320 functions that corresponds to cayE and cayF, respectively. The metabolic profiles of the mutant 321 strains were compared by HPLC (220-400 nm) with those of the wild type strains, with or 322 without the empty pGL42a or pJET1.2/blunt vector used during mutagenesis ( Fig. 5e and Fig.  323   S5). Whereas P. protegens Pf-5 (with or without the empty vector) produces protegencin, in the 324 ΔpgnE-Kan R , ΔpgnF-Kan R , and ΔpgnH-Kan R mutant strains no polyyne precursor could be 325 identified (Fig. 5e). Deletions of the desaturase genes cayB, cayC, and cayE, and the 326 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint thioesterase gene cayF in T. caryophylli abolished the production of caryoynencin. The wild type 327 (with or without an empty vector) generates the 7E/Z-isomers of caryoynencin, but the mutant 328 strains (ΔcayB-Apr R , ΔcayC-Apr R , ΔcayE-Apr R , and ΔcayF-Apr R ) neither produce polyynes nor 329 pathway intermediates (Fig. S5). These data indicate that the three desaturases and the 330 thioesterase synergise in the production of polyynes. Interestingly, the same multienzyme 331 system that gives rise to a tetrayne in the protegencin and caryoynencin BGCs, appears to form 332 three triple bonds in the collimonin pathway, and 2 triple bonds and an allene moiety in cepacin 333 pathway (Fig. 1). 334 335

Highly transmissible alkyne and polyyne cassettes 337
Our results identify a single point of emergence of polyyne biosynthesis within bacteria and 338 demarcate its evolution from alkyne biosynthesis (Fig. 2). The basal positioning of 339 Proteobacteria within the polyyne phylogeny hints at a potential origin of the biosynthetic ability 340 (Fig. 4)

Evidence of uncharacterised polyyne in P. protegens 360
We identified and characterised a novel Pseudomonas polyyne metabolite produced by the 361 widely studied P. protegens strains Pf-5 and CHA0 (Table S1). P. protegens (formerly P.  (Fig. 2), and the metabolic product was not identified.

572
ΔpgnD mutant (bottom). c) Structure of protegencin, the identity of which was confirmed by a combination 573 of high resolution mass spectrometry and NMR spectroscopy (see Table S2 and Fig. S4a-4g). d) Core   . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 6, 2021. ; https://doi.org/10.1101/2021.03.05.433886 doi: bioRxiv preprint