1. Abstract
Cacaoidin is the first member of the new lanthidin RiPP family, a lanthipeptide produced by the strain Streptomyces cacaoi CA-170360 with unprecedented features such as an unusually high number of D-amino acids, a double methylation in the N-terminal alanine and a tyrosine residue glycosylated with a disaccharide. In this work, we describe the complete identification, cloning and heterologous expression of the cacaoidin biosynthetic gene cluster, which shows unique RiPP genes.
2. Introduction
Actinomycetes are an extremely diverse group of Gram-positive, filamentous bacteria with high GC content genomes (1) considered as one of the most prolific sources for the discovery of new natural products (2,3). Among all the actinomycetes, the genus Streptomyces produces over 70-80% of the secondary metabolites with described therapeutic properties (4).
The increasing number of sequenced genomes has revealed that actinomycetes carry the genetic potential to produce many more secondary metabolites than those detected under laboratory conditions (5). The development of bioinformatic tools to identify the presence of new secondary metabolite Biosynthetic Gene Clusters (BGCs), such as antiSMASH (6) or MiBIG (7) has permitted the development of targeted genome mining strategies directed at specific families of compounds (8).
Ribosomally synthesized and Post-translationally modified Peptides (RiPPs) are a group of secondary metabolites with a large structural diversity. Most of these compounds are synthesized as a longer precursor peptide, containing an N-terminal leader peptide that usually guides secretion and is excised from the C-terminal core peptide, which finally becomes the mature RiPP (9) after undergoing a broad diversity of post-translationally modifications (PTMs).
We have recently described the discovery of the antibiotic cacaoidin, the first reported member of lanthidins, a new RIPP with unprecedented structural characteristics not found in other lanthipeptides (10) (Figure 1). This 23-amino acid molecule, produced by Streptomyces cacaoi CA-170360, contains several PTMs, some of them shared with other lanthipeptides. Cacaoidin presents a C-terminal amino acid S-[(Z)-2-aminovinyl-3-methyl]-D-cysteine (AviMeCys) by oxidative decarboxylation of the C-terminal cysteine. AviMeCys is formed both in lanthipeptides and linaridins (11) by LanD (12) or LinD (13) enzymesrespectively, belonging to the HFCD (Homo-oligomeric Flavin-containing Cysteine Decarboxylase) protein family (12). Cacaoidin also shows a lanthionine (Lan) ring, another characteristic of lanthipeptides (14). These thioether cross-links involve the dehydration of Ser and Thr residues to 2,3-didehydroalanine (Dha) and 2,3-didehydrobutyrine (Dhb), respectively, followed by the addition of the Cys thiol to the unsaturated amino acid. None of the linaridins described to date contains lanthionine bridges, although some show Dhb; however, no obvious homologues of lanthipeptide dehydratases are present in their BGCs (15). Cacoidin presents an unprecedented N, N, dimethyl lanthionine (NMe2Lan), typical of linaridins, a RIPP family that lacks lanthionines. The N, N-dimethylation is introduced by α-N-methyltransferases homologous to CypM (16, 17), but is not found in lanthipeptides BGCs. These structural features common to both families of lanthipeptides and linaridins support the proposal of cacoidin as the first reported member of the lanthidins (10).
Structure of cacaoidin
Cacaoidin also presents other unusual structural features, such as a high number of D-amino acids including D-Abu and an O-glycosylated tyrosine residue carrying a non-previously reported disaccharide formed by α-L-rhamnose and β-L-6-deoxy-gulose. Cacaoidin shows potent antibacterial activity against MRSA (Methicillin resistant Staphylococcus aureus) (MIC 0.5 µg/mL) and moderate activity against a clinical isolate of Clostridium difficile (MIC 4 µg/mL) (10).
In this work, we present the identification and analysis of the cacaoidin BGC from the genome analysis of Streptomyces cacaoi CA-170360, showing its distinct gene cluster organization. We show that the cacaoidin BGC contains all the genes required for the antibiotic biosynthesis that was successfully produced by heterologous expression.
3. Results and discussion
3.1. Sequencing of S. cacaoi CA-170360 genome and identification of cacaoidin BGC
CA-170360 genome sequence was obtained with a combination of PacBio and Illumina approaches. De novo PacBio sequencing of CA-170360 genome provided 2 contigs of 5,971,081 bp and 2,704,105 bp, which were used as reference to map the 163 contigs obtained through Illumina sequencing, and used to correct PacBio frameshifts caused by the high GC content (73.1 %).
In order to identify the BGC responsible for the production of cacaoidin, the genome was analyzed with antiSMASH (6), BAGEL4 (18) and PRISM (19), Many BGCs were predicted, but none of these bioinformatic tools could predict the BGC responsible of cacaoidin, biosynthesis suggesting that the discovery of novel bioactive NPs by genome mining is still a challenge.
The C-terminal sequence of cacaoidin (Thr-Ala-Ser-Trp-Gly-Cys) was used as the query in a tBLASTn using the whole genome sequence to search for the gene encoding this peptide. A 162 bp Open Reading Frame (ORF) was found, which helped to elucidate the final structure of the peptide (10). Cacaoidin structural gene caoA encodes a 23-amino acid C-terminal core peptide (SSAPCTIYASVSASISATASWGC) following a predicted 30-amino acid N-terminal leader peptide (MGEVVEMVAGFDTYADVEELNQIAVGEAPE). Neither the leader nor the core peptide caoA sequences showed high sequence similarity with any other lanthipeptide or linaridin (Supporting Figure 1).
Considering the final structure of cacaoidin (10) (Figure 1) and the BLAST analysis of the ORFs located up- and downstream of caoA, we identified a putative 30 Kb BGC (cao cluster) containing 27 ORFs that were associated to the biosynthesis (Figure 2, Supporting Table 2, Supporting Table 3). Interestingly, no homologous genes of known dehydratases or cyclases commonly found in the four current classes of lanthipeptides nor in the class of linaridins could be identified in this region.
Schematic representation of the BGC of cacaoidin, where caoA codes for the precursor peptide. The sequences of the leader and core peptides of cacaoidin are shown.
The BLAST analysis (Supporting Table 2) and the secondary structure given by HHpred (Supporting Table 3) of each ORF led us to putatively assign them a role in the PTMs of cacaoidin core peptide involving the AviMeCys ring and lanthionine formation, the terminal N, N di-methylation, the incorporation of D-amino acids, the disaccharide biosynthesis and the tyrosine glycosylation.
The BGC encodes a putative cypemycin decarboxylase CypD homologue (CaoD) containing a conserved phosphopantothenoylcysteine (PPC) synthetase/decarboxylase domain. CaoD has little sequence similarity with CypD and LanD enzymes, both belonging to the HFCD protein family, and involved in the catalysis of the oxidative decarboxylation of the C-terminal cysteine residue in the presence of a flavin cofactor (20, 21). The presence of the PPC domain support a potential role in the oxidative decarboxylation and, consequently, it can be postulated that CaoD may be involved in the formation of the AviMeCys ring.
The formation of lanthionine rings is accomplished by different dehydratases and cyclases depending on the lanthipeptide class (I-IV) (15). In class I, a dehydratase (LanB) generates the Dha and Dhb and a cyclase (LanC) adds the Cys thiol. In class II, a single modification enzyme (LanM) is involved that contains an N-terminal dehydratase domain and a C-terminal LanC-like domain. In classes III and IV lanthionine rings are produced also by a single enzyme, called LanKC for class III and LanL for class IV. Both enzymes show an N-terminal phospho-Ser/phosphor-Thr lyase domain, a central kinase-like domain and a C-terminal cyclase domain which contains Zn-binding ligands only in LanL (15).
Surprisingly none of the ORFs present in the cao cluster showed any homology with LanC, LanM, LanKC or LanL proteins. The BLAST analysis of Cao7, which was identified as a hypothetical protein, showed some degree of homology with the N-terminal sequence of a LanC-like protein from Raineyella antarctica (WP_139283243.1), but Cao7 did not contain the characteristic conserved cyclase domain. Both proteins show a HopA1 conserved domain (PFAM17914), that has been described in the HopA1 effector protein from Pseudomonas syringae (22), that was shown to directly bind the Enhanced Disease Susceptibility 1 (EDS1) complex in Arabidopsis thaliana, activating the immune response signaling pathway. Future research is still needed to determine the function of this protein that can only be tentatively proposed as potential new type of lanthionine synthetase.
The N-terminal Ala dimethylation of cypemycin, the prototypical member of linaridins (11), is carried out by the S-adenosylmethionine (SAM)-dependent methyltransferase CypM (13, 16). No CypM homologues have been found in the genome of the producing strain. Within the cao cluster, cao4 encodes a putative O-methyltransferase containing the conserved Methyltransf_2 domain, also belonging to the family of SAM-dependent methyltransferases. Many class I lanthipeptide clusters from actinobacteria contain an O-methyltransferase, generically known as LanS. Two types of LanS enzymes have been described: LanSA, which incorporates β-amino acid isoaspartate (23) and LanSB, which methylates the C-terminal carboxylate of a RiPP precursor (24). Cao4 shows very low homology with both types of LanS proteins. Since cacaoidin does not contain isoaspartate nor a C-terminal methylation, the role of Cao4 in the N, N-methylation is currently under study.
D-Amino acids provide a wide variety of properties to lanthipeptides, such as resistance to proteolysis, induction of bioactivity or structural conformation (25). However, only L-amino acids can be added by the ribosomal machinery, so the way to introduce D-stereocenters into lanthipeptides is modifying L-Ser and L-Thr, leading to Dha and Dhb, which will be subjected to a diastereoselective hydrogenation, to finally incorporate D-Ala and D-Abu, respectively (15, 24). This reaction is carried out by dehydrogenases generically called LanJ (15), which are divided in two classes, namely the zinc-dependent dehydrogenases (LanJA) and the flavin-dependent dehydrogenases (LanJB). LanJB is able to reduce both Dha and Dhb, whereas LanJA can only hydrogenate Dha. To date, only two LanJB enzymes have been characterized, CrnJB and BsjJB, involved in the biosynthesis of carnolysin (26) and bicereucin (27), respectively. Recently, another flavin-dependent oxidoreductase (LahJB) has been described in the putative lanthipeptide biosynthetic gene cluster lah (24).
Within the cao BGC, the protein Cao12 shows homology with LLM class flavin-dependent oxidoreductases and might be involved in the incorporation of D-amino acids.
The cacaoidin disaccharide has not previously reported and is formed by α-L-rhamnose and β-L-6-deoxy-gulose. Four proteins are required for the synthesis of α-L-rhamnose: a Glucose-1-phosphate thymidylyltransferase (RmlA), a dTDP-D-glucose 4,6-dehydratase (RmlB), a dTDP-4-keto-6-deoxy-D-glucose 3,5-epimerase (RmlC) and a dTDP-4-keto-6-deoxy-L-mannose reductase (RmlD), although the corresponding genes do not have to be necessarily clustered (Figure 3) (28). The cao BGC only contains three of the four genes rmlA, rmlB and rmlD. Nevertheless, a BLAST search of RmlC against CA-170360 whole genome sequence also shows the presence of a rmlC gene and additional rmlA, rmlB and rmlD genes outside the cacaoidin cluster.
Schematic presentation of the biosynthesis of dTDP-L-Rhamnose from D-Glucose
Bleomycin, tallysomycin and zorbamycin are antitumor antibiotics which incorporate NDP-L-gulose or NDP-6-deoxy-L-gulose to their structures and their biosynthesis was used as reference to look for the presence of similar proteins being encoded in our genome. The sugar biosynthesis in the pathways of these compounds involves four classes of enzymes enzymes (Figure 4): a NTP-sugar synthase (BlmC/TlmC/ZbmC), a sugar epimerase (BlmG/TlmG), a GDP-mannose-4,6-dehydratase (ZbmL) and aNAD-dependent sugar epimerase (ZbmG) (29). Despite no homologues of these genes were found in the cacaoidin BGC, a BLAST search in the total genome sequence of CA-170360 permitted to identify some protein homologues. These include a D-glycero-beta-D-manno-heptose 1-phosphate adenylyltransferase homologous to BlmC/TlmC/ZbmC (48% similarity); a NAD-dependent epimerase/dehydratase homologous to BlmE/TlmE (38.7% similarity); a GDP-mannose 4,6-dehydratase with 62% similarity to ZbmL; and a dTDP-glucose 4,6-dehydratase with 34% similarity with ZbmG. However, as none of these homologue proteins are found associated within the same cluster, no conclusions can be made for the β-L-6-deoxy-gulose biosynthesis.
Proposed pathway for the β-L-6-deoxy-gulose sugar biosynthesis for the BLM, TLM and ZMB compounds.
As one of its unusual structural features of cacaoidin, the disaccharide α-L-rhamnose-β-L-6-deoxy-gulose is O-linked to the aromatic ring of the tyrosine residue. While asparagine N-glycosylation and serine, threonine or hydroxyproline O-glycosylation have been reported in many natural glycopeptides (30), the O-glycosylation of tyrosine is not common. Up to date, the only natural products undergoing a tyrosine O-glycosylation are the lipoglycopeptide antibiotics mannopeptimycins, produced by Streptomyces hygroscopicus, which contain an O-linked di-mannose (31). In prokaryotes, the O-glycosylation of tyrosine residues has been also reported in the S-layer of the cell envelope of Paenibacillus alvei, Thermoanaerobacter thermohydrosulfuricus and Thermoanaerobacterium thermosaccharolyticum strains. In P. alvei CCM 2051T, a polymeric branched polysaccharide is O-glycosidically linked via an adaptor to specific tyrosine residues of the S-layer protein SpaA by the O-oligosaccharyl:protein transferase WsfB (32). This protein is encoded within the slg cluster, that carries the genes necessary for the biosynthesis of this glycan chain. The cao cluster lacks an homologue of WsfB, so we cannot propose a candidate that O-glycosylates the tyrosine residue of cacaoidin.
The cao cluster contains three glycosyltransferases (GTs) (Cao8, Cao16, Cao24) belonging to two families of glycosyltransferases, GT-2 and GT-4. Cao8 and Cao16 belong to the family GT-2, that contain a GT_2_WfgS_like domain, involved in O-antigen biosynthesis. Cao8 and Cao16 show 42% identity (54% similarity) and 43% identity (52% similarity), respectively, with an UDP-Glc:alpha-D-GlcNAc-diphosphoundecaprenol beta-1,3-glucosyltransferase WfgD from Streptomyces sp. F-1, which catalyzes the addition of Glc, the second sugar moiety of the O152-antigen repeating unit, to GlcNAc-pyrophosphate-undecaprenol (33). Cao24 belong to the family GT-4 that has a GT4_GtfA-like domain and a conserved RfaB domain, involved in the cell wall and membrane biosynthesis (34).
Despite the presence of three GTs in the cacaoidin BGC, only two amino sugars are detected in the structure. The three GTs could be involved in the glycosylation as it has already been proposed for other clusters with more GT genes than amino sugars linked in the compound and proposed to work together and to be required to achieve efficient glycosylation. The biosynthesis of PM100117/PM100118 (35), saquayamycins (36) and sipanmycin (37) are some of these examples. On the basis of the absolute configurations of the cacaoidin sugar moieties, it has been proposed that Cao8 and Cao16 might work cooperatively to attach the α-L-rhamnose unit, while Cao24 would incorporate the β-L-6-deoxygulose unit (10).
Two rhamnosyltransferases (WsfF and WsfG) have been identified in the slg cluster of P. alvei as the responsible of the attatchment of the L-rhamnose to the tyrosine residue. In the case of mannopeptimycins, two peptide mannosyltransferases (MppH and MppI) would O-glycosylate the tyrosine residue. However, in all these cases low homologies were found between these enzymes and the glycosyltransferases (GTs) present in the cacaoidin cluster. Further studies are needed to confirm the role of each GT in cacaoidin biosynthesis.
Processing of leader peptide is another key step in the post-translational modification impacting in the producer immunity and transport. The N-terminal leader peptide plays a role in targeting the unmodified precursor by the posttranslational modifying enzymes, in the secretion of the peptide and in keeping the modified pre-peptide inactive (38). The enzymes responsible for the removal of the leader peptide depend on the type of lanthipeptide. Class I lanthipeptides are exported by the ABC transporter LanT and their leader peptides are cleaved by the serine protease LanP (14). In class II, both secretion and cleavage are performed by a unique enzyme with a conserved N-terminal cysteine protease domain, called LanTP (39).
In the cacaoidin cluster, Cao14 encodes a putative Zn-dependent peptidase belonging to the M16 peptidase family that may be involved in the leader peptide processing. Recently, it has been reported that the leader peptide of the class III lanthipeptide NAI-112 (40) is removed by a bifunctional Zn-dependent M1-class metalloprotease, AplP, that first cleaves the N-terminal segment of the leader peptide as an endopeptidase, and subsequently removes the remaining leader sequence through its aminopeptidase activity (41). Leader peptide removal in class III lanthipeptides does not have a general mechanism. In fact, in labyrinthopeptins and curvopeptins, an endopeptidase is involved in the partial N-terminal segment removal of the leader peptide and the remaining overhang is progressively trimmed off by an additional aminopeptidase (42). In other cases, such as flavipeptin (43), a designated prolyl oligopeptidase (POP) is involved in the cleavage of the leader peptide of modified precursor peptides at the C-terminal of a Pro residue, although it is not clear if a second aminopeptidase is needed to complete the leader peptide removal. Class IV lanthipeptides often lack a designated protease to cleave the leader peptide, but it has been reported that some of them might also use AplP homologs (41). When AplP and Cao14 were compared, both proteins showed a low homology degree (17.2% identity, 25.4% similarity). Future research will clarify if Cao14 is the cacaoidin leader peptidase and if it has a dual function as endo- and aminopeptidase.
Besides, three ABC transporters were found in the pathway (Cao11, Cao18 and Cao19) that might be responsible of the export and self-resistance of cacaoidin. In addition to the active removal of the leader sequence coupled to active transport, two non-universal immunity strategies have been adopted by strains producing class I and II lanthipeptides. This active transport is mediated by the ABC type transport system LanFEG and sequestering the mature lanthipeptide in the extracellular environment by LanI immunity proteins (44). A self-immunity mechanism has not been deeply studiedfor class III and IV lanthipeptides but, as in the case of cacaoidin, it has also been proposed that ABC transporters could play a role in the self-resistance of the producer strains (45).
Gene expression in the cacaoidin cluster seems to be under the control of different classes of regulators. Five transcriptional regulators are found involving one LuxR (CaoR1), two HTH-type XRE (CaoR2 and CaoR3), one TetR (CaoR4) and one SARP (CaoR5) regulators. XRE and TetR have been described as transcriptional repressors (46, 47) while LuxR and SARP have been described as transcriptional activators (48, 49). Further studied of the regulation of lanthipeptide biosynthesis will clarify their role in the production of the antibiotic.
Among the remaining eight genes identified in the cao cluster, six of the proteins (Cao7, Cao14, Cao17, Cao21, Cao 25 and Cao26) do not have any defined functions. Cao9 is a phosphotransferase containing a conserved APH domain, which confers resistance to various aminoglycosides (50). It has been reported that some phosphotransferases may provide self-resistance against aminoglycosides, as shown for streptomycin 6-phosphotransferase (51) or CapP, involved in the resistance to capuramycin antibiotics (52). The role of Cao9 in the biosynthesis cluster of cacaoidin is currently unknown. A protein belonging to the START/RHO_alpha_C/PITP/Bet_v1/CoxG/CalC (SRPBCC) superfamily is also present in the cluster (Cao23). SRPBCC proteins share α/β helix-grip-fold structures and have a deep hydrophobic ligand-binding pocket (53, 54). This superfamily contains aromatase/cyclase (ARO/CYC) domains of proteins such as tetracenomycin from Streptomyces glaucescens (55), and the SRPBCC domains of Streptococcus mutans Smu.440 and related proteins (56).
The HHpred analysis of each ORF was also used for the detection of RiPP precursor peptide Recognition Elements (RREs) (57). These RRE are structurally similar conserved precursor peptide-binding domain present in the majority of known prokaryotic RiPP modifying enzymes and are usually responsible for the leader peptide recognition (57). These RREs are related to the small peptide chaperone PqqD, involved in the biosynthesis of pyrroloquinoline quinone (PQQ) (58), which reportedly binds to PqqA (precursor peptide) to do its function (59). In this analysis, we used HHPred to search PqqD-like domains in the putative biosynthetic proteins from cao gene cluster, even those with unknown functions. In fact, the identification of an RRE within the protease StmE, involved in lasso peptide streptomonomicin (STM) biosynthesis, and an “ocin_ThiF_like” cyclodehydratase (TOMM F) protein from TOMM (Thiazole/Oxazole-Modified Microcin) biosynthetic gene clusters, allowed to assign its non-previously proposed function. However, no RREs were found in the Cao proteins, suggesting the possibility of alternative leader peptide recognition domains that are unrelated to the already known RREs (57). As homology detection algorithms will become more accurate and more sequences will become available, additional RREs will be found.
3.2. Cloning and heterologous expression of cacaoidin BGC
The strain S. cacaoi CA-170360 is reluctant to genetic manipulation, limiting the obtention of knockdown mutants to confirm the involvement of the cao gene cluster in the biosynthesis of cacaoidin. To confirm that the cao cluster was responsible of antibiotic biosynthesis, we cloned and heterologously expressed the cacaoidin BGC in the genetically amenable host Streptomyces albus J1074.
We followed the CATCH method to clone a 40 Kb region containing the cao BGC into the pCAP01 vector (60), yielding pCAO. pCAO was introduced into NEB-10-beta E. coli ET12567 cells by electroporation. A triparental conjugation was carried out between E. coli ET12567/pCAO, E. coli ET12567/pUB307 and S. albus J1074 spores (61). Five positive transconjugants, alongside the negative control (S. albus J1074/pCAP01) and the wild-type strain CA-170360, were grown in R2YE for 14 days at 28°C to confirm the production of the targeted antibiotic. After acetone extraction of the cultures, organic solvent was evaporated, and the aqueous extracts in 20% DMSO were analyzed by LC-HRESI-TOF. The analysis of the extracts from pCAO transconjugants confirmed the presence of cacaoidin as peaks at 3.35 minutes were detected, coincident with the retention time of elution of cacaoidin in the wild type strain and purified cacaoidin standards. The perfect correlation between the UV spectrum, exact mass and isotopic distribution of cacaoidin standards and the components isolated from the transconjugants S. albus J1074/pCAO undoubtedly demonstrated that they correspond to cacaoidin (Supporting Figure 2). These preliminary results clearly confirm that the cao BGC cloned in pCAO is enough to ensure the biosynthesis of cacaoidin.
3.3. Comparison with other clusters
To study if more lanthidin-encoding clusters can be found within actinomycetes, a BLAST search against the NCBI whole genome shotgun sequences database was performed, and clusters with high degree of homology to cacaoidin were found in the strains Streptomyces cacaoi subsp. cacaoi strain NRRL B-1220 (MUBL01000486), Streptomyces sp. NRRL F-5053 (JOHT01000009), Streptomyces sp. NRRL S-1868 (JOGD01000003), Streptomyces cacaoi subsp. cacaoi NBRC 12748 (BJMM01000002.1) and Streptomyces cacaoi subsp. cacaoi OABC16 (VSKT010000024) (Figure 5, Supporting Table 4). An alignment of the precursor peptide of the cacaoidin in all homologous clusters showed that no variations in the protein sequence were found (Supporting Figure 3). No other cacaoidin-related peptides or pathways were found in the databases, indicating that the cacaoidin BGC is very conserved. A phylogenetic tree generated using neighbor-joining method and corrected with the Jukes and Cantor algorithm (62, 63) showed the close relatedness of strain Streptomyces cacaoi CA-170360 with the strains that also contain the cao cluster, which was highly supported by the bootstrap values (Supporting Figure 4). Moreover, when the 16S rDNA sequences of the strains harboring the cacaoidin BGC were analyzed in EzBiocloud, all of them were identified as Streptomyces cacaoi (data not shown), indicating that the cacaoidin BGC is so far limited to this specific species, with no identifiable orthologs in other species. Several genome comparative studies have found strain-specific BGCs in some species of Streptomyces, reflecting that chemical novelty can be found at the strain level and that the analysis of the genomes of closely related strains constitutes a promising approach for the identification of novel BGCs (64, 65).
Schematic representation of the alignment of cacaoidin BGC in Streptomyces cacaoi CA-170360 and the clusters found in NCBI with high degree of homology. Most of them belong as well to a strain of Streptomyces cacaoi. A: Streptomyces cacaoi CA-170360; B: Streptomyces cacaoi NBRC 12748; C: Streptomyces sp. NRRL S-1868; D: Streptomyces sp. NRRL F-5053; E: Streptomyces cacaoi NRRL B-1220; F: Streptomyces cacaoi OABC16).
Nevertheless, the analysis of below-threshold scores of CaoA BLAST results, together with the search of HopA1 domain-containing proteins similar to Cao7, allowed us to find some pathways that could encode new lanthidins (Supporting Figure 5). The alignment of the hypothetical precursor peptides shows the presence of some conserved residues that possibly could be involved in the leader peptide recognition by biosynthetic enzymes (Supporting Figure 6). Also, the analysis of the ORFs present in all these clusters show that all of them contain a HopA1 domain-containing protein, a LLM flavin-dependent oxidoreductase, a CypD-related protein, a Zn-dependent or S9 peptidase and a putative phosphotransferase (Supplorting Figure 5). Most of these clusters also contain an O-methyltransferase. These data suggest a broader distribution of potential BGCs enconding new lanthidins. However, we will need to have more lanthidin molecules described, before we can conclude the existence of a minimal set of genes required to produce a lanthidin.
4. Conclusions
Cacaoidin is the first member of the new lanthidin RiPP family, characterized by structural features of lanthipeptide and linaridin families, and encoded by a new unprecedented RiPP BGC organization that could not be detected by any bioinformatic tool. The lack of homology with common lanthionine ring formation or double N-terminal dimethylation enzymes suggests an alternative mechanism of biosynthesis. The other unusual structural features of cacaoidin, such as the high number of D-amino acids or the O-glycosylation of tyrosine are supported by the presence in the cao cluster of protein homologues of a LLM class flavin-dependent oxidoreductase and three glycosyltransferases The heterologous expression of cacaoidin, has demonstrated that the cao cluster contains all the necessary genes to biosynthesize cacaoidin, and future research is needed to clarify the unassigned functions of the cao genes Cacaoidin BGC cluster was only found in the genomes of all Streptomyces cacaoi strains publicly available and not in any other species, suggesting that this cluster may be a species-specific trait. Undoubtedly, cacaoidin BGC has an unprecedented genetic organization, completely different from any other previously described RiPP cluster. Moreover, the detection of similar putative lanthidin homologous clusters opens the door to the study of a new exciting family of RiPPs.
5. Material and Methods
Detailed descriptions of all procedures are provided in the Supporting Information. Primer sequences for the cacaoidin gene cluster cloning are included in Supporting Table S1. The cao BGC sequence is available in the National Center for Biotechnology Information (NCBI) database under accession GenBank number MT210103.
6. Acknowledgments
This work is supported by Novo Nordisk Foundation grant NNF16OC0021746. The authors thank Daniel Oves-Costales for helpful advice during the whole process and the Microbiology and Chemistry areas of Fundación MEDINA for the technical support. We thank José Antonio Salas and the University of Oviedo for kindly provide strains Streptomyces albus J1074 and Escherichia coli ET12567/pUB307.