Abstract
Background Lanthipeptides belong to the ribosomally synthesized and post-translationally modified peptide group of natural products and have a variety of biological activities ranging from antibiotics to antinociceptives. These peptides are cyclized through thioether crosslinks and can bear other secondary post-translational modifications. While lanthipeptide biosynthetic gene clusters can be identified by the presence of characteristic enzymes involved in the post-translational modification of these peptides, locating the precursor peptides encoded within these clusters is challenging due to their short length and high sequence variability, which limits the high-throughput exploration of lanthipeptide precursor peptides. To address this challenge, we enhanced the predictive capabilities of Rapid ORF Description & Evaluation Online (RODEO) to identify all known classes of lanthipeptides.
Results Using RODEO, we mined over 100,000 bacterial and archaeal genomes in the RefSeq database. We identified nearly 8,500 lanthipeptide precursor peptides. These precursor peptides were identified in a broad range of bacterial phyla as well as the Euryarchaeota phylum of archaea. Bacteroidetes were found to encode a large number of these biosynthetic gene clusters, despite making up a relatively small portion of the genomes in this dataset. While a number of these precursor peptides are similar to those of previously characterized lanthipeptides, even more were not, including potential antibiotics. Additionally, examination of the biosynthetic gene clusters revealed enzymes that install secondary post-translational modifications are more widespread than initially thought.
Conclusion Lanthipeptide biosynthetic gene clusters are more widely distributed and the precursor peptides encoded within these clusters are more diverse than previously appreciated, demonstrating that the lanthipeptide sequence-function space remains largely underexplored.
Footnotes
List of Abbreviations
- BGC
- biosynthetic gene cluster,
- HMM
- hidden Markov model,
- LanA
- lanthipeptide precursor peptide;
- LanB
- class I lanthipeptide dehydratatse;
- LanC
- class I lanthipeptide cyclase
- LanKC
- class III lanthipeptide synthetase
- LanL
- class IV lanthipeptide synthetase
- LanM
- class II lanthipeptide synthetase;
- MEME
- Multiple Em for Motif Elicitation;
- ORF
- open reading frame.
- Pfam
- protein family;
- RODEO
- Rapid ORF Description & Evaluation Online;
- RiPP
- ribosomally synthesized and post-translationally modified peptide;
- SVM
- support vector machine.