Abstract
Nonribosomal peptides are chemically and functionally diverse natural products with important applications in medicine and agriculture. Bacterial and fungal genomes contain thousands of nonribosomal peptide biosynthetic gene clusters (BGCs) of unknown function, providing a promising resource for peptide discovery. Core structural features of such peptides can be inferred by predicting the substrate(s) of adenylation (A) domains in nonribosomal peptide synthetases (NRPSs). However, existing approaches to A domain prediction rely on limited datasets and often struggle with domains selecting large substrates or from less-studied taxa. Here, we systematically curate and computationally analyse 3,254 A domains and present two new high-accuracy specificity predictors, PARAS and PARASECT. A new type of A domain with unusually high L-tryptophan specificity was identified through the application of PARAS, and intact protein mass spectrometry to the corresponding NRPS showed it to direct the production of tryptopeptin-related metabolites in Streptomyces species. Together, these technologies will accelerate the characterisation of novel NRPSs and their metabolic products.
PARAS and PARASECT are available at https://paras.bioinformatics.nl.
Competing Interest Statement
G.L.C. is non-executive director, consultant and shareholder of ErebaGen Ltd. M.H.M. is a member of the scientific advisory board of Hexagon Bio.