Abstract
Motivation Nonribosomal peptides (NRPs) are a class of secondary metabolites synthesized by multimodular enzymes named nonribosomal peptide synthetases (NRPSs) and mainly produced by bacteria and fungi. It has been shown that NRPs have a huge structural and functional diversity including antimicrobial activity, therefore, they are of increasing interest for modern biotechnology. Methods such as NMR and LC-MS/MS allow to determine NRP structure precisely, but it is often not a trivial task to find natural producers of them. Today, searches are usually performed manually, mostly with tools such as antiSMASH or Prism. However, there are cases when potential producers should be found among hundreds of strains, for instance, when analyzing metagenomes data. Thus, the development of automated approaches is a high-priority task for further NRP research.
Results We developed BioCAT, a two-side approach to find biosynthesys gene clusters (BGCs) which may produce a given NRP when the structure of interesting NRP has already been found. Formally, the BioCAT unites the antiSMASH software and the rBAN retrosynthesis tool but some improvements were added to both gene cluster and NRP chemical structure analyses. The main feature of the method is PSSM usage to store specificities of NRPS modules, which has increased the alignment quality in comparison with more strict approaches developed earlier. An ensemble model was implemented to calculate the final alignment score. We tested the method on a manually curated NRP producers database and compared it with a competing tool called GARLIC. Finally, we showed the method applicability on a several external examples.
Availability BioCAT is available on the GitHub repository or via pip
Contact konanovdmitriy{at}gmail.com
Competing Interest Statement
The authors have declared no competing interest.