PT - JOURNAL ARTICLE AU - Rafael R. C. Cuadrat AU - Danny Ionescu AU - Alberto M. R. Davila AU - Hans-Peter Grossart TI - Recovering genomic clusters of secondary metabolites from lakes: a Metagenomics 2.0 approach AID - 10.1101/183061 DP - 2017 Jan 01 TA - bioRxiv PG - 183061 4099 - http://biorxiv.org/content/early/2017/08/31/183061.short 4100 - http://biorxiv.org/content/early/2017/08/31/183061.full AB - Background Metagenomic approaches became increasingly popular in the past decades due to decreasing costs of DNA sequencing and bioinformatics development. So far, however, the recovery of long genes coding for secondary metabolism still represents a big challenge. Often, the quality of metagenome assemblies is poor, especially in environments with a high microbial diversity where sequence coverage is low and complexity of natural communities high. Recently, new and improved algorithms for binning environmental reads and contigs have been developed to overcome such limitations. Some of these algorithms use a similarity detection approach to classify the obtained reads into taxonomical units and to assemble draft genomes. This approach, however, is quite limited since it can classify exclusively sequences similar to those available (and well classified) in the databases.In this work, we used draft genomes from Lake Stechlin, north-eastern Germany, recovered by MetaBat, an efficient binning tool that integrates empirical probabilistic distances of genome abundance, and tetranucleotide frequency for accurate metagenome binning. These genomes were screened for secondary metabolism genes, such as polyketide synthases (PKS) and non-ribosomal peptide synthases (NRPS), using the Anti-SMASH and NAPDOS workflows.Results With this approach we were able to identify 243 secondary metabolite clusters from 121 genomes recovered from the lake samples. A total of 18 NRPS, 19 PKS and 3 hybrid PKS/NRPS clusters were found. In addition, it was possible to predict the partial structure of several secondary metabolite clusters allowing for taxonomical classifications and phylogenetic inferences.Conclusions Our approach revealed a great potential to recover and study secondary metabolites genes from any aquatic ecosystem.