PT  - JOURNAL ARTICLE
AU  - Benoit Marc Bergk Pinto
AU  - Timothy M Vogel
AU  - Catherine Larose
TI  - EggVio: a user friendly and versatile pipeline for assembly and functional annotation of shallow depth sequenced samples
AID  - 10.1101/2022.04.23.489251
DP  - 2022 Jan 01
TA  - bioRxiv
PG  - 2022.04.23.489251
4099  - http://biorxiv.org/content/early/2022/04/25/2022.04.23.489251.short
4100  - http://biorxiv.org/content/early/2022/04/25/2022.04.23.489251.full
AB  - We introduce a homemade pipeline allowing to improve the quality of the metagenomic annotations carried out when using shallow depth metagenomic datasets. The main motivation being to be able to quantify more precisely, with greater certainty, the genes involved in bacterial interactions. The limitation in our experimental design is that we use a sequencing technique with a low throughput (miSeq) compared to the metagenomic standard (hiSeq) because we carry out a fairly large sampling (almost a hundred samples) in time series. This methodological constraint from our study means that the assembly of the sequences is not very exhaustive (less than 50% of the sequences manage to be assembled). In this chapter, we will therefore present a new pipeline designed to specifically deal with such kind of data. We used co-assembly and a sequence annotation strategy in order to recover the sequences that could not be mapped on the assembled contigs. In addition, in order to avoid adding too much noise, when rescuing reads, we have built an algorithm to define a threshold of e-value based on the noise of the sequence annotation learned from sequences mapped in the assembly.We have selected several recent tools known to be effective for assembling, mapping and annotating these data. In addition, this pipeline was also built in order to be very user-friendly in terms of installation. In this idea of reproducibility, accessibility and transparency, we have designed an installation script to allow each user to install each tool required for the pipeline in a simple and reproducible way. Regarding the performances of this pipeline, we were able to show that the expected error rate (False discovery rate) for the annotation was close to 5%. Finally, we also used an actual dataset from a bioremediation site and showed that the representability of the samples seemed much better when we used our pipeline than when we used a classic metagenome assembly strategy.Competing Interest StatementThe authors have declared no competing interest.