RT Journal Article SR Electronic T1 Cascabel: a flexible, scalable and easy-to-use amplicon sequence data analysis pipeline JF bioRxiv FD Cold Spring Harbor Laboratory SP 809384 DO 10.1101/809384 A1 Asbun, Alejandro Abdala A1 Besseling, Marc A A1 Balzano, Sergio A1 van Bleijswijk, Judith A1 Witte, Harry A1 Villanueva, Laura A1 Engelmann, Julia C YR 2019 UL http://biorxiv.org/content/early/2019/10/17/809384.abstract AB Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene rather than the entire genome, the number of reads needed per sample is lower than that required for metagenome sequencing, making marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a flexible and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) and a representative sequence tree. Our pipeline allows customizing the analyses by offering several choices for most of the steps, for example different OTU generating methods. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL and licensed under GNU GPLv3.