Abstract
Biologists often find it necessary to execute bioinformatic workflows (WFs) as part of their research. However, operation of most WF-management platforms requires at least some programming expertise. Here we describe NeatSeq-Flow, a platform that enables users with no programming knowledge to design and execute complex high-throughput sequencing WFs on their own computer or computer cluster. Workflows are composed of modules. NeatSeq-Flow provides a large compendium of pre-built modules as well as a generic module. Advanced users can also generate custom-made, sophisticated modules using templates and only basic Python commands. Modules and WFs are easily shareable. To execute a WF, through either the graphical user interface or the command line, users need to only specify modules’ order and parameters (workflow design) and input file locations (sample information). WF execution is parallelized on both samples and analysis steps, and progress can be tracked in real time. Results are obtained in a neat directory structure, along with a self-sustaining WF backup for reproducibility. NeatSeq-Flow operates by shell-script generation, allowing full transparency of the WF process. NeatSe q-Flow supports CONDA for easy installation and portability of entire environments. All these features make NeatSeq-Flow an easy-to-use WF platform without compromising flexibility, reproducibility, transparency and efficiency.
Availability http://neatseq-flow.readthedocs.io/en/latest/
Contact sklarz{at}bgu.ac.il
Footnotes
Added reference to the NeatSeq flow GUI and more