RT Journal Article SR Electronic T1 BGDMdocker: a Docker workflow for analysis and visualization pan-genome and biosynthetic gene clusters of bacterial JF bioRxiv FD Cold Spring Harbor Laboratory SP 098392 DO 10.1101/098392 A1 Gong Cheng A1 Quan Lu A1 Zongshan Zhou A1 Ling Ma A1 Guocai Zhang A1 WU Yilei A1 Chao Chen YR 2017 UL http://biorxiv.org/content/early/2017/02/03/098392.abstract AB Motivation At present Docker technology has received increasing level of attention throughout the bioinformatics community. However, its implementation details have not yet been mastered by most biologists and applied widely in biological researches. In order to popularizing this technology in the bioinformatics and sufficiently use plenty of public resources of bioinformatics tools (Dockerfile and image of scommunity, officially and privately) in Docker Hub Registry and other Docker sources based on Docker, we introduced full and accurate instance of a bioinformatics workflow based on Docker to analyse and visualize pan-genome and biosynthetic gene clusters of a bacteria in this article, provided the solutions for mining bioinformatics big data from various public biology databases. You could be guided step-by-step through the workflow process from docker file to build up your own images and run an container fast creating an workflow.Results We presented a BGDMdocker (bacterial genome data mining docker-based) workflow based on docker. The workflow consists of three integrated toolkits, Prokka v1.11, panX, and antiSMASH3.0. The dependencies were all written in Dockerfile, to build docker image and run container for analysing pan-genome of total 44 Bacillus amyloliquefaciens strains, which were retrieved from public? database. The pan-genome totally includes 172,432 gene, 2,306 Core gene cluster. The visualized pan-genomic data such as alignment, phylogenetic trees, maps mutations within that cluster to the branches of the tree, infers loss and gain of genes on the core-genome phylogeny for each gene cluster were presented. Besides, 997 known (MIBiG database) and 553 unknown (antiSMASH-predicted clusters and Pfam database) genes of biosynthesis gene clusters types and orthologous groups were mined in all strains. This workflow could also be used for other species pan-genome analysis and visualization. The display of visual data can completely duplicated as well as done in this paper. All result data and relevant tools and files can be downloaded from our website with no need to register. The pan-genome and biosynthetic gene clusters analysis and visualization can be fully reusable immediately in different computing platforms (Linux, Windows, Mac and deployed in the cloud), achieved cross platform deployment flexibility, rapid development integrated software package.Availability and implementation BGDMdocker is available at http://42.96.173.25/bapgd/ and the source code under GPL license is available at https://github.com/cgwyx/debian_prokka_panx_antismash_biodocker.Contact chenggongwyx{at}foxmail.comSupplementary information Supplementary data are available at biorxiv online.