TY - JOUR T1 - BGDMdocker: an workflow base on Docker for analysis and visualization pan-genome and biosynthetic gene clusters of Bacterial JF - bioRxiv DO - 10.1101/098392 SP - 098392 AU - Gong Cheng AU - Zongshan Zhou AU - Ling Ma AU - Guocai Zhang AU - Yilei WU AU - Chao Chen Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/10/098392.abstract N2 - Motivation At present Docker technology is increasing level of attention throughout the bioinformatics community, but Docker technology implementation details have not been mastered by most biologist and applied in biological research, In order to apply this technology in the popularization and sufficient use plenty of free academic resources of bioinformatics tools images (community, official, and private) in Docker Hub Registry and other Docker sources base on Docker, In this article, we introduced full and accurate instance of an bioinformatics workflow base on Docker of analysis and visualization pan-genome and biosynthetic geneclusters of Bacterial, provides the solutions for the data mining bioinformatics big data from various free biology databases of Bacterial. We’ll guide you step-by-step through the process from dockerfile building your own image and run an container fast creating an workflow.Results We present BGDMdocker (bacterial genome data ming docker-based) workflow based on docker of pan-genome and biosynthetic geneclusters analysis and visualization can be fully reusable immediately in different computing platforms (Linux, Windows, Mac and deployed in the cloud), achieved cross platform deployment flexibility, rapid development integrated software packag. The workflow consists of three integrated toolkit Prokka v1.11, panX, antiSMASH3.0 and its dependencies are all write in Dockerfile, and we use Dockerfile build docker image and run container for anlysis pan-genome of total 44 B. amyloliquefaciens strains, that is an open pan-genome, total 172,432 gene, 2,306 Core gene cluster, and visualized pan-genomic data such as alignment, a phylogenetic tree, maps mutations within that cluster to the branches of the tree, infers loss and gain of genes on the core-genome phylogeny for each gene cluster, besides, mining 997 known(MIBiG database) and 553 unknown (antiSMASH-predicted clusters and Pfam database) total 1550 biosynthesis geneclusters types and orthologous groups in all strains. This workflow can also be used for other species pan-genome vis analysis and ualization, you can completely duplicate the display of visual data as well as in this paper. All data of result have been upload in our website, no need to register you can download all tools, data and files.Availability and implementation BGDMdocker is available at http://42.96.173.25/bapgd/ and the source code under GPL license is available at https://github.com/cgwyx/debian_prokka_panx_antismash_dockerContact chenggongwyx{at}foxmail.comSupplementary information Supplementary data are available at biorxiv online. ER -