PT - JOURNAL ARTICLE AU - Li Charlie Xia AU - Dongmei Ai AU - Hojoon Lee AU - Noemi Andor AU - Chao Li AU - Nancy R. Zhang AU - Hanlee P. Ji TI - SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution AID - 10.1101/247536 DP - 2018 Jan 01 TA - bioRxiv PG - 247536 4099 - http://biorxiv.org/content/early/2018/01/12/247536.short 4100 - http://biorxiv.org/content/early/2018/01/12/247536.full AB - Background Simulating genome sequence data with features can facilitate the development and benchmarking of structural variant analysis programs. However, there are a limited number of data simulators that provide structural variants in silico. Moreover, there are a paucity of programs that generate structural variants with different allelic fraction and haplotypes.Findings We developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine’s flexible design process enables one to specify size, position, and allelic fraction for deletion, insertion, duplication, inversion and translocation variants. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.Conclusions We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine’s features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated the accuracy of the simulations. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. SVEngine is implemented as a standard Python package and is freely available for academic use at: https://bitbucket.org/charade/svengine.