Abstract
Motivation Large-scale rearrangements and copy number changes combined with different modes of cloevolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable oriant calling tools and create well-calibrated benchmarks.
Results We developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools.
Availability and implementation tHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/tHapMix.
Contact sivakhno{at}illumina.com
Supplementary information Supplementary data are available at Bioinformatics online.