Abstract
Accurate detection and classification of somatic single nucleotide variants (SNVs) is important in defining the clonal composition of human cancers. Existing tools are prone to miss low prevalence mutations and methods for classification of mutations into clonal groups across the whole genome are underdeveloped. Increasing interest in deciphering clonal population dynamics over multiple samples in time or anatomic space from the same patient is resulting in whole genome sequence (WGS) data from phylogenetically related samples. With the access to this data, we posited that injecting clonal structure information into the inference of mutations from multiple samples would improve mutation detection.
We developed MuClone: a novel statistical framework for simultaneous detection and classification of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. The key advance lies in incorporating prior knowledge about the cellular prevalences of clones to improve the performance of detecting mutations, particularly low prevalence mutations. We evaluated MuClone through synthetic and real data from spatially sampled ovarian cancers. Results support the hypothesis that clonal information improves sensitivity in detecting somatic mutations without compromising specificity. In addition, MuClone classifies mutations across whole genomes of multiple samples into biologically meaningful groups, providing additional phylogenetic insights and enhancing the study of WGS-derived clonal dynamics.