%0 Journal Article %A Nico Borgsmüller %A Jose Bonet %A Francesco Marass %A Abel Gonzalez-Perez %A Nuria Lopez-Bigas %A Niko Beerenwinkel %T Bayesian non-parametric clustering of single-cell mutation profiles %D 2020 %R 10.1101/2020.01.15.907345 %J bioRxiv %P 2020.01.15.907345 %X The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intra-tumor heterogeneity by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq data sets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task, rendering the applicability of existing methods more limited. Here we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. BnpC employs a Dirichlet process mixture model coupled with a Markov chain Monte Carlo sampling scheme, including a modified non-conjugate split-merge move and a novel posterior estimator to predict clones and genotypes. Our method was comprehensively benchmarked against state-of-the-art methods on simulated data using various data sizes and was applied to three cancer scDNA-seq data sets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime, and scalability. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by supplementary experimental data. As scDNA-seq data size constantly grows, scalable, efficient and accurate methods such as BnpC will become increasingly relevant, not only to solve intra-tumor heterogeneity, but also as a pre-processing step to reduce data size. BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC. %U https://www.biorxiv.org/content/biorxiv/early/2020/01/15/2020.01.15.907345.full.pdf