PT - JOURNAL ARTICLE AU - Nico Borgsmüller AU - Jose Bonet AU - Francesco Marass AU - Abel Gonzalez-Perez AU - Nuria Lopez-Bigas AU - Niko Beerenwinkel TI - Bayesian non-parametric clustering of single-cell mutation profiles AID - 10.1101/2020.01.15.907345 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.01.15.907345 4099 - http://biorxiv.org/content/early/2020/02/28/2020.01.15.907345.short 4100 - http://biorxiv.org/content/early/2020/02/28/2020.01.15.907345.full AB - The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intra-tumor heterogeneity by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq data sets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Here we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. BnpC employs a Dirichlet process mixture model coupled with a Markov chain Monte Carlo sampling scheme, including a modified split-merge move and a novel posterior estimator to predict clones and genotypes. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq data sets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime, and scalability. Its inferred genotypes were the most accurate, and it was the only method able to run and produce results on data sets with 10,000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by supplementary experimental data. With ever growing scDNA-seq data sets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve intra-tumor heterogeneity but also as a pre-processing step to reduce data size. BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC.