TY - JOUR T1 - Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data JF - bioRxiv DO - 10.1101/2020.05.06.058180 SP - 2020.05.06.058180 AU - Fatemeh Dorri AU - Sohrab Salehi AU - Kevin Chern AU - Tyler Funnell AU - Marc Williams AU - Daniel Lai AU - Mirela Andronescu AU - Kieran R. Campbell AU - Andrew McPherson AU - Samuel Aparicio AU - Andrew Roth AU - Sohrab Shah AU - Alexandre Bouchard-Côté Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/05/07/2020.05.06.058180.abstract N2 - A new generation of scalable single cell whole genome sequencing (scWGS) methods [Zahn et al., 2017, Laks et al., 2019], allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment [Laks et al., 2019] is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls for tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by O(|C| +|L|), where |C| is the number of cells and |L| is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of O(|C| |L|). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data.Competing Interest StatementSPS and SA are shareholders and consultants of Contextual Genomics Inc. ER -