Abstract
CRISPR-Cas9 based lineage tracing technologies have enabled the reconstruction of single-cell phylo- genies from transcriptional readouts. However, developing tree-reconstruction algorithms with theoretical guarantees in this setting is challenging. In this work, we derive a reconstruction algorithm with theoret- ical guarantees using Neighbor-Joining (NJ) on distances that are moment-matched to estimate the true tree distances. We develop a series of tools to analyze this algorithm and prove its theoretical guarantees. When the parameters of the data generating process are known and there is no missing data, our results align with established results from common evolutionary models, such as Cavender-Farris-Neyman and Jukes-Cantor. However, to account for the realistic case where the parameters of the data generating process are not known and there is missing data, we develop new theory that shows for the first time that it is still possible to obtain reconstruction guarantees in the CRISPR-Cas9 case and in other models of evolution. Empirically, we show on both simulated lineage tracing data and on real data from a mouse model of lung cancer the improved performance of our method as compared to the traditional use of NJ.
Competing Interest Statement
M.G.J. consults for and holds equity in Vevo Therapeutics.