Abstract
Understanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, WADDINGTON-OT, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs. We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neuraland placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.
Footnotes
Email: lander{at}broadinstitute.org (E.S.L.), aregev{at}broadinstitute.org (A.R.); jianshu{at}broadinstitute.org (J.S.)
1 A limitation of Waddington’s landscape is that it is cell-autonomous (i.e. doesn’t include effects of other cells). Our model is actually more general.
2 For example, one simple cost function is squared Euclidean distance c(x, y) = ||x - y||2.
3 For example, imagine that ℙt is a mixture of Gaussians with time-varying mixture weights.
4 Advection, a term borrowed from fluid mechanics, refers to the transport of a substance by bulk motion. The constraint that the divergence of the flow is equal to the rate of change of ρ means that ρ flows according to the velocity field v, without gaining or losing mass.
5 As we discuss in Appendix S2, we have had some initial success with a rectified-linear function class. Following Cleary et al. 2017 [SMAF], we impose low-rank and sparsity constraints on the linear part, and we apply the function only to the transcription factors in Xti. Each low-rank component can then be interpreted as a regulatory module of transcription factors acting on a module of regulated genes.