Abstract
Single-cell transcriptomic studies of diverse and complex systems are becoming ubiquitous. Algorithms now attempt to integrate patterns across these studies by removing all study-specific information, without distinguishing unwanted technical bias from relevant biological variation. Integration remains difficult when capturing biological variation that is distributed across studies, as when combining disparate temporal snapshots into a panoramic, multi-study trajectory of cellular development. Here, we show that a fundamental analytic shift to gene coexpression within clusters of cells, rather than gene expression within individual cells, balances robustness to bias with preservation of meaningful inter-study differences. We leverage this insight in Trajectorama, an algorithm which we use to unify trajectories of neuronal development and hematopoiesis across studies that each profile separate developmental stages, a highly challenging task for existing methods. Trajectorama also reveals systems-level processes relevant to disease pathogenesis within the microglial response to myelin injury. Trajectorama benefits from efficiency and scalability, processing nearly one million cells in around an hour.