Abstract
Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular differentiation, transitions, and lineage hierarchies, helping unravel the dynamic processes underlying development, disease progression, and tissue regeneration. However, many existing tools for trajectory inference lack a cohesive statistical model and reliable uncertainty quantification, limiting their utility and robustness. In this paper, we introduce VITAE (Variational Inference for Trajectory by AutoEncoder), a novel statistical approach that integrates a latent hierarchical mixture model with variational autoencoders to infer trajectories. The statistical hierarchical model underpinning the approach enhances interpretability, facilitating the exploration of differential gene expression patterns along the trajectory, while the posterior approximations generated by the variational autoencoder ensure computational efficiency, flexibility, and scalability. Notably, VITAE uniquely enables simultaneous trajectory inference and data integration across multiple datasets, enhancing accuracy in both tasks. We show that VITAE outperforms other state-of-the-art trajectory inference methods on both real and synthetic data under various trajectory topologies. Furthermore, we apply VITAE to jointly analyze three distinct single-cell RNA sequencing datasets of the mouse neocortex, unveiling comprehensive developmental lineages of projection neurons. VITAE effectively mitigates batch effects within and across datasets, aligning cells to elucidate clear trajectories and uncover finer structures that might be overlooked in individual datasets. Additionally, we showcase VITAE’s efficacy in integrative analyses of multi-omic datasets with continuous cell population structures.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Update title; update reference format.