RT Journal Article SR Electronic T1 PHATE: A Dimensionality Reduction Method for Visualizing Trajectory Structures in High-Dimensional Biological Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 120378 DO 10.1101/120378 A1 Moon, Kevin R. A1 van Dijk, David A1 Wang, Zheng A1 Chen, William A1 Hirn, Matthew J. A1 Coifman, Ronald R. A1 Ivanova, Natalia B. A1 Wolf, Guy A1 Krishnaswamy, Smita YR 2017 UL http://biorxiv.org/content/early/2017/03/24/120378.abstract AB In recent years, dimensionality reduction methods have become critical for visualization, exploration, and interpretation of high-throughput, high-dimensional biological data, as they enable the extraction of major trends in the data while discarding noise. However, biological data contains a type of predominant structure that is not preserved in commonly used methods such as PCA and tSNE, namely, branching progression structure. This structure, which is often non-linear, arises from underlying biological processes such as differentiation, graded responses to stimuli, and population drift, which generate cellular (or population) diversity. We propose a novel, affinity-preserving embedding called PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding), designed explicitly to preserve progression structure in data.PHATE provides a denoised, two or three-dimensional visualization of the complete branching trajectory structure in high-dimensional data. It uses heat-diffusion processes, which naturally denoise the data, to compute cell-cell affinities. Then, PHATE creates a diffusion-potential geometry by free-energy potentials of these processes. This geometry captures high-dimensional trajectory structures, while enabling a natural embedding of the intrinsic data geometry. This embedding accurately visualizes trajectories and data distances, without requiring strict assumptions typically used by path-finding and tree-fitting algorithms, which have recently been used for pseudotime orderings or tree-renderings of cellular data. Furthermore, PHATE supports a wide range of data exploration tasks by providing interpretable overlays on top of the visualization. We show that such overlays can emphasize and reveal trajectory end-points, branch points and associated split-decisions, progression-forming variables (e.g., specific genes), and paths between developmental events in cellular state-space. We demonstrate PHATE on single-cell RNA sequencing and mass cytometry data pertaining to embryoid body differentiation, IPSC reprogramming, and hematopoiesis in the bone marrow. We also demonstrate PHATE on non-single cell data including single-nucleotide polymorphism (SNP) measurements of European populations, and 16s sequencing of gut microbiota.