Summary
The heterogeneity of neurodegenerative diseases is a key confound to disease understanding and treatment development, as study cohorts typically include multiple phenotypes on distinct disease trajectories. Here we present a new machine learning technique – Subtype and Stage Inference (SuStaIn) – able to uncover data-driven disease phenotypes with distinct temporal progression patterns, from widely available crosssectional patient studies. Results from imaging studies in two neurodegenerative diseases reveal new subgroups and their distinct trajectories of regional neurodegeneration. In genetic frontotemporal dementia, SuStaIn identifies genotypes from imaging alone, validating its ability to identify subtypes, and characterises within-group heterogeneity for the first time. In Alzheimer’s disease, SuStaIn uncovers three subtypes, uniquely revealing their temporal complexity. SuStaIn provides fine-grained patient stratification, which substantially enhances the ability to predict conversion between diagnostic categories over standard models that ignore subtype (p=7.18×10--4) or temporal stage (p=3.96×10−5). SuStaIn thus offers new promise for enabling disease subtype discovery and precision medicine.