PT - JOURNAL ARTICLE AU - Pan, Shi AU - Withnell, Eloise AU - Secrier, Maria TI - Classifying epithelial-mesenchymal transition states in single cell cancer data using large language models AID - 10.1101/2024.08.16.608311 DP - 2024 Jan 01 TA - bioRxiv PG - 2024.08.16.608311 4099 - http://biorxiv.org/content/early/2024/08/19/2024.08.16.608311.short 4100 - http://biorxiv.org/content/early/2024/08/19/2024.08.16.608311.full AB - Epithelial–mesenchymal plasticity plays a significant role in various biological processes including tumour progression and chemoresistance. However, the expression programmes underlying the epithelial–mesenchymal transition (EMT) in cancer are diverse, and accurately defining the EMT status of tumour cells remains a challenging task. In this study, we employed a pre-trained single-cell large language model (LLM) to develop an EMT-language model (EMT-LM) that allows us to capture discrete states within the EMT continuum in single cell cancer data. In capturing EMT states, we achieved an average Area Under the Receiver Operating Characteristic curve (AUROC) of 90% across multiple cancer types. We propose a new metric, ADESI, to aid the biological interpretability of our model, and derive EMT signatures liked with energy metabolism and motility reprogramming underlying these state switches. We further employ our model to explore the emergence of EMT states in spatial transcriptomics data, uncovering hybrid EMT niches with contrasting potential for antitumour immunity or immune evasion. Our study provides a proof of concept that LLMs can be applied to characterise cell states in single cell data, and proposes a generalisable framework to predict EMT in single cell RNA-seq that can be adapted and expanded to characterise other cellular states.Competing Interest StatementThe authors have declared no competing interest.