Abstract
Recent advancements in single-cell immune profiling that enable the measurement of the transcriptome and T-cell receptor (TCR) sequences simultaneously have emerged as a promising approach to study immune responses at cellular resolution. Yet, combining these different types of information from multiple datasets into a joint representation is complicated by the unique characteristics of each modality and the technical effects between datasets. Here, we present mvTCR, a multimodal generative model to learn a unified representation across modalities and datasets for joint analysis of single-cell immune profiling data. We show that mvTCR allows the construction of large-scale and multimodal T-cell atlases by distilling modality-specific properties into a shared view, enabling unique and improved data analysis. Specifically, we demonstrated mvTCR’s potential by revealing and separating SARS-CoV-2-specific T-cell clusters from bystanders that would have been missed in individual unimodal data analysis. Finally, mvTCR can enable automated analysis of new datasets when combined with transfer-learning approaches.
Overall, mvTCR provides a principled solution for standard analysis tasks such as multimodal integration, clustering, specificity analysis, and batch correction for single-cell immune profiling data.
Competing Interest Statement
F.J.T. reports receiving consulting fees from Roche Diagnostics GmbH and Cellarity Inc., and ownership interest in Cellarity, Inc. Y.A. acknowledges financial support by JURA Bio, Inc.
Footnotes
Revision includes the following changes: - extended benchmark - application showcase on SARS-CoV-2 dataset - run time analysis and integration capability over multiple studies