Abstract
The advent and success of foundation models such as GPT has sparked growing interest in their application to single-cell biology. Models like Geneformer and scGPT have emerged with the promise of serving as versatile tools for this specialized field. However, the efficacy of these models, particularly in zero-shot settings where models are not fine-tuned but used without any further training, remains an open question, especially as practical constraints require useful models to function in settings that preclude fine-tuning (e.g., discovery settings where labels are not fully known). This paper presents a rigorous evaluation of the zero-shot performance of these proposed single-cell foundation models. We assess their utility in tasks such as cell type clustering and batch effect correction, and evaluate the generality of their pretraining objectives. Our results indicate that both Geneformer and scGPT exhibit limited reliability in zero-shot settings and often underperform compared to simpler methods. These findings serve as a cautionary note for the deployment of proposed single-cell foundation models and highlight the need for more focused research to realize their potential.2
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* Work performed while interning at Microsoft Research New England.
kasia{at}well.ox.ac.uk, {lcrawford{at}microsoft.com, ava.amini{at}microsoft.com, lualex{at}microsoft.com}
Section 3.1, Figures 2 & 3 revised due to corrected code after discovery of the unexpected behavior of the package used for evaluation. Figure 6 updated for clarity and Figure S3 added for completeness. Tables S1 - S4 with the evaluation metrics added. A couple of typos corrected.
2 The code used for our analyses can be accessed at https://github.com/microsoft/zero-shot-scfoundation.
3 FlashAttention currently supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100) and Turing GPUs (T4, RTX 2080). Currently, no plans exist to support other GPUs, such as the popular V100.
4 Data available via data.pbmc_dataset function from scvi-tools [26] Python package.