Abstract
Large Language Models (LLMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of LLMs in single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. By comparing seven different single-cell LLMs with task-specific methods, we found that single-cell LLMs may not consistently excel in all tasks than task-specific methods. However, the emergent abilities and the successful applications of cross-species/cross-modality transfer learning of LLMs are promising. In addition, we present a systematic evaluation of the effects of hyper-parameters, initial settings, and stability for training single-cell LLMs based on a proposed scEval framework, and provide guidelines for pre-training and fine-tuning. Our work summarizes the current state of single-cell LLMs, and points to their constraints and avenues for future developments.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Contributing authors: tianyu.liu{at}yale.edu; kexing.li{at}yale.edu; yuge.wang{at}yale.edu; hongyu.li{at}yale.edu;
We modified the main text and the corresponding figures. We added more experiment results.