RT Journal Article SR Electronic T1 SC-MAMBA2: Leveraging State-Space Models for Efficient Single-Cell Ultra-Long Transcriptome Modeling JF bioRxiv FD Cold Spring Harbor Laboratory SP 2024.09.30.615775 DO 10.1101/2024.09.30.615775 A1 Zhao, Yalong A1 Zhao, Bowen A1 Zhang, Fan A1 He, Chenfeng A1 Wu, Wendao A1 Lai, Lipeng YR 2024 UL http://biorxiv.org/content/early/2024/10/01/2024.09.30.615775.abstract AB Single-cell transcriptomics has revolutionized our understanding of cellular heterogeneity, yet modeling ultra-long transcriptome sequences (i.e. number of genes) remains a significant computational challenge. In this study, we introduce SC-MAMBA2, based on the most recent MAMBA2 architecture, as the first application of this architecture integrated with state-space models (SSMs) for single-cell transcriptome modeling. Unlike traditional Transformer-based language models, SC-MAMBA2 leverages the efficiency and scalability of SSMs, enabling to handle longer transcriptome sequences with reduced computational overhead. We introduce unique design adaptations specifically tailored to transcriptome sequences and implement a bidirectional modeling approach under the SSM framework, facilitating comprehensive analysis of whole genome transcriptome sequence. SC-MAMBA2 stands as the largest model in the single-cell transcriptomics domain, with over 150 million parameters, capable of processing transcriptome sequences covering more than 60,000 genes. The model was trained on a dataset of 57 million cells, making it the most comprehensive solution for handling ultra-long sequences to date. Through extensive benchmarking across various downstream tasks, SC-MAMBA2 consistently outperforms state-of-the-art models, demonstrating superior accuracy and computational efficiency. Our results underscore the effectiveness and advanced capabilities of SC-MAMBA2, positioning it as a pivotal tool for future single-cell transcriptome studies.Competing Interest StatementThe authors have declared no competing interest.