GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- For correspondence: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

Abstract
Deciphering the universal gene regulatory mechanisms in diverse organisms holds great potential to advance our knowledge of fundamental life process and facilitate research on clinical applications. However, the traditional research paradigm primarily focuses on individual model organisms, resulting in limited collection and integration of complex features on various cell types across species. Recent breakthroughs in single-cell sequencing and advancements in deep learning techniques present an unprecedented opportunity to tackle this challenge. In this study, we developed GeneCompass, the first knowledge-informed, cross-species foundation model pre-trained on an extensive dataset of over 120 million single-cell transcriptomes from human and mouse. During pre-training, GeneCompass effectively integrates four types of biological prior knowledge to enhance the understanding of gene regulatory mechanisms in a self-supervised manner. Fine-tuning towards multiple downstream tasks, GeneCompass outperforms competing state-of-the-art models in multiple tasks on single species and unlocks new realms of cross-species biological investigation. Overall, GeneCompass marks a milestone in advancing knowledge of universal gene regulatory mechanisms and accelerating the discovery of key cell fate regulators and candidate targets for drug development.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵$ A list of affiliations appears at the end of the paper.
Data and code availability
All primary data presented in this study will be deposited in a public database, and all codes will be uploaded to GitHub: https://github.com/xCompass-AI/GeneCompass.
Subject Area
- Biochemistry (13843)
- Bioengineering (10538)
- Bioinformatics (33474)
- Biophysics (17270)
- Cancer Biology (14340)
- Cell Biology (20313)
- Clinical Trials (138)
- Developmental Biology (10959)
- Ecology (16157)
- Epidemiology (2067)
- Evolutionary Biology (20478)
- Genetics (13491)
- Genomics (18772)
- Immunology (13895)
- Microbiology (32423)
- Molecular Biology (13504)
- Neuroscience (70645)
- Paleontology (532)
- Pathology (2219)
- Pharmacology and Toxicology (3770)
- Physiology (5942)
- Plant Biology (12117)
- Synthetic Biology (3400)
- Systems Biology (8222)
- Zoology (1860)