RT Journal Article SR Electronic T1 Sequence modeling and design from molecular to genome scale with Evo JF bioRxiv FD Cold Spring Harbor Laboratory SP 2024.02.27.582234 DO 10.1101/2024.02.27.582234 A1 Nguyen, Eric A1 Poli, Michael A1 Durrant, Matthew G. A1 Thomas, Armin W. A1 Kang, Brian A1 Sullivan, Jeremy A1 Ng, Madelena Y. A1 Lewis, Ashley A1 Patel, Aman A1 Lou, Aaron A1 Ermon, Stefano A1 Baccus, Stephen A. A1 Hernandez-Boussard, Tina A1 RĂ©, Christopher A1 Hsu, Patrick D. A1 Hie, Brian L. YR 2024 UL http://biorxiv.org/content/early/2024/03/06/2024.02.27.582234.abstract AB The genome is a sequence that completely encodes the DNA, RNA, and proteins that orchestrate the function of a whole organism. Advances in machine learning combined with massive datasets of whole genomes could enable a biological foundation model that accelerates the mechanistic understanding and generative design of complex molecular interactions. We report Evo, a genomic foundation model that enables prediction and generation tasks from the molecular to genome scale. Using an architecture based on advances in deep signal processing, we scale Evo to 7 billion parameters with a context length of 131 kilobases (kb) at single-nucleotide, byte resolution. Trained on 2.7M prokaryotic and phage genomes, Evo can generalize across the three fundamental modalities of the central dogma of molecular biology to perform zero-shot function prediction that is competitive with, or outperforms, leading domain-specific language models. Evo also excels at multi-element generation tasks, which we demonstrate by generating synthetic CRISPR-Cas molecular complexes and entire transposable systems for the first time. Using information learned over whole genomes, Evo can also predict gene essentiality at nucleotide resolution and can generate coding-rich sequences up to 650 kb in length, orders of magnitude longer than previous methods. Advances in multi-modal and multiscale learning with Evo provides a promising path toward improving our understanding and control of biology across multiple levels of complexity.Competing Interest StatementM.P. is an employee of TogetherAI. M.G.D. acknowledges outside interest in Stylus Medicine. C.R. acknowledges outside interest in Factory and Google Ventures. P.D.H. acknowledges outside interest in Stylus Medicine, Spotlight Therapeutics, Circle Labs, Arbor Biosciences, Varda Space, Vial Health, and Veda Bio, where he holds various roles including as co-founder, director, scientific advisory board member, or consultant. B.L.H acknowledges outside interest in Prox Biosciences as a scientific co-founder. All other authors declare no competing interests.