RT Journal Article SR Electronic T1 Single-sequence protein structure prediction using language models from deep learning JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.08.02.454840 DO 10.1101/2021.08.02.454840 A1 Ratul Chowdhury A1 Nazim Bouatta A1 Surojit Biswas A1 Charlotte Rochereau A1 George M. Church A1 Peter K. Sorger A1 Mohammed AlQuraishi YR 2021 UL http://biorxiv.org/content/early/2021/08/04/2021.08.02.454840.abstract AB AlphaFold2 and related systems use deep learning to predict protein structure from co-evolutionary relationships encoded in multiple sequence alignments (MSAs). Despite dramatic, recent increases in accuracy, three challenges remain: (i) prediction of orphan and rapidly evolving proteins for which an MSA cannot be generated, (ii) rapid exploration of designed structures, and (iii) understanding the rules governing spontaneous polypeptide folding in solution. Here we report development of an end-to-end differentiable recurrent geometric network (RGN) able to predict protein structure from single protein sequences without use of MSAs. This deep learning system has two novel elements: a protein language model (AminoBERT) that uses a Transformer to learn latent structural information from millions of unaligned proteins and a geometric module that compactly represents Cα backbone geometry. RGN2 outperforms AlphaFold2 and RoseTTAFold (as well as trRosetta) on orphan proteins and is competitive with designed sequences, while achieving up to a 106-fold reduction in compute time. These findings demonstrate the practical and theoretical strengths of protein language models relative to MSAs in structure prediction.Competing Interest StatementM.A. is a member of the SAB of FL2021-002, a Foresite Labs company, and consults for Interline Therapeutics. P.K.S. is a member of the SAB or Board of Directors of Glencoe Software, Applied Biomath, RareCyte and NanoString and has equity in several of these companies. A full list of G.M.C. tech transfer, advisory roles, 559 and funding sources can be found on the lab website: http://arep.med.harvard.edu/gmc/tech.html.