TY - JOUR T1 - Large-scale design and refinement of stable proteins using sequence-only models JF - bioRxiv DO - 10.1101/2021.03.12.435185 SP - 2021.03.12.435185 AU - Jedediah M. Singer AU - Scott Novotney AU - Devin Strickland AU - Hugh K. Haddox AU - Nicholas Leiby AU - Gabriel J. Rocklin AU - Cameron M. Chow AU - Anindya Roy AU - Asim K. Bera AU - Francis C. Motta AU - Longxing Cao AU - Eva-Maria Strauch AU - Tamuka M. Chidyausiku AU - Alex Ford AU - Ethan Ho AU - Craig O. Mackenzie AU - Hamed Eramian AU - Frank DiMaio AU - Gevorg Grigoryan AU - Matthew Vaughn AU - Lance J. Stewart AU - David Baker AU - Eric Klavins Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/03/12/2021.03.12.435185.abstract N2 - Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we report a neural network model that predicts protein stability based only on sequences of amino acids, and demonstrate its performance by evaluating the stability of almost 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We also report a second neural network model that is able to generate novel stable proteins. Finally, we show that the predictive model can be used to substantially increase the stability of both expert-designed and model-generated proteins.Competing Interest StatementJMS and NL are employed by Two Six Technologies, which has filed a patent on a portion of the technology described in this manuscript. ER -