PT - JOURNAL ARTICLE AU - Ivan Anishchenko AU - Tamuka M. Chidyausiku AU - Sergey Ovchinnikov AU - Samuel J. Pellock AU - David Baker TI - De novo protein design by deep network hallucination AID - 10.1101/2020.07.22.211482 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.07.22.211482 4099 - http://biorxiv.org/content/early/2020/07/23/2020.07.22.211482.short 4100 - http://biorxiv.org/content/early/2020/07/23/2020.07.22.211482.full AB - There has been considerable recent progress in protein structure prediction using deep neural networks to infer distance constraints from amino acid residue co-evolution1–3. We investigated whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occuring proteins used in training the models. We generated random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting distance maps, which as expected are quite featureless. We then carried out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (KL-divergence) between the distance distributions predicted by the network and the background distribution. Optimization from different random starting points resulted in a wide range of proteins with diverse sequences and all alpha, all beta sheet, and mixed alpha-beta structures. We obtained synthetic genes encoding 129 of these network hallucinated sequences, expressed and purified the proteins in E coli, and found that 27 folded to monomeric stable structures with circular dichroism spectra consistent with the hallucinated structures. Thus deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute, alongside traditional physically based models, to the de novo design of proteins with new functions.Competing Interest StatementThe authors have declared no competing interest.