Abstract
Nature only samples a small fraction in sequence space, yet many more amino acid combinations can fold into stable proteins. Furthermore, small structural variations in a single fold, which may only be a few amino acids different from the next homolog, define their molecular function. Hence, to design proteins with novel molecular functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of small protein folds while sampling high shape diversity. We designed and evaluated about 30,000 de novo protein designs of 7 different folds. Among these designs, about 6,200 stable proteins were identified, with predicted structures having first-of-its-kind minimalized thioredoxin. Obtained data revealed more protein folding rules, such as helix connecting loops, which were in nature. Beyond providing a resource database for protein engineering, our data presents a large training data set for machine learning. We developed a high-accuracy classifier to predict the stability of our designed proteins. The methods and the wide range of new protein shapes provide a basis for the design of new protein function without compromising stability.
Competing Interest Statement
The authors have declared no competing interest.