Journal of Molecular Biology
A General Computational Approach for Repeat Protein Design
Graphical abstract
Introduction
Repeat proteins play key roles in biological processes ranging from adhesion to signaling to defense mechanisms [1]. These proteins consist of adjacent series of usually non-identical repeated amino acid sequences; in most cases, these repeated units fold cooperatively into either a solenoid-shaped or a toroid-shaped structure [2], [3], [4]. Although extremely diverse in structure and sequence, repeat proteins are characterized by short-ranged intra-repeat and inter-repeat interactions between residues [2]. The intrinsic modularity of repeat proteins allows combination of functionalities in a single domain (e.g., recognition motifs for nucleic acids [5] and peptides [6]) and can be used to generate biomaterials with tunable mechanical properties [7]. However, interactions between neighboring repeats are not always conserved; hence arbitrary extension by repeat insertion is not usually possible.
To allow modular extension, designed repeat proteins with self-compatible repeating elements have been generated using consensus-based approaches [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18]. Consensus sequences are defined by the most common residue at each position in a multiple sequence alignment (MSA) of the proteins or repeats in a family. This approach is conceptually simple and powerful but does have non-optimal features. First, the consensus sequence can vary depending on the collection size and the selection method for the sequences used in the alignment. Second, residue–residue packing, particularly critical in the formation of a uniquely defined hydrophobic core, is not considered, and hence in some cases, the consensus may have sub-optimal residue–residue interactions. Incorporating amino acid covariation information derived from statistical analysis of naturally occurring sequences can capture some of these residue–residue coupling effects [19], [20], [21], but reliable estimates of covariance require large numbers of sequences that are not available for all protein families.
Here we describe a general computational approach for repeat protein design that integrates Rosetta de novo structure generation and design methodology with protein family-based sequence and structural information. By automatically generating very low energy design models compatible with the available sequence and structure information, the method provides increased versatility compared to standard sequence consensus-based approaches and reduces the manual intervention required to achieve stable designs.
Section snippets
Results and Discussion
We developed a computational approach that integrates sequence and structural information with Rosetta [22] de novo folding and design calculations for the generation of idealized repeat proteins (Fig. 1). Families with α helical, β and mixed α/β secondary structure were chosen for redesign to illustrate the generality of the method. Sets of sequences were designed for six protein families: ankyrin (ank), armadillo (arm), tetratrico peptide repeat (TPR), HEAT, leucine-rich repeats (LRR) and
Conclusions
The approach presented here generalizes the current MSA-based methods for repeat protein design by automatically integrating sequence, structure and energetic information. Designing backbones de novo avoids potential bias due to the use of a single or few template structures.
Forty percent of the proteins designed with our method were folded and had a melting temperature (Tm) of 57 °C or greater (Table 2). The crystal structures we were able to solve had an RMSD of about 1 Å to the design models.
Generation of sequence and structural constraints
The repeat consensus sequences were obtained from family alignments in the SMART database [29], [33]. For the HEAT family, not present in SMART, the Pfam [30] seed alignment was used. A double-repeat sequence was generated by duplication of the consensus. When the consensus sequence did not cover the whole repeat, connecting fragments were added using alanine residues as placeholders. The length of this linking sequence was based on the shortest connection observed, with at least a 10%
Acknowledgments
We thank the members of the protein production facility at the Institute for Protein Design, Seattle, WA, Sergey Ovchinnikov and Hetu Kamisetty for fruitful discussions on MSA, and James Thompson and Justin Ashworth for SequenceProfile implementation in Rosetta. This work was facilitated though the use of advanced computational, storage and networking infrastructure provided by the Hyak supercomputer system at the University of Washington. For technical assistance and coordination of efforts at
References (54)
Tandem repeats in proteins: from sequence to structure
J Struct Biol
(2012)- et al.
When protein folding is simplified to protein coiling: the continuum of solenoid protein structures
Trends Biochem Sci
(2000) What curves α-solenoids? Evidence for an α-helical toroid structure of Rpn1 and Rpn2 proteins of the 26S proteasome
J Biol Chem
(2002)- et al.
Modular peptide binding: from a comparison of natural binders to designed armadillo repeat proteins
J Struct Biol
(2014) - et al.
Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins
J Mol Biol
(2003) - et al.
Designed armadillo repeat proteins as general peptide-binding scaffolds: consensus design and computational optimization of the hydrophobic core
J Mol Biol
(2008) - et al.
Design of stable α-helical arrays from an idealized TPR motif
Structure
(2003) - et al.
Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats
J Mol Biol
(2010) - et al.
The contribution of entropy, enthalpy, and hydrophobic desolvation to cooperativity in repeat-protein folding
Structure
(2011) - et al.
Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family
J Mol Biol
(2003)
Engineering of beta-propeller protein scaffolds by multiple gene duplication and fusion of an idealized WD repeat
Biomol Eng
Artificial leucine rich repeats as new scaffolds for protein design
Bioorg Med Chem Lett
Beyond consensus: statistical free energies reveal hidden interactions in the design of a TPR motif
J Mol Biol
Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability
J Mol Biol
ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules
Methods Enzymol
Comparison of ARM and HEAT protein repeats
J Mol Biol
The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium
J Struct Biol
Crystallization and preliminary X-ray analysis of the 9 kDa protein of the mouse signal recognition particle and the selenomethionyl-SRP9
FEBS Lett
[20] Processing of X-ray diffraction data collected in oscillation mode
PRDB: Protein Repeat DataBase
Proteomics
Modular recognition of nucleic acids by PUF, TALE and PPR proteins
Mol Biosyst
A modular approach to the design of protein-based smart gels
Biopolymers
Consensus-derived structural determinants of the ankyrin repeat motif
Proc Natl Acad Sci
Design of a binding scaffold based on variable lymphocyte receptors of jawless vertebrates by module engineering
Proc Natl Acad Sci
Consensus design of a NOD receptor leucine rich repeat domain with binding affinity for a muramyl dipeptide, a bacterial cell wall fragment
Protein Sci
Evolutionary information for specifying a protein fold
Nature
Design of a novel globular protein fold with atomic-level accuracy
Science
Cited by (66)
Engineering of brick and staple components for ordered assembly of synthetic repeat proteins
2023, Journal of Structural BiologyRepeat proteins: designing new shapes and functions for solenoid folds
2021, Current Opinion in Structural BiologyCitation Excerpt :These goals have been achieved by designing proteins with high thermodynamic stability composed of self-compatible repeats obtained by sequence-based (consensus design, ancestral reconstruction) or structure-based (template superposition, docking) approaches, reviewed by Parmeggiani and Huang [6]. Template-free designs have been generated using fragment assemblies guided by secondary structure assignment [7], but, recently, tools for parametric design of coiled coils in the ISAMBARD software have been extended to allow design of repeat proteins [8]. Fragment assembly has been also used to create entirely new designed helical repeat proteins based on duplications of helix-loop-helix-loop units with no similarity with existing repeat protein families and extreme stability, either in open solenoid form [9,10] or closed toroid shape [11,12••].
Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design
2020, Journal of Molecular BiologyComputational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems
2020, StructureCitation Excerpt :When evaluated instead using rank-order coefficient and MCC, SimpleMachine's deficiencies are revealed, as are the deficiencies of other tools (Figure 3A). In light of the finding that computational tools identify stabilizing point mutations with a relatively low success rate of ∼20% (Figure 1; Table S2), it is counter-intuitive that similar success rates have been reported for de novo design and that the resulting proteins are often extremely stable (Koga et al., 2012; Parmeggiani et al., 2015). To understand this, we use the simplifying assumption that each point mutation is independent and model the expected experimental stabilization resulting from multiple mutations (from the pool of those predicted to stabilize, Figure 1A).
Development and applications of artificial symmetrical proteins
2020, Computational and Structural Biotechnology Journal
- †
F.P. and P.-S.H. contributed equally to this work.
- 1
Present address: S. Caprari, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.