Journal of Molecular Biology
Volume 427, Issue 2, 30 January 2015, Pages 563-575
Journal home page for Journal of Molecular Biology

A General Computational Approach for Repeat Protein Design

https://doi.org/10.1016/j.jmb.2014.11.005Get rights and content

Highlights

  • Several repeat proteins are potential scaffolds for biomaterials and binders.

  • We developed a general computational method for design of repeat proteins.

  • Forty percent of the tested designs were soluble, monomeric and folded with high stability.

  • Crystal structures confirmed the high accuracy of the models.

  • The method can be used to generate easily modular proteins with desired topologies.

Abstract

Repeat proteins have considerable potential for use as modular binding reagents or biomaterials in biomedical and nanotechnology applications. Here we describe a general computational method for building idealized repeats that integrates available family sequences and structural information with Rosetta de novo protein design calculations. Idealized designs from six different repeat families were generated and experimentally characterized; 80% of the proteins were expressed and soluble and more than 40% were folded and monomeric with high thermal stability. Crystal structures determined for members of three families are within 1 Å root-mean-square deviation to the design models. The method provides a general approach for fast and reliable generation of stable modular repeat protein scaffolds.

Introduction

Repeat proteins play key roles in biological processes ranging from adhesion to signaling to defense mechanisms [1]. These proteins consist of adjacent series of usually non-identical repeated amino acid sequences; in most cases, these repeated units fold cooperatively into either a solenoid-shaped or a toroid-shaped structure [2], [3], [4]. Although extremely diverse in structure and sequence, repeat proteins are characterized by short-ranged intra-repeat and inter-repeat interactions between residues [2]. The intrinsic modularity of repeat proteins allows combination of functionalities in a single domain (e.g., recognition motifs for nucleic acids [5] and peptides [6]) and can be used to generate biomaterials with tunable mechanical properties [7]. However, interactions between neighboring repeats are not always conserved; hence arbitrary extension by repeat insertion is not usually possible.

To allow modular extension, designed repeat proteins with self-compatible repeating elements have been generated using consensus-based approaches [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18]. Consensus sequences are defined by the most common residue at each position in a multiple sequence alignment (MSA) of the proteins or repeats in a family. This approach is conceptually simple and powerful but does have non-optimal features. First, the consensus sequence can vary depending on the collection size and the selection method for the sequences used in the alignment. Second, residue–residue packing, particularly critical in the formation of a uniquely defined hydrophobic core, is not considered, and hence in some cases, the consensus may have sub-optimal residue–residue interactions. Incorporating amino acid covariation information derived from statistical analysis of naturally occurring sequences can capture some of these residue–residue coupling effects [19], [20], [21], but reliable estimates of covariance require large numbers of sequences that are not available for all protein families.

Here we describe a general computational approach for repeat protein design that integrates Rosetta de novo structure generation and design methodology with protein family-based sequence and structural information. By automatically generating very low energy design models compatible with the available sequence and structure information, the method provides increased versatility compared to standard sequence consensus-based approaches and reduces the manual intervention required to achieve stable designs.

Section snippets

Results and Discussion

We developed a computational approach that integrates sequence and structural information with Rosetta [22] de novo folding and design calculations for the generation of idealized repeat proteins (Fig. 1). Families with α helical, β and mixed α/β secondary structure were chosen for redesign to illustrate the generality of the method. Sets of sequences were designed for six protein families: ankyrin (ank), armadillo (arm), tetratrico peptide repeat (TPR), HEAT, leucine-rich repeats (LRR) and

Conclusions

The approach presented here generalizes the current MSA-based methods for repeat protein design by automatically integrating sequence, structure and energetic information. Designing backbones de novo avoids potential bias due to the use of a single or few template structures.

Forty percent of the proteins designed with our method were folded and had a melting temperature (Tm) of 57 °C or greater (Table 2). The crystal structures we were able to solve had an RMSD of about 1 Å to the design models.

Generation of sequence and structural constraints

The repeat consensus sequences were obtained from family alignments in the SMART database [29], [33]. For the HEAT family, not present in SMART, the Pfam [30] seed alignment was used. A double-repeat sequence was generated by duplication of the consensus. When the consensus sequence did not cover the whole repeat, connecting fragments were added using alanine residues as placeholders. The length of this linking sequence was based on the shortest connection observed, with at least a 10%

Acknowledgments

We thank the members of the protein production facility at the Institute for Protein Design, Seattle, WA, Sergey Ovchinnikov and Hetu Kamisetty for fruitful discussions on MSA, and James Thompson and Justin Ashworth for SequenceProfile implementation in Rosetta. This work was facilitated though the use of advanced computational, storage and networking infrastructure provided by the Hyak supercomputer system at the University of Washington. For technical assistance and coordination of efforts at

References (54)

  • M. Nikkhah et al.

    Engineering of beta-propeller protein scaffolds by multiple gene duplication and fusion of an idealized WD repeat

    Biomol Eng

    (2006)
  • H. Baabur-Cohen et al.

    Artificial leucine rich repeats as new scaffolds for protein design

    Bioorg Med Chem Lett

    (2011)
  • T.J. Magliery et al.

    Beyond consensus: statistical free energies reveal hidden interactions in the design of a TPR motif

    J Mol Biol

    (2004)
  • B.J. Sullivan et al.

    Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability

    J Mol Biol

    (2012)
  • A. Leaver-Fay et al.

    ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules

    Methods Enzymol

    (2011)
  • M.A. Andrade et al.

    Comparison of ARM and HEAT protein repeats

    J Mol Biol

    (2001)
  • R. Xiao et al.

    The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium

    J Struct Biol

    (2010)
  • S. Doublié et al.

    Crystallization and preliminary X-ray analysis of the 9 kDa protein of the mouse signal recognition particle and the selenomethionyl-SRP9

    FEBS Lett

    (1996)
  • Z. Otwinowski et al.

    [20] Processing of X-ray diffraction data collected in oscillation mode

  • J. Jorda et al.

    PRDB: Protein Repeat DataBase

    Proteomics

    (2012)
  • A. Filipovska et al.

    Modular recognition of nucleic acids by PUF, TALE and PPR proteins

    Mol Biosyst

    (2012)
  • T.Z. Grove et al.

    A modular approach to the design of protein-based smart gels

    Biopolymers

    (2012)
  • L.K. Mosavi et al.

    Consensus-derived structural determinants of the ankyrin repeat motif

    Proc Natl Acad Sci

    (2002)
  • S.-C. Lee et al.

    Design of a binding scaffold based on variable lymphocyte receptors of jawless vertebrates by module engineering

    Proc Natl Acad Sci

    (2012)
  • R. Parker et al.

    Consensus design of a NOD receptor leucine rich repeat domain with binding affinity for a muramyl dipeptide, a bacterial cell wall fragment

    Protein Sci

    (2014)
  • M. Socolich et al.

    Evolutionary information for specifying a protein fold

    Nature

    (2005)
  • B. Kuhlman et al.

    Design of a novel globular protein fold with atomic-level accuracy

    Science

    (2003)
  • Cited by (66)

    • Repeat proteins: designing new shapes and functions for solenoid folds

      2021, Current Opinion in Structural Biology
      Citation Excerpt :

      These goals have been achieved by designing proteins with high thermodynamic stability composed of self-compatible repeats obtained by sequence-based (consensus design, ancestral reconstruction) or structure-based (template superposition, docking) approaches, reviewed by Parmeggiani and Huang [6]. Template-free designs have been generated using fragment assemblies guided by secondary structure assignment [7], but, recently, tools for parametric design of coiled coils in the ISAMBARD software have been extended to allow design of repeat proteins [8]. Fragment assembly has been also used to create entirely new designed helical repeat proteins based on duplications of helix-loop-helix-loop units with no similarity with existing repeat protein families and extreme stability, either in open solenoid form [9,10] or closed toroid shape [11,12••].

    • Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems

      2020, Structure
      Citation Excerpt :

      When evaluated instead using rank-order coefficient and MCC, SimpleMachine's deficiencies are revealed, as are the deficiencies of other tools (Figure 3A). In light of the finding that computational tools identify stabilizing point mutations with a relatively low success rate of ∼20% (Figure 1; Table S2), it is counter-intuitive that similar success rates have been reported for de novo design and that the resulting proteins are often extremely stable (Koga et al., 2012; Parmeggiani et al., 2015). To understand this, we use the simplifying assumption that each point mutation is independent and model the expected experimental stabilization resulting from multiple mutations (from the pool of those predicted to stabilize, Figure 1A).

    • Development and applications of artificial symmetrical proteins

      2020, Computational and Structural Biotechnology Journal
    View all citing articles on Scopus

    F.P. and P.-S.H. contributed equally to this work.

    1

    Present address: S. Caprari, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.

    View full text