Beyond directed evolution—semi-rational protein engineering and design

https://doi.org/10.1016/j.copbio.2010.08.011Get rights and content

Over the past two decades, directed evolution has transformed the field of protein engineering. The advances in understanding protein structure and function, in no insignificant part a result of directed evolution studies, are increasingly empowering scientists and engineers to device more effective methods for manipulating and tailoring biocatalysts. Abandoning large combinatorial libraries, the focus has shifted to small, functionally rich libraries and rational design. A critical component to the success of these emerging engineering strategies are computational tools for the evaluation of protein sequence datasets and the analysis of conformational variations of amino acids in proteins. Highlighting the opportunities and limitations of such approaches, this review focuses on recent engineering and design examples that require screening or selection of small libraries.

Introduction

Enzymes are highly versatile and proficient catalysts. Optimized by Darwinian evolution over millions of years, they can greatly accelerate chemical reactions while ensuring high substrate specificity, as well as exquisite enantioselectivity and stereoselectivity. These performance features make biocatalysts attractive candidates for asymmetric synthesis in the laboratory and industrial processes. However, there are often significant discrepancies between an enzyme's function in nature and the specific requirements for ex vivo applications envisioned by scientists and engineers. Enzyme engineering by directed evolution has become the strategy of choice for tailoring the catalytic, biophysical and molecular recognition properties of target proteins [1].

Traditionally, directed evolution relies on an iterative two-step protocol, initially generating molecular diversity by random mutagenesis and in vitro recombination, then identifying library members with improvements in desired phenotype by high-throughput screening or selection. The approach can be problematic as even protein libraries with millions of members still sample only a tiny fraction of the vast sequence space possible for an average protein. Biases in the experimental methods and the degeneracy of the genetic code further skew and restrict the library design [2]. Rather than addressing these problems through bigger libraries and more screening or selection, many researchers are moving beyond traditional directed evolution, instead advocating new strategies for designing smaller, higher quality libraries.

Often referred to as semi-rational, smart or knowledge-based library design, these approaches utilize information on protein sequence, structure and function, as well as computational predictive algorithms to preselect promising target sites and limited amino acid diversity for protein engineering. The focus on specific amino acid positions translates into dramatically reduced library sizes while the consideration of evolutionary variability, topological constraints and mechanistic features to weigh in on amino acid identity can result in libraries with higher functional content. In addition to the sequence and structure-based design strategies, QM and MD calculations, as well as machine-learning algorithms have become invaluable tools to effectively explore the impact of amino acid substitutions on protein structure and stability. Together, these concepts offer promising predictors for altering protein features such as substrate specificity, stereoselectivity and stability by enzyme redesign (but leaving the catalytic machinery of the native biocatalyst intact), as well as the creation of new function by de novo design.

From a practical perspective, semi-rational protein engineering can significantly increase the efficiency of biocatalyst tailoring. Besides typically requiring fewer iterations to identify variants with the desired phenotype, the generation of small high-quality libraries can largely eliminate the need for high-throughput methods in library analysis. The smaller number of variants also creates new opportunities for the evaluation of library members by protocols not amendable to a high-throughput format. Finally, these design strategies provide an intellectual framework to predict and rationalize experimental findings, taking the field from discovery-based towards hypothesis-driven protein engineering. To highlight the rapidly growing number of successful enzyme engineering studies by semi-rational and computer-guided protein design, this review concentrates (with few exceptions) on recent studies that required libraries of less than 1000 members (Table 1).

Section snippets

Sequence-based enzyme redesign

A popular strategy to more effectively navigate and identify ‘islands’ of functionality in protein sequence space has been the use of evolutionary information. Multiple sequence alignments (MSAs) and phylogenetic analyses have become standard tools for the exploration of amino acid conservation and ancestral relationships among groups of homologous protein sequences and structures. Whether these statistics are derived from large natural sequence pools or through neutral drift experiments in the

Structure-based enzyme redesign

Protein function is usually intimately linked to three-dimensional structure, making the substitution of one or more amino acids in macromolecules a function of not just sequence context but also structural topology. The rapidly growing number of protein structures in the PDB and advances in homology modeling offer valuable assistance for protein engineers to more effectively locate key residues near active sites and at domain interfaces or hinge regions which can translate into superior

Computational enzyme redesign

Advances in computational protein design algorithms have made in silico modeling a highly promising strategy for the tailoring of biocatalysts. Rather than relying on evolutionary information as a guide for sequence alterations and combinatorial library preparation at the bench, computational methods can effectively estimate the energetics of amino acid variations on the overall protein structure through the use of rotamer libraries and backbone reorganization, hence reducing experimental

Computational de novo enzyme design

In the spirit of Richard Feynman's quote ‘what I cannot create, I do not understand’, the ultimate enzyme engineering challenge is not so much about engineering but rational design. Instead of remodeling an existing enzyme, the creation of biocatalysts from scratch not only offers potential practical benefits in that it empowers scientists and engineers to build synthetic enzymes for any chemical transformation, it also presents a testing ground for our fundamental understanding of the

Concluding remarks

The methodological advances in semi-rational enzyme engineering and de novo enzyme design in recent years provide researchers with powerful and effective new strategies to manipulate biocatalysts. As the examples in this review demonstrate, the integration of sequence and structure-based approaches in library preparation has already proven a potent guide to enzyme redesign. In the case of computational de novo and redesign methods, current models still tend to lag behind laboratory-evolved

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

This work was supported partly by the National Institutes of Health (GM69958), the US National Science Foundation (CBET-0730312) and a grant from the Petroleum Research Fund by the American Chemical Society (PRF 47135-AC1). Thanks also to the members of the Lutz lab for helpful comments on the manuscript.

References (40)

  • H.J. Joosten et al.

    Identification of fungal oxaloacetate hydrolyase within the isocitrate lyase/PEP mutase enzyme superfamily using a sequence marker-based method

    Proteins

    (2008)
  • Jochens H, Bornscheuer UT: Natural diversity to guide focused directed evolution. ChemBioChem 2010, 11:...
  • T. Bosma et al.

    Biodegradation of 1,2,3-trichloropropane through directed evolution and heterologous expression of a haloalkane dehalogenase gene

    Appl Environ Microbiol

    (2002)
  • P. Banas et al.

    Mechanism of enhanced conversion of 1,2,3-trichloropropane by mutant haloalkane dehalogenase revealed by molecular modeling

    J. Comput Aided Mol Des

    (2006)
  • J. Ehren et al.

    Protein engineering of improved prolyl endopeptidases for celiac sprue therapy

    Protein Eng Des Sel

    (2008)
  • J. Liao et al.

    Engineering proteinase K using machine learning and synthetic genes

    BMC Biotechnol

    (2007)
  • J. Minshull et al.

    Engineered protein function by selective amino acid diversification

    Methods

    (2004)
  • F. Chen et al.

    Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection

    Proc Natl Acad Sci U S A

    (2010)
  • E. Malito et al.

    Crystal structure of a Baeyer-Villiger monooxygenase

    Proc Natl Acad Sci U S A

    (2004)
  • M.W. Fraaije et al.

    Discovery of a thermostable Baeyer-Villiger monooxygenase by genome mining

    Appl Microbiol Biotechnol

    (2005)
  • Cited by (379)

    View all citing articles on Scopus
    View full text