Beyond directed evolution—semi-rational protein engineering and design
Introduction
Enzymes are highly versatile and proficient catalysts. Optimized by Darwinian evolution over millions of years, they can greatly accelerate chemical reactions while ensuring high substrate specificity, as well as exquisite enantioselectivity and stereoselectivity. These performance features make biocatalysts attractive candidates for asymmetric synthesis in the laboratory and industrial processes. However, there are often significant discrepancies between an enzyme's function in nature and the specific requirements for ex vivo applications envisioned by scientists and engineers. Enzyme engineering by directed evolution has become the strategy of choice for tailoring the catalytic, biophysical and molecular recognition properties of target proteins [1].
Traditionally, directed evolution relies on an iterative two-step protocol, initially generating molecular diversity by random mutagenesis and in vitro recombination, then identifying library members with improvements in desired phenotype by high-throughput screening or selection. The approach can be problematic as even protein libraries with millions of members still sample only a tiny fraction of the vast sequence space possible for an average protein. Biases in the experimental methods and the degeneracy of the genetic code further skew and restrict the library design [2]. Rather than addressing these problems through bigger libraries and more screening or selection, many researchers are moving beyond traditional directed evolution, instead advocating new strategies for designing smaller, higher quality libraries.
Often referred to as semi-rational, smart or knowledge-based library design, these approaches utilize information on protein sequence, structure and function, as well as computational predictive algorithms to preselect promising target sites and limited amino acid diversity for protein engineering. The focus on specific amino acid positions translates into dramatically reduced library sizes while the consideration of evolutionary variability, topological constraints and mechanistic features to weigh in on amino acid identity can result in libraries with higher functional content. In addition to the sequence and structure-based design strategies, QM and MD calculations, as well as machine-learning algorithms have become invaluable tools to effectively explore the impact of amino acid substitutions on protein structure and stability. Together, these concepts offer promising predictors for altering protein features such as substrate specificity, stereoselectivity and stability by enzyme redesign (but leaving the catalytic machinery of the native biocatalyst intact), as well as the creation of new function by de novo design.
From a practical perspective, semi-rational protein engineering can significantly increase the efficiency of biocatalyst tailoring. Besides typically requiring fewer iterations to identify variants with the desired phenotype, the generation of small high-quality libraries can largely eliminate the need for high-throughput methods in library analysis. The smaller number of variants also creates new opportunities for the evaluation of library members by protocols not amendable to a high-throughput format. Finally, these design strategies provide an intellectual framework to predict and rationalize experimental findings, taking the field from discovery-based towards hypothesis-driven protein engineering. To highlight the rapidly growing number of successful enzyme engineering studies by semi-rational and computer-guided protein design, this review concentrates (with few exceptions) on recent studies that required libraries of less than 1000 members (Table 1).
Section snippets
Sequence-based enzyme redesign
A popular strategy to more effectively navigate and identify ‘islands’ of functionality in protein sequence space has been the use of evolutionary information. Multiple sequence alignments (MSAs) and phylogenetic analyses have become standard tools for the exploration of amino acid conservation and ancestral relationships among groups of homologous protein sequences and structures. Whether these statistics are derived from large natural sequence pools or through neutral drift experiments in the
Structure-based enzyme redesign
Protein function is usually intimately linked to three-dimensional structure, making the substitution of one or more amino acids in macromolecules a function of not just sequence context but also structural topology. The rapidly growing number of protein structures in the PDB and advances in homology modeling offer valuable assistance for protein engineers to more effectively locate key residues near active sites and at domain interfaces or hinge regions which can translate into superior
Computational enzyme redesign
Advances in computational protein design algorithms have made in silico modeling a highly promising strategy for the tailoring of biocatalysts. Rather than relying on evolutionary information as a guide for sequence alterations and combinatorial library preparation at the bench, computational methods can effectively estimate the energetics of amino acid variations on the overall protein structure through the use of rotamer libraries and backbone reorganization, hence reducing experimental
Computational de novo enzyme design
In the spirit of Richard Feynman's quote ‘what I cannot create, I do not understand’, the ultimate enzyme engineering challenge is not so much about engineering but rational design. Instead of remodeling an existing enzyme, the creation of biocatalysts from scratch not only offers potential practical benefits in that it empowers scientists and engineers to build synthetic enzymes for any chemical transformation, it also presents a testing ground for our fundamental understanding of the
Concluding remarks
The methodological advances in semi-rational enzyme engineering and de novo enzyme design in recent years provide researchers with powerful and effective new strategies to manipulate biocatalysts. As the examples in this review demonstrate, the integration of sequence and structure-based approaches in library preparation has already proven a potent guide to enzyme redesign. In the case of computational de novo and redesign methods, current models still tend to lag behind laboratory-evolved
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
This work was supported partly by the National Institutes of Health (GM69958), the US National Science Foundation (CBET-0730312) and a grant from the Petroleum Research Fund by the American Chemical Society (PRF 47135-AC1). Thanks also to the members of the Lutz lab for helpful comments on the manuscript.
References (40)
- et al.
Computational tools for designing and engineering biocatalysts
Curr Opin Chem Biol
(2009) Multiple protein sequence alignment
Curr Opin Struct Biol
(2008)- et al.
Induced allostery in the directed evolution of an enantioselective Baeyer-Villiger monooxygenase
Proc Natl Acad Sci U S A
(2010) - et al.
Macromolecular modeling with rosetta
Annu Rev Biochem
(2008) - et al.
The diversity challenge in directed protein evolution
Comb Chem High Throughput Screen
(2006) - et al.
HotSpot Wizard: a web server for identification of hot spots in protein engineering
Nucleic Acids Res
(2009) - et al.
Redesigning dehalogenase access tunnels as a strategy for degrading an anthropogenic substrate
Nat Chem Biol
(2009) - et al.
3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities
Proteins
(2010) - et al.
Correlated mutation analyses on super-family alignments reveal functionally important residues
Proteins
(2009)
Identification of fungal oxaloacetate hydrolyase within the isocitrate lyase/PEP mutase enzyme superfamily using a sequence marker-based method
Proteins
Biodegradation of 1,2,3-trichloropropane through directed evolution and heterologous expression of a haloalkane dehalogenase gene
Appl Environ Microbiol
Mechanism of enhanced conversion of 1,2,3-trichloropropane by mutant haloalkane dehalogenase revealed by molecular modeling
J. Comput Aided Mol Des
Protein engineering of improved prolyl endopeptidases for celiac sprue therapy
Protein Eng Des Sel
Engineering proteinase K using machine learning and synthetic genes
BMC Biotechnol
Engineered protein function by selective amino acid diversification
Methods
Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection
Proc Natl Acad Sci U S A
Crystal structure of a Baeyer-Villiger monooxygenase
Proc Natl Acad Sci U S A
Discovery of a thermostable Baeyer-Villiger monooxygenase by genome mining
Appl Microbiol Biotechnol
Cited by (379)
Tunnel engineering of gas-converting enzymes for inhibitor retardation and substrate acceleration
2024, Bioresource TechnologySequence-based Functional Metagenomics Reveals Novel Natural Diversity of Functional CopA in Environmental Microbiomes
2024, Genomics, Proteomics and Bioinformatics