Abstract
Directed evolution and protein engineering approaches used to generate novel or enhanced biomolecular function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries containing millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomolecular properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-standard nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field of ‘evolutionary synthetic biology’.
Similar content being viewed by others
References
Arnold FH, Georgiou G (2003) Directed enzyme evolution: screening and selection methods. Humana Press, Totowa, New Jersey
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR (2004) The Pfam protein families database. Nucleic Acids Res 32:D138–D141
Benner SA, Gaucher EA (2001) Evolution, language and analogy in functional genomics. Trends Genet 17:414–418
Bielawski JP, Yang Z (2004) A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J Mol Evol 59:121–132
Brakmann S (2001) Discovery of superior enzymes by directed molecular evolution. Chembiochem 2:865–871
Bridgham JT, Ortlund EA, Thornton JW (2009) An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461:515–519
Chen F, Gaucher EA, Leal NA, Hutter D, Havemann SA, Govindarajan S, Ortlund EA, Benner SA (2010) Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection. Proc Natl Acad Sci USA 107:1948–1953
Crameri A, Whitehorn EA, Tate E, Stemmer WP (1996) Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat Biotechnol 14:315–319
Crameri A, Raillard SA, Bermudez E, Stemmer WP (1998) DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature 391:288–291
Fox R, Roy A, Govindarajan S, Minshull J, Gustafsson C, Jones JT, Emig R (2003) Optimizing the search algorithm for protein engineering by directed evolution. Protein Eng 16:589–597
Fox RJ, Davis SC, Mundorff EC, Newman LM, Gavrilovic V, Ma SK, Chung LM, Ching C, Tam S, Muley S, Grate J, Gruber J, Whitman JC, Sheldon RA, Huisman GW (2007) Improving catalytic function by ProSAR-driven enzyme evolution. Nat Biotechnol 25:338–344
Gaucher EA (2007) Ancestral sequence reconstruction as a tool to understand natural history and guide synthetic biology: realizing and extending the vision of Zukerkandl and Pauling. Oxford University Press, Oxford, pp 20–33
Gaucher EA, Miyamoto MM, Benner SA (2001) Function–structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. Proc Natl Acad Sci USA 98:548–552
Gaucher EA, Das UK, Miyamoto MM, Benner SA (2002a) The crystal structure of eEF1A refines the functional predictions of an evolutionary analysis of rate changes among elongation factors. Mol Biol Evol 19:569–573
Gaucher EA, Gu X, Miyamoto MM, Benner SA (2002b) Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci 27:315–321
Gaucher EA, Miyamoto MM, Benner SA (2003) Evolutionary, structural and biochemical evidence for a new interaction site of the leptin obesity protein. Genetics 163:1549–1553
Gaucher EA, Govindarajan S, Ganesh OK (2008) Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature 451:704–707
Govindarajan S, Ness JE, Kim S, Mundorff EC, Minshull J, Gustafsson C (2003) Systematic variation of amino acid substitutions for stringent assessment of pairwise covariation. J Mol Biol 328:1061–1069
Gu X (2001) Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol 18:453–464
Gu X, Vander Velden K (2002) DIVERGE: phylogeny-based analysis for functional–structural divergence of a protein family. Bioinformatics 18:500–501
Harms MJ, Thornton JW (2010) Analyzing protein structure and function using ancestral gene reconstruction. Curr Opin Struct Biol 20:360–366
Henry AA, Romesberg FE (2005) The evolution of DNA polymerases with novel activities. Curr Opin Biotechnol 16:370–377
Horlacher J, Hottiger M, Podust VN, Hubscher U, Benner SA (1995) Recognition by viral and cellular DNA polymerases of nucleosides bearing bases with nonstandard hydrogen bonding patterns. Proc Natl Acad Sci USA 92:6329–6333
Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Evolution—Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314
Kimura M (1991) The neutral theory of molecular evolution: a review of recent evidence. Jpn J Genet 66:367–386
Knudsen B, Miyamoto MM (2001) A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc Natl Acad Sci USA 98:14512–14517
Korkegian A, Black ME, Baker D, Stoddard BL (2005) Computational thermostabilization of an enzyme. Science 308:857–860
Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N (2005) ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res 33:W299–W302
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and clustal X version 2.0. Bioinformatics 23:2947–2948
Leal NA, Sukeda M, Benner SA (2006) Dynamic assembly of primers on nucleic acid templates. Nucleic Acids Res 34:4702–4710
Lehman N, Unrau PJ (2005) Recombination during in vitro evolution. J Mol Evol 61:245–252
Lenski RE, Winkworth CL, Riley MA (2003) Rates of DNA sequence evolution in experimental populations of Escherichia coli during 20,000 generations. J Mol Evol 56:498–508
Liao J, Warmuth MK, Govindarajan S, Ness JE, Wang RP, Gustafsson C, Minshull J (2007) Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol 7:16
Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358
Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19:1–7
Lutz S, Patrick WM (2004) Novel methods for directed evolution of enzymes: quality, not quantity. Curr Opin Biotechnol 15:291–297
Ness JE, Kim S, Gottman A, Pak R, Krebber A, Borchert TV, Govindarajan S, Mundorff EC, Minshull J (2002) Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently. Nat Biotechnol 20:1251–1255
Ness JE, Cox AJ, Govindarajan S, Gustafsson C, Gross RA, Minshull J (2005) Empirical biocatalyst engineering: escaping the tyranny of high throughput screening. American Chemical Society, Washington, DC
Patel PH, Kawate H, Adman E, Ashbach M, Loeb LA (2001) A single highly mutable catalytic site amino acid is critical for DNA polymerase fidelity. J Biol Chem 276:5044–5051
Perez-Jimenez R, Li JY, Kosuri P, Sanchez-Romero I, Wiita AP, Rodriguez-Larrea D, Chueca A, Holmgren A, Miranda-Vizuete A, Becker K, Cho SH, Beckwith J, Gelhaye E, Jacquot JP, Gaucher EA, Sanchez-Ruiz JM, Berne BJ, Fernandez JM (2009) Diversity of chemical mechanisms in thioredoxin catalysis revealed by single-molecule force spectroscopy (vol 16, pg 890, 2009). Nat Struct Mol Biol 16:1331
Perfeito L, Fernandes L, Mota C, Gordo I (2007) Adaptive mutations in bacteria: high rate and small effects. Science 317:813–815
Pupko T, Galtier N (2002) A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc Biol Sci 269:1313–1316
Saraf MC, Horswill AR, Benkovic SJ, Maranas CD (2004) FamClash: a method for ranking the activity of engineered enzymes. Proc Natl Acad Sci USA 101:4142–4147
Sismour AM, Lutz S, Park JH, Lutz MJ, Boyer PL, Hughes SH, Benner SA (2004) PCR amplification of DNA containing non-standard base pairs by variants of reverse transcriptase from Human Immunodeficiency Virus-1. Nucleic Acids Res 32:728–735
Skovgaard M, Kodra JT, Gram DX, Knudsen SM, Madsen D, Liberles DA (2006) Using evolutionary information and ancestral sequences to understand the sequence–function relationship in GLP-1 agonists. J Mol Biol 363:977–988
Tabor S, Richardson CC (1995) A single residue in DNA polymerases of the Escherichia coli DNA polymerase I family is critical for distinguishing between deoxy- and dideoxyribonucleotides. Proc Natl Acad Sci USA 92:6339–6343
Taverna DM, Goldstein RA (2002) Why are proteins marginally stable? Proteins 46:105–109
Thornton JW (2004) Resurrecting ancient genes: experimental analysis of extinct molecules. Nat Rev Genet 5:366–375
Tobin MB, Gustafsson C, Huisman GW (2000) Directed evolution: the ‘rational’ basis for ‘irrational’ design. Curr Opin Struct Biol 10:421–427
Van Regenmortel MH (2000) Are there two distinct research strategies for developing biologically active molecules: rational design and empirical selection? J Mol Recognit 13:1–4
Voigt CA, Mayo SL, Arnold FH, Wang ZG (2001) Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci USA 98:3778–3783
Wang ZO, Pollock DD (2005) Context dependence and coevolution among amino acid residues in proteins. Methods Enzymol 395:779–790
Wang HC, Spencer M, Susko E, Roger AJ (2007) Testing for covarion-like evolution in protein sequences. Mol Biol Evol 24:294–305
Wong WS, Yang Z, Goldman N, Nielsen R (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168:1041–1051
Yamashiro K, Yokobori S, Koikeda S, Yamagishi A (2010) Improvement of Bacillus circulans beta-amylase activity attained using the ancestral mutation method. Protein Eng Des Sel 23:519–528
Yang ZH (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591
Yang ZH, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino-acid-sequences. Genetics 141:1641–1650
You L, Arnold FH (1996) Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide. Protein Eng 9:77–83
Yuen CM, Liu DR (2007) Dissecting protein structure and function using directed evolution. Nat Methods 4:995–997
Acknowledgments
This work was supported by a National Institutes of Health grant to EAG. MFC was supported by an NIH NRSA award and in part by the Emory University Fellowship in Research and Science Teaching (FIRST) program’s NIH/NIGMS IRACDA grant number K12 GM000680-11. This work was also supported by the National Aeronautics and Space Administration’s Exobiology and Astrobiology Programs.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cole, M.F., Gaucher, E.A. Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering. J Mol Evol 72, 193–203 (2011). https://doi.org/10.1007/s00239-010-9415-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-010-9415-2