Abstract
Deep mutational scanning has emerged as a promising tool for mapping sequence-activity relationships in proteins1–4, RNA5 and DNA6–8. In this approach, diverse variants of a sequence of interest are first ranked according to their activities in a relevant pooled assay, and this ranking is then used to infer the shape of the fitness landscape around the wild-type sequence. Little is currently know, however, about the degree to which such fitness landscapes are dependent on the specific assay conditions from which they are inferred. To explore this issue, we performed deep mutational scanning of APH(3’)II, a Tn5 transposon-derived kinase that confers resistance to aminoglycoside antibiotics9, in E. coli under selection with each of six structurally diverse antibiotics at a range of inhibitory concentrations. We found that the resulting fitness landscapes showed significant dependence on both antibiotic structure and concentration. This shows that the notion of essential amino acid residues is context-dependent, but also that this dependence can be exploited to guide protein engineering. Specifically, we found that differential analysis of fitness landscapes allowed us to generate synthetic APH(3’)II variants with orthogonal substrate specificities.
To enable efficient generation of comprehensive and relatively unbiased, single amino acid substitution libraries for deep mutational scanning and related applications, we developed a highly multiplexed approach to site-directed mutagenesis that we refer to as Mutagenesis by Integrated TilEs (MITE; Fig. 1). Briefly, we design a set of long DNA oligonucleotides that encode all desired mutations, but are otherwise homologous to the template sequence. The oligonucleotides are organized into ‘tiles’, where those within each tile differ in a central variable region but share identical 5’ and 3’ ends. The tiles are staggered such that their variable regions collectively span the entire template. The oligonucleotides are synthesized on a programmable microarray10 and individual tiles are then PCR amplified using primers complementary to their shared ends. To avoid hybridization and extension of partially overlapping oligonucleotides, the tiles can be split into two non-overlapping pools that are synthesized and amplified separately. Finally, the PCR products from each tile are inserted into linearized plasmids that carry the remaining template sequence using multiplexed sequence- and ligation-independent cloning. Because only a portion of each resulting mutant sequence is derived from the oligonucleotide pool, this approach limits the impact of synthesis errors but still benefits from the cost efficiency of microarray-based DNA synthesis.
We next applied MITE to perform deep mutational scanning of the Tn5 transposon-derived aminoglycoside-3’-phosphotransferase-II (APH(3’)II), a 264 amino acid residues kinase that confers resistance to a variety of aminoglycoside antibiotics9, with the goal of elucidating which residues are essential for its activity and whether these residues are the same for different substrates. We first designed and synthesized six 200 nucleotide (nt) tiles that encoded all possible single amino acid substitutions across APH(3’)II. Each tile contained a 140 nt variable region flanked by 30 nt constant ends. To test whether the cost-efficiency of our method could be further improved by synthesizing multiple mutant libraries in parallel, we also included corresponding tiles for nine homologous proteins, resulting in two non-overlapping pools of 26,250 and 23,666 distinct oligonucleotide sequences (Supplementary Data 1). We found that all six APH(3’)II tiles could be selectively amplified from these pools with minimal optimization of PCR conditions. We then inserted these tiles into plasmids that carried the corresponding remainders of the APH(3’)II coding sequence. Each amplification and multiplexed cloning reaction was performed in duplicate to generate two mutant libraries.
To characterize the resulting libraries, we first shotgun-sequenced their APH(3’)II coding regions to a depth of ∼120,000X (Supplementary Fig. 1). At this depth, we could detect 4,993 of 4,997 possible substitutions and found that the frequency of each mutant amino acid at each position was close to the expected value from an ideal single substitution library (observed median = 0.016%, interquartile range = 0.011%-0.023%, expected = 0.020%). Amino acid mutations caused by single nucleotide substitutions were slightly overrepresented, which is likely due to the combined effects of synthesis- and sequencing-related errors. We next sub-cloned 90 plasmids from the libraries for full-length sequencing of the coding region and found that the majority (90%) of these clones carried a single amino acid substitution, while the rest carried either none (2.2%) or two substitutions (7.8%). To ensure full representation and minimize the impact of synthesis errors, each library was titrated to contain plasmids from ∼107 transformants.
To perform mutational scanning in vivo, we cultured E. coli transformed with APH(3’)II substitution libraries in liquid media supplemented with decreasing concentrations of one of six aminoglycoside antibiotics with diverse structures and a wide range of potencies: kanamycin, ribostamycin, G418, amikacin, neomycin or paromomycin (Supplementary Fig. 2). To normalize the selective pressures induced by the different antibiotics, we first established their minimum inhibitory concentrations (MICs) for E. coli transformed with “wild-type” (WT) APH(3’)II and then used 1:1, 1:2, 1:4 and 1:8 dilutions of WT MICs for the respective library selections. We note that, with the exception of G418 and amikacin, even the 1:8 dilutions were sufficient to inhibit growth of untransformed E. coli. After overnight selection, plasmids were isolated from the surviving cells and their coding regions were shotgun sequenced to compare the amino acid composition of the selected libraries to that of the original input libraries (median coverage = 51,400X; Supplementary Data 2).
We began our analysis by examining changes in the relative abundance of mutant versus WT amino acids at each position after selection with kanamycin (Fig. 2a). At the highest concentration (1:1 WT MIC), we observed a significant depletion of mutant amino acids at approximately 210 of 263 positions (χ2-test at the 5% false discovery rate (FDR) threshold, as determined by the Benjamini-Hochberg procedure12). At lower concentrations, we observed both a decrease in the number of positions with significant depletion of mutations (approximately 155, 130 and 113 at 1:2, 1:4 and 1:8 WT MIC, respectively; 5% FDR) and an increase in the diversity of amino acids that were tolerated at the remaining positions. Conservative substitutions to amino acids with biochemical properties that were similar to the WT residue were on average ∼1.4-fold more likely to be tolerated compared to non-conservative substitutions13, regardless of the kanamycin concentration (Supplementary Fig. 3a). The tolerance for specific substitutions appeared, however, to be both concentration- and position-dependent. Importantly, the observed changes in amino acid frequencies were highly consistent between independent selections (Pearson’s r = 0.76-0.91 across the four kanamycin dilutions; Supplementary Fig. 4).
To better understand the patterns of selection, we projected them onto a previously reported co-crystal structure of APH(3’)II in complex with kanamycin9 (Fig. 2b). These projections show that while mutations are depleted from positions all over the protein at high concentrations, the positions that do not tolerate mutations even at low concentrations tend to cluster internally and near the active site of the enzyme. The projections also provide insights into selection patterns at specific positions. For example, residues 190, 157 and 154 are all aspartates in the vicinity of the active site in the WT enzyme (Supplementary Fig. 3b). At 1:1 WT MIC, we observed depletion of all amino acids except the wild-type aspartates at 190 and 157, while both aspartate and glycine were tolerated at 154. At more moderate selection pressures (1:4 WT MIC), aspartate remained the only amino acid tolerated at 190, aspartate and asparagine were both tolerated at 157, while a wide variety of amino acids, including glycine, valine and arginine, were tolerated at 154. Consistent with its apparent immutability, structural and functional studies have previously identified Asp190 as the catalytic residue in APH(3’)II and it is conserved across all known aminoglycoside phosphotransferases9. The crystal structure also shows that Asp157 forms a stabilizing hydrogen bond with the second ring of kanamycin, which explains why substitutions to amino acids other than the very similar asparginine might be deleterious. In contrast, Asp154 does not appear to directly interact with the substrate, which is consistent with the higher tolerance we observed for non-conservative substitutions at this position. These results support the quantitative and qualitative accuracy of our mutational scanning data, but also suggest that the distribution of relative fitness among the mutants is highly sensitive to the conditions in which they are assayed. We note that if this is a general phenomenon, then establishing quantitative control over selection pressures will be critical to interpretation of deep mutational scanning experiments.
To compare the selection patterns induced by kanamycin in our experiments to those that have molded APH(3’)II over evolutionary timescales, we examined a conservation profile derived from alignment of 133 homologs14. We found positive rank correlations between evolutionary conservation and depletion of mutant amino acids at most positions across the protein (from Spearman’s ρ = 0.51 at ∼1:1 WT MIC to ρ = 0.74 at 1:8 WT MIC; Supplementary Fig. 5a). The most common discrepancies were positions that were relatively conserved, yet highly tolerant of mutations under selection with kanamycin. For example, we identified 11 conserved residues that consistently tolerated non-WT amino acids even at 1:1 WT MIC (Supplementary Fig. 5b). Interestingly, 7 of the 11 residues directly flank the active site, but none of them appear to interact with kanamycin in the WT enzyme. This may reflect that kanamycin is not the substrate, or at least not the only substrate, that has driven the natural evolution of APH(3’)II, and consequently that artificial selection with kanamycin will tolerate or even favor some substitutions that would be deleterious in other contexts.
We next examined the effects of selection using the five other aminoglycosides. These substrates generated concentration- and position-dependent selection patterns that were qualitatively similar to those generated by kanamycin (Supplementary Figs. 6-10). To quantitatively compare the patterns, we ranked the twenty amino acids by the relative changes in their abundances after selection and then computed the rank correlations at each position for each pair of conditions. We found that selection with any two aminoglycosides at matched concentrations showed positive correlations at the majority of positions (for example, median Spearman’s ρ = 0.79 for kanamycin and paromomycin at 1:2 WT MIC and ρ > 0 at 235 of 264 positions at 5% FDR; Supplementary Fig. 11), which implies that the relative fitness values of the mutants were largely similar. We did, however, notice some substitutions that were consistently tolerated under selection with one substrate but depleted under selection with another. We hypothesized that further analysis of these exceptions might identify residues that influence the relative activity of APH(3’)II on different substrates.
To identify such specificity-determining residues, we queried our data for individual substitutions that showed significant depletion after kanamycin selection at 1:2 WT MIC (at 5% FDR) but no trend towards depletion after selection with a second aminoglycoside at its highest concentration. These stringent criteria identified a handful of substitutions that appeared to be well-tolerated in the presence of ribostamycin, G418 or paromomycin but not kanamycin (11, 3 and 4 single substitutions, respectively; Supplementary Fig. 12a; Supplementary Table 1). To confirm, we re-synthesized APH(3’)II variants containing a selection of these substitutions and compared their specificities towards the second aminoglycoside to that of WT APH(3’)II (here, we define specificity as the ratio of the MIC of the second aminoglycoside over the MIC of kanamycin). In six out of nine cases, the tested substitution led to a ≥2-fold increase in specificity towards the second aminoglycoside (range: 2- to 8-fold; see Supplementary Table 2). Moreover, in nine of eleven additional test cases, we found that combining two or more substitutions that favored the same aminoglycoside increased the specificity to a greater extent than any one of the substitutions did alone (range: 4- to 32-fold; Supplementary Table 2). This indicates that our approach accurately identified specificity-determining residues in APH(3’)II and that these residues can have additive effects.
To expand our search for specificity-determining residues, we next looked for substitutions that showed significant depletion after kanamycin selection at 1:2 WT MIC (at 5% FDR) but no trend towards depletion by the second aminoglycoside at its 1:2 WT MIC, or vice versa. These less stringent criteria identified additional candidates in every pairwise comparison (Fig. 3a-e; Supplementary Table 1). Many of these candidate substitutions flanked the active site, particularly at or near residues that interact with the second and third ring of kanamycin in the WT crystal structure (in particular, the α5-α7 and α9-α10 helices). This has been reported as an expected hotspot for specificity-determining residues, because the first ring is constrained by being the phosphate acceptor, while the remaining rings show significant structural diversity among the aminoglycosides9. Candidate substitutions were also frequently found in a relatively flexible part of the N-terminal domain that extends over the phosphate donor (ATP) and aminoglycoside binding sites (in particular, positions 27-57), and in the linker region between the N-terminal domain and the central core (in particular, position 96). These substitutions may influence the presentation of the two substrates near the catalytic residue, but the absence of ATP in the crystal structure limits potential insights from the available model. To confirm the expected functional effects of the identified substitutions, we synthesized and determined the substrate-specificity of five pairs of designed APH(3’)II variants, where each pair incorporated all substitutions that were preferentially tolerated by either kanamycin or the second aminoglycoside in the respective comparison (if multiple candidate substitutions were found at one position, we picked the one that showed the largest difference in abundance). Nine of the ten tested variants showed changes in specificity that favored the targeted substrate (range: 2- to 128-fold; Fig. 3f and Supplementary Table 2). The tenth variant had minimal activity on both substrates, which was likely a consequence of the relatively large number of substitutions in this variant (24 compared to 12 or less in the others). Indeed, in six of the nine successful cases, the expected increase in specificity also came with a concomitant but lesser decrease in absolute activity (as measured by the relevant MIC; Supplementary Table 2), which suggests a precarious balance between the two properties. Strikingly, the two variants that were designed to favor or disfavor paromomycin relative to kanamycin showed a >2,000-fold combined change in their specificities. In this case, a single round of MITE-enabled mutational scanning was therefore sufficient to derive two synthetic enzyme variants with essentially orthogonal substrate specificities from one non-specific progenitor (Fig. 3g).
In summary, MITE coupled with deep sequencing allowed us to perform quantitative mutational scanning of the complete APH(3’)II protein coding sequence with sufficient accuracy to identify individual kinase activity- and substrate specificity-determining residues. We emphasize that structural information or molecular dynamics simulations was not required or directly utilized to identify these residues. Our approach is therefore applicable even when accurate structural models are unavailable or intractable. Assuming that indirect selection strategies can be devised, it can also be extended to proteins and activities that do not confer a direct growth advantage. Notably, we have recently used a similar approach to identify activity- and specificity-determining nucleotides in mammalian gene regulatory elements8. In both cases, we found that combining mutational scanning data from two separate conditions is an effective method for identifying mutations that change the ratio of the activities in the two conditions in the desired direction. Engineering families of biomolecules with orthogonal activities is a central challenge in biotechnology and synthetic biology2,15, but traditional directed evolution is poorly suited to this task due to the difficulty of performing direct selection for the absence of activity in one or more conditions. We therefore expect that performing parallel assays of designed mutant libraries under multiple conditions, followed by integrative analysis and iterative synthesis of new variants, will be a useful strategy for dissecting and engineering a wide variety of biomolecular activities and interactions.
METHODS
Plasmid construction and cloning
All sequences used in this study, including gene synthesis products, oligonucleotide libraries and primers, are provided as Supplementary Data 1. Primers and individual oligonucleotides were synthesized by Integrated DNA Technologies (Coralville, IA), the two 200mer oligonucleotide pools containing mutagenesis tiles were synthesized by Agilent (Santa Clara, CA) as previously described10, the WT APH(3’)II coding region template for library construction was synthesized by GenScript USA (Piscataway, NJ), and the single- and multi-substitution synthetic ORF variants for validation of specificity-determining amino acid residues were synthesized by Gen9 (Cambridge, MA).
To generate a plasmid vector carrying a constitutive EM7 promoter (Life Technologies) and a synthetic gene encoding APH(‘3)II (neo; UniProtKB entry name: KKA2_KLEPN) flanked by two SacI restriction sites, we first combined a DNA fragment containing EM7 (template: EM7_pBR322; primers: EM7_Amp_F/R) with a PCR-linearized pBR322 backbone (template: pBR322; primers: pBR322_EM7_F/R) using the In-Fusion PCR Cloning System (Clontech Laboratories, Mountain View, CA). The resulting plasmid (pBR322[EM7]) was then re-linearized by inverse PCR (primers: pBR322_tetRLin_F/R) and combined with a PCR amplified neo ORF fragment (template: KKA2_KLEPN_opt; primers: KKA2_pBR322_F/R) by In-Fusion to replace the tetR ORF of pBR322 with neo. The final product (pBR322[EM7-neo]) was isolated using QIAprep Spin Miniprep kits (Qiagen, Gaitehrsburg, MD) and verified by Sanger sequencing (primers: pBR322_Seq_F/R).
To generate the APH(‘3)II single-substitution libraries, full-length oligonucleotides were first isolated from the synthesized pools using 10% TBE-Urea polyacrylamide gels (Life Technologies, Carlsbad, CA). Each pair of APH(3’)II tiles and corresponding linearized pBR322[EM7-neo] plasmids were PCR amplified by Herculase II DNA Polymerase (Agilent) with their respective primers (primer prefix: KKA2_TileAmp for tiles, KKA2_LinAmp for backbones), size selected on 1% E-Gel EX agarose gels (Life Technologies), purified with MinElute Gel Extraction kits (Qiagen) and combined using In-Fusion reactions. Each reaction was separately transformed into Stellar chemically competent cells (Clontech), the transformants were grown in LB media with carbenicillin (50 µg/mL) and plasmid DNA libraries were isolated using QIAprep Spin Miniprep kits (Qiagen). Finally, complete substitution libraries were generated by pooling equimolar amounts of the resulting six single-tile plasmid libraries.
To generate plasmids encoding selected single- and multiple-substitution APH(3’)II variants, we amplified the Gen9 synthetic ORFs using Herculase II DNA Polymerase (primers: KA2_pBR322_F/R) and inserted them into a PCR-linearized backbone (primers: pBR322_tetLin_F/R, template: pBR322[EM7]) using In-Fusion reactions. The resulting plasmids (pBR322[EM7-K1] through pBR322[EM7-K44]) were verified by Sanger sequencing and preserved in E. coli as glycerol stock.
Cell culture and selection
To determine the minimum inhibitory concentration (MIC) of each of the six aminoglycosides: kanamycin, ribostamycin, G418, amikacin, neomycin, paromomycin (Sigma-Aldrich, St. Louis, MO) in E. coli expressing WT APH(3’)II, Stellar cells were transformed with pBR322[EM7-neo], recovered in SOC medium (New England Biolabs (NEB), Ipswich, MA) at 37°C for 1 hr, diluted 1:100 with LB and then divided into 96-well growth blocks containing LB with carbenicillin (50 µg/mL) and 2-fold serial dilutions of aminoglycosides. After growth at 37°C with shaking for 24 hrs, the culture densities were assessed by absorbance at 600 nm (A600) using a NanoDrop 8000 (Thermo Scientific, Billerica, MA). The MIC for each aminoglycoside was estimated as the lowest dilution at which A600 was less than 0.025.
To perform mutational scanning, Stellar cells were transformed with 10 ng (∼3 fmol) of mutant library plasmids, recovered in SOC medium at 37°C for 1 hour, diluted into 15 mL LB with carbenicillin (50 µg/mL) and one of the aminoglycosides at 1:1, 1:2, 1:4 or 1:8 dilutions of the following concentrations (estimated 1:1 WT MICs): 225 µg/mL for kanamycin, 2500 µg/mL for ribostamycin, 5 µg/mL for G418, 10 µg/mL for amikacin (see notes in Supplementary Data 1), 40 µg/mL for neomycin and 320 µg/mL for paromomycin. The cultures were incubated in 50 mL tubes at 37°C with shaking for 24 hours, pelleted by centrifugation and frozen at −20°C. Plasmids were isolated from the pellets using QIAprep Spin Miniprep kits (Qiagen). Each transformation and selection was performed in duplicate, using each of the two independently generated mutant libraries.
To establish the substrate specificity of the synthetic APH(3’)II variants relative to WT APH(3’)II, glycerol stocks of the relevant clones (pBR322[EM7-K1] through pBR322[EM7-K44]) were streaked onto LB agar plates and cultured overnight at 37°C. Single colonies were then inoculated into 1 mL LB with 50 µg carbenicillin, incubated at 37°C with shaking for 6 hours, diluted 1:100 into LB media, and then split into 96-well deep well plates containing LB with carbenicillin (50 µg/mL) and 2-fold serial dilutions of aminoglycosides dilution series. We note that the longer recovery in these follow-up experiments increased the absolute MICs compared to those we measured immediately following transformation).
Mut-Seq
To sequence and count mutations, the APH(3’)II coding regions were first isolated from the plasmid pools by SacI digest (NEB) followed by agarose gel purification. The SacI fragments were ligated into high molecular weight concatemers using T4 DNA ligase (NEB). The concatemers were fragmented and converted to sequencing libraries using Nextera DNA Sample Prep kits (Illumina, San Diego CA). Library fragments from the 200-800 nt size range were selected using agarose gels and then sequenced on Illumina MiSeq instruments using 2×150 nt reads. The reads were subsequently aligned to a reference sequence consisting of concatenated copies of the WT APH(3’)II coding region using BWA version 0.5.9-r16 with default parameters16. The number of occurrences of each amino acid at each position along the coding region was then counted and tabulated from the aligned reads. Only codons for which all three sequenced and aligned nucleotides had phred quality scores ≥ 30 were included in these counts.
Computational analysis
Data processing and statistical analysis, including computation of Pearson’s and Spearman’s correlation coefficients, χ2-statistics and associated p-values, were performed using the Enthought Python Distribution (http://www.enthought.com), with IPython17, NumPy version 1.6.1, SciPy version 0.10.1 and matplotlib version 1.1.018. Rendering of the APH(3’)II crystal structure (PDB accession 1ND4) was performed using PyMOL version 1.5 (Schrödinger). The rendering of its secondary structure was derived from PDBsum at EMBL-EBI (http://www.ebi.ac.uk/pdbsum/). The evolutionary conservation profile for APH(3’)II was obtained from ConSurf-DB14.
The magnitude of changes in the abundance of mutant amino acids at each position after selection was estimated as where Mutinput and WTinput are, respectively, the observed counts of mutant and wild-type amino acids at that position in the input library and Mutselected and WTselected are the corresponding observed counts after selection.
The magnitude of changes in the abundance of each specific amino acid at each position after selection was estimated as where AAinput and ¬AAinput are, respectively, the observed counts of that amino acid and all other amino acids at that position in the input library and AAselected and ¬AAselected are the corresponding observed counts after selection.
The statistical significance of a deviation of any ΔMut or ΔAA from 1.0 was estimated using a χ2-test for independence on a 2×2 contingency table that contained the four corresponding counts with a pseudocount of 1 added to each. To correct for multiple hypothesis testing, the Benjamini-Hochberg procedure was applied to identify the 5% false discovery rate (FDR) threshold12. We note that ΔMut or ΔAA values greater than 1.0 do not necessarily indicate higher fitness relative to the WT APH(3’)II (i.e., positive selection), because the majority of ORFs that contributed to the count of WT residues at one position still carried substitutions at other positions.
Substitutions that favored growth under selection with one aminoglycoside relative to another were identified by requiring ΔAA < 1.0 at 5% FDR across two replicates under selection with the first and ΔAA ≥ 1.0 across two replicates for the same substitution under selection with the other, as well as a minimum difference of 0.5 between the log10-transformed ΔAA values, at the concentrations indicated in the main text. These thresholds were established empirically to select a limited number of high confidence candidates.
AUTHOR CONTRIBUTIONS
A.M. and T.S.M. developed the MITE method and designed the study. A. M., P. R., A. G. and T. S. M. performed or supervised the molecular biology experiments. L. W. and T. S. M. performed the cell culture experiments. T.S.M. analyzed the data and wrote the manuscript with substantial input from all authors.
ACCESSION NUMBERS
All sequencing data has been submitted to the NCBI Short Read Archive under accession [pending].
ACKNOWLEDGEMENTS
The authors would like to thank E. M. LeProust and S. Chen for oligonucleotide library synthesis, and X. Zhang, A. Majithia, G. S. Ramachandran, members of the Broad Technology Labs and the members of the Lander laboratory for discussions. This project was supported by funds from the Broad Institute.