SUMMARY
The rapid and escalating spread of SARS coronavirus 2 (SARS-CoV-2) poses an immediate public health emergency. The viral spike protein S binds ACE2 on host cells to initiate molecular events that release the viral genome intracellularly. Soluble ACE2 inhibits entry of both SARS and SARS-2 coronaviruses by acting as a decoy for S binding sites, and is a candidate for therapeutic and prophylactic development. Using deep mutagenesis, variants of ACE2 are identified with increased binding to the receptor binding domain of S. Mutations are found across the interface, in the N90-glycosylation motif, and at buried sites where they are predicted to enhance folding and presentation of the interaction epitope. When single substitutions are combined, large increases in binding can be achieved. The mutational landscape offers a blueprint for engineering high affinity proteins and peptides that block receptor binding sites on S to meet this unprecedented challenge.
In December, 2019, a novel zoonotic betacoronavirus closely related to bat coronaviruses spilled over to humans at the Huanan Seafood Market in the Chinese city of Wuhan (1, 2). The virus, called SARS-CoV-2 due to its similarities with the severe acute respiratory syndrome (SARS) coronavirus responsible for a smaller outbreak nearly two decades prior (3, 4), has since spread human-to-human rapidly across the world, precipitating extraordinary containment measures from governments (5). Stock markets have fallen, travel restrictions have been imposed, public gatherings canceled, and large numbers of people are self-isolating. These events are unlike any experienced in generations. Symptoms of coronavirus disease 2019 (COVID-19) range from mild to dry cough, fever, pneumonia and death, and SARS-CoV-2 is devastating among the elderly and other vulnerable groups (6, 7).
The S spike glycoprotein of SARS-CoV-2 binds angiotensin-converting enzyme 2 (ACE2) on host cells (2, 8–13). S is a trimeric class I viral fusion protein that is proteolytically processed into S1 and S2 subunits that remain noncovalently associated in a prefusion state (8, 11, 14). Upon engagement of ACE2 by a receptor binding domain (RBD) in S1 (15), conformational rearrangements occur that cause S1 shedding, cleavage of S2 by host proteases, and exposure of a fusion peptide adjacent to the S2’ proteolysis site (14, 16–18). Favorable folding of S to a post-fusion conformation is coupled to host cell/virus membrane fusion and cytosolic release of viral RNA. Atomic contacts with the RBD are restricted to the protease domain of ACE2 (19, 20), and soluble ACE2 (sACE2) in which the neck and transmembrane domains are removed is sufficient for binding S and neutralizing infection (12, 21–24). In principle, the virus has limited potential to escape sACE2-mediated neutralization without simultaneously decreasing affinity for native ACE2 receptors, thereby attenuating virulence. Furthermore, fusion of sACE2 to the Fc region of human immunoglobulin can provide an avidity boost while recruiting immune effector functions and increasing serum stability, an especially desirable quality if intended for prophylaxis (23, 25), and sACE2 has proven safe in healthy human subjects (26) and patients with lung disease (27). Recombinant sACE2 has now been rushed into a clinical trial for COVID-19 in Guangdong province, China (Clinicaltrials.gov #NCT04287686), and peptide derivatives of ACE2 are also being explored as cell entry inhibitors (28).
Since human ACE2 has not evolved to recognize SARS-CoV-2 S, it was hypothesized that mutations may be found that increase affinity for therapeutic and diagnostic applications. The coding sequence of full length ACE2 with an N-terminal c-myc epitope tag was diversified to create a library containing all possible single amino acid substitutions at 117 sites spanning the entire interface with S and lining the substrate-binding cavity. S binding is independent of ACE2 catalytic activity (23) and occurs on the outer surface of ACE2 (19, 20), whereas angiotensin substrates bind within a deep cleft that houses the active site (29). Substitutions within the substrate-binding cleft of ACE2 therefore act as controls that are anticipated to have minimal impact on S interactions, yet may be useful for engineering out substrate affinity to enhance in vivo safety. It is important to note though that catalytically active protein may have desirable effects for replenishing lost ACE2 activity in COVID-19 patients in respiratory distress (30, 31).
The ACE2 library was transiently expressed in human Expi293F cells under conditions that typically yield no more than one coding variant per cell, providing a tight link between genotype and phenotype (32, 33). Cells were then incubated with a subsaturating dilution of medium containing the RBD of SARS-CoV-2 fused C-terminally to superfolder GFP (sfGFP: (34)) (Fig. 1A). Levels of bound RBD-sfGFP correlate with surface expression levels of myc-tagged ACE2 measured by dual color flow cytometry. Compared to cells expressing wild type ACE2 (Fig. 1C), many variants in the ACE2 library fail to bind RBD, while there appeared to be a smaller number of ACE2 variants with higher binding signals (Fig. 1D). Cells expressing ACE2 variants with high or low binding to RBD were collected by fluorescence-activated cell sorting (FACS), referred to as “nCoV-S-High” and “nCoV-S-Low” sorted populations, respectively. During FACS, fluorescence signal for bound RBD-sfGFP continuously declined, requiring the collection gates to be regularly updated to ‘chase’ the relevant populations. This is consistent with RBD dissociating over hours during the experiment. Reported affinities of RBD for ACE2 range from 1 to 15 nM (8, 10).
(A) Media from Expi293F cells secreting the SARS-CoV-2 RBD fused to sfGFP was collected and incubated at different dilutions with Expi293F cells expressing myc-tagged ACE2. Bound RBD-sfGFP was measured by flow cytometry. The dilutions of RBD-sfGFP-containing medium used for FACS selections are indicated by arrows.
(B-C) Expi293F cells were transfected with wild type ACE2 plasmid diluted with a large excess of carrier DNA. It has been previously shown that under these conditions, cells typically acquire no more than one coding plasmid and most cells are negative. Cells were incubated with RBD-sfGFP-containing medium and co-stained with fluorescent anti-myc to detect surface ACE2 by flow cytometry. During analysis, the top 67% (magenta gate) were chosen from the ACE2-positive population (purple gate) (B). Bound RBD was subsequently measured relative to surface ACE2 expression (C).
(D) Expi293F cells were transfected with an ACE2 single site-saturation mutagenesis library and analyzed as in B. During FACS, the top 15% of cells with bound RBD relative to ACE2 expression were collected (nCoV-S-High sort, green gate) and the bottom 20% were collected separately (nCoV-S-Low sort, blue gate).
Transcripts in the sorted populations were deep sequenced, and frequencies of variants were compared to the naive plasmid library to calculate the enrichment or depletion of all 2,340 coding mutations in the library (Fig. 2). This approach of tracking an in vitro selection or evolution by deep sequencing is known as deep mutagenesis (35). Enrichment ratios (Fig. 3A and 3B) and residue conservation scores (Fig. 3D and 3E) closely agree between two independent sort experiments, giving confidence in the data. For the most part, enrichment ratios (Fig. 3C) and conservation scores (Fig. 3F) in the nCoV-S-High sorts are anticorrelated with the nCoV-S-Low sorts, with the exception of nonsense mutations which were appropriately depleted from both gates. This indicates that most, but not all, nonsynonymous mutations in ACE2 did not eliminate surface expression. The library is biased towards solvent-exposed residues and has few substitutions of buried hydrophobics that might have bigger effects on plasma membrane trafficking (33).
Log2 enrichment ratios from the nCoV-S-High sorts are plotted from ≤ −3 (i.e. depleted/deleterious, orange) to neutral (white) to ≥ +3 (i.e. enriched, dark blue). ACE2 primary structure is on the vertical axis, amino acid substitutions are on the horizontal axis. *, stop codon.
(A-B) Log2 enrichment ratios for ACE2 mutations in the nCoV-S-High (A) and nCoV-S-Low (B) sorts closely agree between two independent FACS experiments. Nonsynonymous mutations are black, nonsense mutations are red. Replicate 1 used a 1/40 dilution and replicate 2 used a 1/20 dilution of RBD-sfGFP-containing medium. R2 values are for nonsynonymous mutations.
(C) Average log2 enrichment ratios tend to be anticorrelated between the nCoV-S-High and nCoV-S-Low sorts. Nonsense mutations (red) and a small number of nonsynonymous mutations (black) are not expressed at the plasma membrane and are depleted from both sort populations (i.e. fall below the diagonal).
(D-F) Correlation plots of residue conservation scores from replicate nCoV-S-High (D) and nCoV-S-Low (E) sorts, and from the averaged data from both nCoV-S-High sorts compared to both nCoV-S-Low sorts (F). Conservation scores are calculated from the mean of the log2 enrichment ratios for all amino acid substitutions at each residue position.
Mapping the experimental conservation scores from the nCoV-S-High sorts to the structure of RBD-bound ACE2 (19) shows that residues buried in the interface tend to be conserved, whereas residues at the interface periphery or in the substrate-binding cleft are mutationally tolerant (Fig. 4A). The region of ACE2 surrounding the C-terminal end of the ACE2 α1 helix and β3-β4 strands has a weak tolerance of polar residues, while amino acids at the N-terminal end of α1 and the C-terminal end of α2 prefer hydrophobics (Fig. 4B), likely in part to preserve hydrophobic packing between α1-α2. These discrete patches contact the globular RBD fold and a long protruding loop of the RBD, respectively.
(A) Conservation scores from the nCoV-S-High sorts are mapped to the cryo-EM structure (PDB 6M17) of RBD (pale green ribbon) bound ACE2 (surface). The view at left is looking down the substrate-binding cavity, and only a single protease domain is shown for clarity. Residues conserved for high RBD binding are orange; mutationally tolerant residues are pale colors; residues that are hot spots for enriched mutations are blue; and residues maintained as wild type in the ACE2 library are grey. Glycans are dark red sticks.
(B) Average hydrophobicity-weighted enrichment ratios are mapped to the RBD-bound ACE2 structure, with residues tolerant of polar substitutions in blue, while residues that prefer hydrophobic amino acids are yellow.
(C) A magnified view of part of the ACE2 (colored by conservation score as in A) / RBD (pale green) interface. Accompanying heatmap plots log2 enrichment ratios from the nCoV-S-High sort for substitutions of ACE2-T27, D30 and K31 from ≤ −3 (depleted) in orange to ≥ +3 (enriched) in dark blue.
Two ACE2 residues, N90 and T92 that together form a consensus N-glycosylation motif, are notable hot spots for enriched mutations (Fig. 2 and 4A). Indeed, all substitutions of N90 and T92, with the exception of T92S which maintains the N-glycan, are highly favorable for RBD binding, and the N90-glycan is thus predicted to partially hinder S/ACE2 interaction.
Mining the data identifies many ACE2 mutations that are enriched for RBD binding. For instance, there are 122 mutations to 35 positions in the library that have log2 enrichment ratios >1.5 in the nCoV-S-High sort. At least a dozen ACE2 mutations at the structurally characterized interface enhance RBD binding, and will be useful for engineering highly specific and tight binders of SARS-CoV-2 S. The molecular basis for how some of these mutations enhance RBD binding can be rationalized from the RBD-bound cryo-EM structure (Fig. 4C): hydrophobic substitutions of ACE2-T27 increase hydrophobic packing with aromatic residues of S, ACE2-D30E extends an acidic side chain to reach S-K417, and aromatic substitutions of ACE2-K31 contribute to an interfacial cluster of aromatics. However, engineered ACE2 receptors with mutations at the interface may present binding epitopes that are sufficiently different from native ACE2 that virus escape mutants can emerge, or they may be strain specific and lack breadth to bind S in future coronavirus outbreaks.
Instead, attention was drawn to mutations in the second shell and farther from the interface that do not directly contact S but instead have putative structural roles. For example, proline substitutions were enriched at five library positions (S19, L91, T92, T324 and Q325) where they might entropically stabilize the first turns of helices. Proline was also enriched at H34, where it may enforce the central bulge in α1. Multiple mutations were also enriched at buried positions where they will change local packing (e.g. A25V, L29F, W69V, F72Y and L351F). The selection of ACE2 variants for high binding signal therefore not only reports on affinity, but also on presentation at the membrane of folded structure recognized by SARS-CoV-2 S. The presence of enriched structural mutations in the sequence landscape is especially notable considering the ACE2 library was biased towards solvent-exposed positions.
Thirty single substitutions highly enriched in the nCoV-S-High sort were validated by targeted mutagenesis (Fig. 5). Binding of RBD-sfGFP to full length ACE2 mutants increased compared to wild type, yet improvements were small and were most apparent on cells expressing low ACE2 levels (Fig. 5A). To rapidly assess mutations in a format more relevant to therapeutic development, the soluble ACE2 protease domain was fused to sfGFP. Expression levels of sACE2-sfGFP were qualitatively evaluated by fluorescence of the transfected cultures (Fig. 6A), and binding of sACE2-sfGFP to full length S expressed at the plasma membrane was measured by flow cytometry (Fig. 6B). A single substitution (T92Q) that eliminates the N90 glycan gave a small increase in binding signal (Fig. 6B). Focusing on the most highly enriched substitutions in the selection for S binding that were also spatially segregated to minimize negative epistasis (36), combinations of mutations in sACE2 gave large increases in S binding (Table 1 and Fig. 6B). While this assay only provides relative differences, the combinatorial mutants have enhanced binding by at least an order of magnitude, which would give KDs in the picomolar range based on the published KD of wild type ACE2 with S (8, 10). Other combinations of mutations may have even greater effects. While monomeric sACE2 has been extensively studied in vivo (including in man) and is the most likely to advance first to clinical use, fusions of sACE2 to Fc of IgG or IgA classes may provide additional benefits of avidity, recruitment of immune effector functions, and secretion in to the respiratory mucosa. Plasmids for these fusions are deposited with Addgene.
(A) Expi293F cells expressing full length ACE2 were stained with RBD-sfGFP-containing medium and analyzed by flow cytometry. Data are compared between wild type ACE2 (black) and a single mutant (L79T, red). Increased RBD binding is most discernable in cells expressing low levels of ACE2 (blue gate).
(B) RBD-sfGFP binding was measured for 30 single amino acid substitutions in ACE2. Data are GFP mean fluorescence in the low expression gate (blue box in panel A) with background fluorescence subtracted.
(A) Expression of sACE2-sfGFP mutants was qualitatively evaluated by fluorescence of the transfected cell cultures.
(B) Cells expressing full length S were stained with dilutions of sACE2-sfGFP-containing media and binding was analyzed by flow cytometry.
While deep mutagenesis of viral proteins in replicating viruses has been extensively pursued to understand escape mechanisms from drugs and antibodies, the work here shows how deep mutagenesis can be directly applicable to therapeutic design when the selection method is decoupled from virus replication and focused on host factors.
METHODS
Plasmids
The mature polypeptide (a.a. 19-805) of human ACE2 (GenBank NM_021804.1) was cloned in to the NheI-XhoI sites of pCEP4 (Invitrogen) with a N-terminal HA leader (MKTIIALSYIFCLVFA), myc-tag, and linker (GSPGGA). Soluble sACE2 fused to superfolder GFP (34) was constructed by genetically joining the protease domain (a.a. 1-615) of ACE2 to sfGFP (GenBank ASL68970) via a gly/ser-rich linker (GSGGSGSGG), and pasting between the NheI-XhoI sites of pcDNA3.1(+) (Invitrogen). A synthetic human codon-optimized gene fragment (Integrated DNA Technologies) for the RBD (a.a. 333-529) of SARS-CoV-2 S (GenBank YP_009724390.1) was N-terminally fused to a HA leader and C-terminally fused to superfolder GFP (34) and ligated in to the NheI-XhoI sites of pcDNA3.1(+). Human codon-optimized full length S (a.a. 16-1273) was subcloned from pUC57-2019-nCoV-S(Human) (Molecular Cloud) with a N-terminal HA leader (MKTIIALSYIFCLVFA), myc-tag, and linker (GSPGGA).
Tissue Culture
Expi293F cells (ThermoFisher) were cultured in Expi293 Expression Medium (ThermoFisher) at 125 rpm, 8 % CO2, 37 °C. For production of RBD-sfGFP, cells were prepared to 2 × 106 / ml. Per ml of culture, 500 ng of pcDNA3-RBD-sfGFP and 3 μg of polyethylenimine (MW 25,000; Polysciences) were mixed in 100 μl of OptiMEM (Gibco), incubated for 20 minutes at room temperature, and added to cells. Transfection Enhancers (ThermoFisher) were added 19 h post-transfection, and cells were cultured for 110 h. Cells were removed by centrifugation at 800 × g for 5 minutes and medium was stored at −20 °C. After thawing and immediately prior to use, remaining cell debris and precipitates were removed by centrifugation at 20,000 × g for 5 minutes. Soluble ACE2-sfGFP protein was produced by the same protocol with the following modifications: Transfection Enhancers were added 22-½ h post-transfection, and medium supernatant was harvested after 60 h.
Deep mutagenesis
117 residues within the protease domain of ACE2 were diversified by overlap extension PCR (37) using primers with degenerate NNK codons. The plasmid library was transfected in to Expi293F cells using Expifectamine under conditions previously shown to typically give no more than a single coding variant per cell (32, 33); 1 ng coding plasmid was diluted with 1,500 ng pCEP4-ΔCMV carrier plasmid per ml of cell culture at 2 × 106 / ml, and the medium was replaced 2 h post-transfection. The cells were collected after 24 h, washed with ice-cold PBS supplemented with 0.2 % bovine serum albumin (PBS-BSA), and incubated for 30 minutes on ice with a 1/20 (replicate 1) or 1/40 (replicate 2) dilution of medium containing RBD-sfGFP into PBS-BSA. Cells were co-stained with anti-myc Alexa 647 (clone 9B11, 1/250 dilution; Cell Signaling Technology). Cells were washed twice with PBS-BSA, and sorted on a BD FACS Aria II at the Roy J. Carver Biotechnology Center. The main cell population was gated by forward/side scattering to remove debris and doublets, and DAPI was added to the sample to exclude dead cells. Of the myc-positive (Alexa 647) population, the top 67% were gated (Fig. 1B). Of these, the 15% of cells with the highest and 20% of cells with the lowest GFP fluorescence were collected (Fig. 1D) in tubes coated overnight with fetal bovine serum and containing Expi293 Expression Medium. Total RNA was extracted from the collected cells using a GeneJET RNA purification kit (Thermo Scientific), and cDNA was reverse transcribed with high fidelity Accuscript (Agilent) primed with gene-specific oligonucleotides. Diversified regions of ACE2 were PCR amplified as 5 fragments. Flanking sequences on the primers added adapters to the ends of the products for annealing to Illumina sequencing primers, unique barcoding, and for binding the flow cell. Amplicons were sequenced on an Illumina NovaSeq 6000 using a 2×250 nt paired end protocol. Data were analyzed using Enrich (38), and commands are provided in the GEO deposit. Briefly, the frequencies of ACE2 variants in the transcripts of the sorted populations were compared to their frequencies in the naive plasmid library to calculate an enrichment ratio.
Flow Cytometry Analysis of ACE2-S Binding
Expi293F cells were transfected with pcDNA3-myc-ACE2 or pcDNA3-myc-S plasmids (500 ng DNA per ml of culture at 2 × 106 / ml) using Expifectamine (ThermoFisher). Cells were analyzed by flow cytometry 24 h post-transfection. To analyze binding of RBD-sfGFP to full length myc-ACE2, cells were washed with ice-cold PBS-BSA, and incubated for 30 minutes on ice with a 1/30 dilution of medium containing RBD-sfGFP and a 1/240 dilution of anti-myc Alexa 647 (clone 9B11, Cell Signaling Technology). Cells were washed twice with PBS-BSA and analyzed on a BD LSR II. To analyze binding of sACE2-sfGFP to full length myc-S, cells were washed with PBS-BSA, and incubated for 30 minutes on ice with a serial dilution of medium containing sACE2-sfGFP and a 1/240 dilution of anti-myc Alexa 647 (clone 9B11, Cell Signaling Technology). Cells were washed twice with PBS-BSA and analyzed on a BD Accuri C6. Data were processed with FCS Express (De Novo Software)
Reagent and data availability
Plasmids are deposited with Addgene under IDs 141183-5 and 145145-78. Raw and processed deep sequencing data are deposited in NCBI’s Gene Expression Omnibus (GEO) with series accession no. GSE147194.
CONFLICT OF INTEREST STATEMENT
E.P. is the inventor on a provisional patent filing by the University of Illinois covering aspects of this work. E.P. is a cofounder of Orthogonal Biologics, Inc.
ACKNOWLEDGEMENTS
Staff at the UIUC Roy J. Carver Biotechnology Center assisted with FACS and Illumina sequencing. Hannah Choi assisted with plasmid preparation. The development of deep mutagenesis to study virus-receptor interactions was supported by NIH award R01AI129719.
Footnotes
New data has been added in Figures 5 and 6 to show that mutations can be combined to substantially increase the binding of soluble ACE2 to the spike of SARS-CoV-2 by at least an order of magnitude.