SUMMARY
The rapid and escalating spread of SARS coronavirus 2 (SARS-CoV-2) poses an immediate public health emergency, and no approved therapeutics or vaccines are currently available. The viral spike protein S binds ACE2 on host cells to initiate molecular events that release the viral genome intracellularly. Soluble ACE2 inhibits entry of both SARS and SARS-2 coronaviruses by acting as a decoy for S binding sites, and is a candidate for therapeutic and prophylactic development. Using deep mutagenesis, variants of ACE2 are identified with increased binding to the receptor binding domain of S at a cell surface. Mutations are found across the interface and also at buried sites where they are predicted to enhance folding and presentation of the interaction epitope. The N90-glycan on ACE2 hinders association. The mutational landscape offers a blueprint for engineering high affinity ACE2 receptors to meet this unprecedented challenge.
In December, 2018, a novel zoonotic betacoronavirus closely related to bat coronaviruses spilled over to humans at the Huanan Seafood Market in the Chinese city of Wuhan (1, 2). The virus, called SARS-CoV-2 due to its similarities with the severe acute respiratory syndrome (SARS) coronavirus responsible for a smaller outbreak nearly two decades prior (3, 4), has since spread human-to-human rapidly across the world, precipitating extraordinary containment measures from governments (5). Stock markets have fallen, travel restrictions have been imposed, public gatherings canceled, and large numbers of people are quarantined. These events are unlike any experienced in generations. Symptoms of coronavirus disease 2019 (COVID-19) range from mild to dry cough, fever, pneumonia and death, and SARS-CoV-2 is devastating among the elderly and other vulnerable groups (6, 7).
The S spike glycoprotein of SARS-CoV-2 binds angiotensin-converting enzyme 2 (ACE2) on host cells (2, 8-13). S is a trimeric class I viral fusion protein that is proteolytically processed into S1 and S2 subunits that remain noncovalently associated in a prefusion state (8, 11, 14). Upon engagement of ACE2 by a receptor binding domain (RBD) in S1 (15), conformational rearrangements occur that cause S1 shedding, cleavage of S2 by host proteases, and exposure of a fusion peptide adjacent to the S2’ proteolysis site (14, 16-18). Favorable folding of S to a post-fusion conformation is coupled to host cell/virus membrane fusion and cytosolic release of viral RNA. Atomic contacts with the RBD are restricted to the protease domain of ACE2 (19, 20), and soluble ACE2 (sACE2) in which the neck and transmembrane domains are removed is sufficient for binding S and neutralizing infection (12, 21-23). In principle, the virus has limited potential to escape sACE2-mediated neutralization without simultaneously decreasing affinity for native ACE2 receptors, thereby attenuating virulence. Furthermore, fusion of sACE2 to the Fc region of human immunoglobulin can provide an avidity boost while recruiting immune effector functions and increasing serum stability, an especially desirable quality if intended for prophylaxis (23, 24), and sACE2 has proven safe in healthy human subjects (25) and patients with lung disease (26). Recombinant sACE2 has now been rushed into a clinical trial for COVID-19 in Guangdong province, China (Clinicaltrials.gov #NCT04287686).
Since human ACE2 has not evolved to recognize SARS-CoV-2 S, it was hypothesized that mutations may be found that increase affinity for therapeutic and diagnostic applications. The coding sequence of full length ACE2 with an N-terminal c-myc epitope tag was diversified to create a library containing all possible single amino acid substitutions at 117 sites spanning the entire interface with S and lining the substrate-binding cavity. S binding is independent of ACE2 catalytic activity (23) and occurs on the outer surface of ACE2 (19, 20), whereas angiotensin substrates bind within a deep cleft that houses the active site (27). Substitutions within the substrate-binding cleft of ACE2 therefore act as controls that are anticipated to have minimal impact on S interactions, yet may be useful for engineering out substrate affinity to enhance in vivo safety. It is important to note though that catalytically active protein may have desirable effects for replenishing lost ACE2 activity in COVID-19 patients in respiratory distress (28, 29).
The ACE2 library was transiently expressed in human Expi293F cells under conditions that typically yield no more than one coding variant per cell, providing a tight link between genotype and phenotype (30, 31). Cells were then incubated with a subsaturating dilution of medium containing the RBD of SARS-CoV-2 fused C-terminally to superfolder GFP (sfGFP: (32)) (Fig. 1A). Levels of bound RBD-sfGFP correlate with surface expression levels of myc-tagged ACE2 measured by dual color flow cytometry. Compared to cells expressing wild type ACE2 (Fig. 1C), many variants in the ACE2 library fail to bind RBD, while there appeared to be a smaller number of ACE2 variants with higher binding signals (Fig. 1D). Cells expressing ACE2 variants with high or low binding to RBD were collected by fluorescence-activated cell sorting (FACS), referred to as “nCoV-S-High” and “nCoV-S-Low” sorted populations, respectively. During FACS, fluorescence signal for bound RBD-sfGFP continuously declined, requiring the collection gates to be regularly updated to ‘chase’ the relevant populations. This is consistent with RBD dissociating over hours during the experiment. Reported affinities of RBD for ACE2 range from 1 to 15 nM (8, 10).
Transcripts in the sorted populations were deep sequenced, and frequencies of variants were compared to the naive plasmid library to calculate the enrichment or depletion of all 2,340 coding mutations in the library (Fig. 2). This approach of tracking an in vitro selection or evolution by deep sequencing is known as deep mutagenesis (33). Enrichment ratios (Fig. 3A and 3B) and residue conservation scores (Fig. 3D and 3E) closely agree between two independent sort experiments, giving confidence in the data. For the most part, enrichment ratios (Fig. 3C) and conservation scores (Fig. 3F) in the nCoV-S-High sorts are anticorrelated with the nCoV-S-Low sorts, with the exception of nonsense mutations which were appropriately depleted from both gates. This indicates that most, but not all, nonsynonymous mutations in ACE2 did not eliminate surface expression. The library is biased towards solvent-exposed residues and has few substitutions of buried hydrophobics that might have bigger effects on plasma membrane trafficking (31).
Mapping the experimental conservation scores from the nCoV-S-High sorts to the structure of RBD-bound ACE2 (19) shows that residues buried in the interface tend to be conserved, whereas residues at the interface periphery or in the substrate-binding cleft are mutationally tolerant (Fig. 4A). The region of ACE2 surrounding the C-terminal end of the ACE2 α1 helix and β3-β4 strands has a weak tolerance of polar residues, while amino acids at the N-terminal end of α1 and the C-terminal end of α2 prefer hydrophobics (Fig. 4B), likely in part to preserve hydrophobic packing between α1-α2. These discrete patches contact the globular RBD fold and a long protruding loop of the RBD, respectively.
Two ACE2 residues, N90 and T92 that together form a consensus N-glycosylation motif, are notable hot spots for enriched mutations (Fig. 2 and 4A). Indeed, all substitutions of N90 and T92, with the exception of T92S which maintains the N-glycan, are highly favorable for RBD binding, and the N90-glycan is thus predicted to partially hinder S/ACE2 interaction.
Mining the data identifies many ACE2 mutations that are enriched for RBD binding. For instance, there are 122 mutations to 35 positions in the library that have log2 enrichment ratios >1.5 in the nCoV-S-High sort. At least a dozen ACE2 mutations at the structurally characterized interface enhance RBD binding, and may be useful for engineering highly specific and tight binders of SARS-CoV-2 S, especially for point-of-care diagnostics. The molecular basis for how some of these mutations enhance RBD binding can be rationalized from the RBD-bound cryo-EM structure (Fig. 4C): hydrophobic substitutions of ACE2-T27 increase hydrophobic packing with aromatic residues of S, ACE2-D30E extends an acidic side chain to reach S-K417, and aromatic substitutions of ACE2-K31 contribute to an interfacial cluster of aromatics. However, engineered ACE2 receptors with mutations at the interface may present binding epitopes that are sufficiently different from native ACE2 that virus escape mutants can emerge, or they may be strain specific and lack breadth.
Instead, attention was drawn to mutations in the second shell and farther from the interface that do not directly contact S but instead have putative structural roles. For example, proline substitutions were enriched at five library positions (S19, L91, T92, T324 and Q325) where they might entropically stabilize the first turns of helices. Proline was also enriched at H34, where it may enforce the central bulge in α1. Multiple mutations were also enriched at buried positions where they will change local packing (e.g. A25V, L29F, W69V, F72Y and L351F). The selection of ACE2 variants for high binding signal therefore not only reports on affinity, but also on presentation at the membrane of folded structure recognized by SARS-CoV-2 S. The presence of enriched structural mutations in the sequence landscape is especially notable considering the ACE2 library was biased towards solvent-exposed positions.
Deep mutational scans in human cells have errors (34), and it is unclear how large an effect an enriched mutation in a selection will have when introduced in a purified protein. Mutations of interest for ACE2 engineering will need careful assessment by targeted mutagenesis, as well as considerations on how best to combine mutations for production of conformationally-stable, high affinity sACE2. Other considerations will be whether to fuse sACE2 to Fc of IgG1 or IgA1 to evoke specialized immune effector functions, or to fuse with albumin to boost serum stability without risking an excessive inflammatory response. These are unknowns.
While deep mutagenesis of viral proteins in replicating viruses has been extensively pursued to understand escape mechanisms from drugs and antibodies, the work here shows how deep mutagenesis can be directly applicable to therapeutic design when the selection method is decoupled from virus replication and focused on host factors.
METHODS
Plasmids
The mature polypeptide (a.a. 19-805) of human ACE2 (GenBank NM_021804.1) was cloned in to the NheI-XhoI sites of pCEP4 (Invitrogen) with a N-terminal HA leader (MKTIIALSYIFCLVFA), myc-tag, and linker (GSPGGA). A synthetic human codon-optimized gene fragment (Integrated DNA Technologies) for the RBD (a.a. 333-529) of SARS-CoV-2 S (GenBank YP_009724390.1) was N-terminally fused to a HA leader and C-terminally fused to superfolder GFP (32) and ligated in to the NheI-XhoI sites of pcDNA3.1(+) (Invitrogen).
Tissue Culture
Expi293F cells (ThermoFisher) were cultured in Expi293 Expression Medium (ThermoFisher) at 125 rpm, 8 % CO2, 37 °C. For production of RBD-sfGFP, cells were prepared to 2 × 106 / ml. Per ml of culture, 500 ng of pcDNA3-RBD-sfGFP and 3 μg of polyethylenimine (MW 25,000; Polysciences) were mixed in 100 μl of OptiMEM (Gibco), incubated for 20 minutes at room temperature, and added to cells. Transfection Enhancers (Thermo Fisher) were added 19 h post-transfection, and cells were cultured for 110 h. Cells were removed by centrifugation at 800 × g for 5 minutes and medium was stored at -20 °C. After thawing and immediately prior to use, remaining cell debris and precipitates were removed by centrifugation at 20,000 × g for 5 minutes.
Deep mutagenesis
117 residues within the protease domain of ACE2 were diversified by overlap extension PCR (35) using primers with degenerate NNK codons. The plasmid library was transfected in to Expi293F cells using Expifectamine under conditions previously shown to typically give no more than a single coding variant per cell (30, 31); 1 ng coding plasmid was diluted with 1,500 ng pCEP4-ΔCMV carrier plasmid per ml of cell culture at 2 × 106 / ml, and the medium was replaced 2 h post-transfection. The cells were collected after 24 h, washed with ice-cold PBS-BSA, and incubated for 30 minutes on ice with a 1/20 (replicate 1) or 1/40 (replicate 2) dilution of medium containing RBD-sfGFP into PBS supplemented with 0.2 % bovine serum albumin (PBS-BSA). Cells were co-stained with anti-myc Alexa 647 (clone 9B11, 1/250 dilution; Cell Signaling Technology). Cells were washed twice with PBS-BSA, and sorted on a BD FACS Aria II at the Roy J. Carver Biotechnology Center. The main cell population was gated by forward/side scattering to remove debris and doublets, and DAPI was added to the sample to exclude dead cells. Of the myc-positive (Alexa 647) population, the top 67% were gated (Fig. 1B). Of these, the 15 % of cells with the highest and 20% of cells with the lowest GFP fluorescence were collected (Fig. 1D) in tubes coated overnight with fetal bovine serum and containing Expi293 Expression Medium. Total RNA was extracted from the collected cells using a GeneJET RNA purification kit (Thermo Scientific), and cDNA was reverse transcribed with high fidelity Accuscript (Agilent) primed with gene-specific oligonucleotides. Diversified regions of ACE2 were PCR amplified as 5 fragments. Flanking sequences on the primers added adapters to the ends of the products for annealing to Illumina sequencing primers, unique barcoding, and for binding the flow cell. Amplicons were sequenced on an Illumina NovaSeq 6000 using a 2×250 nt paired end protocol. Data were analyzed using Enrich (36), and commands are provided in the GEO deposit. Briefly, the frequencies of ACE2 variants in the transcripts of the sorted populations were compared to their frequencies in the naive plasmid library to calculate an enrichment ratio.
Reagent and data availability
Plasmids are deposited with Addgene under IDs 141183-5. Raw and processed deep sequencing data are deposited in NCBI’s Gene Expression Omnibus (GEO). At this time, a series accession number has not been assigned.
CONFLICT OF INTEREST STATEMENT
E.P. is the inventor on a provisional patent filing by the University of Illinois covering aspects of this work.
ACKNOWLEDGEMENTS
Staff at the UIUC Roy J. Carver Biotechnology Center assisted with FACS and Illumina sequencing. The development of deep mutagenesis to study virus-receptor interactions was supported by NIH award R01AI129719.