ABSTRACT
Molecular-level understanding of human neutralizing antibody responses to SARS-CoV-2 could accelerate vaccine design and facilitate drug discovery. We analyzed 294 SARS-CoV-2 antibodies and found that IGHV3-53 is the most frequently used IGHV gene for targeting the receptor binding domain (RBD) of the spike (S) protein. We determined crystal structures of two IGHV3-53 neutralizing antibodies +/- Fab CR3022 ranging from 2.33 to 3.11 Å resolution. The germline-encoded residues of IGHV3-53 dominate binding to the ACE2 binding site epitope with no overlap with the CR3022 epitope. Moreover, IGHV3-53 is used in combination with a very short CDR H3 and different light chains. Overall, IGHV3-53 represents a versatile public VH in neutralizing SARS-CoV-2 antibodies, where their specific germline features and minimal affinity maturation provide important insights for vaccine design and assessing outcomes.
MAIN
The ongoing COVID-19 pandemic, which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is far from an end (1). The increasing global health and socioeconomic damage require urgent development of an effective COVID-19 vaccine. While multiple vaccine candidates have entered clinical trials (2), the molecular features that contribute to an effective antibody response are not clear. Over the past decade, the concept of a public antibody response (also known as multidonor class antibodies) to specified microbial pathogens has emerged. A public antibody response describes antibodies that have shared genetic elements and modes of recognition, and can be observed in multiple individuals against a given antigen. Such responses to microbial pathogens have been observed against influenza (3), dengue (4), malaria (5), and HIV (6). Identification of public antibody responses and characterization of the molecular interactions with cognate antigen can provide insight into the fundamental understanding of the immune repertoire and its ability to quickly respond to novel microbial pathogens, as well as facilitate rational vaccine design against these pathogens (7, 8).
The spike (S) protein is the major surface antigen of SARS-CoV-2. The S protein utilizes its receptor-binding domain (RBD) to engage the host receptor ACE2 for viral entry (9–12). Therefore, RBD-targeting antibodies could neutralize SARS-CoV-2 by blocking ACE2 binding. A number of antibodies that target the RBD of SARS-CoV-2 have now been discovered in very recent studies (13–28). We compiled a list of 294 SARS-CoV-2 RBD-targeting antibodies where information on IGHV gene usage is available (17–28) (Table S2), and found that IGHV3-53 is the most frequently used IGHV gene among such antibodies (Fig. 1A). Of 294 RBD-targeting antibodies, 10% are encoded by IGHV3-53, as compared to only 0.5% to 2.6% in the repertoire of naïve healthy individuals (29) with a mean of 1.8% (30). The prevalence of IGHV3-53 in the antibody response in SARS-CoV-2 patients has also been recognized in some recent antibody studies (20, 22, 27). These observations indicate that IGHV3-53 represents a frequent and public antibody response to the SARS-CoV-2 RBD.
To understand the molecular features that endow IGHV3-53 with the ability to act as a public antibody, we determined crystal structures of two IGHV3-53 neutralizing antibodies, namely CC12.1 and CC12.3, in complex with the SARS CoV-2 RBD and also in the presence of the SARS-CoV1/2 cross-reactive Fab CR3022 (17). CC12.1 and CC12.3 were previously isolated from a SARS-CoV-2-infected patient and shown to be SARS-CoV-2 RBD-specific (27). Although CC12.1 and CC12.3 are both encoded by IGHV3-53, CC12.1 utilizes IGHJ6, IGKV1-9, and IGKJ3, whereas CC12.3 utilizes IGHJ4, IGKV3-20, and IGKJ1. This variation in IGHJ, IGKV, and IGKJ usage indicates that CC12.1 and CC12.3 belong to different clonotypes, but are encoded by a common IGHV3-53 germline gene. IgBlast analysis (31) shows that IGHV and IGKV of CC12.1 are only 1% somatically mutated at the nucleotide sequence level (two amino-acid changes each). Similarly, the IGHV and IGKV of CC12.3 are also minimally somatically mutated at 1.4% in both IGHV (four amino-acid changes) and IGKV (a single amino-acid deletion). The binding affinities (Kd) of Fabs CC12.1 and CC12.3 to SARS-CoV-2 RBD are 17 nM and 14 nM, respectively (Fig. S2). Moreover, competition experiments suggest that CC12.1 and CC12.3 bind to a similar epitope, which overlaps with the ACE2 binding site, but not the CR3022 epitope (Fig. S3).
We determined four complex crystal structures, CC12.1/RBD, CC12.3/RBD, CC12.1/RBD/CR3022, and CC12.3/RBD/CR3022 at resolutions of 3.11 Å, 2.33 Å, 2.90 Å, and 2.70 Å, respectively (Table S1). CC12.1 and CC12.3 bind to the ACE2 binding site on SARS-CoV-2 RBD with an identical angle of approach (Fig. 1B-F). Interestingly, another IGHV3-53 antibody B38, whose structure was determined recently (23), also binds to the ACE2 binding site on SARS-CoV-2 RBD in a similar manner (Fig. S4). Similar to the ACE2 binding site (11), the epitopes of these antibodies can only be accessed when the RBD is in the “up” conformation (Fig. S5). Among 16 ACE2 binding residues on RBD, 10 are within the epitopes of CC12.1 and B38, and 6 are in the epitope of CC12.3 (Fig. 2A-D). Many of the epitope residues are not conserved between SARS-CoV-2 and SARS-CoV (Fig. 2E), explaining their lack of cross-reactivity (27). The buried surface area (BSA) from the heavy-chain interaction is quite similar in CC12.1 (723 Å2), CC12.3 (698 Å2), and B38 (713 Å2). In contrast, the light-chain interaction is much smaller for CC12.3 (176 Å2) compared to CC12.1 (566 Å2) and B38 (495 Å2), consistent with different light-chain gene usage. While both CC12.1 and B38 utilize IGKV1-9, CC12.3 utilizes IGKV3-20. This observation suggests that IGHV3-53 can pair with different light chains to target the ACE2 binding site of the SARS-CoV-2 RBD. Given that CC12.3 (80% BSA from the heavy chain) binds the RBD with similar affinity to CC12.1 (56% BSA from heavy chain) (Fig. S2), the light-chain identity seems not to be as critical as the heavy chain. In fact, among the RBD-targeting IGHV3-53 antibodies, nine different light chains are observed, although IGKV1-9 and IGKV3-20 are the most frequently found to date (Fig. S6).
To understand why IGHV3-53 is elicited as a public antibody response, the molecular interactions between the RBD and the heavy chains of CC12.1, CC12.3, and B38 were analyzed. The complementarity-determining regions (CDR) H1 and H2 of these antibodies interact extensively with the RBD mainly through specific hydrogen bonds (Fig. 3A-B). Interestingly, all residues on CDR H1 and H2 that hydrogen bond with the RBD are encoded by the germline IGHV3-53 (Fig. S1 and S7, Table S3). These interactions are almost identical among CC12.1, CC12.3, and B38 with the only difference at VH residue 58. A somatic mutation VH Y58F is present in both CC12.1 and CC12.3, but not in B38 (Fig. 3A-C, boxed residues). Nevertheless, this somatic mutation is unlikely to be essential for IGHV3-53 to engage the RBD, since VH residue 58 in B38 still interacts with the RBD through an additional hydrogen bond (Fig. 3C). Of note, none of these antibody interactions mimic ACE2 binding (Fig. 3D).
Our structural analysis reveals two key motifs in the IGHV3-53 germline sequence that are important for RBD binding, namely an NY motif at VH residues 32 and 33 in the CDR H1, and an SGGS motif at VH residues 53 to 56 in the CDR H2 (Fig. S8). The side chain of VH N32 in the NY motif forms a hydrogen bond with the backbone carbonyl of A475 on the RBD, and this interaction is stabilized by an extensive network of hydrogen bonds with other antibody residues as well as a bound water molecule (Fig. 4A). VH N32 also hydrogen bonds with VH R94, which in turn hydrogen bonds with N487 and Y489 on the RBD (Fig. 4A). These interactions enhance not only RBD-Fab interaction, but also stabilize CDR and framework residues and conformations. VH Y33 in the NY motif inserts into a hydrophobic cage formed by RBD residues Y421, F456, L455 and the aliphatic component of K417 (Fig. 4B). A hydrogen bond between VH Y33 and the carbonyl oxygen of L455 on RBD further strengthens the interaction. The second key motif SGGS in CDR H2 forms extensive hydrogen bond network with the RBD (Fig. 4C), including four hydrogen bonds that involve the hydroxyl side chains of VH S53 and VH S56, and four water-mediated hydrogen bonds to the backbone carbonyl of VH G54, the backbone amide of VH S56, and the side chain of VH S56. Along with VH Y52, the SGGS motif takes part in a type I beta turn, with a positive Φ-angle for VH G55 at the end of the turn. In addition, the Cα of VH G54 is only 4 Å away from the RBD, indicating that side chains of other amino acids would clash with the RBD if they were present at this position. As a result, the SGGS motif is a perfect fit for interacting with the RBD at this location.
Overall, these observations demonstrate the importance of the NY and SGGS motifs, which are both encoded in the IGHV3-53 germline, for engaging the RBD. In fact, besides IGHV3-53, the only other IGHV gene that contains an NY motif in CDR H1 and an SGGS motif in CDR H2 is IGHV3-66, which is a closely-related IGHV gene to IGHV3-53 (32). As compared to IGHV3-53, IGHV3-66 has a lower occurrence frequency in the repertoire of healthy individuals (0.3% to 1.7%) (29), which may explain why IGHV3-66 is less prevalent than IGHV3-53, but yet is still quite commonly observed (19–22, 24, 26) in antibodies in SARS-CoV-2 patients (Fig. 1A). Overall, our structural analysis has identified two germline-encoded binding motifs that enable IGHV3-53 to act as a public antibody and target the SARS CoV-2 RBD with no mutations required from affinity maturation.
While the binding mode of CDR H1 and H2 to RBD is highly similar among CC12.1, CC12.3, and B38, CDR H3 interaction with the RBD is different (Fig. 3A-C) due to differences in the CDR H3 sequences and conformations (Fig. 5A-B). For example, while CDR H3 of CC12.1 can interact with RBD Y453 through a hydrogen bond, CDR H3s of CC12.3 and B38 do not from such a hydrogen bond (Fig. 3A-C). Similarly, due to the difference in light-chain gene usage, the light-chain interactions with the RBD can vary substantially in IGHV3-53 antibodies (Fig. S9). Overall, our structural analysis demonstrates that IGHV3-53 provides a highly versatile framework for antibodies to target the ACE2 binding site in SARS-CoV-2 RBD.
An interesting feature of CC12.1 and CC12.3 is their relatively short CDR H3. While the CDR H3 sequences of CC12.1 and CC12.3 are very different, both have a length of nine amino acids (Kabat numbering), whereas the average CDR H3 length for human antibodies is around 13 (33), as compared to for example very long CDR H3s (up to 30 residues) on average that seem to be required for many broadly neutralizing antibodies to HIV-1 (34). Similarly, antibody B38 has a very short CDR H3 length of seven residues (23). It is unlikely that longer CDR H3’s can be accommodated in these antibodies since their epitopes are relatively flat with no large pocket to insert a protruding CDR (Fig. 2A-C). This conclusion was also arrived in a recent study that also reported SARS-CoV-2 RBD-targeting antibodies that are encoded by either IGHV3-53 or IGHV3-66 tend to have a short CDR H3 (28). In fact, IGHV3-53 and IGHV3-66 antibodies in general have slightly shorter than average CDR H3’s (by around one residue), but do appear to have a few much shorter CDR H3’s (<10 amino acids) than average in the baseline antibody repertoire (30). In CC12.1 and CC12.3, the space for CDR H3 to fit in the interface within the RBD is limited (Fig. 5A-B), which thus constrains its length. Consistently, among RBD-targeting antibodies currently reported (17–28), those encoded by IGHV3-53 have a significantly shorter CDR H3 compared to those encoded by other IGHV genes (p-value = 6e-8, Mann-Whitney U test) (Fig. 5C). These observations provide structural and statistical evidence that a short CDR H3 length is a molecular feature of the IGHV3-53-encoded public antibody response to the SARS-CoV-2 RBD, reminiscent of a short 5-residue CDR L3 in IGHV1-2 antibodies to the CD4 receptor binding site in gp120 of HIV-1 Env (35).
Besides IGHV3-53, several other IGHV genes such as IGHV1-2, IGHV3-9, and IGHV3-30 are also more frequently observed than other germlines in SARS-CoV-2 RBD-targeting antibodies (Fig. 1A). The molecular mechanisms of these antibody responses to SARS-CoV-2 will need to be characterized in the future. In addition, whether other antibody germline gene segments, including the IGHD and the light chain, contribute to public antibody responses to SARS-CoV-2 will also need to be further addressed. Notwithstanding, the detailed characterization of this public antibody response to SARS CoV-2 is already be a promising starting point for rational vaccine design (36), especially given limited to no affinity maturation is required from the germline to achieve a high affinity neutralizing antibody response to the RBD. In addition, IGHV3-53 exists at a reasonable frequency in healthy individuals (29, 30), indicating that this public antibody could be commonly elicited during vaccination (37), and aid in design of both antibody and small molecule therapeutics (7, 38).
AUTHOR CONTRIBUTIONS
M. Y., H.L., N.C.W., F.Z., D.H., T.F.R., E.L., D.S, J.G.J., D.R.B. and I.A.W. conceived and designed the study. M.Y., H.L., N.C.W., C.C.D.L., W.Y. and Y.H. expressed and purified the proteins. M.Y. and C.C.D.L. performed biolayer interferometry binding assays. M.Y., H.L., N.C.W., X.Z. and H.T. performed the crystallization and X-ray data collection. M.Y. and X.Z. determined and refined the X-ray structures. M.Y., H.L., N. C.W., C.C.D.L. and X.Z. analyzed the data. M.Y., H.L., N.C.W. and I.A.W. wrote the paper and all authors reviewed and/or edited the paper.
MATERIALS AND METHODS
Expression and purification of SARS-CoV-2 RBD
The receptor-binding domain (RBD) (residues 319-541) of the SARS-CoV-2 spike (S) protein (GenBank: QHD43416.1) was cloned into a customized pFastBac vector (40), and fused with an N-terminal gp67 signal peptide and C-terminal His6 tag (17). A recombinant bacmid DNA was generated using the Bac-to-Bac system (Life Technologies). Baculovirus was generated by transfecting purified bacmid DNA into Sf9 cells using FuGENE HD (Promega), and subsequently used to infect suspension cultures of High Five cells (Life Technologies) at an MOI of 5 to 10. Infected High Five cells were incubated at 28 °C with shaking at 110 r.p.m. for 72 h for protein expression. The supernatant was then concentrated using a 10 kDa MW cutoff Centramate cassette (Pall Corporation). The RBD protein was purified by Ni-NTA, followed by size exclusion chromatography, and buffer exchanged into 20 mM Tris-HCl pH 7.4 and 150 mM NaCl.
Expression and purification of Fabs
For CC12.1 and CC12.3, the heavy and light chains were cloned into phCMV3. The plasmids were transiently co-transfected into ExpiCHO cells at a ratio of 2:1 (HC:LC) using ExpiFectamine™ CHO Reagent (Thermo Fisher Scientific) according to the manufacturer’s instructions. The supernatant was collected at 10 days post-transfection. The Fabs were purified with a CaptureSelect™ CH1-XL Affinity Matrix (Thermo Fisher Scientific) followed by size exclusion chromatography. CR3022 was expressed and purified as described previously (17).
Expression and purification of ACE2
The N-terminal peptidase domain of human ACE2 (residues 19 to 615, GenBank: BAB40370.1) was cloned into phCMV3 vector, and fused with a C-terminal Fc tag. The plasmids were transiently transfected into Expi293F cells using ExpiFectamine™ 293 Reagent (Thermo Fisher Scientific) according to the manufacturer’s instructions. The supernatant was collected at 7 days post-transfection. Fc-tagged ACE2 protein was then purified with a Protein A column (GE Healthcare) followed by size exclusion chromatography.
Crystallization and structural determination
CC12.1/RBD, CC12.3/RBD, CC12.1/CR3022/RBD, and CC12.3/CR3022/RBD complexes were formed by mixing each of the protein components at an equimolar ratio and incubated overnight at 4°C. Each complex was adjusted to 13 mg/ml and screened for crystallization using the 384 conditions of the JCSG Core Suite (Qiagen) and ProPlex screen (Molecular Dimensions) on either our custom-designed robotic CrystalMation system (Rigaku) or an Oryx8 (Douglas Instruments) at Scripps Research. Crystallization trials were set-up by the vapor diffusion method in sitting drops containing 0.1 μl of protein and 0.1 μl of reservoir solution. Diffraction-quality crystals were obtained in the following conditions:
CC12.1/RBD complex (13 mg/ml): 0.1M sodium citrate pH 5.5 and 15% (w/v) polyethylene glycol 6000 at 20°C
CC12.3/RBD complex (13 mg/mL): 0.1 M sodium phosphate pH 6.5 and 12% (w/v) polyethylene glycol 8000 at 20°C
CC12.1/RBD/CR3022 complex (13 mg/mL): 20% PEG-3000, 0.2 M sodium chloride, 0.1 M HEPES pH 7.5 at 20°C
CC12.3/RBD/CR3022 complex (13 mg/mL): 0.1M Tris pH 8, 15% ethylene glycol, 1M lithium chloride, 10% PEG 6000 at 20°C
Of note, these four complexes crystallized in a broad range of pHs. All crystals appeared on day 3 and were harvested on day 7. Before flash cooling in liquid nitrogen for X-ray diffraction studies, crystals were equilibrated in reservoir solution supplemented the following cryoprotectants:
CC12.1/RBD complex: 20% glycerol
CC12.3/RBD complex: 20% glycerol
CC12.1/RBD/CR3022 complex: 10% ethylene glycol
CC12.3/RBD/CR3022 complex: none were required
Diffraction data were collected at cryogenic temperature (100 K) at Stanford Synchrotron Radiation Lightsource (SSRL) on the new Scripps/Stanford beamline 12-1 with a beam wavelength of 0.97946 Å, and processed with HKL2000 (41). Structures were solved by molecular replacement using PHASER (42) with PDB 6YLA (43), 4TSA, and 4ZD3 (44). Iterative model building and refinement were carried out in COOT (45) and PHENIX (46), respectively. Epitope and paratope residues, as well as their interactions, were identified by accessing PISA at the European Bioinformatics Institute (http://www.ebi.ac.uk/pdbe/prot_int/pistart.html) (39).
Biolayer interferometry binding assay
Antibody binding and competition assays were performed by biolayer interferometry (BLI) using an Octet Red instrument (FortéBio) as described previously (47), with Ni-NTA biosensors. There were five steps in the assay: 1) baseline: 60s with 1x kinetics buffer; 2) loading: 180 s with 20 μg/mL of 6x His-tagged SARS-CoV-2 RBD proteins; 3) baseline: 135 s with 1x kinetics buffer; 4) association: 240 s with serial diluted concentrations of CC12.1 Fab or CC12.3 Fab; and 5) dissociation: 240 s with 1x kinetics buffer. For Kd estimation, a 1:1 binding model was used.
For competition assays, CC12.1 Fab, CC12.3 Fab, CR3022 Fab, and human ACE2-Fc were all diluted to 250 nM. Ni-NTA biosensors were used. In brief, the assay has five steps: 1) baseline: 60s with 1x kinetics buffer; 2) loading: 120 s with 20 μg/mL, 6x His-tagged SARS-CoV-2 RBD proteins; 3) baseline: 120 s with 1x kinetics buffer; 4) first association: 180 s with CC12.1 Fab, CC12.3 Fab or CR3022 Fab; and 5) second association: 180 s with CC12.1 Fab, CC12.3 Fab, CR3022 Fab, or human ACE2-Fc or disassociation with 1x kinetics buffer for each first association.
ACKNOWLEDGEMENTS
We thank Robyn Stanfield for assistance in data collection and Bryan Briney for naïve antibody germline analysis. We are grateful to the staff of Stanford Synchrotron Radiation Laboratory (SSRL) Beamline 12-1 for assistance. This work was supported by NIH K99 AI139445 (N.C.W.), the Bill and Melinda Gates Foundation OPP1170236 (I.A.W. and D.R.B.), NIH CHAVD (UM1 AI44462 to I.A.W., D.S. and D.R.B.), and the IAVI Neutralizing Antibody Center. Use of the SSRL, SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. The SSRL Structural Molecular Biology Program is supported by the DOE Office of Biological and Environmental Research, and by the National Institutes of Health, National Institute of General Medical Sciences (including P41GM103393).