Abstract
Since the COVID-19 pandemic onset, the antibody response to SARS-CoV-2 has been extensively characterized. Antibodies to the receptor binding domain (RBD) on the spike protein are frequently encoded by IGHV3-53/3-66 with a short CDR H3. Germline-encoded sequence motifs in CDRs H1 and H2 play a major role, but whether any common motifs are present in CDR H3, which is often critical for binding specificity, have not been elucidated. Here, we identify two public clonotypes of IGHV3-53/3-66 RBD antibodies with a 9-residue CDR H3 that pair with different light chains. Distinct sequence motifs on CDR H3 are present in the two public clonotypes that appear to be related to differential light chain pairing. Additionally, we show that Y58F is a common somatic hypermutation that results in increased binding affinity of IGHV3-53/3-66 RBD antibodies with a short CDR H3. Overall, our results advance fundamental understanding of the antibody response to SARS-CoV-2.
Introduction
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is the etiological agent of coronavirus disease 2019 (COVID-19)1,2, which primarily results in respiratory distress, cardiac failure, and renal injury in the most severe cases3,4. The virion is decorated with the spike (S) glycoprotein, which contains a receptor-binding domain (RBD) that mediates virus entry by binding to angiotensin-converting enzyme-2 (ACE-2) receptor on the surface of host cells1,5-7. To mitigate the devastating social and economic consequences of the pandemic, vaccines and post-exposure prophylaxes including antibody cocktails that exploit reactivity to the S protein are being developed at an unprecedented rate. Several vaccines are currently in various stages of clinical trials8,9. Most notable are the mRNA vaccines from Pfizer-BioNTech and Moderna, which have been issued emergency use authorization by the Food and Drug Administration for distribution in the United States10–12 and the Oxford-AstraZeneca chimpanzee adenovirus vectored DNA vaccine in the United Kingdom13–15. In humans, most neutralizing antibodies to SARS-CoV-2 target the immunodominant RBD on the S protein16,17, and can abrogate virus attachment and entry into host cells18,19. In the past year, many RBD antibodies have been isolated and characterized from convalescent SARS-CoV-2 patients 20–40.
Antibody diversity is generated through V(D)J recombination41–43. Three genes, one from each of the variable (V), diversity (D) and joining (J) loci, are combined to form the coding region for the heavy chain. In humans, genes encoding for the V, D and J regions are denoted as IGHV, IGHD and IGHJ, respectively. Two complementarity-determining regions on the heavy chain (CDRs H1 and H2) are encoded by the V gene while the third (CDR H3) is encoded by the V(D)J junction. A similar process occurs in assembly of the coding region for the light chain except that the D gene is absent. The light chain genes also encode kappa and lambda chains that are denoted as IGKV and IGKJ, as well as IGLV and IGLJ, respectively. To further improve affinity of antibodies to an antigen, affinity maturation occurs via somatic hypermutation (SHM)44,45. V(D)J recombination and SHM therefore ensure a diverse repertoire of antibodies is available for an immune response to the enormous number and variety of potential antigens.
Notwithstanding this antibody diversity, some RBD antibodies with strikingly similar sequences have been found in multiple convalescent SARS-CoV-2 patients32,46,47. These antibodies can be classified as public clonotypes if they share the same IGHV gene with similar CDR H3 sequences48–52. Over the past decade, public clonotypes to human immunodeficiency virus48, malaria52, influenza49, and dengue virus53 have been discovered. Antibodies to SARS-CoV-2 RBD frequently use IGHV3-53 and IGHV3-6623,31,47,54, which only differ by one amino acid (i.e. I12 in IGHV3-53 and V12 in IGHV3-66). IGHV3-53/3-66 antibodies carry germline-encoded features that are critical for RBD binding – an NY motif in CDR H1 and an SGGS motif in CDR H231,47,54. Nevertheless, IGHV3-53/3-66 RBD antibodies have varying lengths of CDR H3 with diverse sequences, which seem to deviate from the canonical definition of a public clonotype.
By categorizing IGHV3-53/3-66 RBD antibodies based on CDR H3 length and light chain usage, we now report on two public clonotypes of IGHV3-53/3-66 RBD antibodies, both of which have a CDR H3 length of 9 amino acids but with distinct sequence motifs. Our structural and biochemical analyses reveal that these sequence motifs on CDR H3 are associated with light chain pairing preference. We also identify Y58F as a signature SHM among IGHV3-53/3-66 RBD antibodies that have a CDR H3 length of less than 15 amino acids (Kabat numbering). As the COVID-19 pandemic continues, knowledge of public antibodies against SARS-CoV-2 can inform on therapeutic development as well as vaccine assessment.
Results
Two public clonotypes of IGHV3-53/3-66 RBD antibodies
In this study, we define clonotypic IGHV3-53/3-66 RBD antibodies as antibodies that share the same IGL(K)V genes and with identical CDR H3 length. Literature mining of 214 published IGHV3-53/3-66 RBD antibodies obtained from convalescent patients (Supplementary Table 1) revealed that the two most common clonotypes have a CDR H3 length of 9 amino acids and are paired with light chains IGKV1-9 (clonotype 1) and IGKV3-20 (clonotype 2), respectively (Figure 1a). Antibodies from clonotype 1 have been observed across 10 studies22-24,32-36,40, whereas antibodies from clonotype 2 are found across seven studies22,24,32-34,37,40. Interestingly, sequence logos revealed distinct sequence features of CDR H3 between clonotype 1 and clonotype 2 antibodies (Figure 1b).
We further determined IGHJ gene usage in the two major clonotypes of IGHV3-53/3-66 RBD antibodies. Among the IGHV3-53/3-66 RBD antibodies with a CDR H3 length of 9 amino acids, we observed a statistically significant bias in IGHJ gene usage (p-value = 2e-6, Fisher’s exact test), where clonotypes 1 and 2 preferentially pair with IGHJ6 and IGHJ4, respectively (Figure 1c). In fact, IGHJ6 encodes the last four amino acids (GMDV) in CDR H3 that are highly conserved in clonotype 1 (Figure 1d, Supplementary Figure 1a). Similarly, IGHJ4 encodes the last four amino acids (YFDY) in CDR H3 that are highly conserved in clonotype 2 (Figure 1d, Supplementary Figure 1b). Taken together, we demonstrate that IGHV3-53/3-66 RBD antibodies can be categorized into at least two public clonotypes.
Structural analysis of signature motifs on CDR H3
We further investigated sequence signatures of CDR H3s in clonotypes 1 and 2 (Figure 1b). In particular, we focused on amino acid residues 96, 98 and 100 in CDR H3 since these residues show clear patterns of differential amino-acid preference between clonotype 1 and clonotype 2 antibodies. Subsequently, analysis was performed on structures of BD-604 (PDB 7CH4) and CC12.1 (PDB 6XC2), which are two clonotype 1 antibodies, as well as BD-629 (PDB 7CH5) and CC12.3 (PDB 6XC4), which are two clonotype 2 antibodies.
Residue 96 is usually Leu in clonotype 1 antibodies, while an aromatic residue, usually Tyr, occupies residue 96 in clonotype 2 antibodies. While VH L96 interacts with Y489 of the RBD in clonotype 1 antibodies via van der Waals interactions, VH F/Y96 is located at the center of a π-π stacking network that involves F456, Y489 and VH Y100 (Figure 2a, 2b, Supplementary Figure 2a, 2b; left panels). Substituting VH L96 in clonotype 1 with Y96 would result in a clash with RBD Y489, whereas substituting VH F/Y96 in clonotype 2 with L96 would abolish the π-π stacking network but still maintain a hydrophobic core.
Residue 98 in CDR H3 of clonotype 1 antibodies does not show a strong amino-acid preference, since it is located in a relatively open space (Figure 1b, 2a, Supplementary Figure 2a; middle panels). On the other hand, a highly conserved acidic residue at position 98 in the CDR H3 loop of clonotype 2 antibodies contributes to formation of hydrogen bond interactions with VH Y52 as well as electrostatic interactions with RBD K417 and VL R96 (Figure 2b, Supplementary Figure 2b; middle panels). Consistently, VL R96 is highly conserved in clonotype 2 antibodies, but not in other IGHV3-53/3-66 RBD antibodies (Supplementary Figure 3). Thus, the electrostatic interactions between VH D/E98 and VL R98 are highly conserved in clonotype 2 antibodies and can likely help stabilize the CDR H3 loop conformation to minimize entropic cost upon binding to SARS-CoV-2 RBD.
Residue 100 is usually Gly in CDR H3 of clonotype 1 antibodies (Figure 1b). Structural analysis shows that small, non-polar amino acids are favored at position 100 due to the limited space around that residue (Figure 2a, Supplementary Figure 2a; right panels). Moreover, G100 in clonotype 1 has a positive Φ angle, which is typically less favorable for non-Gly amino acids. In contrast, residue 100 is a highly conserved Tyr in CDR H3 of clonotype 2 antibodies (Figure 1b). Structural analysis shows that VH Y100 contributes to the π-π stacking network that is formed via the aromatic ring at VH residue 96 (see above) and an aromatic residue at VL residue 49 (Figure 2b, Supplementary Figure 2b; right panels).
Additionally, we investigated the structural basis of the conservation of VH Y102 among clonotype 2 antibodies. Structural analysis reveals that VH Y102 interacts with RBD Y486 via π-π interactions (Supplementary Figure 4). Only IGHJ4 offers a bulky aromatic side chain at residue 102 (Figure 1d), which explains the common usage of IGHJ4 in clonotype 2 antibodies. In contrast, clonotype 1 antibodies frequently use IGHJ6 (Figure 1d), which has a much shorter Val at residue 102, most likely because IGHJ6 encodes a Gly at residue 100 that can avoid steric clashes with the light chain (see above, Figure 2a, Supplementary Figure 2a; right panels). Of note, the only other IGHJ gene that encodes a non-bulky amino acid at residue 100 is IGHJ3 (Ala). IGHJ1, IGHJ2, IGHJ4, and IGHJ5 all encode a bulky residue at residue 100 (Figure 1d), which may be disfavored in clonotype 1 antibodies due to the limited space where VH residue 100 is located (Supplementary Figure 5). Overall, our structural analyses provide a structural basis for the differential signature sequence motifs in CDR H3 between clonotype 1 and clonotype 2 antibodies.
Incompatibility of CDR H3 between clonotype 1 and clonotype 2 antibodies
To understand the influence of light-chain usage in CDR H3 sequences, we performed a structural alignment of RBD-bound CDR H3 from two clonotype 1 antibodies, namely BD-604 and CC12.1, and two clonotype 2 antibodies, namely BD-629 and CC12.3 (Supplementary Figures 2c-2f). While the CDR H3 conformations are similar within each clonotype (RMSD ranges from 0.27 to 0.41 Å), they are quite different between clonotypes (RMSD ranges from 0.77 Å to 1.5 Å). Although our sample size is small, this analysis suggests that antibodies from clonotypes 1 and 2 have different preferences for their CDR H3 conformations. Such differential preference of CDR H3 conformations may be partly influenced by light-chain usage, as indicated by the structural analyses above on VH residues 96, 98, and 100 (Figure 2, Supplementary Figures 2 and 5).
To experimentally examine the compatibility between CDR H3 and the light chains from clonotype 1 and clonotype 2 antibodies, we focused on antibodies COV107-23 (clonotype 1) and COVD21-C8 (clonotype 2). The heavy-chain sequences of these two antibodies only differ by four amino acids in CDR H3, namely VH residues 96, 98, 99, and 100 (Supplementary Figure 6a). Of note, COV107-23 uses IGHJ4, which is seldom observed among clonotype 1 antibodies but highly preferred in clonotype 2 antibodies (Figure 1c), to encode the two amino acids at the C-terminus of its CDR H3 (Supplementary Figure 6b). Both COV107-23 and COVD21-C8 bind strongly to the SARS-CoV-2 RBD, with dissociation constants (KD) of 1 nM and 4 nM, respectively (Figure 3a). However, when their light chains are swapped, their binding affinity to the RBD is weakened substantially to KD > 1 μM. We further determined apo crystal structures of COV107-23 paired with its native light chain and with the light chain from COVD21-C8 to 2.0 Å and 3.3 Å, respectively (Supplementary Table 2). The conformations of CDR H3 indeed differ when paired with different light chains, as exemplified by the 3.3 Å displacement of VH G97 near the tip of CDR H3 and different side-chain orientations of VH T98 (Figure 3b). In addition, a type I’ β-turn is observed at the tip of CDR H3 in COV107-23 when paired with its native light chain but not with the light chain from COVD21-C8 (Figure 3c). These observations demonstrate that the conformation of CDR H3 changes substantially when IGKV1-9 in COV10-23 is swapped to IGKV3-20, which abolishes the binding to RBD (Figure 3a). The CDR H3 conformation is therefore a determinant for compatibility between the CDR H3 sequence and the light chain in IGHV3-53/3-66 RBD antibodies.
Compatibility of different CDR H3 variants with IGHV1-9 for binding to RBD
Besides antibodies from clonotypes 1 and 2, other IGHV3-53/3-66 RBD antibodies with a range of CDR H3 lengths pair with different light chains (Figure 1a). We further aimed to expand our analysis on CDR H3 compatibility to include CDR H3 from IGHV3-53/3-66 RBD antibodies other than clonotypes 1 and 2. In particular, we focused on identifying CDR H3 sequences that are compatible with IGKV1-9, which is used by clonotype 1 antibodies for binding to RBD. We first compiled a list of 143 CDR H3 variants that were observed in IGHV3-53/3-66 RBD antibodies (Supplementary Table 3). A yeast display library was then constructed with these 143 CDR H3 variants in the B38 antibody, which is a IGHV3-53/IGKV1-9 RBD antibody26. Subsequently, fluorescence-activated cell sorting (FACS) was performed on the yeast display library based on antibody expression level and binding to SARS-CoV-2 RBD (Supplementary Figures 7 and 8). The enrichment level of each CDR H3 variant in the sorted library was quantified by next-generation sequencing (see Methods, Supplementary Table 4). CDR H3 variants that were positively enriched in binding (log10 enrichment > 0) are derived from both IGKV1-9 and non-IGKV1-9 antibodies (Figure 4a). The native CDR H3 for B38 has a log10 enrichment level of - 0.002. As a result, positively enriched CDR H3 variants should have a higher affinity than wild-type B38. A total of 68% (17 out of 25) binding-enriched CDR H3 variants have a length of 9 amino acids, whereas only 31% (37 out of 118) have a length of 9 amino acids in the non-enriched group (Figure 4b). Interestingly, binding-enriched CDR H3 variants with a length of 9 amino acids displayed very similar sequence features as that of clonotype 1 antibodies obtained from literature mining (Figure 1b and 4c). Of note, 41% (7 out of 17) binding-enriched CDR H3 variants with a length of 9 amino acids come from non-IGKV1-9 antibodies. Overall, our yeast display screen indicates that certain CDR H3s from non-IGKV1-9 RBD antibodies are compatible with IGKV1-9 for RBD binding and have similar sequence features as those CDR H3s from clonotype 1 antibodies.
We noticed that some CDR H3 sequences that come from IGKV1-9 RBD antibodies do not enrich in binding. One possibility is that they are still able to bind to RBD, but with a lower affinity than B38, which has a KD of 70 nM to the RBD26. However, as shown by our yeast display screen, CDR H3 sequences from IGKV1-9 antibodies in general have a significantly stronger binding to RBD than those from non-IGKV1-9 antibodies (p-value = 0.002, Figure 4d), whereas their expression level is only marginally higher than that from non-IGKV1-9 antibodies (p-value = 0.06, Figure 4d).
Y58F is a signature SHM in IGHV3-53/3-66 RBD antibodies
We further aimed to understand if there are common SHMs among IGHV3-53/3-66 RBD antibodies. We first categorized IGHV3-53/3-66 RBD antibodies from convalescent SARS-CoV-2 patients by CDR H3 length. The occurrence frequencies of individual SHMs in each category were then analyzed (Figure 5a). This analysis included 214 IGHV3-53/3-66 RBD antibodies that have sequence information available. One clear observation is that Y58F is highly common among IGHV3-53/3-66 RBD antibodies with a CDR H3 length of less than 15 amino acids, but completely absent when the CDR H3 length is 15 amino acids or above, suggesting that Y58F improves the binding of affinity IGHV3-53/3-66 antibodies to RBD only when they have a short CDR H3 loop (CDR H3 < 15 amino acids). To understand the effect of Y58F on the binding affinity of IGHV3-53/3-66 antibodies to the RBD, we compared the binding affinity of the same antibodies that carry either Y58 or F58 to the RBD. In particular, we focused on three IGHV3-53/3-66 RBD antibodies that have a CDR H3 length of 9 amino acids – one in clonotype 1 (COV107-23), and two in clonotype 2 (COVD21-C8 and CC12.3). Our BLI experiments showed that the Y58F mutation dramatically improved the affinity of the three antibodies (COV107-23, COVD21-C8 and CC12.3) by ~10-fold to ~1000-fold (Figure 5b, Supplementary Figure 9). As a control, we also performed the same experiment on an IGHV3-53/3-66 antibody with a CDR H3 length of 15 amino acids, namely COVA2-20. In contrast to those three IGHV3-53/3-66 RBD antibodies with a short CDR H3, COVA2-20 shows similar binding affinity to RBD between Y58 and F58 variants (Figure 5b, Supplementary Figure 5). Taken together, our results show that Y58F appears to be a signature SHM in IGHV3-53/3-66 RBD antibodies with CDR H3 length of < 15 amino acids. In fact, the results here are consistent with our previous finding that IGHV3-53/3-66 RBD antibodies with CDR H3 length of 15 amino acids or longer adopt a different binding mode as compared to those with a shorter CDR H354.
Interestingly, a Y58F mutation results in a loss of hydrogen bonding interactions between residue 58 of the heavy chain and T415 of the RBD (Supplementary Figure 10), yet the mutation significantly increases the binding affinity of the antibody to the RBD. We then performed a structural analysis on seven IGHV3-53/66 RBD antibodies with Y58F mutation and nine without26,29,38,40,47,54-57. Our results indicate that, by removal of the hydroxyl group, the side chain of Y58F moves closer to the backbone carbon of RBD T415 (Supplementary Figure 10). The average distance between the centroid of the side-chain aromatic ring at VH residue 58 and the backbone carbon of RBD T415 are 5.3 Å and 5.9 Å for antibodies that carry F58 and Y58, respectively. Since T-shaped π-π stacking is optimal at around 5.0 to 5.2 Å58,59, F58 but not Y58 can form strong T-shaped π-π stacking interactions with the amide backbone of RBD T415. This observation can at least partly explain why Y58F improves affinity despite the loss of a hydrogen bond with the RBD.
Discussion
While several studies to date have described IGHV3-53/3-66 as a commonly used germline for SARS-CoV-2 RBD antibodies23,31,47,54, the exact sequence requirements for generating an IGHV3-53/3-66 antibody to SARS-CoV-2 RBD has remained largely elusive. As a result of numerous efforts from multiple groups in isolating RBD antibodies and reporting their sequences20–40, detailed characterization of RBD antibody sequence features has become possible. Through sequence analysis, biophysical experiments, and high-throughput screening, we identified distinct sequence requirements for two public clonotypes (clonotypes 1 and 2) of IGHV3-53/3-66 RBD antibodies. In fact, the frequent occurrence of IGHV3-53/3-66 RBD antibodies with IGHJ6 and a CDR H3 length of 9 amino acids, which are germline features of clonotype 1 antibodies, have also been reported in previous publications23,60.
One important finding in this study is that the CDR H3 sequence that supports IGHV3-53/3-66 antibodies binding to RBD is light chain-dependent. This finding is consistent with our previous observation that there is a large diversity of CDR H3 sequences in IGHV3-53/3-66 RBD antibodies54. In addition, our findings explain a recent observation by Banach and colleagues61 who showed that swapping the heavy and light chains of different IGHV3-53/3-66 RBD antibodies often substantially reduced their neutralization potency. Therefore, IGHV3-53/3-66 provides a robust framework to generate different public clonotypes that have distinct CDR H3 and light chain sequence signatures. While only two major clonotypes of IGHV3-53/3-66 RBD antibodies are examined in this study, it will be worth characterizing other minor clonotypes to obtain a more complete understanding of the compatibility between CDR H3 sequence and light-chain identity among IGHV3-53/3-66 RBD antibodies.
Although this study revealed that Y58F is a common SHM that improves the affinity of IGHV3-53/3-66 antibodies with a short CDR H3 to RBD, other common SHMs have also shown up in our sequence analysis (Figure 5a), albeit with a lower frequency. Most noticeably, a cluster of common SHMs is found in VH framework region 1 from residues 26 to 28. This cluster of SHMs is also likely to be important for affinity maturation to RBD. A recent study has indeed shown that SHMs VH F27V and T28I together increase affinity by 100-fold of an IGHV3-53/3-66 antibody to the SARS-CoV-2 RBD38. Additional common SHMs among IGHV3-53/3-66 RBD antibodies with a short CDR H3 include S31R in CDR H1 and V50L in CDR H2 (Figure 5a). As a result, while IGHV3-53/3-66 RBD antibodies do not require any SHM to neutralize SARS-CoV-2 57, this study along with others have shown that SHM can substantially improve the binding affinity of IGHV3-53/3-66 antibodies to RBD38,57. Consistently, RBD antibodies from convalescent SARS-CoV-2 patients have significantly more SHMs and higher neutralization potency at 6 months post-infection than at 1-month post-infection62.
Circulating SARS-CoV-2 mutant variants represent a major ongoing challenge to natural immunity and vaccination. In particular, a lot of attention has been focused on RBD mutation E484K, which has emerged in multiple independently SARS-CoV-2 lineages63,64 and can alter the antigenicity of the spike protein65–67. Another naturally occurring RBD mutation, K417N, which has emerged in South Africa and Brazil (B.1.351 lineage and B.1.1.28, respectively)63,64,68, has recently been shown to also alter antigenicity of the spike protein66,69-71. Consistently, we found that K417N dramatically decreased the binding of COV107-23 (clonotype 1) and COVD21-C8 (clonotype 2) to RBD (Supplementary Figures 11a-11b). In fact, K417 forms an electrostatic interaction with the signature residue VH D/E98 of CDR H3 in clonotype 2 antibodies (Figure 2b) and can also interact with CDR H3 of clonotype 1 antibodies (Supplementary Figure 11c), providing a structural explanation for its change in antigenicity. Constant antigenic drift of SARS-CoV-2 is unavoidable if it keeps circulating among humans. Thus, sustained efforts in characterizing the antibody response to SARS-CoV-2 as it evolves will not only benefit vaccine development and assessment, but also improve our fundamental understanding of the ability of the antibody repertoire to rapidly respond to viral infections.
Methods
Literature mining for antibodies to SARS-CoV-2 RBD
Sequences of anti-SARS-CoV-2 RBD from convalescent patients infected with SARS-CoV-2 were obtained from published articles20–40 (Supplementary Table 1). IgBlast was used to identify somatic hypermutations and analyze IGHJ gene usage72. Of note, IgBlast can only identify IGHJ gene usage for antibodies with available nucleotide sequences. Sequence logos were generated by WebLogo73.
Expression and purification of Fc-tagged RBD
The receptor-binding domain (RBD) (residues 319-541) of the SARS-CoV-2 spike (S) protein (GenBank: QHD43416.1) was fused with an N-terminal Igk secretion signal and a C-terminal SSSSG linker followed by an Fc tag and cloned into a phCMV3 vector. The plasmid was transiently transfected into Expi293F cells using ExpiFectamine™ 293 Reagent (Thermo Fisher Scientific) according to the manufacturer’s instructions. The supernatant was collected at 7 days post-transfection. The Fc-tagged RBD was purified with by KanCapA protein A affinity resin (Kaneka).
Expression and purification of Fabs
Fab heavy and light chains were cloned into phCMV3. Heavy chain Y58F or F58Y mutants were constructed using the QuikChange XL Mutagenesis kit (Stratagene) according to the manufacturer’s instructions. The plasmids were transiently co-transfected into Expi293F cells at a ratio of 2:1 (HC:LC) using ExpiFectamine™ 293 Reagent (Thermo Fisher Scientific) according to the manufacturer’s instructions. The supernatant was collected at 7 days post-transfection. The Fab was purified with a CaptureSelect™ CH1-XL Pre-packed Column (Thermo Fisher Scientific).
Biolayer interferometry binding assay
Binding assays were performed by biolayer interferometry (BLI) using an Octet Red96e instrument (FortéBio) as described previously74. Briefly, Fc-tagged SARS-CoV-2 RBD proteins at 20 to 100 μg/ml in 1x kinetics buffer (1x PBS, pH 7.4, 0.01% w/v BSA and 0.002% v/v Tween 20) were loaded onto streptavidin (SA) biosensors and incubated with the indicated concentrations of Fabs. The assay consisted of five steps: 1) baseline: 60 s with 1x kinetics buffer; 2) loading: 300 s with His6-tagged S or RBD proteins; 3) baseline: 60 s with 1x kinetics buffer; 4) association: 60 s with samples (Fab or IgG); and 5) dissociation: 60s with 1x kinetics buffer. For estimating the exact KD, a 1:1 binding model was used.
X-ray crystallography
Fabs COV107-23 (15 mg/ml) and COV107-23 paired with the light chain of COVD21-C8 (COV107-23-swap, 14 mg/ml) were screened for crystallization using the 384 conditions of the JCSG Core Suite (Qiagen) on our custom-designed robotic CrystalMation system (Rigaku) at Scripps Research by the vapor diffusion method in sitting drops containing 0.1 μl of protein and 0.1 μl of reservoir solution. For COV107-23, optimized crystals were grown in 0.085 M of sodium citrate - citric acid pH 5.6, 0.17 M ammonium acetate, 15% (v/v) glycerol, and 25.5% (w/v) polyethylene glycol 4000 at 20°C. For COV107-23-swap, optimized crystals were grown in 0.1 M of sodium citrate pH 4, 1 M lithium chloride, and 20% (w/v) polyethylene glycol 6000 at 20°C. Crystals were grown for 7 days and then harvested and flash cooled in liquid nitrogen. Diffraction data were collected at cryogenic temperature (100 K) at Stanford Synchrotron Radiation Lightsource (SSRL) on the Scripps/Stanford beamline 12-1 with a beam wavelength of 0.97946 Å, and processed with HKL200075. Structures were solved by molecular replacement using PHASER76, where the models were generated by Repertoire Builder (https://sysimm.org/rep_builder/)77. Iterative model building and refinement were carried out in COOT78 and PHENIX79, respectively.
Construction of plasmids and CDR H3 library
143 oligonucleotides (Supplementary Table 3) encoding CDR H3 were obtained from Integrated DNA Technologies (IDT) and PCR-amplified using 5’-ACC TAC AGA TGA ATT CTC TTA GGG CAG AAG ATA CCG CCG TCT ACT ACT GC-3’ as forward primer and 5’-GGG CCT TTT GTA GAA GCT GAA CTC ACA GTG ACG GTA GTC CCT TGT CCC CA-3 as reverse primer. Then, the amplified oligonucleotide pool was gel-purified using a GeneJET Gel Extraction Kit (Thermo Scientific).
Wild-type (WT) B38 yeast display plasmid, pCTcon2_B38, was generated by cloning the coding sequence of (from N-terminal to C-terminal, all in-frame) Aga2 secretion signal, B38 Fab light chain, V5 tag, ERBV-1 2A self-cleaving peptide, Aga2 secretion signal, B38 Fab heavy chain, HA tag, and Aga2p, into the pCTcon2 vector80. pCTcon2_B38 was PCR-amplified using 5’-TGG GGA CAA GGG ACT ACC GTC ACT GTG-3 as forward primer and 5-GCA GTA GTA GAC GGC GGT ATC TTC TGC-3’ as reverse primer to generate the linearized vector. The PCR product was then gel-purified.
Yeast antibody display library generation
5 μg of the amplified oligonucleotide pool and 4 μg of purified linearized vector were transformed into Saccharomyces cerevisiae EBY100 via electroporation following previously published protocol81 to generate a B38 yeast display library with different CDR H3 variants.
Fluorescence-activated cell sorting of yeast antibody display library
100 μl of WT B38 yeast antibody display library glycerol stock was recovered in 50 ml SD-CAA medium (2% w/v D-glucose, 0.67% w/v yeast nitrogen base with ammonium sulfate, 0.5% w/v casamino acids, 0.54% w/v Na2HPO4, 0.86% w/v NaH2PO4·H2O, all dissolved in deionized water) by incubating at 27°C with shaking at 250 rpm until OD600 reached between 1.5 and 2.0. At this time, 15 ml of the yeast culture was harvested, and the yeast pellet was obtained via centrifugation at 4,000 × g at 4°C for 5 min. The supernatant was discarded, and SGR-CAA (2% w/v galactose, 2% w/v raffinose, 0.1% w/v D-glucose, 0.67% w/v yeast nitrogen base with ammonium sulfate, 0.5% w/v casamino acids, 0.54% w/v Na2HPO4, 0.86% w/v NaH2PO4·H2O, all dissolved in deionized water) was added to make up the volume to 50 ml. The yeast culture was then transferred to a baffled flask and incubated at 18°C with shaking at 250 rpm. Once OD600 has reached between 1.3 and 1.6, 1 ml of yeast culture was harvested, and the yeast pellet was obtained via centrifugation at 4,000 × g at 4°C for 5 min. The pellet was subsequently washed with 1 ml of 1x PBS twice. After the final wash, cells were resuspended in 1 ml of 1x PBS.
Then, for expression assay, 1 μg of PE anti-HA.11 (epitope 16B12, BioLegend, Cat. No. 901517) buffer-exchanged into 1x PBS was added to the cells. A negative control was set up with nothing added to the PBS-resuspended cells. Samples were incubated overnight at 4°C with rotation. Then, the yeast pellet was washed twice in 1x PBS and resuspended in FACS tubes containing 2 ml 1X PBS. Using a BD FACS Aria II cell sorter (BD Biosciences), PE-positive cells were collected in 1 ml of SD-CAA containing 1x Penicillin/Streptomycin. Cells were then collected via centrifugation at 4,500 rpm at 20°C for 15 min. The supernatant was discarded. Subsequently, the pellet was resuspended in 100 μl of SD-CAA and plated on SD-CAA plates at 37°C. After 40 h, colonies were collected in 2 ml of SD-CAA. Frozen stocks were made by reconstituting the pellet in 15% v/v glycerol (in SD-CAA medium) and then stored at −80°C.
For binding assay, 20 μg of SARS-CoV-2 S RBD-Fc was added to washed cells. A negative control was set up with nothing added to the PBS-resuspended cells. Samples were incubated overnight at 4°C with rotation. The yeast pellet was then washed twice in 1x PBS. After the last wash, cells were resuspended in 1 ml of 1x PBS. Subsequently, 1 μg of PE anti-human IgG Fc antibody (clone HP6017, BioLegend, Cat. No. 409304) buffer-exchanged into 1x PBS was added to yeast. Cells were incubated at 4°C for 1 h with rotation. The yeast pellet was then washed twice in 1x PBS and resuspended in FACS tubes containing 2 ml 1x PBS. Using a BD FACS Aria II cell sorter (BD Biosciences), PE-positive cells were collected in 1 ml of SD-CAA containing 1x Penicillin/Streptomycin. Cells were then collected via centrifugation at 4,500 rpm at 20°C for 15 min. The supernatant was then discarded. Subsequently, the pellet was resuspended in 100 μl of SD-CAA and plated on SD-CAA plates at 37°C. After 40 h, colonies were collected in 2 ml of SD-CAA, and subsequently pelleted. Frozen stocks were made by reconstituting yeast pellets with 15% v/v glycerol (in SD-CAA medium) and then stored at −80°C.
Next-generation sequencing of CDR H3 loops
Plasmids from the unsorted yeast display library (input) as well as two replicates of sorted yeast display library based on RBD-binding and expression were extracted from sorted yeast cells using a Zymoprep Yeast Plasmid Miniprep II Kit (Zymo Research) following the manufacturer’s protocol. The CDR H3 region was subsequently amplified via PCR using 5’-ACC TAC AGA TGA ATT CTC TTA GG-3’ and 5’-GGG CCT TTT GTA GAA GCT GAA CT-3’ as forward and reverse primers, respectively. Subsequently, adapters containing sequencing barcodes were appended to the genes encoding the CDR H3 region via PCR. 100 ng of each sample was used for paired-end sequencing using Illumina MiSeq PE150 (Illumina). PEAR was used for merging the forward and reverse reads82. Regions corresponding to the CDR H3 were extracted from each paired read. The number of reads corresponding to each CDR H3 variant in each sample is counted. A pseudocount of 1 was added to the final count to avoid division by zero in enrichment calculation. The enrichment for variant i was computed as follows:
Code availability
Custom python scripts for analyzing the deep mutational scanning data have been deposited to https://github.com/wchnicholas/IGHV3-53_sequence_features. Files for Rosetta modeling are available at https://github.com/timothyjtan/ighv3-53_3-66_antibody_sequence_features.
Data availability
Raw sequencing data have been submitted to the NIH Short Read Archive under accession number: BioProject PRJNA691562. The X-ray coordinates and structure factors will be deposited to the RCSB Protein Data Bank prior to publication.
Competing interests
The authors declare no competing interests.
Author Contributions
T.J.C.T., G.C.P. and N.C.W. conceived and designed the study. J.R.B., M.Y. and B.M.S. expressed and purified the proteins. T.J.C.T., K.K., X.C., J.R.C. and C.B.B. performed the yeast display experiments. T.J.C.T., Y.W. and N.C.W. processed the next-generation sequencing data. M.Y. and X.Z. performed the crystallization, X-ray data collection, determined and refined the X-ray structures. T.J.C.T., M.Y., G.C.P., I.A.W. and N.C.W. analyzed the data. T.J.C.T., M.Y., I.A.W. and N.C.W. wrote the paper and all authors reviewed and/or edited the paper.
Acknowledgements
We thank the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign for assistance with fluorescence-activated cell sorting and next-generation sequencing. This work was supported by NIH R00 AI139445 (N.C.W.), and Bill and Melinda Gates Foundation INV-004923 (I.A.W.).