De novo design of high-affinity antibody variable regions (Fv) against the SARS-CoV-2 spike protein

The emergence of SARS-CoV-2 is responsible for the pandemic of respiratory disease known as COVID-19, which emerged in the city of Wuhan, Hubei province, China in late 2019. Both vaccines and targeted therapeutics for treatment of this disease are currently lacking. Viral entry requires binding of the viral spike receptor binding domain (RBD) with the human angiotensin converting enzyme (hACE2). In an earlier paper1, we report on the specific residue interactions underpinning this event. Here we report on the de novo computational design of high affinity antibody variable regions through the recombination of VDJ genes targeting the most solvent-exposed hACE2-binding residues of the SARS-CoV-2 spike protein using the software tool OptMAVEn-2.02. Subsequently, we carry out computational affinity maturation of the designed prototype variable regions through point mutations for improved binding with the target epitope. Immunogenicity was restricted by preferring designs that match sequences from a 9-mer library of “human antibodies” based on H-score (human string content, HSC)3. We generated 106 different designs and report in detail on the top five that trade-off the greatest affinity for the spike RBD epitope (quantified using the Rosetta binding energies) with low H-scores. By grafting the designed Heavy (VH) and Light (VL) chain variable regions onto a human framework (Fc), high-affinity and potentially neutralizing full-length monoclonal antibodies (mAb) can be constructed. Having a potent antibody that can recognize the viral spike protein with high affinity would be enabling for both the design of sensitive SARS-CoV-2 detection devices and for their deployment as therapeutic antibodies.

vaccine © 2020 Moderna, Inc) and neutralizing antibodies 9 has been made. A computational study identified the structural basis for multi-epitope vaccines 10,11 whereas in another study, the glycosylation 36 patterns of the spike SARS-CoV-2 protein were computationally deduced 12 . In addition, fully human single 37 domain anti-SARS-CoV-2 antibodies with sub-nanomolar affinities were identified from a phage-displayed 38 single-domain antibody library. Naïve CDRs were grafted to framework regions of an identified human germline IGHV allele using SARS-CoV-2 RBD and S1 protein as antigens 13 . In another study 14 , a human 40 antibody 47D11 was identified to have cross neutralizing effect on SARS-CoV-2 by screening a library of 41 SARS-CoV-1 antibodies. In two other studies, potent neutralizing antibodies were isolated from the sera of 42 convalescent COVID-19 patients 15,16 . To the best of our knowledge, none of these neutralizing antibody 43 sequences are publicly available. In a follow up effort 17 , human antibody CR3022 (which is neutralizing 44 against SARS-CoV-1 18 ) has been shown to bind to SARS-CoV-2 RBD in a cryptic epitope but without a

52
Motivated by these shortcomings, here we explore the de novo design of antibody variable regions targeting 53 the most solvent-exposed residues of the spike protein that are also part of the residue contact map involved 54 in hACE2 binding, and trade-off binding energy against human sequence content in the variable region.

55
The goal here is to exhaustively explore the sequence space of all possible variable region designs and 56 report a range of diverse solutions that can serve as potentially neutralizing antibodies (nAb). We find that 57 many different combinations of VDJ genes followed by mutation can yield potentially high affinity variable 58 regions (scored using the Rosetta binding energy function) against an epitope of the spike protein RBD.

59
Pareto optimal designs with respect to binding affinity vs. human content were drawn and five affinity 60 matured designs are detailed in the results section.

62
We first performed solvent accessibility analysis using the STRIDE 21 program on the 21 hACE2-binding 63 residues of the SARS-CoV-2 spike protein (S-protein) RBD to define our binding epitope. The top seven 64 residues with the highest solvent accessibility scores (i.e., SAS) are (Arg346, Phe347, Ala348, Tyr351, Figure 1. The SARS-CoV-2 spike protein RBD in complex with Human ACE2 protein (PDB-id: 6LZG) is shown 70 along with the most solvent accessible residues at the binding interface highlighted in purple. A zoomed view of these 71 seven epitope residues is shown in the inset box. The numbering scheme for the S-protein residues is same as in PDB 72 accession id 6LZG (rcsb.org/structure/6LZG or 6VW1 7 and 6M0J 6 ).

74
We next used the previously developed OptMAVEn-2.0 2 software to computationally identify the 75 combination of VDJ genes forming the variable region that best binds the desired epitope. OptMAVEn 22 76 has been used before successfully to design five high affinity CDRs against a FLAG tetrapeptide 23 , three 77 thermally and conformationally stable antibody variable regions (sharing less than 75% sequence similarity 78 to any naturally occurring antibody sequence) against a dodecapeptide mimic of carbohydrates 24 and two 79 thermostable, high affinity variable heavy chain domains (VHH) against α-synuclein peptide responsible 80 for Parkinson's disease 25 . All these designs were experimentally constructed and nanomolar affinities for 81 their respective target antigens was demonstrated.

83
Through a combination of rotations and translations, OptMAVEn-2.0 identified 3,234 unique antigen poses 84 that presented the epitope to the antibody differently. The combinatorial space of different VDJ genes that 85 upon recombination form the variable region of the prototype antibody was informed by the MAPs database of antibody parts 26 . MAPs (see Supp. Info. S1 for link to full database) contains 929 modular antibody (i.e., variable-V*, complementarity determining -CDR3, and joining-J*) parts from 1,168 human, humanized, parts, C-terminus-shortened V parts (i.e. V* parts) and N-terminus-shortened J parts (J* parts). Note that 92 CDR3 includes the entire D gene and also up to the C-terminus of the V gene and up to the N-terminus of 93 the J gene. In the remainder of the manuscript, the list of parts used to design the variable region are referred 94 to as CDR3, V* and J* parts.

96
For each one of the 3,234 spike poses, OptMAVEn-2.0 identified a variable region combination composed 97 of end-to-end joined V*, CDR3, and J* region parts that minimized the Rosetta binding energy between 98 the variable region and spike epitope formed by the seven residues. As part of OptMAVEn-2.0, the the complete design of the variable region using parts denoted as HV*, HCDR3, HJ* for the heavy chain paper 2 ). The top five prototype designs with the highest Rosetta binding energies were present in four 108 clusters and spanned a highly diverse set of choices of MAPs parts (see Table 1) with minimal conservation 109 of the same part among the five prototype designs. The number entries in Table 1

124
Inspection of the interaction of design P1 with the spike epitope reveals strong electrostatic contacts 125 between the S-protein residues Tyr351, Asn354, and Arg355 (see Figure 1c) all of which have been deemed 126 important for hACE2 binding 1 . The strongest contacts with the three epitope residues are established by 127 five antibody residues spanning both the heavy and light chains (shown in yellow in Figure 2b). Spike Tyr351 interacts with Ser64 in the HV* domain, Asn354 interacts with Glu38 and Tyr114 in HV* and KV* 129 domains respectively, while spike Arg355 interacts with Asn37 and Asp110 of HV* and HCDR3 domains, 130 respectively, in the stable spike-antibody complex (see Figure 2c).

139
We next applied Rosetta-based in silico affinity maturation (see Methods) for each one of the top five 140 prototype designs shown in Table 1 Figure 3a). On average, upon affinity maturation, the 147 binding energy was improved by ~14 kcal/mol and the overall energy was improved by ~37 kcal/mol.

S3) between computational affinity matured and prototype variable region designs.
We next assessed the departure of the 106 designed variable regions from fully-human antibody sequences antibodies 22 . The value of H-score is scaled to 100 and normalized by the length of the sequence. An antibody sequence with all 9-mers exactly matching 9-mers of human antibodies will have a perfect H-157 score of 100. Figure 3b illustrates the trade-off between the Rosetta binding energy vs. H-score for these 158 affinity matured variable region designs. For comparison, we calculated the H-score for the human 159 antibodies CR3022 32 , 80R 33 , S230 34 and M396 35 which are known to be neutralizing against SARS-CoV-  spike protein RBD using complex structure (PDB-id:6W41) to be -56.4 kcal/mol which is very close to the on the spike RBD than the one that CR3022 targets (see Supp. Fig S7).

175
176 Figure 4 shows the sequence alignment of these five selected affinity matured sequences (i.e., P1.D1,  187 188 Table 2. List of important contacts between the spike protein epitope residues and residues of each of the selected 189 affinity matured designs. For each contact, the loss in binding energy upon mutation of antibody residue from the 190 interface to alanine is tabulated. The corresponding interacting spike residue is also shown. Antibodies that strongly bind to the RBD but do not inhibit hACE2 binding have been shown to be 195 neutralizing for SARS-CoV-2 (47D11 14 ) and for SARS-CoV-1 (CR3022 in combination with CR3014 32 ).

196
The mechanisms of neutralization of such antibodies are not completely known 14 . It is possible that upon

204
In comparison, our design P1.D1 forms strong contacts (see Table 2) with many residues of the RBD which 205 in turn also indirectly interact with hACE2 (see Figure 5). For example, residues L455 and T470 of the 206 RBD are in contact with both hACE2 contacting RBD residues Y449, F490 and P1.D1 contacting RBD residues Y351, I465. By perturbing the inter-residue interaction network of RBD-hACE2 it is plausible that a neutralizing effect can be achieved.

210
We also carried out an all-atom Molecular Dynamics ( RBD of the SARS-CoV-2 spike protein potentially sequestering it from hACE2. This is also corroborated 217 by the Rosetta binding energy value of around -48.3 kcal/mol 1 calculated for the spike protein RBD with 218 hACE2 (from PDB-id: 6lzg) which is weaker by over 7 kcal/mol compared to designs P1.D1 and P1.D2.

219
Finally, it is important to stress that our designs rely on the accuracy of the Rosetta energy function to 220 Figure 5. Space filling plot of the P1.D1-RBD complex superimposed on the hACE2-RBD complex. In blue are shown the RBD residues in contact with hACE2 and in green the RBD residues in contact with the P1.D1 designed antibody. In pink are shown all the RBD residues that are in contact with both hACE2 contacting RBD residues and P1.D1 contacting RBD residues. A few residues that are a part of these inter-residue contacts are labelled.
recapitulate experimental affinities and that carrying out experimental binding assays are needed to confirm or refute these findings.

225
In summary, the goal of this computational analysis was to assess the range of possible antibody designs 226 that can affect binding with the viral spike protein by interacting with residues involved in hACE2 binding.

227
We reported on de novo prototype variable regions targeting the most solvent accessible seven-residue 228 epitope in the spike and their (computationally) affinity matured sequences with the lowest Rosetta binding 229 energies. Designs were rank ordered not only in terms of their Rosetta binding energy but also their 230 humanness score metric H-score. We reported complete amino acid sequences for the 106 affinity matured 231 designs as well as the five prototype sequences and V*, CDR3, and J* parts used. Importantly, we would 232 like to note that high affinities of designed antibodies, as modeled using the Rosetta binding energy