Engineered display of ganglioside-sugars on protein elicits a clonally and structurally constrained B cell response

Ganglioside sugars, as Tumour-Associated Carbohydrate Antigens (TACAs), are long-proposed targets for vaccination and therapeutic antibody production, but their self-like character imparts immunorecessive characteristics that classical vaccination approaches have to date failed to overcome. One prominent TACA, the glycan component of ganglioside GM3 (GM3g), is over-expressed on diverse tumours. To probe the limits of glycan tolerance, we used protein editing methods to display GM3g in systematically varied non-native presentation modes by attachment to carrier protein lysine sidechains using diverse chemical linkers. We report here that such presentation creates glycoconjugates that are strongly immunogenic in mice and elicit robust antigen-specific IgG responses specific to GM3g. Characterisation of this response by antigen-specific B cell cloning and phylogenetic and functional analyses suggests that such display enables the engagement of a highly restricted naïve B cell class with a defined germline configuration dominated by members of the IGHV2 subgroup. Strikingly, structural analysis reveals that glycan features appear to be recognised primarily by antibody CDRH1/2, and despite the presence of an antigen-specific Th response and B cell somatic hypermutation, we found no evidence of affinity maturation towards the antigen. Together these findings suggest a ‘reach-through’ model in which glycans, when displayed in non-self formats of sufficient distance from a conjugate backbone, may engage ‘glycan ready’ V-region motifs encoded in the germline. Structural constraints define why, despite engaging the trisaccharide, antibodies do not bind natively-presented glycans, such as when linked to lipid GM3. Our findings provide an explanation for the long-standing difficulties in raising antibodies reactive with native TACAs, and provide a possible template for rational vaccine design against this and other TACA antigens. Highlights GM3g synthetically coupled via a longer, orthogonal (from backbone) glycoconjugate (LOG) presentation format (thioethyl-lysyl-amidine) display elicits high-titre IgG responses in mice. The germinal centre experience of LOG glycoconjugate-specific B cell responses is directly influenced by the protein backbone. Structural characterisation of the antibody response to LOGs reveals highly restricted germline-encoded glycan-engaging motifs that mediate GM3g recognition. Failure of antibodies to bind the native trisaccharide highlights barriers to be overcome for the rational design of anti-TACA antibodies.

protein lysine sidechains using diverse chemical linkers. We report here that such 48 presentation creates glycoconjugates that are strongly immunogenic in mice and elicit 49 robust antigen-specific IgG responses specific to GM3g. Characterisation of this 50 response by antigen-specific B cell cloning and phylogenetic and functional analyses 51 suggests that such display enables the engagement of a highly restricted naïve B cell 52 class with a defined germline configuration dominated by members of the IGHV2 53 subgroup. Strikingly, structural analysis reveals that glycan features appear to be 54 recognised primarily by antibody CDRH1/2, and despite the presence of an antigen-55 specific Th response and B cell somatic hypermutation, we found no evidence of 56 affinity maturation towards the antigen. Together these findings suggest a 'reach-57 through' model in which glycans, when displayed in non-self formats of sufficient 58 distance from a conjugate backbone, may engage 'glycan ready' V-region motifs 59 encoded in the germline. Structural constraints define why, despite engaging the 60 trisaccharide, antibodies do not bind natively-presented glycans, such as when linked 61 to lipid GM3. Our findings provide an explanation for the long-standing difficulties in 62 raising antibodies reactive with native TACAs, and provide a possible template for 63 rational vaccine design against this and other TACA antigens. Differing presentation of GM3g modulates B cell immunogenicity 164 GM3 presents its glycan (GM3g) (Fig 1a) natively at short distance (estimated at 6 Å 165 based on native O-glycoside, three-bond O-hydroxymethyl spaced display from the 166 head group) from its native macromolecular (lipid membrane) assembly surface. 167 We first chose to interrogate the inherent immunogenicity in mice of natively-presented 168 GM3g in the context of intact GM3 lipid. Assembly of GM3-bearing liposomes 169 (PG:PC:Chol:GM3 = 39:39:19:3) created an appropriate macromolecular assembly 170 bearing multi-copy GM3g (Fig S1a). groups of 2,156), and similarly low anti-GM3 IgM titres (EPT = 492) (Fig S1b-d). 175 These findings are consistent with non-specific IgM binding and the absence of 176 specific GM3g-binding antibodies, reflecting the low-affinity, high-avidity nature of IgM 177 in ELISA formats (29,30). Unsurprisingly, in the absence of T cell help (classically 178 provided by protein in the antigenic complex), antigen-specific IgG was not detected 179 against either ceramide or GM3 (Fig S1e,f). These data confirm the profoundly limited 180 immunogenicity of GM3g in this macromolecular format. 181

182
We next explored an alternative non-native macromolecular assembly upon which to 183 display GM3g. Orthogonal display on macromolecular protein scaffolds has the 184 potential to mimic membrane-like multi-copy GM3g display, yet allowing control of 185 copy-number density, site-specific conjugation and, critically, distance from the 186 surface in terms of longer orthogonal display. The use of precise protein-editing 187 methods via lysine (Lys)-selective (31) 'tag-and-modify' methods (32) allows GM3g 188 presentation in diverse protein scaffolds (Fig 1b) subsequently GM3g copy number and spacing (Fig 1e,f; Fig S3). This set of mutants 248 permitted the dissection of features including moiety spacing, such as proximal versus 249 distal GM3g glycoconjugates in HEL-[-amidine-GM3g]3p and HEL-[-amidine-250 GM3g]3d). Notably, predicted pI values were essentially unaltered: wtHEL was 9.32, 251 whereas HEL-null (in which all Lys were mutated to Arg) was 9.48. In this way, full 252 control of Lys sites and copy numbers (n = 0-6) allowed editing of the GM3g in Strikingly, these revealed that not only is copy number a determining factor, but that 258 contrary to prior avidity-centric perceptions, maximal loading does not deliver 259 maximum titres. Indeed, optimal sugar loading with respect to anti-glycoconjugate 260 antibody production was not proportional to the number of modifications but was found 261 to be 2-4 (for HEL-[-amidine-GM3g]2-4) in the absence of adjuvant, with significant 262 reductions in IgG titres for HEL-[-amidine-GM3g]5 and HEL-[-amidine-GM3g]6 263 (P < 0.0001) (Fig 1g). Interestingly, the glycoconjugate spacing in the case of HEL-[-264 amidine-GM3g]3p and HEL-[-amidine-GM3g]3d had no obvious bearing on the final 265 GM3g-specific IgG titres. 266

267
To understand the origins of this counterintuitive outcome, we evaluated possible 268 mechanisms. First, we tested whether the increased GM3g-specific titres arising from 269 HEL-[-amidine-GM3g]2-4 immunisation were a consequence of Lys-to-Arg mutations 270 changing the T cell immunogenicity of the protein backbone, possibly introducing 271 artificial T cell epitopes that enhanced the response rather than a genuine GM3g 272 13 loading effect. To assess this, we immunised mice with incompletely amidine-GM3g-273 modified wtHEL derived from chemical modification conditions adjusted to instead 274 yield a product where the mean glycan occupancy was lowered to ~3.7 per HEL. Mice 275 immunised with this alternative lower copy product again showed greater GM3g-276 specific IgG titres compared to the high copy number LOG, HEL-[-amidine-GM3g]6 277 (Fig S4a,b) (P = 0.029), implying that the differential GM3g titres were unlikely to 278 result from protein carrier amino acid substitutions impacting T cell help. Notably, HEL 279 is a weak T cell antigen in BALB/c mice(34), and though the high IgG titres imply that 280 sufficient T help is generated to facilitate reliable antigen-specific isotype switching, 281 we were unable to detect Th recall responses, including in mice that had received HEL 282 in MPLA (Fig S4c-g). 283 284 To further probe the relationship between glycoconjugate occupancy and the 285 downstream humoral response, we evaluated the anti-GM3g IgM response two weeks 286 post-prime (Fig S4h). These titres reflect the early humoral response which may not 287 necessitate Th support. Although IgM titres were lower and data more dispersed 288 compared to IgG, the trends with respect to glycan occupancy were the same, again 289 implying that this is likely to be a Th cell-independent effect. This GM3g occupancy 290 phenomenon was distinct from that observed against the HEL backbone, which was 291 found to largely be adjuvant-(P < 0.0001) rather than sugar loading-dependent (P = 292 0.3496, two-way ANOVA) effect (Fig S4i). Collectively, these data therefore highlight 293 that glycan occupancy may have a substantial effect on antibody outcomes, 294 suggesting that the titration of optimal loading can be leveraged to deliver higher titres. 295 Interestingly, HEL-[-amidine-GM3g]0 in which all lysines were mutated to arginine 296 elicited a low titre anti-GM3g response (EPT = 2,940) in formulation with MPLA (Fig  297  1g). These data, along with mass spectrometric analysis (Fig S3b) suggest that even 298 partial incorporation of GM3g onto the N-terminal primary amine is sufficient to initiate 299 a response against the glycoconjugate. 300 301 GM3g-specific antibodies raised with multiple protein carriers 302 Having demonstrated that HEL LOGs elicit substantial IgG titres even with relatively 303 low glycan copy numbers, we next tested the immunogenicity of the amidine-GM3g 304 LOG on a different protein carrier, truncated gp120. This provided an excellent 305 additional test of the LOG method, with more potential Lys 'tag' sites and a backbone 306 that supplies multiple Th epitopes. Notably, while the total number of lysines on the 307 gp120 construct used was 25, after application of the same benign editing methods 308 for LOG generation, we estimated via electrophoretic analysis and densitometry data 309 that amidine-GM3g loading delivered a mean of approximately 16 modifications 310 (gp120-[-amidine-GM3g]16, Fig S5a,b). This partial lysine occupancy may be a 311 consequence of the heavy endogenous N-linked glycosylation on gp120 reducing the 312 accessibility of some lysine sidechains. 313

Antigen-specific T helper responses are unaltered in LOGs 333
Any protein alteration, including the methods we used here to generate LOGs, may 334 also affect downstream peptide processing and antigen presentation. We therefore 335 tested the specific impact of LOGs on T cell antigen-specific recall responses. Whole 336 spleen suspensions from gp120-[-amidine-GM3g]16-immunised mice were stimulated 337 in vitro with unmodified gp120, gp120-[-amidine-GM3g]16 and HEL-[-amidine-GM3g]6 338 for 16 h (adding Brefeldin A for the final 6 h). IFN-+ CD4 T cells were quantified and 339 contrasted between the vaccination and re-stimulatory conditions (Fig S7a-d). 340 Detectable antigen-specific responses were found only in the adjuvanted groups 341 irrespective of the GM3g-presentation status of the immunogen. Moreover, the recall 342 response was of equal magnitude whether gp120 or gp120-[-amidine-GM3g]16 were 343 used. HEL-[-amidine-GM3g]6 did not induce any recall responses, confirming the 344 important role of the conjugated carrier in providing T cell help. Together these 345 suggested that the presentation of GM3g with LOGs did not inhibit the capacity for 346 16 corresponding antigen to be processed nor for corresponding T cells to recognise 347 anchored peptide (P > 0.9999). We further evaluated secretion of a broader panel of 348 cytokines in supernatant after 72 h and observed similar trends in both IL-2 and IL-4 349 (Fig S7e-g). As is classical in the Th2-biased BALB/c background, IgG1 was the 350 predominant isotype, with the TLR-4/Th1-biasing MPLA adjuvant bolstering IgG2a 351 production (Fig S7h,i). 352 353

Variation of the glycan in LOGs elicits orthogonal antibody outcomes 354
Having demonstrated that GM3 LOGs may be created in forms that are strongly 355 immunogenic for B cell responses, we tested the extension of this phenomenon to 356 other self-glycans. We chose the Lewis group trisaccharide Lewis-X (Le X ) as another 357 representative glycan for its similar size (trisaccharidic) and yet differing sugar content 358 and arrangement (branched, non-linear) and charge state (neutral) (Fig 1h). 359 Corresponding gp120-[-amidine-Le X ]n LOG was constructed in an essentially 360 identical manner and used in formulation with MPLA in identical immunisation 361 protocols. Antibodies were similarly raised against the Le X LOG, with significantly 362 greater titres compared with animals immunised with unmodified gp120 (P = 0.005) 363 (Fig 1i). Notably, antiserum raised against either corresponding Le X g or GM3g LOGs 364 were orthogonal, strictly binding autologous glycan, implying tight glycan specificity. 365

B cell clonality against GM3g LOG is narrow 367
To dissect the molecular mechanisms underpinning the surprisingly robust B cell 368 response against the LOGs, we conducted comprehensive clonotyping using animals 369 primed with the HEL-[-amidine-GM3g]6 LOG. Antigen-specific B cells were sorted 370 from mice, sorting on pre-gated IgD -B cells according to molecular probes specific 371 either to the glycoconjugate or the protein backbone (Fig 2a,b; Fig S8a). Heavy chain 372 variable regions (VH) were recovered from one mouse and sequenced from 87 events, 373 for which the majority (80/87) were GM3g-specific (Fig 2c). Clonality was defined 374 according to the inferred heavy chain VDJ gene origins (Fig 2d; Fig S8b). Antigen-375 specific events were found in the spleen and bone marrow rather than inguinal lymph 376 nodes, suggesting that draining follicular responses had ceased by four weeks post-377 administration (Fig 2e). 378

379
The specific gene segments present in the isolated clones (Fig 2f-h; Fig S8c) reveal 380 striking homology in their IGHV utilisation. In particular, the IGHV2 subgroup was the 381 predominant VH-gene class used in the GM3g-specific events and was expressed in 382 > 80% of sorted B cells. The phylogenetically-related IGHV2-3*01, IGHV2-6-5*01 and 383 IGHV2-9*02 members were the most well-represented in the GM3g-binders (Fig 2i). 384 By contrast, the proportionality of V-genes utilised among HEL-binding B cells was 385 significantly more diverse. Furthermore, D-and J-gene usage was highly diverse 386 among these clones, implying that they tolerate broad CDRH3s and joining 387 orientations. 388 389 VH-gene utilisation was also highly related between animals, implying a striking 390 consistency in the use of this VH-gene-dependent clonal class in facilitating LOG 391 binding (Fig 2j). This was unlike the HEL-binding clones; for these a broader, more 392 diverse set of clonotypes was isolated, fully consistent with the larger antigenic protein 393 surface compared with the more restricted but seemingly immunodominant glycan 394 surface in corresponding LOGs (Fig 2k). The corresponding odds ratio that a given V-395 gene would be shared with respect to the antibody binding target revealed that for all 396 animals, there is significantly narrower V-gene utilisation against LOG than the protein 397 backbone alone (Fig 2l). 398 399 Given the strikingly restricted clonotypology of the anti-[-amidine-GM3g] response in 400 the context of the broad tolerance to diverse DH and JH genes, the LOG was 401 hypothesised to access a high frequency of naïve B cells. To interrogate this, LOG-402 binding naïve B cells from murine splenocytes were detected at a strikingly high 403 frequency of 0.025% of IgD + IgM mid-hi B cells (Fig S8d,e). These events were 404 sequenced from one mouse, revealing similar enrichment of the IGHV2 subgroup 405 (88%) compared with the immunised mice (Fig S8f,g). 406 407 A representative subset of several GM3g-binding IgGs from the IGHV2-subgroup 408 origin were recombinantly synthesised and supernatant screened against gp120-[-409 amidine-GM3g]16 (Fig 2m) -all bound specifically, confirming functionality. The best 410 binder amongst these antibodies, termed BAR-1 with inferred germline VH-gene 411 IGHV2-9*02 (Fig. 2m), was purified for further analysis. LOGs, we aimed to determine the effects of the protein backbone on B cell clonal 417 outcomes. We similarly sorted B cells from gp120-[-amidine-GM3g]16-immunised 418 19 mice (4-weeks post-prime). B cells that bound the gp120 backbone were not identified 419 (Fig 3a,b), consistent with undetectable gp120 serum antibody binding in these 420 animals (Fig 3c) and other animals primed with this LOG as an immunogen (Fig S5). 421 Strikingly, the V-gene usage of antibodies raised against gp120-[-amidine-GM3g]16 422 again revealed that the IGHV2 subgroup dominates, representing > 90% of clones 423 (Fig 3d,e), of the same clonotype as that observed in the HEL-[-amidine-GM3g]6-424 immunised mice. 425

426
We observed in gp120-[-amidine-GM3g]16-immunised mice that a higher proportion 427 of B cells were members of clonal families compared with HEL-[-amidine-GM3g]6-428 immunised mice, with an average of 2.67-fold increase in the proportion of non-429 singleton B cells (Fig 3f,g). This may imply that the gp120 protein backbone offers 430 greater clonal expansion, probably as a function of its improved T cell immunogenicity 431 compared with HEL (Fig S4, Fig S7). We further assessed the impact of the protein 432 backbone on clonal diversity by performing a Chao1 estimate test (35, 36). While there 433 was a trend for lower class sampling values in gp120-[-amidine-GM3g]16-immunised 434 mice (which implies narrow clonal diversity), this was not statistically significant (Fig  435   3h). We also observed that at four-weeks post-prime, there were some antigen-436 specific B cells found in the iLN (Fig 3i) -this was not seen in the HEL-[-amidine-437 GM3g]6-immunised mice and may suggest that the different protein backbone 438 maintains activated B cells within the secondary or tertiary lymphoid organ (S/TLO) 439 structures, where much of the antigen persists, driving increased maintenance of the 440 follicular response. We observed in the sequences isolated from gp120-[-amidine-441 GM3g]16-immunised mice that the degree of SHM undergone was compartment-442 specific (Fig 3j): the mean nucleotide mismatch of VH sequences derived from the 443 20 lymph node was 6.8, spleen was 1.4 and bone marrow 0.8. Moreover, the extent of 444 SHM undergone by the clones raised against gp120-[-amidine-GM3g]16 were 445 significantly greater than that against HEL-[-amidine-GM3g]6 (P = 0.0059, 446 Kolmogorov-Smirnov test) (Fig 3k). These data implicate the protein backbone in 447 determining the maintenance of the primary germinal centre (GC) reaction conditions. 448

449
To understand the cellular underpinnings of the improved GC experience of gp120-[-450 amidine-GM3g]16-raised clones, we measured the induction of follicular helper T (Tfh) 451 cells with respect to protein carrier. We demonstrated that the gp120 carrier elicits a 452 larger Tfh population (Fig 3l-n; Fig S9), which is coordinate with the concept that the 453

LOGs induce minimal affinity maturation despite SHM 457
Having shown differential SHM rates with respect to the protein carrier, we next 458 evaluated the functional effect of SHM on antibody affinity. First, we analysed the 459 mutation frequencies across the VH gene in an unbiased manner to identify whether 460 there were codons that were commonly mutated across the gp120-[-amidine-461 GM3g]16-immunised mice (Fig S10a) and identified that positions in CDRH1--namely 462 T6I and S7N--were frequently mutated across multiple animals (Fig S10b). To 463 evaluate the effect of these mutations, we introduced these changes into BAR-1 and 464 screened their binding via ELISA; the data revealed no significant differences in 465 binding compared to the wild-type mAb (Fig S10c), implying a lack of affinity 466 maturation associated with these mutations. Second, we selected the largest clonal 467 21 family, which had undergone significant expansion and diversification and was of an 468 inferred IGHV2-9*02 origin (Fig 3o). These antibodies were expressed recombinantly 469 and screened via ELISA against HEL-[-amidine-GM3g]6 and the EC50 values were 470 compared against that of the iGL (Fig 3p). Our data showed no evidence of increased 471 affinity against the glycoconjugate, despite substantial SHM, collectively suggesting a 472 strongly limited capacity for B cells to further improve binding against the carbohydrate. 473

analysis. 492
Although the data imply that the complete GM3g glycan structure is a required 494 component of antibody binding, we also observed broad, substantial contributions 495 from differing non-reducing aglycones (Figure 4d, left): enhanced binding for amide 496 and aminoalkyl aglycones was potentiated further by the presence of an amidine. Any 497 such potentiation was notably lost in the absence of incorrect glycan (Fig 4d, right), 498 further highlighting the role of tight glycan recognition in driving affinity, despite 499 apparent engagement both of glycan and aliphatic constituents. Thus, although these 500 data suggest that the antibody response targets the linker-glycan motif, it is 501 nonetheless specific to the GM3g glycan. 502 503 We next interrogated the binding of GM3g LOG-raised antibodies against native GM3g 504 display through ELISA screening gp120-[-amidine-GM3g]16 antiserum against GM3 505 and a ceramide control. Data revealed no indication of GM3-specific binding, but rather 506 elevated non-specific reactivity with both ceramide and GM3 in an MPLA-dependent 507 manner, potentially a function of the adjuvant mounting non-specific antibody 508 responses with a substantial hydrophobic element (Fig 4e,f). To eliminate any 509 serological background and control for the non-specific binding observed in the MPLA-510 adjuvanted gp120-[-amidine-GM3g]16 antiserum, GM3g LOG-reactive monoclonal 511 antibodies of an IGHV2 origin were purified and again screened via ELISA. No binding 512 was detected against either ceramide or GM3 in any of the 11 clones tested (Fig 4g). 513 These data imply that the antibodies raised against GM3g presented synthetically in 514 this manner fail to elicit reactivity against native glycan presentation. We generated and purified the Fab of the GM3g-binding mAb clone, BAR-1 and 519 quantified binding using surface plasmon resonance (SPR) against an amidine 520 (C(NH)NH)-GM3g-coated surface, bearing the same extended side-chain motif as 521 used in LOGs, generating a KD = 17 ± 1 μM (Fig 5a). Next, we synthesised an 522 equivalent soluble ligand, Lys-amidine-GM3g, as a representative minimal LOG motif, 523 and a truncated variant Me-amidine-GM3g and conducted solution-phase isothermal 524 titration calorimetry (ITC), generating respective similar KD = 5.4 ± 1.2 μM (Lys-525 amidine- GM3g, Fig 5b, Fig S11a) and KD = 2.1 ± 0.7 μM (Me-amidine-GM3g, Fig  526   S11b,c). Notably, consistent with LOG design, rather than display entropic cost, both 527 displayed balanced binding thermodynamics (TΔS = -1 kcal/mol and -5.7 kcal/mol, 528 respectively). Competition ELISAs using these soluble ligands were consistent with 529 that observed using polyclonal sera, namely, that binding could be competed out using 530 soluble GM3g, but that Me-amidine-GM3g was more competitive (Fig 5c). 531 532 Dynamic structural interrogation of the BAR1•Lys-amidine-GM3g complex using 533 universal standard transfer analysis (uSTA) protein NMR (39) (Fig 5d,e; Fig S11c- gave a KD = 49 ± 10 μM and koff = 3.77 ± 3 s -1 , consistent with values obtained by 535 complementary methods (Fig 5a,b). Atomic-level 'heat maps' of magnetization 536 transfer in uSTA revealed a ligand pose with primary engagement of BAR1 with the 537 glycan motif of GM3g over the Lys-amidine-linker moiety. Interestingly, analyses of 538 the interaction of two truncated ligand variants -GM3g itself and just the tip 539 disaccharide Neu5Ac-Gal (Fig 5e-g) -further identified relaxation of GM3g alone into 540 a pose that creates even greater contact of the Gal upon removal of the LOG longer-541 24 linker moiety in Lys-amidine-GM3g. This suggested topological frustration in the 542 complex with Lys-amidine-GM3g (and by extension the LOG) that, when removed, 543 allows a relaxation further into the binding motif. 544 545 Next, the atomic level basis of these interactions was probed through complementary 546 methods, allowing structural analysis of BAR-1 in complex with Lys-amidine-GM3g. 547 Crystallization of the BAR1•Lys-amidine-GM3g complex revealed a striking, 548 seemingly LOG-specific arrangement in the 3D-structure of the holo complex (Fig 5h,I  549 and Fig S12a). Notably, consistent with design, the longer length of the LOG moiety 550 allowed the GM3g to 'reach through' a seemingly flexibly-engaged CDRL3 region to 551 engage key residues in CDRH2, and, also, to some extent CDRH1, leaving the part of 552 the groove formed by CDRH3, CDRL1 and CDRL2 unoccupied. The antibody binding 553 pocket is largely hydrophobic in character. 554

555
The crystal contained two complete copies of the complex which are largely identical 556 (rmsd of light chain 0.5 Å). In both the electron density is well ordered for all three 557 sugar rings and the amidine of Lys-amidine-GM3g, but less well ordered for the 'reach 558 through' lysine side-chain. As a seemingly key 'foothold' the indole of Trp57H stacks 559 against the alpha-face of the Gal sugar of GM3g to create a classical pi-CH interaction 560 (Fig 5i and Fig S12c) found in diverse so-called carbohydrate modules (CBMs) (40, 561 41). This is supported by binding of the tip Neu5Ac sugar of GM3g, which makes five 562 hydrogen bonds to BAR1 backbone, including a striking bidentate interaction of its C-563 1 carboxylate with amide nitrogen atoms of Ala58H and Val59H but notably there is no 564 charge-driven interaction. Several highly coordinated water molecules (W) also 565 25 contribute to binding, as well as a hydrophobic pi-CH interaction with Phe37H. The 566 reducing-end Glc of GM3g also makes hydrogen bonds to three water molecules, two 567 of which bridge to the protein (including W3 which bridges to Tyr99L, Asn63H and 568 galactose) but only three direct van der Waal contacts with the protein. The amidine 569 linkage of Lys-amidine-GM3g makes hydrogen bonds to the protein (Tyr97L) and to a 570 water molecule (W5) that bridges to the glucose and, intriguingly, a cation-pi 571 interaction with Tyr37L confirming a contribution from the longer amidine linker to 572 binding. The aliphatic side chain of the lysine makes van der Waal contacts with 573 Tyr97L. 574

575
To probe specific contributions to binding, including the 'foothold' Trp57H, we probed 576 the residues lining the binding site of BAR-1 through Ala-scanning mutagenesis. uSTA 577 protein NMR allowed us to look at the modulation of the binding pose adopted by Lys-578 amidine-GM3g. Strikingly, whilst alterations of lining residues Phe37H, Val59H, 579 Tyr103H and Tyr99K retained residual binding in an ELISA format (Fig S14), their 580 interaction surfaces were all essentially similar (Fig 5j). By contrast, no interactions at 581 all were observed between Lys-amidine-GM3g and 'foothold' mutant Trp57Ala, further 582 emphasizing its key role (Fig S14c). Specifically, when exciting the 'ligand only' 583 sample at 8 ppm, small residual signal is seen in the STD spectrum. This was found 584 to be of identical magnitude to the spectrum of the Trp57H BAR-1 mutant, revealing 585 that there was no detectable binding between ligand and protein. Together, these structural and biophysical analyses highlight the key residues 588 important in driving binding. These residues were notably conserved amongst the 589 26 IGHV2 subgroup-containing clones that we had validated for GM3g binding, with 590 particularly high sequence similarity in their VH-encoded CDRH1 and CDRH2 loops 591 (Fig 5k), and are also consistent with our mutagenic and structural analyses. These 592 data also showed that involvement of CDRH3 in ligand binding was limited, which 593 aligns not only with our structural analyses but also with the observation from our For self-glycans, there is a heavy incentive to skew the naïve B cell repertoire to avoid 598 the presence of self or self-like glycan-reactive B cells to prevent generation of 599 autoreactive antibodies (1, 8), as supported by evidence of anti-glycan responses 600 associated with various autoimmune conditions (42, 43). Notably, previous studies 601 have failed to reliably raise high-titre antibodies responses against GM3 using 602 conventional autologous formulations (21, 26, 44). 603

604
The LOG modular format has potential advantages compared with immunisation with 605 autologous GM3, namely: i) the docking of the sugar to a peptidic carrier allows for 606 associated T cell help, and ii) non-native presentation of otherwise immunorecessive 607 TACAs via a bespoke chemical linker may bypass the tolerogenic constraints that 608 prevent antibodies being raised against native glycan presentations in endogenous 609 glycoconjugates. Our discovery that GM3g-specific IgG responses were readily 610 mounted in a mouse (predominantly by the IGHV2 subgroup) reveals that the LOG 611 modular format of self-glycans can access a subset of naïve B cells that native 612 presentations of the same glycan do not. 613

614
We have rationalised the lack of native glycoconjugate cross-reactivity by combining 615 immunogenetic, structural, biochemical and biophysical-based analyses. The 616 structure of the BAR-1 Fab with Lys-amidine-GM3g reveals that that the sugar portion 617 is recognised by the CDR1 and CDR2 loops in the VH domain. Intriguingly, the 618 recognition of the galactose and sialic acid sugars closely resembles (Fig S13)  varied LOGs conjugated to different chemical linkers might exploit affinity maturation 664 processes to 'walk' clones towards native glycan reactivity. This is unlike classical 665 germline-targeted approaches, which use isolated and highly mutated antibodies of 666 known functional effect as a template germline clonal class (15, 47). However, in the 667 instance described here, it is not known whether 'up-mutation' of the BAR-1 class can 668 move towards a functional effect to yield native GM3g recognition. 669 Finally, the explicit demonstration also of the presence of germline-encoded lectin-like 671 motifs (48-50) present in the murine BCR germline is striking. This not only challenges 672 the dogma associated with the perceived poor immunogenicity of glycans (51) but may 673 also provide an explanation for the greatly divergent views and results that have in the 674 past been obtained from immunisations with glycoconjugates. Not only may this be a 675 consequence of conjugate presentation format (e.g. 'parallel' versus 'orthogonal' or 676 shorter versus longer linkage), as we argue here, but may also be a consequence of 677 the restricted clonotypic response that we have discovered here. It may be that only 678 upon engagement of the correct glycoconjugate or glyco-epitope would a large 679 proportion of naïve B cells be activated by using appropriate 'predisposed' germline 680 BCRs, thus improving the frequency of B cell activation events in vivo and explaining 681 the relatively high titres of anti-GM3g antibodies elicited after a limited immunisation 682 regimen. We therefore propose that the logical design of the entire conjugate and not 683 just, for example, the glycan as has been typical, is important to properly exploit these 684 rare, correct engagement events in the effective design of future immunogens. (Life Technologies) containing 5 μL 1X TCL buffer supplemented with 1% 2-ME. 832 Immediately following sorting, plates were centrifuged at 1,500 g for 1 minute. Plates 833 were stored at -80°C until use. 834

B cell receptor variable region recovery 836
Recovery of the antigen-specific B cell receptor variable regions was carried out, 837 adapted from previous publications (53, 54). We are happy to share a detailed step-838 by-step protocol upon request. Briefly, single cell lysates were thawed on ice and RNA 839 was captured on RNAClean XP beads (Beckman Coulter), subsequently washing with 840 70% ethanol. RNA was eluted and cDNA libraries were synthesised using SuperScript 841

III (Life Technologies) with random primers (Life Technologies). VH and VK regions 842
were recovered using the first PCR primer sets (Table S1) and Q5 polymerase. VH 843 amplicons were purified and sequenced using 5' MsVHE. These sequences were 844 used to determine B clonality. 845 To confirm the recovered sequences were truly antigen-specific, antibodies were 846 synthetised recombinantly. To incorporate the variable regions into an expression 847 vector, vector-overlapping adapters were incorporated via PCR (Table S1)

Immunogenetic analyses 854
Immunogenic analyses were performed on the VH regions of successfully recovered 855 clones (Table S2) All NMR experiments were recorded at 298K on a 950-MHz spectrometer with Bruker 897 Avance III HD console and 5-mm TCI CryoProbe, running TopSpin 3.6.1 and using a 898 SampleJet. All ligands in this work were first assigned using selective 1D Hartmann-899 Hahn TOCSY and HSQC experiments. The uSTA experiments were either recorded with the same stddiffesgp.2 as previously 902 described (39), or a pseudo 3D version that used an inputted file vdlist to increment 903 the saturation times. The number of points were set to 32768 or 65536 and sweep 904 width to 16.05ppm for an acquisition time of 2.150s and 4.300s. All spectra were 905 processed using nmrPipe within the uSTA workflow as previously described (39) interaction surface for the X-ray data was calculated from the structure using a <1/r 6 > 912 expectation value between each proton in the ligand, and all protons in the protein, as 913 described previously. 914

915
In Lys-amidine-GM3g, we would anticipate a range of R1 relaxation times which could 916 affect the transfer efficiencies. To address this, we measured the R1 and R2 relaxation 917 rates of each proton of the ligand and developed a correction that allowed us to rescale 918 the transfer efficiencies to account to variations in the relaxation rate. The adjustments 919 to the interaction surfaces by performing this operation were modest (Fig S11).            These reveal good quality fits of the data. Iteratively changing and fixing the KD value, refitting the data and following the variation in the probability of the model being correct (exp(-chi 2 /2)) allows construction of an error surface. To an excellent approximation, the variation in the fitted KD follows a gaussian distribution (e). Performing the same analysis on the koff parameter resulted in a non-central distribution, indicating that in this case, while KD is well determined, koff is not. The distribution is reasonably interpreted by a log-normal distribution, resulting in the most probable value being 3.77 s -1 but with asymmetric error bars, +4 s -1 , -2 s -1 . The distribution can be interpreted as placing a limit on koff, such that koff <8 s -1 . (f,g) R1 and R2 relaxation rates were obtained for each proton in amidine-lysine. The variation in relaxation rates approximately by a factor of 3, prompted us to consider the effects of this on the transfer efficiency. Notably, the R1 determined from the KD analysis for the NAc proton (0.37 s -1 ) was consistent with the value measured directly and independently (0.4 s -1 ) supporting the quantitative uSTA analysis. (h,i) The simulated parameters from the KD analysis in c were used to simulate the variation in transfer efficiency as a function of R 1 and R 2 , revealing almost no variation with R 2 , but a modest variation with R 1 . (j) These curves were interpolated using a biexponential function for R1 and a linear function for R2, and were used to provide a rescaling factor to adjust the transfer efficiencies of each atom to the value expected if relaxation was identical to the NAc proton. The largest correction was for the lysine delta proton (R 1 1.4 s -1 ) which was furthest from the NAc R1 (0.4 s -1 ). In this extreme case, the correction to the transfer efficiency was a factor of 2. (k) The original and rescaled interaction surfaces for Lys-C(NH)NH-GM3g. The overall pattern observed is largely invariant of the rescaling, with some positions varying more than others. The main conclusions drawn from inspection of the surface, that the NAc methyl group and the sialic acid moiety dominate the interaction, that protons in all GM3g sugars are important, and that the lysine does not contribute substantially to the interaction are independent of the relaxation correction. In the manuscript, all interaction surfaces shown have had the transfer efficiencies 'corrected' using this method.  (a) Sequence schematic of BAR-1 and select residues targeted for mutagenesis. (b) ELISA EC50 binding was compared against gp120-[-amidine-GM3g]16 binding (n = 4). Data were compared via Tukey's post-hoc multiple comparison test. P-value denotations: '****' P < 0.0001, '***' P < 0.001, '***' P < 0.01 and '*' P < 0.05. (c) 'Pulse off' 1D NMR (black) and saturation transfer difference (STD) spectra for the various BAR-1 mutants considered, showing specifically the distinctive NAc methyl groups that terminate the Lysine moiety (Left hand peak) and the Sialic acid (Right hand peak).