Insights into the stability of a therapeutic antibody Fab fragment by molecular dynamics and its stabilization by computational design

Successful development of protein therapeutics depends critically on achieving stability under a range of conditions, while retaining their specific mode of action. Gaining a deeper understanding of the drivers of instability across different stress conditions, will potentially enable the engineering of protein scaffolds that are inherently manufacturable and stable. Here, we compared the structural robustness of a humanized antibody fragment (Fab) A33 using atomistic molecular dynamics simulations under two different stresses of low pH and high temperature. RMSD calculations, structural alignments and contact analysis revealed that low pH unfolding was initiated through loss of contacts at the constant domain interface (CL-CH1), prior to CL domain unfolding. By contrast, thermal unfolding began with loss of contacts in both the CL-CH1 and variable domain interface (VL-VH), followed by domain unfolding of CL and also of VH, thus revealing divergent unfolding pathways. FoldX and Rosetta both agreed that mutations at the CL-CH1 interface have the greatest potential for increasing the stability of Fab A33. Additionally, packing density calculations found these residues to be under-packed relative to other inter-domain residues. Two salt bridges were identified that possibly drive the conformational change at low pH, while at high temperature, salt bridges were lost and reformed quickly, and not always with the same partner, thus contributing to an overall destabilization. Sequence entropy analysis of existing Fab sequences revealed considerable scope for further engineering, where certain natural mutations agreed with FoldX and Rosetta predictions. Lastly, the unfolding events at the two stress conditions exposed different predicted aggregation-prone regions (APR), which would potentially lead to different aggregation mechanisms. Overall, our results identified the early stages of unfolding and stability-limiting regions of Fab A33, which provide interesting targets for future protein engineering work aimed at stabilizing to both thermal and pH-stresses simultaneously. Author Summary Currently, antibody-based products are the most rapidly growing class of pharmaceuticals due to their high specificity towards their targets, such as biomarkers on the surface of cancer cells. However, they tend to aggregate at all stages of product development, which leads to decreased efficiency and could elicit an immunological response. Improvements in the stability of therapeutic antibodies are generally made during the development phase, by trial and error of the composition of the formulated product, which is both costly and time consuming. There is great demand and potential for identifying the drivers of instability across different stress conditions, early in the discovery phase, which will enable the rational engineering of protein scaffolds. This work elucidated the stability-limiting regions of the antibody fragment Fab A33 using several computational tools: atomistic molecular dynamics simulations, in-silico mutational analysis by FoldX and Rosetta, packing density calculators, analysis of existing Fab sequences and predictors of aggregation-prone regions. Results identified particular regions in which mutagenesis has the potential to stabilize Fab against both thermal and pH-stresses simultaneously. Overall, the methodology used here could improve the developability screening of candidate antibody products for many diseases, such as cancer, chronic inflammatory diseases and infectious diseases.

7 148 the two stress conditions, leading to partial unfolding of only the C L domain at low pH, compared 149 to destabilization of both C L and V H domains in the high temperature condition. These 150 conformational changes exposed different predicted aggregation-prone regions (APR), which 151 would additionally support divergent aggregation mechanisms. Salt-bridge analysis provided 152 insights into the location of those that were broken most rapidly due to protonation in the low pH 153 simulation, and also showed that high temperature led to an increased fluctuation of salt bridge 154 formation and breaking, more generally throughout the structure. An in-silico mutational analysis 155 by both FoldX and Rosetta, predicted that the constant domain interface had the greatest potential 156 for further stabilization, a finding that was also supported by lower packing-density calculations.
157 Taken together, our results determined the stability-limiting regions at low pH and high 158 temperature for Fab A33, and also identified those with the greatest potential for mutations that 159 simultaneously improve stability under both low pH and high temperature conditions. 167 Simulations in the unfolding trajectories (pH 3.5 and pH 4.5 at 300 K, for low pH; pH 7.0 at 340 K 168 and 380 K, for high temperature) were compared to the simulations in the native trajectory (pH 169 7.0 at 300 K). For every condition of pH and temperature, three independent simulations were 170 performed, and their average RMSD are shown in Figures 2 and 3. Additionally, structures from 171 the unfolding trajectories (pH 3.5, for low pH; 380 K, for high temperature) were aligned to 172 structures from the native trajectory (pH 7.0 at 300 K), to visualize the structural changes that 173 individual domains experienced. For each domain alignment, ten structures were taken every 3 ns 10 225 While V L and C H 1 experienced only small domain displacements (Fig 3C and 3I), clear 226 domain unfolding was observed for both C L and V H (Fig 3E and 3G). At 380 K from 20 ns to 50 ns, 227 the V H domain displayed an increase in RMSD from 2.4 Å at pH 7.0, to 3.2 Å at pH 3.5. That for 228 the C L domain increased from 1.8 Å at pH 7.0 to 2.4 Å at pH 3.5 (all averages were ± 0.1 Å). In 229 these cases, many interface contacts were also lost with respect to pH 7.0 and 300 K, before the 230 unfolding of individual structural domains, again consistent with destabilization of the interface 231 contributing to the loss of stability for the individual domains. For both V L and C H 1, structures from 232 the simulations at 300 K and 380 K aligned well (Fig 3D and 3J), whereas for V H and C L (Fig 3H 233 and 3F) the domains were structurally perturbed at the higher temperature. The V H domain 234 experienced a displacement of the loops on the N-terminal region, including the three CDR loops 235 (Fig 3H). Differences in the C L domain at high temperature were found in the loops and within an 236 internal β-strand (Fig 3F). This was consistent with previous work which identified instability and 237 structural changes in the V H domain of another Fab at high temperature [22]. Taken together, 238 these findings suggest a different unfolding pathway for Fab A33 at low pH and at high 239 temperature.  The unfolding of individual domains was additionally followed by their loss in secondary 253 structure (SS); specifically, we monitored the change in β-strand structure. Constant domains are 254 composed of seven β-strands named A to G, while variable domains contain two more strands, a 255 total of nine, with the two additional strands termed C' and C'' (Fig 4A and 4B). To calculate the 256 loss in β-strand structure for each of the strands, we first tracked the secondary structure 257 designation for each residue in Fab A33 throughout the simulations (Fig S3). The percentage of 258 time occupied within β-strand was calculated for each residue, and then summed for each of the 259 32 β-strands in Fab A33. This value was averaged for each of the three repeats at each condition.
260 The percentage change in β-strand occupancy was then calculated, to determine the loss relative 261 to the reference simulations at pH 7, 300 K (Fig 4 and Fig S4).

269
At pH 3.5, the C L domain had an overall loss in secondary structure content, confirming 270 the results found in the previous section ( Fig 4C). Strands F (-13 ± 3 %) and G (-12 ± 7 %) of the 271 C L domain showed the highest β-sheet structure loss, both located in the outer β-sheet. Strands 272 B (-8 ± 6 %), C (-6 ± 4 %) and E (-8 ± 6 %) also experienced significant losses. Additionally, the 273 β-strand C'' of the V L domain also showed a large variability between repeat trajectories. C'' is the 274 shortest strand, and is located at the extreme of the outer β-sheet connecting the CDR-2 loop, 275 and so the large variability suggests that this is a flexible region prone to the loss of secondary 276 structure in some but not all simulations.

277
At 380 K, the losses in β-strand content were located in both the C L and V H domains, 278 consistent with the unfolding described in the previous section ( Fig 4D). Many strands in the C L 279 domain showed significant losses, A (-7 ± 2 %), C (-8 ± 3 %), D (-16 ± 12 %) and G (-5 ± 1 %), 280 located at the extremes of the inner and outer β-sheets. The V H domain also showed high losses 281 of β-strand. However, of these strands A (-18 ± 17 %) and G (-8 ± 9 %) also showed high variability 282 between repeats. Interestingly, these same two strands in V H were previously found to deform at

312
At low pH, pH 3.5, 300 K, a total of 27 salt bridges were observed, but most of them were 313 very short lived. The more persistent salt bridges at pH 3.5 were Asp151-His189 (86 ± 6 %), 314 Asp82-Arg61 (78 ± 11 %), Asp122-Lys126 (73 ± 1 %), Asp362-Lys335 (56 ± 6 %) (Fig 5B, 5E 315 and 5H). The protonation state at the end of the pH 3.5 simulations was calculated again using 316 these Fab conformations, which revealed these salt bridges to be still present due to predicted 317 pK a values of below 3.5 for these aspartates. Comparison of the salt bridges at pH 7.0 and 3.5, 318 indicated the presence of two salt bridges that potentially trigger the conformational change 319 observed at low pH, and thus the loss of Fab A33 stability. Glu165-Lys103 and Glu195-Lys149, 320 were the most persistent contacts at pH 7.0, but were not present at pH 3.5. Glu165-Lys103 321 bridges the C L domain to the V L domain, and Glu195-Lys149 bridges the outer β-strands C and 322 F of the C L domain. Loss of these salt bridges at low pH, would therefore destabilize the C L 323 domain, and promote the observed C L domain displacement.

324
At high temperature, pH 7.0 and 380 K, a total of 45 salt bridges were observed. The 325 greater number than at 300 K, reflects an increased conformational flexibility of many salt bridges 326 at the higher temperature, in which they often broke, but then reformed with a different partner.
327 Indeed, at the high temperature, salt bridges broke and reformed much faster (Fig 5C). At 380 K, 328 the total time present for the most persistent salt bridges observed at 300 K, had decreased to 47 14 329 ± 9 % for Glu165-Lys103, 58 ± 3 % for Glu195-Lys149, 21 ± 7 % for Asp151-His189, and 43 ± 330 22 % for Asp304-Arg252. However, Asp82-Arg61 increased in occurrence to 70 ± 3 % (Fig 5C, 331 5F and 5I). These findings indicate that the increased dynamics at high temperature, results in   residues were highlighted in magenta in Fig 6A and 6B, as those predicted by both algorithms to 351 have the greatest potential for stabilization. These residues corresponded to S176, N137, S397, 352 S159, S12 and T180, and all of their mutations predicted to be most stabilizing, were to more 353 hydrophobic amino acids, such as Trp, Leu, Ile, Phe and Tyr (Fig S6 and Table S2). Four of the 354 six residues (S176, N137, S397 and T180), are located in the constant domain interface, between 15 355 C L and C H 1 domains (Fig 6B), suggesting significant potential for further stabilization within the 356 C L -C H 1 interface. Furthermore, S159 is in the C L domain, but interacts with an outer β-strands of 357 C L , and S12 is in the V L domain, but interacts with the C L domain (Fig 6B). Thus overall, the C L 358 domain has a relatively high potential for stabilization, through repacking of the C L -C H 1 interface, 359 within the C L domain itself, and also through improved interaction between C L and V L . This is 360 consistent with the MD simulations which found the displacement of C L away from the interface 361 with C H , and subsequent unfolding of the C L domain, to be critical steps in early or partial 362 unfolding.

378
The packing density of each Fab A33 residue was calculated using the package occluded 379 surface (OS) software, which calculates occluded surface and atomic packing [30,31]. The 380 occluded surface packing (OSP) value of each atom is calculated from normal vectors that extend 16 381 outward from the atom surface until they intersect a neighboring van der Waals surface (Fig S7).
382 This value is 0.0 for completely exposed residues and 1.0 where 100% of molecular surface is in 383 contact with other van der Waals surface. The average OSP for all 28 β-strand residues within 384 domain interfaces (V L -V H and C L -C H 1) was 0.49 ± 0.01 (OSP values shown in Table S3). By 385 contrast, the average OSP of the five constant-domain interface residues (S176, N137, S397, 386 T180, and S395), identified by FoldX and Rosetta as having high stabilization potential, was 0.41 387 ± 0.05 (OSP values shown in Table S3). This can be visualized in Fig 7, where OSP values were 388 mapped in the structure of Fab A33, with red to indicate high packing density, and blue to indicate 389 low packing density. β-strand residues within domain interfaces were highlighted as sticks (Fig   390 7A). Residues in the constant interface (C L -C H 1) were lighter colored than residues in the variable 391 interface (V L -V H ), indicating less tight packing of the constant interface. An insight of the residues 392 identified by FoldX and Rosetta is provided in Fig 7B, where a lighter color than the residues in 393 the variable interface was also observed. This result shows that the predicted residues are under-394 packed, and therefore have the potential to be mutated to pack the C L -C H 1 interface more tightly.  406 Sequence alignment and sequence entropy calculations for each residue were obtained using 18 432 Table 1

490
Antibody-based products are the main class of approved biopharmaceuticals, due to their 491 high target specificity [1]. However, there are many barriers to their successful development into 492 therapeutics, with protein aggregation being perhaps the most common and challenging to 493 prevent [5]. There is a need to identify potential instabilities of therapeutic proteins during their 494 early development, particularly against stresses that they will encounter during manufacture, 495 storage and delivery. This would allow their early elimination from further development, or 496 otherwise rational mutagenesis into more stable products. In this context, we have elucidated the 497 first unfolding events that take place on a humanized Fab A33 using atomistic MD simulations, 498 and compared these to predictions of potentially stabilising mutations using computational tools.

499
Our simulations showed that contacts at the interface between domains (V L -V H and C L -

512
The further goal could be to improve the stability of the individual domains. The C L domain 513 was found to unfold at both, low pH and high temperature. Salt bridge analyses identified two key 514 salt bridges at the heart of this domain unfolding at low pH, Glu165-Lys103 and Glu195-Lys149.
515 Glu165-Lys103 bridges the C L domain to the V L domain, and Glu195-Lys149 is located in outer 516 β-strands of the C L domain, bridging β-strands C and F. FoldX and Rosetta also identified 517 stabilizing mutations in the C L domain. To stabilize the interaction between the C L and V L domain, 518 hydrophobic mutants of S12 and K103 were predicted. Interestingly, the mutation S12 to Tyr is 519 found naturally. In the C L domain, S159 was identified, which interacts with an outer β-strand, 520 suggesting this interaction can also be improved. Lastly, the V H domain was also found to unfold 521 at high temperature. The only mutation identified in this domain is S267, to a Pro, which notably 522 is found naturally. Overall, the results found with MD simulations and stabilizing software 523 predictors strongly agree in the domains of Fab A33 that can be stabilized further.

524
In order to gain insights into the mechanisms by which aggregation might occur, APRs in 525 Fab A33 were identified, and their solvent accessibilities were compared. All APRs in Fab A33 526 are located in the interior of the protein, however, at low pH and high temperature the SASA of 527 certain APRs increased. Notably, different APRs were exposed under both stresses, suggesting 546 and 300 K) and under two stresses, low pH (pH 3.5 and pH 4.5 at 300 K) and high temperature 547 (pH 7.0 at 340 K and 380 K). Many high temperature simulations are performed at relatively high 548 temperatures (e.g. 500 K), to achieve complete denaturation of the protein, however, in this case, 549 we aim to partially unfold Fab A33 and detect the regions prone to early unfolding. Simulations 550 were carried out using the OPLS-AA/L all-atom force field [50]. The Fab PDB file was first 551 converted to a topology file with its five (four intra-chain and one inter-chain) disulfide bonds 552 retained. The protonation state of each residue was entered manually, and these were determined