The Atomic-Level Physiochemical Determinants of T Cell Receptor Dissociation Kinetics

The rational design of T Cell Receptors (TCRs) for immunotherapy has stagnated due to a limited understanding of the dynamic physiochemical features of the TCR that elicit an immunogenic response. The physiochemical features of the TCR-peptide major histocompatibility complex (pMHC) bond dictate bond lifetime which, in turn, correlates with immunogenicity. Here, we: i) characterize the force-dependent dissociation kinetics of the bond between a TCR and a set of pMHC ligands using Steered Molecular Dynamics (SMD); and ii) implement a machine learning algorithm to identify which physiochemical features of the TCR govern dissociation kinetics. Our results demonstrate that the total number of hydrogen bonds between the CDR2β-MHCα(β), CDR1α-Peptide, and CDR3β-Peptide are critical features that determine bond lifetime. We propose that amino acid substitutions to these hypervariable regions of the TCR can efficiently manipulate immunogenicity and thus be used in the rational design of TCRs for immunotherapy.


1
T cell-based immunotherapies (e.g., chimeric antigen receptor-T, or CAR-T; and TCR-2 engineered-T, or TCR-T) have provided transformative therapeutic responses in a small 3 subset of cancers and patients 1-5 ; however, progress in solid tumors has been agonizingly 4 slow. For example, CAR-T cells require an antigen on the tumor cell surface, but the 5 majority (~85%) of identified neoantigens are intracellular 6 and thus are immunogenic only 6 when a representative fragment is presented on the cell surface in a peptide-major 7 histocompatibility complex (i.e., pMHC). Although TCR-T therapy is MHC-restricted, this 8 approach can target intracellular antigens, and the remarkable sensitivity of a TCR to 9 recognize a single pMHC molecule 7 provides an additional strategic advantage.

10
Nonetheless, identifying neoepitopes, matching these with immunogenic TCRs, and 11 minimizing off-target effects remain significant challenges to implementation of these 12 therapies 8 .

14
Recent reports demonstrate that single-cell sequencing and machine learning technologies 15 can identify patient-and tumor-specific neoepitopes 9, 10 . However, identification of partner 16 TCRs remains challenging, despite the fact that tumor-specific T cells can be found in the 17 peripheral blood 11,12 . The human immune system generates tumor-specific T cells in a 18 process that begins with random V(D)J recombination to create the hypervariable regions of 19 the TCR  and  chains. While this process generates a stunningly large number of 20 possible TCRs (>10 20 -10 61 ) 13, 14 , including 10 6 -10 8 in the peripheral blood, it is inherently 21 inefficient and does not necessarily produce a TCR with appropriate immunogenicity for a 22 given tumor 15 . Alternate strategies of TCR identification have also fallen short; for example, 23 TCR affinity enhancement can lead to a loss of TCR specificity 16,17 and does not always 24 determine immunogenicity 18 .

26
Computational techniques such as steered molecular dynamics (SMD) and machine learning 27 may enable the creation of highly immunogenic, tumor-specific TCRs through rapid and 28 efficient screening of the vast number of possible TCRs. The success of these techniques 1 depends on accurate in vitro predictions of T cell immunogenicity, a goal that remains 2 elusive. Quantitative descriptors of the TCR-pMHC bond identified in previous studies do not 3 consistently correlate with immunogenicity [18][19][20][21] . The majority of these studies measured 4 equilibrium parameters of the TCR-pMHC bond, which do not account for the mechanical 5 forces on the TCR-pMHC bond present in vivo. Recent studies using DNA-based tension 6 probes have estimated this force at ~10-20 pN 22,23 , and subsequent studies demonstrate 7 that dissociation kinetics (i.e., bond lifetime) of the TCR-pMHC bond at this physiologic force 8 can predict immunogenicity [24][25][26][27][28][29][30][31] . These correlations are consistent across species, TCR-9 pMHC pairs, and experimental systems 24-31 .

11
Here, we seek to discern the atomic-level physiochemical features that determine the TCR-12 pMHC bond lifetime under force (i.e., characterize the TCR-pMHC's force-dependent 13 dissociation kinetics). As a first attempt to manipulate the bond lifetime of the TCR-pMHC 14 over a wide range, we characterized the force-dependent dissociation kinetics of a single 15 TCR (with a known crystal structure) to 17 possible pMHCs using steered molecular 16 dynamics (SMD). Then, we used machine learning to identify the physiochemical features 17 and the specific regions of the TCR regulating bond lifetime. Our results demonstrate that 18 the total number of hydrogen bonds (H-bonds) between the CDR2-MHC⍺(), CDR1-

19
Peptide, and CDR3-Peptide are critical features that determine bond lifetime. This finding 20 may inform the rational design of TCRs for TCR-T cell therapy, and provide a path forward to 21 create more advanced and predictive maching learning algorithms.  7.4. The resulting systems were solvated in rectangular water boxes using the TIP3P water 7 model 36 large enough to satisfy the minimum image convention. Na + and Clions were 8 added to neutralize protein charge and reach physiologic salt concentration of ~150 mM. All 9

Figure 1: Steered Molecular Dynamics (SMD) simulations and machine learning algorithms were used to identify the physiochemical features that predict TCR-pMHC bond lifetime. (A)
Starting structure for SMD of TCR and pMHC (shown at the top with black lines and circle arrowheads). The location/direction of pulling are depicted with yellow circles/arrows, respectively; the black scale bar with diamond arrowheads denotes the locality of distance between center of masses. The non-interacting bodies of the TCR and pMHC are colored in gray. Axis directions are indicated in left corner (red: +x-direction, blue: +y-direction, and green: +z-direction). (B) The primary interfacial substructures: (i) MHC⍺(⍺) & MHC⍺(β) = green, Epitope=black; and (ii) TCR CDR1⍺ = light blue, TCR CDR2⍺ = cyan, TCR CDR2⍺ = dark blue, TCR CDR1β = salmon, TCR CDR2β = light red, and TCR CDR3β = red. (C) A two-layer Elastic Net-Exhaustive Feature Selection algorithm (dashed boxes) was used to obtain ranked and reduced feature sets. (D) Selected features were used to tune hyperparameters (dashed box) for each machine learning model (Linear Regression = blue, Elastic Net = orange, k-Nearest Neighbors = green, Support Vector Machines = red, Decision Tree = purple, Random Forest = brown, AdaBoost = pink, Neural Net = gray). simulations were performed with Gromacs 2019.1 35 using the CHARM 22 plus CMAP force 1 field for proteins 37 and orthorhombic periodic boundary conditions. All simulations were in full 2 atomistic detail.      Engineering and the best model for each feature set was scored on absolute error and 28 ranked by the arithmetic average of repeated threefold cross-validation (i.e., n_splits=3, 1 n_repeats=3, random_state=1). Detailed documentation regarding the cross validation and 2 hyperparameter optimization of two-layer Elastic Net -Exhaustive Search feature selection 3 and machine learning predictions are provided in the supporting information. In addition, this 4 dataset has been made freely available on the GitHub repository. features of the entire TCR-pMHC interaction (e.g., total H-bonds between the TCR and 1 pMHC). This characterization provides an overall assessment of the physiochemical features 2 that might impact bond lifetime and is consistent with the quaternary structure of globular 3 proteins. We considered features likely to impact dissociation kinetics and thus included H-4 bonds 56 , LJ-contacts 57 , distance between the TCR and pMHC 58, 59 , solvent accessible 5 surface area (SASA) 60 , root mean square fluctuations (RMSF) 61 , and the gyration tensor of 6 the TCR and pMHC. This approach resulted in 18 features for the first set, and we dubbed 7 these quaternary features (Figure 1-supplement 2).

5
To examine how quaternary physiochemical features influence TCR-pMHC bond 6 dissociation kinetics, we ranked the top ten quaternary features after an Elastic Net grid 7 search for each individual feature ( Figure 3A). The scoring criterion was mean absolute 8

Figure 3: Quaternary Feature Selection and Bond Lifetime Predictions. (A)
Mean absolute test error from elastic net regularization was used to select the top ten quaternary features. Errors represent the best test set standard deviation from repeated threefold cross-validation. (B) According to an exhaustive search, the best feature sets (i.e., p = 1, 3, 5, and 7) to predict bond lifetime. (C) The mean accuracies of bond lifetime prediction for all feature sets in (B) and machine learning models after hyperparameter tuning (Linear Regression = blue, Elastic Net = orange, k-Nearest Neighbors = green, Support Vector Machines = red, Decision Tree = purple, Random Forest = brown, AdaBoost = pink, Neural Net = gray). Errors represent the best test set standard error from repeated threefold cross-validation. The machine learning model standard error from cross-validation (n=9) was statistically compared for increasing feature sets by a one-tailed student's t-test: #p<0.10, *p<0.05, **p<0.01. Spearman correlation coefficients (Figure 3-supplement 1, Figure 3-supplement 3).

6
We next explored whether a combination of quaternary physiochemical features would 7 improve predictions of bond lifetime. To accomplish this, we applied a regularized 8 regression method (Elastic Net; see Methods) as a filter to identify predictive feature sets.

9
To avoid overfitting 62-64 , feature sets were reduced utilizing an Elastic Net 53 -Exhaustive 10 Search 52 algorithm ( Figure 1C) to determine the best combinations of 3, 5, and 7 features.

11
Using these combinations, we then trained and tested 8 different machine learning 12 algorithms to estimate TCR-pMHC bond lifetime ( Figure 1D)   Mean absolute test error from elastic net regularization was used to select the top ten secondary features. Errors represent the best test set standard deviation from repeated threefold cross-validation. (B) According to an exhaustive search, the best feature sets (i.e., p = 1, 3, 5, and 7) to predict bond lifetime. (C) The mean accuracies of bond lifetime prediction for all feature sets in (B) and machine learning models after hyperparameter tuning (Linear Regression = blue, Elastic Net = orange, k-Nearest Neighbors = green, Support Vector Machines = red, Decision Tree = purple, Random Forest = brown, AdaBoost = pink, Neural Net = gray). Errors represent the best test set standard error from repeated threefold cross-validation. The machine learning model standard error from cross-validation (n=9) was statistically compared for increasing feature sets by a one-tailed student's t-test: #p<0.10, *p<0.05, **p<0.01.

25
T cell-based immunotherapies, such as TCR-engineered-T cells, provide exciting potential to 26 treat a wide range of cancers, including solid tumors. However, this potential has not been 27 reached, due, in part, to the inability to rapidly and efficiently explore the vast TCR space to 28 identify optimal tumor-specific TCRs. Experimental methods to design and test potential 1 TCRs are expensive and slow, thus hindering throughput. In contrast, computational 2 algorithms that utilize machine learning have enormous potential to rapidly interrogate the 3 TCR space and identify a small number of candidates for more efficient experimental testing.

4
We tested this premise using SMD to create a small database of TCR-pMHC bond lifetimes, 5 then created machine learning algorithms to predict bond lifetime based on quaternary and 6 secondary features of the TCR-pMHC bond. Using the quaternary features, we found that 7 total LJ-contacts could predict bond lifetime with 90% accuracy. More importantly, we also 8 found that we could predict bond lifetime with an accuracy of 84% using only the total H-  (Figure 4-supplement 2). Moreover, although physical features 18 (e.g., x-Gyration of TCR) were selected in the exhaustive feature selection process ( Figure   19 3B), these did not significantly increase mean accuracy. This demonstrates that no selected 20 physical features improve predictive performance and thus the atomic motion of the TCR or 21 pMHC is unlikely to regulate dissociation kinetics.

20
The inclusion of CDR1-Peptide H-bonds draws new attention to the CDR1 region. Similar 21 to the CDR3 region, CDR1 is in proximity to the peptide (Figure 1A, B) and thus 22 hydrogen bonding between these substructures may be expected. However, surprisingly,

26
Peptide, and CDR3-Peptide may enhance TCR-pMHC force-dependent bond lifetime. It is important to acknowledge that the interactions between these interfacial substructures may 1 be specific to the DMF5 TCR and will require further investigation to generalize.

2
Nonetheless, these results bring new attention to the CDR1 and CDR2 regions in the future 3 of TCR design. Finally, in contrast to previous reports 28, 57 , peptide radius of gyration and 4 CDR3⍺-CDR3 distance were not selected in the top ten predictive features. This is likely 5 due to the artificial pMHC unfolding by pulling from TCR-pMHC termini 27 and the lack of 6 diversity in TCR-pMHC pairs evaluated 28, 57 , respectively.

1
Conclusions. We have demonstrated the utility of combining two computational methods -2 steered molecular dynamics and machine learningto create a methodology that can 3 potentially be used to rapidly and efficiently examine the vast TCR space to predict the bond 4 lifetime, and thus the immunogenic response, of a given TCR-pMHC pair. Our initial results