Calculating Protein-Ligand Residence Times Through State Predictive Information Bottleneck based Enhanced Sampling

Understanding drug residence times in target proteins is key to improving drug efficacy and understanding target recognition in biochemistry. While drug residence time is just as important as binding affinity, atomic-level understanding of drug residence times through molecular dynamics (MD) simulations has been difficult primarily due to the extremely long timescales. Recent advances in rare event sampling have allowed us to reach these timescales, yet predicting protein-ligand residence times remains a significant challenge. Here we present a semi-automated protocol to calculate the ligand residence times across 12 orders of magnitudes of timescales. In our proposed framework, we integrate a deep learning-based method, the state predictive information bottleneck (SPIB), to learn an approximate reaction coordinate (RC) and use it to guide the enhanced sampling method metadynamics. We demonstrate the performance of our algorithm by applying it to six different protein-ligand complexes with available benchmark residence times, including the dissociation of the widely studied anti-cancer drug Imatinib (Gleevec) from both wild-type Abl kinase and drug-resistant mutants. We show how our protocol can recover quantitatively accurate residence times, potentially opening avenues for deeper insights into drug development possibilities and ligand recognition mechanisms.


T4 Lysozyme L99A-Benzene
MD simulation was initiated using the crystal structures of benzene bound to the T4 Lysozyme L99A (PDB ID: 1L83).Initially, before setting up the simulation box, we removed all existing water molecules from the crystal structure.Subsequent setup steps were prepared using the CHARMM-GUI [1] solution builder module.This includes the addition of sodium (Na + ) and chloride (Cl − ) ions to achieve an ionic concentration of 100 mM, along with the addition of TIP3P water molecules [2] for solvation.During the solvation, a total of 27 Na + ions, 35 Cl − ions and 13k TIP3P water molecules were included in the simulation box.By solvating the structures in a TIP3P water box, one edge distance was extended to ∼ 77 Å.The simulation box ultimately contained approximately 46K atoms, including all components of T4 Lysozyme L99A, benzene, waters, and ions.For long-range electrostatic interactions, we used the PME with a cutoff distance of 1.2 nm.The remainder of the simulation preparation followed the detailed protocols outlined in Section II D of the main text.

Abl kinase domain-imatinib
MD simulation was initiated using the crystal structure of imatinib bound to Abl kinase domain (PDB ID: 1OPJ).Mutant structures N368S and L364I were generated using the side-chain mutation module from MOE software.[5] Subsequently, the system was solvated using CHARMM-GUI [1] solution builder module.This involved adding sodium (Na + ) and chloride (Cl − ) ions to achieve an ionic concentration of 100 mM, and adding TIP3P water molecules [2].During the solvation, By solvating the structures in a TIP3P water box, one edge distance was extended to ∼ 85 Å.For long-range electrostatic interactions, the PME was used with a cutoff distance of 1.2 nm.The simulation box ultimately contained approximately 60 K atoms, including all components of Abl kinase protein, Imatinib, waters, and ions.The remainder of the simulation preparation followed the detailed protocols outlined in Section II D of the main text.These steps are followed from Ref. [4] supplementary information (SI).

SPIB accelerated time
In our SPIB-iMetaD protocol, we observed the reduction of the accelerated time at the completion of each round of simulation.This acceleration time refers to the Eq.1 shown in the main text.For direct comparison of dissociation rates, the reduction in acceleration time corresponding to each simulation round is shown in Fig. S1 and Fig. S2.
Figure S1 presents the SPIB-metadynamics iterations across three systems: (a) The FKBP-DMSO system underwent a total of 3 rounds of SPIB-metadynamics, achieving minimal accelerated time in round 2. Thereby, infrequent metadynamics (iMetaD) was performed upon completion of SPIB round 2. (b) The FKBP-DSS system completed a total of 2 rounds of SPIB-metadynamics, with the lowest accelerated time occurring in round 1.Thus, subsequent iMetaD was performed at SPIB round 1. (c) The T4 Lysozyme L99A-benzene underwent a total of 4 rounds of SPIB-metadynamics.Although the accelerated time slightly decreased in round 4 compared to round 3 of SPIB-metadynamics, the larger error bars at round 4 led to the selection of round three as the most preferred choice of RC for subsequent iMetaD simulations.
Figure S2 illustrates the SPIB-metadynamics iterations for Abl kinase domain with wild-type (WT) and two mutations bound with Imatinib: (a) The Abl kinase domain WT-Imatinib system underwent a total of 4 rounds of SPIB-metadynamics, reaching minimal accelerated time in round 3. Thereby, iMetaD was performed upon completion of SPIB round 3. (b) The Abl kinase domain with N368S mutation -Imatinib system completed a total of 3 rounds of SPIB-metadynamics, achieving the lowest accelerated time occurring in round 2. Thus, subsequent iMetaD was performed at SPIB round 2. (c) The Abl kinase domain with L364I mutation underwent a total of 3 rounds of SPIB-metadynamics.Despite a minor reduction in accelerated time in round 4 compared to round 3 of SPIB-metadynamics, the larger error bars at round 4 results made round 3 the preferred choice of RC for further iMetaD simulations.

p-value analysis
The Kolmogorov-Smirnov (KS) test is a crucial tool for evaluating whether sample data significantly differ from a theoretical distribution, particularly the exponential CDF.This test calculates the maximum distance between the empirical distribution of the sample and the exponential CDF.In particular, a low p-value (< 0.05) indicates significant differences between the empirical and exponential distributions such that we reject the null hypothesis.This analysis, including curve-fit results and p-values, is presented in the following section.

FKBP-ligands and T4 Lysozyme L99A-benzene
In the final stage of the protocol, 12 independent rounds of iMetaD were simulated for both the FKBP-DMSO and FKBP-DSS.The results of the curve fitting analysis and the KS test for these simulations are shown in Fig. S3 (a) and (b).Similarly, for the T4 Lysozyme L99A-benzene, the analysis involved conducting 10 independent rounds of iMetaD during the final stage.Subsequently, curve fitting was performed, and the results of the KS test are shown in Fig. S3 (c).

Abl Kinase Domain-Imatinib dissociation pathway clustering and KS-test
The pathway clustering technique for analyzing dissociation pathways has been applied for the Abl kinase domain -imatinib system.As described in Ref. [4], the association of each pathway with varying dissociation time is important, potentially explaining the observed kinetic resistance in patients treated with Imatinib.Thus, we applied the pathway clustering approach exclusively for this system.While a more advanced approach has been proposed in other paper Ref. [3], we used simple metrics for pathway clustering.Specifically, we took the last 5 ns of the trajectory frame for the dissociation path analysis and distance calculations.
We analyze two different paths, namely the Cα-helix and the front door pathways, the latter comprising both the P-loop and the hinge region.To determine the dissociation pathways of individual trajectories, we quantified three distinct distances: between the COM of Imatinib and the COMs of the hinge, P-loop, and Cα-helix regions.Following these measurements, we choose the shortest of the three distances as the dissociation pathway for each trajectory.The process of distance measurement and shortest distance selection was consistently applied across all trajectories.Subsequently, the trajectories were clustered based on these minimum distances using these defined proximity criteria.
Based on this clustering approach, we identified two primary dissociation pathways for all Abl-kinase on its WT and the mutants, depicted in Fig. S4.Particularly, Fig. S4   Based on the clustering analysis, we selected pathways that passed the p-value test.We did not choose the results with p-values less than 0.05, which was deemed less convincing.Consequently, we chose the front door to be the main pathway for both the wild-type and N368S mutant kinase and the Cα-helix pathway  for the L364I mutant.These results can be found in the main text, and curve fit and p-value results are presented in Fig. S5.

Figure S3 :
FigureS3: FKBP-ligands and T4 Lysozyme L99A-benzene KS test and fitting results.(a) FKBP-DMSO ligand system curve fit results.The p-value was evaluated as 0.36.(b) FKBP-DSS ligand system curve fit results.The p-value was evaluated as 0.07.(c) T4 Lysozyme L99A-benzene system curve fit results.The p-value was evaluated as 0.38.
(a)  shows the representative 'front door' pathways where imatinib dissociates near the P-loop and A-loop, highlighted in blue and magenta, respectively.Fig.S4 (b)shows the representative 'Cα-helix' pathways where imatinib dissociates closer to Cα-helix colored in green.This visual representation is used to help our understanding of the clustering mechanism.The summary of all these clustering results is presented in Table.S1.

Figure S5 :
Figure S5: Abl-kinase with Imatinib: (a) Abl-kinase wild-type (WT) system curve fit result.The p-value was evaluated as 0.07.(b) Abl-kinase mutant N368S system curve fit result.The p-value was evaluated as 0.19.(b) Abl-kinase mutant L364I system curve fit result.The p-value was evaluated as 0.06.

Table S1 :
Summary table for all Abl-kinase-imatinib experimental results (expected residence time) and simulation results (SPIB-iMetaD residence time) from our own semi-automated protocol approach depending on each dissociation pathway, including p-values.