Bayesian Optimization for Ternary Complex Prediction (BOTCP)

Arjun Rao; Tin M. Tunjic; Michael Brunsteiner; Michael Müller; Hosein Fooladi; Noah Weber

doi:10.1101/2022.06.03.494737

Abstract

Proximity-inducing compounds (PICs) are an emergent drug technology through which a protein of interest (POI), often a drug target, is brought into the vicinity of a second protein which modifies the POI’s function, abundance or localisation, giving rise to a therapeutic effect. One of the best-known examples for such compounds are heterobifunctional molecules known as proteolysis targeting chimeras (PROTACs). PROTACs reduce the abundance of the target protein by establishing proximity to an E3 ligase which targets the protein towards degradation via the ubiquitin-proteasomal pathway. Design of PROTACs in silico requires the computational prediction of the ternary complex consisting of POI, PROTAC molecule, and the E3 ligase.

Here, we present a novel machine learning-based method for predicting PROTAC-mediated ternary complex structures using Bayesian optimization. We show how a fitness score combining an estimation of protein-protein interactions with PROTAC binding energy calculations enables the sample-efficient exploration of candidate structures. Furthermore, our method presents two novel scores for filtering and reranking which take PROTAC stability (Autodock-Vina based PROTAC stability score) and protein interaction restraints (the TCP-AIR score) into account. We evaluate our method using DockQ scores and demonstrate, that even with a clustering that require members to have a high similarity, i.e. with smaller clusters, we can assign high ranks to those clusters that contain poses close to the experimentally determined native structure of the ternary complexes. We also demonstrate the resultant improved yeild of near-native poses in these clusters.

1 Introduction

Targeted protein degradation is an emerging therapeutic modality which, instead of inhibiting the activity of a drug target, acts by inducing degradation of the target protein itself [1, 2]. It employs so-called monofunctional degraders such as molecular glues or heterobifunctional degraders such as Proteolysis Targeting Chimeras (PROTACs) which act as proximity-inducing compounds. PICs bring together the protein of interest (POI) and an E3 ubiquitin ligase to induce subsequent ubiquitination and degradation of the target protein (See Figure 1) [3, 4, 5, 6].

Figure 1.

Mode of action for targeted protein degradation: (A) A POI is shown in a proximity with an E3 ligase complex – the receptor part of E3 ligase is highlighted in magenta – a heterobifunctional degrader such as PROTAC brings together the POI and E3 ligase, which initiate the degradation process. (B) After formation of a ternary complex, the POI is marked with a polyubiquitin chain (blue), which subsequently leads to protein degradation through UPS. After degradation, the ubiquitin molecules are detached and recycled by the UPS. Moreover, the PROTAC molecule is detached and can induce degradation of multiple POIs (C) Ternary complex formed from the interaction of the POI, receptor part of the E3 ligase and heterobifunctional degrader.

Targeted protein degradation using PROTACs offers several advantages over traditional occupancy-based inhibitors such as [7]:

Expanding the druggable space: there is no need for tight binding to the functional site of the POI, and as a result, PROTACs can target proteins more shallow binding pockets (e.g., transcription factors and scaffolding proteins).
Working at a substoichiometric concentration: A single PROTAC molecule can induce degradation of multiple POI molecules over time.
Improved efficacy: Recovery of target protein abundance (and therefore overall activity) is linked to protein synthesis in cells and not just to the absence of the PROTAC.
Improved selectivity: PROTACs are typically larger than traditional inhibitors and can utilize more binding sites on the target which results in higher selectivity.

Structurally, a PROTAC consists of a warhead that binds to the POI, an E3 ligand that binds into the E3 ligase, and a linker that connects these two parts. Current PROTACs utilize a limited number of E3 ligases. The most prominent E3 ligases are cereblon (CRBN) and Von-Hippel-Lindau Tumor Suppressor (VHL) due to the availability of high-affinity binders and favourable properties for both proteins. For CRBN binders, immunomodulatory drugs such as thalidomide [8], lenalidomide [9], and pomalidomide [10] are widely used. For VHL, binders were initially derived from a natural degron peptide and evolved to highly potent small molecules [11, 12]. In addition, some other E3 ligases such as cell inhibitors of apoptosis protein (cIAP) [13], murine double minute 2 (MDM2) [14], and KEAP1 [15] have been harnessed for targeted protein degradation. There is an ongoing effort on expanding the library of E3 ligases and finding suitable potent ligands against other E3 ligases.

While the set of E3 ligases used for targeted protein degradation with PROTACs has so far been limited, they have been used to target a diverse range of POIs from different protein families. There are prominent examples with good evidence of safety and efficacy in initial clinical trials, including degraders targeting the androgen receptor (AR) [16, 17], estrogen receptor (ER) [18], IRAK4, BCL-XL [19], Helios (IKZF2) [20], and GSPT1. Furthermore, in vivo studies for PROTAC targeting BRD4 [21], BTK [22, 23], RIPK2 [24], and SMARCA2 [25] have shown promising results. These findings highlight the potential of PROTAC for the degradation of diverse protein families.

Judging the efficacy of a candidate PROTAC molecule in-silico is a daunting task owing to a plethora of interactions that are difficult to evaluate. For a PROTAC to be effictive, it must at-least satisfy the following requirements. Firstly, it must be able to penetrate the cell membrane and stay stable inside the cell. Secondly, its interactions with the E3 ligase and the POI must result in the formation of a stable ternary complex where the POI is held near the E3 ligase. Finally, this complex must be one that lends itself to being tagged by ubiquitin so that the POI may be degraded.

In light of the above steps, it is clear that accurate and efficient prediction of the structure of the ternary complex is imperative for improvements at many stages of the PROTAC development process. For instance, computational modeling of ternary complexes allows the screening of multiple different E3 ligases for a candidate target, maximising the stability of the ternary complex [26]. Knowing the structure of the ternary complex is the first step to calculating its stability, and experimental evidence shows that the formation of a stable ternary complex can lead to more efficient degradation [26].

However ternary complex prediction is in itself a challenging task as one needs to take the interactions between several components into account. For example, the interaction between the warhead and the E3-binding fragments and their respective proteins, the protein-protein interaction between the E3-ligase and the target protein, as well as the interactions and physical constraints imposed by the linker. Furthermore it requires the ability to score the validity of any single physical configuration of the two proteins and the degrader, and to optimize this score to ultimately predict the structure of the ternary complex efficiently. Even if we know the fragments that bind to the E3 ligase and the POI, what constitutes an optimal linker is not yet fully understood [27]. The linker length and composition can have a significant impact on cooperativity and stability of the ternary complex [28, 22], and consequently on the efficiency of the degradation [29].

Accurate and efficient ternary complex prediction is imperative for improvements at many stages of the PRO- TAC development process. For instance, computational modeling of ternary complexes allows the screening of multiple different E3 ligases for a candidate target, maximising the stability of the ternary complex [26]. It can further facilitate linker design by investigating the impact of the linker on the ternary complex. These steps can incorporate different warheads, finding an ideal combination with available E3 ligase and binders and a linker.

In one of the first attempts for computationally predicting ternary complexes, Drummond et al. [30] proposed four different methods for the prediction ensemble of PROTAC mediated ternary complexes. With the most promising method of combining docking of the two protein-ligand structures with separate PROTAC conformer generation, they showed that it is possible to recover near-native ternary complexes for a few available PDBs. In follow-up work [31], Drummond et al. added two additional methods which improve the previous performance. This work introduces clustering of the generated structures, an essential step in obtaining good results. Zaidman et al. [32] proposed another method, PRosettaC, which adds consecutive steps such as global protein-protein docking and local docking refinement, generating conformation of PROTACs that can fit in the docking solutions and final clustering. Although they have been able to reproduce a good amount of crystal structures of ternary complexes, they are initiating their prediction from bound structures, which does not reflect real-world conditions. Bound structures come from the decomposition of already available ternary complexes, but a priori we do not have access to this information. In one of the recent works, Weng et al. [33] proposed a method again based on global protein-protein docking, local refinement with RosettaDock [34], generating PROTACs using RDkit, and applying filtering and re-ranking criteria for predicting near-native ternary complexes. They started from unbound structures and showed that good results could be achieved with these methods.

The first contribution of our current work, is the demonstration of the successful use of a Bayesian optimization (BO) [35] framework to perform a sample-efficient exploration of the space of ternary complex candidates. BO is an established method which allows optimizing functions which are costly to evaluate by proposing promising evaluation candidates (here: candidate ternary complex structures). This leads to sample efficiency, i.e., fewer candidates need to be evaluated (e.g. by a docking software) compared to (e.g. Monte Carlo-) search methods, and allows the use of potentially complex fitness functions which are needed to characterize the multiple interactions within a ternary complex. We use this to sample poses scored by a weighted sum of a protein-protein interaction (PPI) score and a PROTAC score to model the feasibility of a candidate ternary complex structure. The PPI score quantifies the interactions between the proteins in a particular candidate, while the PROTAC score takes into account the feasibility of embedding a conformation of the PROTAC molecule given the position of the binding pockets in this configuration of proteins.

The second contribution of our work lies in the design of two novel scores specialized for filtering, ranking and clustering. The first score is used for filtering favourable complexes - a version of the ambiguous interaction restraints energy (here abbreviated as “TCP-AIR Energy”) [36] that has been modified specifically for ternary complexes and prioritizes poses with ideal protein configurations. The second score is the PROTAC stability score, which is calculated using a modified version of Autodock-Vina. It packs the PROTAC and indicates its stability given a particular configuration of proteins. This score is useful for ranking.

We show in our current work (See table 2) that, with the efficient sampling with the BO loop, combined with filtering, clustering, and ranking using the above scores, we are able to effectively find highly ranked clusters with near-native poses. Specifically, compared to previous works, our clustering is more confined and each cluster contains only a small number of points. Our ability to effectively rank these, gives us clusters which have a high yield of near-native scores.

View this table:

Table 1:

Details of the protein structures used in this work. Columns show the PDB ID for each ternary complex as well as the chain identifiers (complex), the gene name, the PDB ID of the structure used as input for prediction (template), the chain IDs of E3 ligase and POI, the first and last residue number included into the used models (N-term, C-term). For the PROTAC, the colums show the residue ID, the nature of any ionizable group, and the net charge used in molecular models.

View this table:

Table 2:

Results of the BOTCP method for ternary complex prediction on unbound structures before the refinement. ¹The best rank containing at least one model with DockQ≥0.23. ²The total number of clusters. ³The percentage of the near-native poses in that specific ranked cluster. For each ternary complex, we selected a cluster which includes an RRT with a high DockQ score. Columns also show the PDB ID and characterization of the best RRT in terms of f_nat, I_RMSD, L_RMSD and DockQ score (see text).

2 Methods

In this paper, we propose the Bayesian Optimization for Ternary Complex Prediction (BOTCP) method. At its core lies a Bayesian optimization (BO) loop capable of finding appropriate ensembles of ternary complexes by optimizing an objective function which includes a fitness describing the strength of the interaction of the two proteins (“PPI fitness”) and a fitness modeling the possibility of connecting the binding pockets with the PROTAC (“PROTAC fitness”). We then perform a local optimization around the most promising candidates using a score which takes into account the stability of the PROTAC between the two proteins (“PROTAC stability score”, calculated by our own extension to Autodock-Vina). Afterwards, we cluster the resulting candidates before performing filtering using an extension to the ambiguous interaction restraint (AIR) energy [36], and re-ranking the final selections using the PROTAC stability score. In a nutshell, BOTCP consists of the following steps/stages:

Input initialization (Section 2.1)
Bayesian optimization (Section 2.2)
Local optimization with simulated annealing of the PROTAC stability score (Section 2.3)
Clustering, filtering using TCP-AIR energy, and re-ranking (Section 2.4)
Refinement (Section 2.5)

2.1 Initialization

At the time of writing, the protein database (PDB [37]) contained fifteen experimentally (x-ray) determined ternary complex structures which include a PROTAC molecule, a protein of interest, and an E3 ligase. We discarded one of these (7PI4) since here the POI is a kinase with a large flexible loop that is close to the proteinprotein interface and the PROTAC molecule but not resolved in the experimental structure. The remaining 14 structures are listed in Table 2.1. Each structure was downloaded from the PDB in its native format and preprocessed as follows. All (non-PROTAC) hetero atoms were removed, and the chains corresponding to the POI and the E3 ligase as well as the PROTAC molecule were saved in a single PDB file for comparison with predictions. In several cases, the asymmetric unit contained two or more ternary complexes, which were saved individually. The protein structures used as input for modeling were taken from other PDB entries (unbound structures) so as to emulate the practical use case where the experimental structure of the ternary complex is not available. The structures used here include three different E3 ligases and seven POIs. For each of these ten proteins, a PDB entry was selected according to the following criteria:

The protein has a bound ligand that is similar to the warhead or binder of the corresponding PROTAC molecule.
The structure comes with few and/or short gaps/unresolved loops and the resolution (R-factor) is good.

For one structure (7KHH), no PDB entry fulfilling the first criterion was available. Here we estimated the structure of the bound warhead by docking the warhead into the binding pocket, using the GNINA docking tool [38] with default parameters. The quality of the resulting complex structure was ascertained by visual inspection and comparison to the corresponding portion of the ternary complex structure. The resulting warhead and binder molecules (only non-hydrogen atoms) were saved in SDF format for usage as inputs to the BO method. Each of the ten individual single structures of E3 ligase or POI were then preprocessed using the following steps:

All heteroatoms (including water and small molecules) were removed while keeping ions (in four cases, the structure contained a Zinc ion complexed by four cysteine or histidine residues).
For those structures that had gaps, we downloaded the corresponding AlphaFold structures [39, 40], aligned the experimental and the AlphaFold structures using Chimera [41], and used the coordinates of the AlphaFold structures to fill the gaps.
We cropped flexible N- and C-terminal tails (usually with poor confidence levels in AlphaFold predictions) as specified in 2.1.
Missing atoms were replaced using the PDBfixer tool [42].
We used Reduce[43] to protonate/add hydrogens, and, where appropriate, flip asparagine, glutamine, or histidine side chains.

The PROTAC molecules were extracted from the original PDB file and converted to SDF format using Open Babel [44]. If required, bond orders and protonation were corrected manually. Here we assumed that all weak bases (secondary and tertiary amines) and acids (carboxylic acid) are (de)protonated to form anions or cations. In a few cases, the presence of piperazines required a choice to protonate one of the two nitrogens. This choice was made based on chemical intuition and the corresponding expected pKA variations. All SDF files are provided in the SI. The protein PDB and PROTAC SDF files resulting from this procedure were used as input for the BO loop.

Our method uses three inputs: the POI with the docked warhead, the E3 ligase with the docked E3 binder, and the PROTAC SMILES (see Figure 2 for an example), which are provided to the Bayesian optimization loop. It is more suitable to access the co-crystal structure of POI+warhead and E3 ligase+binder, but if this information is not available, it is required first to dock the corresponding ligands into their proteins and use these structures after molecular docking for the next step in out BOTCP pipeline.

Figure 2:

Example of the required inputs for the ternary complex 6HAX (PDB ID). (A) PROTAC molecule. (B) and (D) E3 ligase docked to warhead. (C) and (E) POI docked to the warhead.

It is possible to apply our method to the structures of proteins and docked ligands extracted from the co-crystal structure of known PROTAC ternary complexes (so-called bound protein data). However, as we do not have access to these bound structures in most real-world situations, we focus on using unbound data (individual protein structures). Note that this makes the problem more challenging as the 3D structures in the bound and unbound cases differ.

2.2 Bayesian optimization

Our goal is to optimize a black box function y(x) describing the quality of a ternary complex pose (defined by the vector x), where evaluations are expensive and result in noisy observations. In the present work, the noisy observations result from an objective function that combines information concerning the favorability of protein-protein interaction with information on the feasibility of finding a stable PROTAC conformer for a particular ternary complex x. This is a classical use case for Bayesian optimization [45, 46]. Formally, our goal is to find the optimal pose x* which maximizes y(x), i.e., while searching within some conformation sample space χ. This is accomplished by using the noisy observations obtained by evaluating the objective to update a surrogate model (here, a Gaussian process), which is used to sample promising points for evaluation using an acquisition strategy. These steps are repeated in a loop (Figure 3) until some stopping criterion is met. The loop is described in algorithm 1. In the following sections, we describe the elements of this loop (search space, objective evaluation, surrogate model, acquisition strategy) in detail.

Figure 3:

Overview of Bayesian optimization pipeline. We are sampling from the roto-translational space of POI. Then we assign a score for each sample D = {(x_i,y_i)}. Then, the dataset D is used for updating the GP into posterior GP. For sampling the next batch of points, the acquisition function is used. In our proposed method, we used the UCB acquisition function to determine the next sampling points. This iterative process continues until the end of some Bayesian optimization steps. The number of iterations is a hyperparameter that controls when do we get the list of final valid complexes.

2.2.1 Search space

Using the Bayesian optimization loop, we optimize the ternary complex pose. Each such pose is described by a vector x. As we are concerned with the relative orientation of the two proteins, we arbitrarily fix the E3 ligase at the origin and only modify the position of the POI while attempting to find the pose with the highest fitness. The vector x thus describes the relative rotation and translation (RRT) of the POI with respect to the E3 ligase, and comes from the 7D roto-translational space χ (3 dimensions for translation, 4 dimensions for rotation as a quaternion). In the remainder of this paper, we refer to such protein-protein configurations as RRTs (see Figure 4A for an illustration). Note that the alternative of optimizing using an all-atom representations for the POI would also be possible (e.g., encoding the location of each residue within x to allow the optimization algorithm to also alter the protein structure), but the resulting search space is less suitable for practical Bayesian optimization due to its high dimensionality.

Figure 4:

Illustration of the search space. The E3 ligase (yellow) is fixed in space. The Bayesian optimization optimizes the ternary complex by sampling different poses (A-D), each of which describes the rotation and relative translation (RRT) of the POI.

2.2.2 Objective evaluation

Our model uses Bayesian optimization to propose promising candidates x ∈ χ for evaluation. This requires training a surrogate model using the currently available data, i.e., a dataset D = {(x_i,y_i) | i = 1, 2, &,N} consisting of RRTs x_i with their assigned fitness values y_i. We used a combined with two components: where λ determines the relative weighting of the PPI fitness y_PPI vs. the PROTAC fitness y_PROTAC (we used λ = 1). The PPI fitness represents the strength of interaction between the two proteins. While strong PPIs are important for a stable ternary complex, there are cases in which the proteins are oriented such that the respective binding pockets for the E3-binder and the warhead are too far apart for the length of the linker. On the other hand, the proteins’ relative positions might result in binder and warhead being too close, leading to steric clashes within the linker itself. The PROTAC fitness y_PROTAC(x) ensures these cases have a low overall fitness values.

Algorithm 1

The BO loop

Algorithm 2

Acquiring new points for evaluation

Algorithm 3

Acquisition optimization algorithms

Algorithm 4

Evaluation of objective

PPI fitness

A number of protein-protein docking scoring schemes have been proposed in the literature. In this work, we use the distance-scale finite ideal-gas reference (DFIRE) energy score [47] as y_PPI. This score is a relatively unbiased estimate of the validity of the protein-protein interaction while allowing rapid evaluation. The PPI fitness serves to filter out infeasible protein-protein orientations.

PROTAC fitness

The PROTAC fitness y_PROTAC is calculated based on a linker conformer, which we generate using RDKit. To do this, we constrain the two binding fragments are at their respective target positions (specified by the protein orientations as described by x) via a restoring force field. The linker atoms are free from any such constraint, and we sample linker conformations by minimizing the UFF [48] starting with initial conformers generated using distance geometry. After generating 4 candidate conformers, we choose the output to be the conformer with the lowest RMSD between the fragments’ positions and their target positions. The PROTAC fitness is defined as the negative square-root of the UFF energy of this conformer.

The generation of this conformer and hence fitness is computationally expensive. We found that we can effectively train a neural network model to effectively predict the PROTAC fitness. For a particular POI, E3 ligase, and PROTAC, we train a fitness model which maps the 7D RRT vector x to the PROTAC fitness values generated using RDKit (as described above). The model consisted of two components: a simple model which learns to predict asymptotic component of the energy (based on the distance between the translation component of the RRT and a learned center vector), and a 3-hidden layer neural network which learns to predict the residual component. Prior to the BO loop, we pre-train this model using a dataset of RDKit-generated conformers and use it to map the x values proposed by the Bayesian optimization loop. The advantage is that after learning the model, for computing the fitness for new rotation and translation, just one forward pass is required, making the fitness computation very fast.

It is important to note that the above PROTAC fitness values only use the protein geometry in order to define the fragment positions. For the linker atom positions, the proteins are not taken into account, thus, it cannot be used to filter out cases with steric clashes between PROTAC and proteins. These cases are taken into account by the PROTAC stability score (see below).

Local fitness ascent

When evaluating the objective, we additionally perform a local ascent in the space of RRTs using a black box optimization algorithm (Nelder-mead simplex ascent [49]) and take the final optimized objective as the objective corresponding to a particular orientation x. This gives a more robust estimation of the objective as in some cases, small changes of x can have a somewhat large influence on the fitness. This approach is described in algorithm 4.

2.2.3 Surrogate model

We aim to optimize the black box function y (x) using the evaluations y_i at some points x_i but without having access to an analytical form of this function. To optimize y, we need to approximate/extrapolate our belief about y at points that have not been evaluated in order to find regions of interest within the search space χ. This is done using a surrogate model f (x), which describes our belief/approximation of the actual black box function y(x) based on our observed points so far. The surrogate model is trained so that it matches the true function y at the evaluated points x_i, but it can also be evaluated at other points at a low computational cost. The important characteristic of the surrogate model is the ability to output uncertainty of the prediction in the form of posterior over function values f (x) at points x.

While in principle Bayesian optimization can utilize different surrogate models, we employed the widely used Gaussian process (GP) [50, 51]. A GP is defined by a prior mean function m(x) and a covariance kernel k(x, x’) which describes the covariance between any two points x and x’, i.e.,

We use a constant learned value m(x) ≡ m₀ as prior mean. As covariance kernel, we use the a well-known Matern kernel [51] (see Figure 5), i.e., where d is the distance between x and x’, v is a smoothness parameter (taking on values , or , with larger values increasing smoothness), and K_ν is a modified Bessel function. As distance d, we use the RMSD between the atomic coordinates of the protein of interest (scaled by a learned parameter parameter Θ), which (between two rigid transformations of an object) can be calculated very efficiently, making this kernel computationally inexpensive.

Figure 5:

Illustration of the Matern kernel, used as covariance kernel for the Gaussian process. (A) RMSD between different POI positions relative to the E3 ligase (i.e., different RRTs x). (B) Matern kernel (see Equation 4) as a function of the RMSD.

2.2.4 Acquisition strategy

Bayesian optimization requires a strategy by which we select new candidate RRTs for which we evaluate the objective. Intuitively, we want to perform exploration by sampling points for which we have no information about their potential value, while also performing exploitation by sampling points which we assume will have a high objective value. At the same time, we should avoid sampling points where we already possess enough information to predict that the objective will be low. Formally, we define an acquisition function α, and evaluate the objective on values x which maximize α. There are different acquisition strategies allowing to balance exploration and exploitation [35]. We found that the popular upper confidence bound (UCB) acquisition function [52, 35] is effective in our scenario. It is defined as where μ(x) and σ(x) are the predicted mean and standard deviation of f (x) (which can be calculated from the existing data points using the mean function and covariance kernel). The parameter β > 0 defines the balance of exploration versus exploitation (a large β leads to large α at points where the standard deviation is high, thus increasing the exploration, while a small β leads to large α at points with a high predicted mean, thus increasing exploitation).

Our full acquisition strategy is described in algorithms 2 and 3. In each iteration of the BO loop, we sample two batches of new points, one for exploitation and for exploration. We do this by selecting points with a high value of the above acquisition function, with low and high β values respectively, and performing local optimization using gradient ascent (using ADAM [53]). We use the open-source BoTorch package [54] to solve this optimization problem and implement the BO loop.

2.3 Local optimization with simulated annealing of the PROTAC stability score

From the above points, we filter out RRTs with infeasible PROTAC scores or PPI scores. We next locally optimized the remaining RRTs using a combination of the PPI score and a PROTAC stability score.

PROTAC stability score

We calculated a PROTAC stability score using an extension of AutoDock-Vina [55]. This score takes into account the stability of the PROTAC within the context of the protein configuration corresponding to a particular RRT (which is ignored in the calculation of the PROTAC fitness, see above). We extended Autodock-Vina by adding an additional restoring force that allows constraining some atoms of the ligand while performing docking. We used this force to keep the E3 binder and the warhead at their respective coordinates given the current protein-protein RRT, ensuring that the binding fragments do not leave their binding pockets. We calculated ad PROTAC stability score for each sampled RRT by performing the following steps:

We optimized the PROTAC molecule’s conformation under only the restoring force and the intra-molecular part of the Vina force field, creating a viable conformer with the binding fragments held at their positions.
We then turned off the restoring force and perform an optimization using the full (intra- and inter- molecular) Vina force field.

For each RRT, Vina creates 10 (random) PROTAC conformers and performs this optimization (using the sum of the energies of the restoring force and the Vina force fields as fitness). Correspondingly, for each such PROTAC conformer i we have the following two energies. First, the Vina energy , which is the combination of the intermolecular and internal energy of the conformer. Second, the restoring energy, which measures the deviation of the E3 binder and warhead from their constrained positions in the particular RRT. To generate a score corresponding to the stability of the PROTAC in a particular protein-protein RRT, we look at the following cases:

is low and is low: the conformation is stable and doesn’t strain the warhead and E3 ligase.
is low and is high: the conformation is stable, but the warhead and E3 ligase need to be displaced to achieve this conformation.
is high: the conformation is unstable/infeasible.

We thus define the following weighted average of the Vina energy as the PROTAC stability score: where η^v and η^R are scaling parameters. This score rewards stable conformers while being forgiving to outliers.

Simulated annealing

We used a combination of this score and the PPI score to perform local optimization of the remaining candidates using simulated annealing [56]. For any RRT, simulated annealing proceeds via a sequence of steps, where in each step we randomly sample an RRT near the current RRT. The samples generated here are optimized to have a reasonable PPI and PROTAC score. We evaluate the above mentioned weighted sum of the PROTAC stability and PPI scores and stochastically accept and reject it based on that.

As this process allows the PROTAC conformation to change (both explicitly through simulated annealing and implicitly within Autodock-Vina), it resembles a refinement of the PROTAC conformer.

Finally, we ensure that each sampled point has a reasonable PPI and PROTAC score.

2.4 Clustering and filtering using TCP-AIR energy

After generating this refined set of candidate RRTs, we perform a clustering step. These clusters are first filtered using the TCP-AIR energy, then, we re-rank them to obtain the final output set of RRTs.

Clustering

We cluster the set of candidates using a combination of the PPI score and the PROTAC stability score. Clusters were greedily created by selecting the RRT with the highest fitness, choosing that as the center of the new cluster, and assigning all sampled RRTs with an RMSD (vs. the center of the cluster) of below 5 Å to this cluster. This step was repeated (starting again with the RRT with the highest fitness which was not yet part of any cluster) until all points were assigned to some cluster. The motivation for this simple clustering strategy is that it guarantees a limit on the maximum RMSD between the poses assigned to a single cluster. It is to be noted here that clusters with a single element are also considered.

Filtering using TCP-AIR energy

We use a filtering technique based on the ambiguous interaction restraints (AIRs) energy [57] that has been used extensively in [36] for energy calculation. In order to calculate the TCP-AIR energy terms, for both proteins, active and passive residues need to be defined. Active and passive residues are chosen based on two criteria: first, they have some atoms on the surface of the protein, and second, these atoms lie within some distance of the atoms of the fragments binding to this particular protein. We used distances of 5 Å and 7.5 Å for active and passive distances, respectively. For instance, residues within 5 Å of the warhead were defined as active residues for POI. For these residues, we calculate the effective distances (as done in previous work [36]). These are used to calculate the TCP-AIR energy (using the NOE energy formula, [57]) using three threshold values L, U, and S. L and U are the lower and upper bounds on the effective distance, while S is the value from where the computation of the energy changes from quadratic to linear. These parameters need to be specified for each ternary complex separately since each structure has different distances between two active sites of the proteins (due to the space occupied by the PROTAC molecule). We set the values, we calculated the average distances of each active residue of one protein with the active and passive residues of the other protein, resulting in two distances for each candidate. We considered all of our samples that have PROTAC stability score > 5. For each of them we took an average of the two distances of the proteins and define L as the 25th quantile. We then defined U = L + 3 Å and S = U + 2 Å. When filtering, we preserved all clusters that had at least one candidate with an TCP-AIR energy large than the 90th quantile of all AIR energies (see Figure 6).

Figure 6:

Usefulness of TCP-AIR energy and the PROTAC stability score for filtering. A) Correlation between the TCP-AIR energy and the DockQ score for different example ternary complexes. As high DockQ scores are only achieved when the TCP-AIR energy is 0 or close to 0, the later can be used as a filtering criterion. Text insets show Pearson (r) and Spearman (ρ) correlation values. B) Same as A, but for the PROTAC stability score.

Re-rank clustering

After filtering, we re-rank the remaining clusters using the PROTAC stability score (see above, Section 2.3). Clusters are ranked according to the largest PROTAC stability score of their members. We then calculate the rank of the near-native clusters (Table 2). For the refinement, we generate a diverse set of complexes by selecting the structure with the highest PROTAC stability score out of each of the 100 top-ranked clusters. For each of these, we use the 10 PROTAC conformers generated by AutoDock-Vina (see Section 2.3) as input to the refinement (i.e., the refinement is performed for 1000 structures).

2.5 Refinement

Starting with a given number of ternary complex structures from re-ranked and clustered TCP output, we performed further refinement using classical force field-based molecular simulation with an implicit solvent model. Force field parameters for each molecule were taken from the Amber ffSB14 force field [58] for proteins, and GAFF2 [59] for PROTAC molecules. Partial charges for the PROTAC molecules were determined as AM1BCC charges [60] using antechamber/Ambertools version 21 [61]. MD simulations were performed using OpenMM [62] with a Langevin type integrator and a time step of two femtoseconds. The implicit solvent model used was the second GBneck model (GBN2 [63]). Each structure was first optimized through energy minimization for 100 steps to remove close contacts, followed by a simulated annealing run, with the temperature varying from 300 to 0 Kelvin in a linear fashion, for a total simulation time of 200 picoseconds. This procedure was repeated four times to approximate the average total free energy of solvation and standard deviations for each input structure. The resulting energies are essentially a simplified version of MM-GB/SA energies, where a calculation of entropic contributions is omitted. Including entropy terms would require much longer simulation times, and, given the approximate nature of current approaches for entropy estimation [64], we consider the benefit of including this contribution limited.

2.6 Re-clustering and Re-ranking

The refined structures (1000 of them) were clustered by the fraction of common contact (FCC) clustering method [65]. Afterwards, the clusters were ranked based on size (the cluster with the highest number of elements being the first-ranked). The clustering threshold was set to 0.9 (this could lead to smaller entropy in the clusters [65]) and minimum number of elements in a cluster was set to 4. The rest of non-clustered structures were not taken into consideration.

2.7 Fitness landscape analysis

Using 5 exemplary ternary complexes from PDB (bound structures), we extracted the RRT corresponding to the native pose and modified them to investigate the resulting change of the fitness y(x). For each ternary complex, we generated 4000 random near-native samples by first randomly shifting the POI before rotating them. The shift directions were drawn from a von Mises-Fisher distribution with the mean set to the offset of the POI (i.e., pointing away from the E3 ligase) and the directional parameter κ = 1 (i.e., shifts moderately concentrated in this direction). The shift distances were drawn from an exponential distribution with λ = 1 Å. The rotations were generated by drawing a random rotation direction from a uniform distribution (no directions favored) and a rotation angle drawn from a von Mises distribution with mean 0 and κ =10 (i.e., small rotations favored).

3 Results

3.1 Evaluation of fitness functions

To check whether our proposed fitness functions are a meaningful objective for the BO loop, we analyzed the change of the fitness values around the true ternary complexes. Using bound structures, we created random near-native RRTs by altering the position of the POI using small, random translations and rotations (see Section 2.7 for details). We then compared the values of the PPI and PROTAC fitnesses of these random RRTs to the fitness of the actual native complex (Figure 7). The results show that these fitnesses are higher at the native RRT compared to other RRTs in its vicinity, thus indicating the their usefulness as objective for the BO loop.

Figure 7:

Native complexes have high fitness values. The columns show (for five sample ternary complexes) histograms of PPI fitness (left) and PROTAC fitness (right) for random near-native RRTs, created by randomly shifting and rotationing the POI (see text for details). The actual native complex (pink line) achieves a comparably high fitness.

3.2 Evaluation of the BO loop

We futhermore evaluated the BO method to test whether it performs adequate exploration and exploitation. Ideally, the distribution of candidates acquired by the BO loop is clustered in regions where the fitness y(x) is high. We compared the BO outputs to a set of randomly sampled candidates, and found that this is indeed the case (Figure 8): while random sampling occasionally finds RRTs with a high fitness, the BO loop generates samples in a more structured way (Figure 8A-B), resulting in more consistently high fitnesses (Figure 8) and a higher maximum value.

Figure 8: Bayesian optimization results in higher fitness values than random search.

Shown here are samples from the acquisition function at iteration 15 of the BO loop for the 6HAX complex, compared to random samples picked with proteins close to each other. (A) Translation values from random sampling (pink) and proposed by Bayesian optimization (blue). The latter leads to a candidate concentration in a promising region. The spread of points also suggests that the the BO loop sufficiently explores the space without converging prematurely to any specific minima. Points sampled via Both random sampling and BO are subject to a local optimization (B) Rotation quaternions mapped onto a 3-D cube from random sampling and Bayesian optimization. (C) Resulting fitness histograms. While random samples find some conformations with high fitness, the best fitness acquired by the Bayesian optimization is higher, and the mean fitness is substantially better, confirming that our samples efficiently search through the space of higher fitness values.

3.3 Evaluation metric

For evaluating the quality of the candidates generated by the BO loop (Table 2), we used the DockQ score [66], which is based on the metrics underlying the well-known critical assessment of predicted interactions (CAPRI) score [67]: f_nat, I_RMSD, and L_RMSD. f_nat is the fraction of native contacts that a candidate recovers. I_RMSD is calculated over all backbone atoms for those residues found in the interface of the reference structure (residues within a radius of 5 Å). L_RMSD is calculated based on the backbone atoms of the ligand between the predicted and native model (after aligning the receptor structures, i.e., the E3 ligase). The DockQ score combines these measures into a single scalar value between 0 and 1, with higher values corresponding to ternary complexes which resemble the original structure more closely. The distribution of the DockQ scores has been defined as follows [66]:

0 < DockQ < 0.23 - Incorrect complex
0.23 ≤ DockQ < 0.49 - Acceptable quality complex
0.49 ≤ DockQ < 0.80 - Medium quality complex
DockQ ≥ 0.80 - High quality complex

3.4 Prediction of near-native poses

We run our pipeline on 14 ternary complexes, some of which have multiple reported native ternary complex structures leading to a total of 22 cases. By running our pipeline and comparing our results to these 22 native ternary complex structures, we have found that the BO loop can effectively sample the near-native poses. In addition, the filtering and ranking procedures can assign them a relevant rank. For 21 out of 22 cases, we could sample at least acceptable quality complexes. For 16 out of 22 (Table 2), we were able to rank them within the top 20 clusters, and for 12 out of 22 complexes, we were able to recover a medium quality ternary structure. Additionally, 14 out of 22 clusters have at least 50% of the near-native poses in the reported cluster (see Table 2). This method proves that even with using a relatively simple PPI interaction and unbound structures, taking into account the constraints imposed by the linker, with the incorporation of the PROTAC score in the BO loop, as well as the TCP-AIR energy for filtering and PROTAC stability score for ranking, we can recover near-native predictions of the ternary complex structures.

After the use of refinement, we can see clear improvements. For 22/22 cases, we obtained complexes with the DockQ ≥ 0.23. For 17 out of 22 (Table 3), we were able to rank them within the top 20 clusters. For 14 out of 22 cases, we could recover at least a medium-quality ternary structure. For 2 out of 22 complexes, we recovered a high-quality ternary structure. Additionally, in 14/22 cases, in the reported highly-ranked near-native cluster, at least 50% of the poses in it are near-native poses (see Table 3).

View this table:

Table 3:

Results of the BOTCP method for ternary complex prediction on unbound structures after the refinement. ¹The best rank containing at least one model with DockQ≥0.23. ²The total number of clusters. ³The percentage of the near-native poses in that specific ranked cluster. For each ternary complex, we selected a cluster which includes an RRT with a high DockQ score. Columns also show the PDB ID and characterization of the best RRT in terms of f_nat, I_RMSD, L_RMSD and DockQ score (see text).

3.5 Comparison to previous work

Ternary complex prediction models have frequently been evaluated on bound structures (e.g., PRosettaC [32]). We have demonstrated in this work that effective ranking of predicted structures in the more challenging scenario of unbound structures is possible with our method. We are aware of one other approach based on unbound structures has been presented by Weng et al. [33], who have shown successful ranking of the near-native clusters within the top-15 clusters. However, Weng et al. used clustering with FCC using a threshold of 0.5, which is relatively low, leading to a large spread of poses within the resulting clusters. Even though there may be a near-native pose in such a cluster, a threshold of 0.5 implies that the fraction of near-native poses in the cluster would be low due to the diversity of poses within the cluster. Moreover, there is a higher likelihood that the best-scored pose in this cluster is, in fact, not a near-native pose. To avoid this, we choose a higher threshold of 0.9, which restricts the spread of points placed in a single cluster. Accordingly, a cluster ranked high in our case has lower entropy, i.e., the yield of near-native structures in it is much higher. With this form of constrained clustering, we have near-native clusters ranked within the top-15 clusters in 13 out of 22 cases before refinement and 16 out of 22 cases after refinement. These results demonstrate the efficacy of the Vina and TCP-AIR energies.

4 Conclusion and outlook

We have demonstrated our BOTCP module for predicting unbound rigid ternary complex structures (Table 2). Our method can sample and appropriately rank the near-native poses for most of them. Initially, 7KHH was the only complex for which our method is (currently) unable to sample near-native poses effectively. However, after refinement, we were able to recover near-native poses for all of the complexes (Table 3). Our results do not require the use of GPUs to compute the scores and, using 128 standard x64 CPU cores, for each complex, our results including filtering and re-ranking excluding refinement, typically take less than 2 hours. This represents a significant improvement over conventional techniques, that attempt to handle many interactions in a ternary complex via GPU intensive molecular dynamics simulations which typically take days or weeks to report a near-native pose. Note that this code is a prototype with significant room to optimize the various components further still.

At the heart of this, is the fact that Bayesian optimization learns to sample regions of high scores in a few iterations. In all of our simulations, we sample only around 2500 points, where the BO loop converges rougly after sampling around 700 points. This sample efficiency suggests modifying the fitness function to use more computationally expensive but informative data (e.g., the PROTAC stability score) for the BO loop. Such a fitness could take into account the interactions of the linker with the proteins, which has the potential to significantly improve the yield of near-native poses.

Currently, there are several avenues of improvement for both the quality of results and computational efficiency. The subsequent refinement of the structures provided from the BOTCP module currently takes about 25 hours but could be further optimized. For instance by running the refinement for less than 1000 poses or by using a smaller number of PROTAC conformations per structure. While the refinement is still computationally intensive, we see that the results without refinement suggest that the current scores and sampling technique can be beneficial in assessing the interactions for a given ternary complex.

As mentioned above, incorporating more informative PPI and constraint scores in the BO loop will further filter non-native poses. Currently, the PROTAC stability score and the PPI score are affected because the side-chains are misaligned with the side chains as in the original crystal structure. Our experiments show that the native structure consistently has a PROTAC stability fitness that is among the highest observed for that complex, which promises significant improvements if this score includes side-chain flexibility (e.g., using Autodock-Vina). To summarize, we demonstrate in this work the successful application of Bayesian optimization as well as the design of two specialized scores that can effectively sample and rank highly clusters of near-native poses for Ternary Complexes.

Author Contributions

A.R., T.T. conceived the algorithms, and ran the simulations using the TCP-AIR score, PROTAC stability score and the BO pipeline. A.R., T.T., M.M., H.F. implemented the software necessary. M.B. performed input data processing and refinement of structures. M.M. performed analysis of the fitness scores and result visualization. A.R., T.T., M.M., M.B., N.W. wrote the manuscript.

References

[1].↵
Matthieu Schapira, Matthew F Calabrese, Alex N Bullock, and Craig M Crews. Targeted protein degradation: expanding the toolbox. Nature reviews Drug discovery, 18(12):949–963, 2019.
OpenUrl CrossRef
[2].↵
Nan Bai, Kristin M Riching, Aman Makaju, Hao Wu, Timothy M Acker, Shu-Ching Ou, Yaru Zhang, Xiaomeng Shen, Daryl Bulloch, Huan Rui, et al. Modeling the crl4a ligase complex to predict target protein ubiquitination induced by cereblon-recruiting protacs. Journal of Biological Chemistry, page 101653, 2022.
[3].↵
Stuart L Schreiber. The rise of molecular glues. Cell, 184(1):3–9, 2021.
OpenUrl
[4].↵
Guoqiang Dong, Yu Ding, Shipeng He, and Chunquan Sheng. Molecular glues for targeted protein degradation: from serendipity to rational discovery. Journal of Medicinal Chemistry, 64(15):10606–10620, 2021.
OpenUrl
[5].↵
Taavi K Neklesa, James D Winkler, and Craig M Crews. Targeted protein degradation by protacs. Pharmacology & therapeutics, 174:138–144, 2017.
OpenUrl
[6].↵
Mariell Pettersson and Craig M Crews. Proteolysis targeting chimeras (protacs)—past, present and future. Drug Discovery Today: Technologies, 31:15–27, 2019.
OpenUrl
[7].↵
Alessio Ciulli and Nicole Trainor. A beginner’s guide to protacs and targeted protein degradation. The Biochemist, 43(5):74–79, 2021.
OpenUrl
[8].↵
Eric S Fischer, Kerstin Böhm, John R Lydeard, Haidi Yang, Michael B Stadler, Simone Cavadini, Jane Nagel, Fabrizio Serluca, Vincent Acker, Gondichatnahalli M Lingaraju, et al. Structure of the ddb1-crbn e3 ubiquitin ligase in complex with thalidomide. Nature, 512(7512):49–53, 2014.
OpenUrl CrossRef PubMed Web of Science
[9].↵
Emma C Fink and Benjamin L Ebert. The novel mechanism of lenalidomide activity. Blood, The Journal of the American Society of Hematology, 126(21):2366–2369, 2015.
OpenUrl
[10].↵
Shaobing Gao, Shichao Wang, and Yongping Song. Novel immunomodulatory drugs and neo-substrates. Biomarker Research, 8(1):1–8, 2020.
OpenUrl
[11].↵
Dennis L Buckley, Inge Van Molle, Peter C Gareiss, Hyun Seop Tae, Julien Michel, Devin J Noblin, William L Jorgensen, Alessio Ciulli, and Craig M Crews. Targeting the von hippel-lindau e3 ubiquitin ligase using small molecules to disrupt the vhl/hif-1α interaction. Journal of the American Chemical Society, 134(10):4465–4468, 2012.
OpenUrl CrossRef PubMed
[12].↵
Carles Galdeano, Morgan S Gadd, Pedro Soares, Salvatore Scaffidi, Inge Van Molle, Ipek Birced, Sarah Hewitt, David M Dias, and Alessio Ciulli. Structure-guided design and optimization of small molecules targeting the protein-protein interaction between the von hippel-lindau (vhl) e3 ubiquitin ligase and the hypoxia inducible factor (hif) alpha subunit with in vitro nanomolar affinities. Journal of Medicinal Chemistry, 57(20):8657–8663, 2014.
OpenUrl CrossRef PubMed
[13].↵
James Schiemer, Reto Horst, Yilin Meng, Justin I Montgomery, Yingrong Xu, Xidong Feng, Kris Borzilleri, Daniel P Uccello, Carolyn Leverett, Stephen Brown, et al. Snapshots and ensembles of btk and ciap1 protein degrader ternary complexes. Nature Chemical Biology, 17(2):152–160, 2021.
OpenUrl
[14].↵
John Hines, Schan Lartigue, Hanqing Dong, Yimin Qian, and Craig M Crews. Mdm2-recruiting protac offers superior, synergistic antiproliferative activity via simultaneous degradation of brd4 and stabilization of p53. Cancer research, 79(1):251–262, 2019.
OpenUrl Abstract/FREE Full Text
[15].↵
Jieli Wei, Fanye Meng, Kwang-Su Park, Hyerin Yim, Julia Velez, Prashasti Kumar, Li Wang, Ling Xie, He Chen, Yudao Shen, et al. Harnessing the e3 ligase keap1 for targeted protein degradation. Journal of the American Chemical Society, 143(37):15073–15083, 2021.
OpenUrl
[16].↵
Daniel Peter Petrylak, Xin Gao, Nicholas J Vogelzang, Mary Harlow Garfield, Ian Taylor, Marcia Dougan Moore, Ronald Alan Peck, and Howard A Burris III. First-in-human phase i study of arv-110, an androgen receptor (ar) protac degrader in patients (pts) with metastatic castrate-resistant prostate cancer (mcrpc) following enzalutamide (enz) and/or abiraterone (abi)., 2020.
[17].↵
Taavi Neklesa, Lawrence B Snyder, Ryan R Willard, Nicholas Vitale, Jennifer Pizzano, Deborah A Gordon, Mark Bookbinder, Jennifer Macaluso, Hanqing Dong, Caterina Ferraro, et al. Arv-110: An oral androgen receptor protac degrader for prostate cancer. J Clin Oncol, 37(7):14–16, 2019.
OpenUrl
[18].↵
JJ Flanagan, Y Qian, SM Gough, M Andreoli, M Bookbinder, G Cadelina, J Bradley, E Rousseau, R Willard, J Pizzano, et al. Abstract p5-04-18: Arv-471, an oral estrogen receptor protac degrader for breast cancer, 2019.
[19].↵
Yonghan He, Raphael Koch, Vivekananda Budamagunta, Peiyi Zhang, Xuan Zhang, Sajid Khan, Dinesh Thummuri, Yuma T Ortiz, Xin Zhang, Dongwen Lv, et al. Dt2216—a bcl-xl-specific degrader is highly active against bcl-xl-dependent t cell lymphomas. Journal of hematology & oncology, 13(1):1–13, 2020.
OpenUrl
[20].↵
Joshua D Hansen, Matthew Correa, Mark A Nagy, Matt Alexander, Veronique Plantevin, Virginia Grant, Brandon Whitefield, Dehua Huang, Timothy Kercher, Roy Harris, et al. Discovery of crbn e3 ligase modulator cc-92480 for the treatment of relapsed and refractory multiple myeloma. Journal of medicinal chemistry, 63(13):6648–6676, 2020.
OpenUrl
[21].↵
Bing Zhou, Jiantao Hu, Fuming Xu, Zhuo Chen, Longchuan Bai, Ester Fernandez-Salas, Mei Lin, Liu Liu, Chao-Yie Yang, Yujun Zhao, et al. Discovery of a small-molecule degrader of bromodomain and extra-terminal (bet) proteins with picomolar cellular potencies and capable of achieving tumor regression. Journal of medicinal chemistry, 61(2):462–481, 2018.
OpenUrl CrossRef PubMed
[22].↵
Adelajda Zorba, Chuong Nguyen, Yingrong Xu, Jeremy Starr, Kris Borzilleri, James Smith, Hongyao Zhu, Kathleen A Farley, WeiDong Ding, James Schiemer, et al. Delineating the role of cooperativity in the design of potent protacs for btk. Proceedings of the National Academy of Sciences, 115(31):E7285–E7292, 2018.
OpenUrl Abstract/FREE Full Text
[23].↵
Dennis Dobrovolsky, Eric S Wang, Sara Morrow, Catharine Leahy, Tyler Faust, Radoslaw P Nowak, Katherine A Donovan, Guang Yang, Zhengnian Li, Eric S Fischer, et al. Bruton tyrosine kinase degradation as a therapeutic strategy for cancer. Blood, The Journal of the American Society of Hematology, 133(9):952–961, 2019.
OpenUrl
[24].↵
Alina Mares, Afjal H Miah, Ian ED Smith, Mark Rackham, Aditya R Thawani, Jenni Cryan, Pamela A Haile, Bartholomew J Votta, Allison M Beal, Carol Capriotti, et al. Extended pharmacodynamic responses observed upon protac-mediated degradation of ripk2. Communications biology, 3(1):1–13, 2020.
OpenUrl
[25].↵
Christiane Kofink, Nicole Trainor, Barbara Mair, Simon Wöhrle, Melanie Wurm, Nikolai Mischerikow, Gerd Bader, Klaus Rumpel, Thomas Gerstberger, Yunhai Cui, and et al. A selective and orally bioavailable vhl-recruiting protac achieves smarca2 degradation in vivo. ChemRxiv, 2022.
[26].↵
Scott J Hughes and Alessio Ciulli. Molecular recognition of ternary complexes: a new dimension in the structure-guided design of chemical degraders. Essays in biochemistry, 61(5):505–516, 2017.
OpenUrl Abstract/FREE Full Text
[27].↵
Troy A Bemis, James J La Clair, and Michael D Burkart. Unraveling the role of linker design in proteolysis targeting chimeras: Miniperspective. Journal of Medicinal Chemistry, 64(12):8042–8052, 2021.
OpenUrl
[28].↵
Morgan S Gadd, Andrea Testa, Xavier Lucas, Kwok-Ho Chan, Wenzhang Chen, Douglas J Lamont, Michael Zengerle, and Alessio Ciulli. Structural basis of protac cooperative recognition for selective protein degradation. Nature chemical biology, 13(5):514–521, 2017.
OpenUrl
[29].↵
Robert Ian Troup, Charlene Fallan, and Matthias GJ Baud. Current strategies for the design of protac linkers: a critical review. Exploration of Targeted Anti-tumor Therapy, 1:273–312, 2020.
OpenUrl
[30].↵
Michael L Drummond and Christopher I Williams. In silico modeling of protac-mediated ternary complexes: validation and application. Journal of chemical information and modeling, 59(4):1634–1644, 2019.
OpenUrl
[31].↵
Michael L Drummond, Andrew Henry, Huifang Li, and Christopher I Williams. Improved accuracy for modeling protac-mediated ternary complex formation and targeted protein degradation via new in silico methodologies. Journal of Chemical Information and Modeling, 60(10):5234–5254, 2020.
OpenUrl
[32].↵
Daniel Zaidman, Jaime Prilusky, and Nir London. Prosettac: Rosetta based modeling of protac mediated ternary complexes. Journal of chemical information and modeling, 60(10):4894–4903, 2020.
OpenUrl
[33].↵
Gaoqi Weng, Dan Li, Yu Kang, and Tingjun Hou. Integrative modeling of protac-mediated ternary complexes. Journal of Medicinal Chemistry, 64(21):16271–16281, 2021.
OpenUrl
[34].↵
Nicholas A Marze, Shourya S Roy Burman, William Sheffler, and Jeffrey J Gray. Efficient flexible backbone protein—protein docking for challenging targets. Bioinformatics, 34(20):3461–3469, 2018.
OpenUrl CrossRef PubMed
[35].↵
Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
[36].↵
Cyril Dominguez, Rolf Boelens, and Alexandre MJJ Bonvin. Haddock: a protein-protein docking approach based on biochemical or biophysical information. Journal of the American Chemical S’ociety, 125(7):1731–1737, 2003.
OpenUrl CrossRef PubMed Web of Science
[37].↵
Helen M. Berman. The Protein Data Bank. Nucleic Acids Research, 28(1):235–242, jan 2000.
OpenUrl CrossRef PubMed Web of Science
[38].↵
Andrew T. McNutt, Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. GNINA 1.0: molecular docking with deep learning. Journal of Cheminformatics, 13(1):1–20, 2021.
OpenUrl
[39].↵
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, aug 2021.
OpenUrl CrossRef PubMed
[40].↵
Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yor-danova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, and Sameer Velankar. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1):D439–D444, jan 2022.
OpenUrl CrossRef
[41].↵
Eric F. Pettersen, Thomas D. Goddard, Conrad C. Huang, Gregory S. Couch, Daniel M. Greenblatt, Elaine C. Meng, and Thomas E. Ferrin. UCSF Chimera - A visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25(13):1605–1612, oct 2004.
OpenUrl CrossRef PubMed Web of Science
[42].↵
P. Eastman. PDBFixer, 2019.
[43].↵
Christopher J. Williams, Jeffrey J. Headd, Nigel W. Moriarty, Michael G. Prisant, Lizbeth L. Videau, Lindsay N. Deis, Vishal Verma, Daniel A. Keedy, Bradley J. Hintze, Vincent B. Chen, Swati Jain, Steven M. Lewis, W. Bryan Arendall, Jack Snoeyink, Paul D. Adams, Simon C. Lovell, Jane S. Richardson, and David C. Richardson. MolProbity: More and better reference data for improved all-atom structure validation. Protein Science, 27(1):293–315, jan 2018.
OpenUrl CrossRef PubMed
[44].↵
Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open Babel: An open chemical toolbox. Journal of cheminformatics, 3(1):33, jan 2011.
OpenUrl
[45].↵
Apoorv Agnihotri and Nipun Batra. Exploring bayesian optimization. Distill, 5(5):e26, 2020.
OpenUrl
[46].↵
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
OpenUrl
[47].↵
Chi Zhang, Song Liu, Qianqian Zhu, and Yaoqi Zhou. A knowledge-based energy function for proteinligand, protein-protein, and protein-dna complexes. Journal of medicinal chemistry, 48(7):2325–2335, 2005.
OpenUrl CrossRef PubMed Web of Science
[48].↵
Anthony K Rappíe, Carla J Casewit, KS Colwell, William A Goddard III, and W Mason Skiff. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 114(25):10024–10035, 1992.
OpenUrl CrossRef Web of Science
[49].↵
J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313, 01 1965.
OpenUrl CrossRef Web of Science
[50].↵
Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer school on machine learning, pages 63–71. Springer, 2003.
[51].↵
Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, and Bharath K Sriperumbudur. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582, 2018.
[52].↵
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
OpenUrl
[53].↵
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1f12.6980, 2014.
[54].↵
Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33, 2020.
[55].↵
Jerome Eberhardt, Diogo Santos-Martins, Andreas F Tillack, and Stefano Forli. Autodock vina 1.2. 0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021.
OpenUrl CrossRef
[56].↵
Scott Kirkpatrick, C Daniel Gelatt Jr., and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
OpenUrl Abstract/FREE Full Text
[57].↵
Michael Nilges. A calculation strategy for the structure determination of symmetric demers by 1h nmr. Proteins: Structure, Function, and Bioinformatics, 17(3):297–309, 1993.
OpenUrl
[58].↵
James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of chemical theory and computation, 11(8):3696–713, aug 2015.
OpenUrl
[59].↵
Junmei Wang, Romain M. Wolf, James W. Caldwell, Peter A. Kollman, and David A. Case. Development and testing of a general amber force field. Journal of Computational Chemistry, 25(9):1157–74, jul 2004.
OpenUrl CrossRef PubMed Web of Science
[60].↵
Araz Jakalian, David B. Jack, and Christopher I. Bayly. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. Journal of Computational Chemistry, 23(16):1623–1641, oct 2002.
OpenUrl CrossRef PubMed Web of Science
[61].↵
D.A. Case, H.M. Aktulga, K. Belfon, I.Y. Ben-Shalom, S.R. Brozell, D.S. Cerutti, III T.E. Cheatham, G.A. Cisneros, V.W.D. Cruzeiro, T.A. Darden, R.E. Duke, G. Giambasu, M.K. Gilson, H. Gohlke, A.W. Goetz, R. Harris, S. Izadi, S.A. Izmailov, C. Jin, and K. Ka. Amber, 2021.
[62].↵
Peter Eastman, Jason Swails, John D. Chodera, Robert T. McGibbon, Yutong Zhao, Kyle A. Beauchamp, Lee-Ping Wang, Andrew C. Simmonett, Matthew P. Harrigan, Chaya D. Stern, Rafal P. Wiewiora, Bernard R. Brooks, and Vijay S. Pande. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS’ Computational Biology, 13(7):e1005659, jul 2017.
OpenUrl
[63].↵
Hai Nguyen, Daniel R. Roe, and Carlos Simmerling. Improved generalized born solvent model parameters for protein simulations. Journal of Chemical Theory and Computation, 9(4):2020–2034, apr 2013.
OpenUrl
[64].↵
Vilhelm Ekberg and Ulf Ryde. On the Use of Interaction Entropy and Related Methods to Estimate Binding Entropies. Journal of Chemical Theory and Computation, 17(8):5379–5391, 2021.
OpenUrl
[65].↵
Rodrigues J.P.G.L.M, Trellet M., Schmitz C., Kastritis P., Karaca E., Melquiond A.S.J., and Bonvin A.M.J.J. Clustering biomolecular complexes by residue contacts similarity. Proteins: Structure, Function, and Bioinformatics, 80(7):1810–1817, 2012.
OpenUrl
[66].↵
Sankar Basu and Björn Wallner. Dockq: a quality measure for protein-protein docking models. PloS one, 11(8):e0161879, 2016.
OpenUrl CrossRef
[67].↵
Marc F. Lensink, Nurul Nadzirin, Sameer Velankar, and Shoshana J. Wodak. Modeling protein-protein, protein-peptide, and protein-oligosaccharide complexes: Capri 7th edition. Proteins: Structure, Function, and Bioinformatics, 88(8):916–938, 2020.
OpenUrl

View the discussion thread.

Posted June 05, 2022.

Download PDF

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5197)
Biochemistry (11699)
Bioengineering (8715)
Bioinformatics (29119)
Biophysics (14927)
Cancer Biology (12047)
Cell Biology (17347)
Clinical Trials (138)
Developmental Biology (9405)
Ecology (14138)
Epidemiology (2067)
Evolutionary Biology (18261)
Genetics (12216)
Genomics (16760)
Immunology (11839)
Microbiology (27996)
Molecular Biology (11549)
Neuroscience (60781)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3228)
Physiology (4937)
Plant Biology (10382)
Scientific Communication and Education (1679)
Synthetic Biology (2876)
Systems Biology (7332)
Zoology (1642)

[1] [1].↵
Matthieu Schapira, Matthew F Calabrese, Alex N Bullock, and Craig M Crews. Targeted protein degradation: expanding the toolbox. Nature reviews Drug discovery, 18(12):949–963, 2019.
OpenUrl CrossRef

[2] [2].↵
Nan Bai, Kristin M Riching, Aman Makaju, Hao Wu, Timothy M Acker, Shu-Ching Ou, Yaru Zhang, Xiaomeng Shen, Daryl Bulloch, Huan Rui, et al. Modeling the crl4a ligase complex to predict target protein ubiquitination induced by cereblon-recruiting protacs. Journal of Biological Chemistry, page 101653, 2022.

[3] [3].↵
Stuart L Schreiber. The rise of molecular glues. Cell, 184(1):3–9, 2021.
OpenUrl

[4] [4].↵
Guoqiang Dong, Yu Ding, Shipeng He, and Chunquan Sheng. Molecular glues for targeted protein degradation: from serendipity to rational discovery. Journal of Medicinal Chemistry, 64(15):10606–10620, 2021.
OpenUrl

[5] [5].↵
Taavi K Neklesa, James D Winkler, and Craig M Crews. Targeted protein degradation by protacs. Pharmacology & therapeutics, 174:138–144, 2017.
OpenUrl

[6] [6].↵
Mariell Pettersson and Craig M Crews. Proteolysis targeting chimeras (protacs)—past, present and future. Drug Discovery Today: Technologies, 31:15–27, 2019.
OpenUrl

[7] [7].↵
Alessio Ciulli and Nicole Trainor. A beginner’s guide to protacs and targeted protein degradation. The Biochemist, 43(5):74–79, 2021.
OpenUrl

[8] [8].↵
Eric S Fischer, Kerstin Böhm, John R Lydeard, Haidi Yang, Michael B Stadler, Simone Cavadini, Jane Nagel, Fabrizio Serluca, Vincent Acker, Gondichatnahalli M Lingaraju, et al. Structure of the ddb1-crbn e3 ubiquitin ligase in complex with thalidomide. Nature, 512(7512):49–53, 2014.
OpenUrl CrossRef PubMed Web of Science

[9] [9].↵
Emma C Fink and Benjamin L Ebert. The novel mechanism of lenalidomide activity. Blood, The Journal of the American Society of Hematology, 126(21):2366–2369, 2015.
OpenUrl

[10] [10].↵
Shaobing Gao, Shichao Wang, and Yongping Song. Novel immunomodulatory drugs and neo-substrates. Biomarker Research, 8(1):1–8, 2020.
OpenUrl

[11] [11].↵
Dennis L Buckley, Inge Van Molle, Peter C Gareiss, Hyun Seop Tae, Julien Michel, Devin J Noblin, William L Jorgensen, Alessio Ciulli, and Craig M Crews. Targeting the von hippel-lindau e3 ubiquitin ligase using small molecules to disrupt the vhl/hif-1α interaction. Journal of the American Chemical Society, 134(10):4465–4468, 2012.
OpenUrl CrossRef PubMed

[12] [12].↵
Carles Galdeano, Morgan S Gadd, Pedro Soares, Salvatore Scaffidi, Inge Van Molle, Ipek Birced, Sarah Hewitt, David M Dias, and Alessio Ciulli. Structure-guided design and optimization of small molecules targeting the protein-protein interaction between the von hippel-lindau (vhl) e3 ubiquitin ligase and the hypoxia inducible factor (hif) alpha subunit with in vitro nanomolar affinities. Journal of Medicinal Chemistry, 57(20):8657–8663, 2014.
OpenUrl CrossRef PubMed

[13] [13].↵
James Schiemer, Reto Horst, Yilin Meng, Justin I Montgomery, Yingrong Xu, Xidong Feng, Kris Borzilleri, Daniel P Uccello, Carolyn Leverett, Stephen Brown, et al. Snapshots and ensembles of btk and ciap1 protein degrader ternary complexes. Nature Chemical Biology, 17(2):152–160, 2021.
OpenUrl

[14] [14].↵
John Hines, Schan Lartigue, Hanqing Dong, Yimin Qian, and Craig M Crews. Mdm2-recruiting protac offers superior, synergistic antiproliferative activity via simultaneous degradation of brd4 and stabilization of p53. Cancer research, 79(1):251–262, 2019.
OpenUrl Abstract/FREE Full Text

[15] [15].↵
Jieli Wei, Fanye Meng, Kwang-Su Park, Hyerin Yim, Julia Velez, Prashasti Kumar, Li Wang, Ling Xie, He Chen, Yudao Shen, et al. Harnessing the e3 ligase keap1 for targeted protein degradation. Journal of the American Chemical Society, 143(37):15073–15083, 2021.
OpenUrl

[16] [16].↵
Daniel Peter Petrylak, Xin Gao, Nicholas J Vogelzang, Mary Harlow Garfield, Ian Taylor, Marcia Dougan Moore, Ronald Alan Peck, and Howard A Burris III. First-in-human phase i study of arv-110, an androgen receptor (ar) protac degrader in patients (pts) with metastatic castrate-resistant prostate cancer (mcrpc) following enzalutamide (enz) and/or abiraterone (abi)., 2020.

[17] [17].↵
Taavi Neklesa, Lawrence B Snyder, Ryan R Willard, Nicholas Vitale, Jennifer Pizzano, Deborah A Gordon, Mark Bookbinder, Jennifer Macaluso, Hanqing Dong, Caterina Ferraro, et al. Arv-110: An oral androgen receptor protac degrader for prostate cancer. J Clin Oncol, 37(7):14–16, 2019.
OpenUrl

[18] [18].↵
JJ Flanagan, Y Qian, SM Gough, M Andreoli, M Bookbinder, G Cadelina, J Bradley, E Rousseau, R Willard, J Pizzano, et al. Abstract p5-04-18: Arv-471, an oral estrogen receptor protac degrader for breast cancer, 2019.

[19] [19].↵
Yonghan He, Raphael Koch, Vivekananda Budamagunta, Peiyi Zhang, Xuan Zhang, Sajid Khan, Dinesh Thummuri, Yuma T Ortiz, Xin Zhang, Dongwen Lv, et al. Dt2216—a bcl-xl-specific degrader is highly active against bcl-xl-dependent t cell lymphomas. Journal of hematology & oncology, 13(1):1–13, 2020.
OpenUrl

[20] [20].↵
Joshua D Hansen, Matthew Correa, Mark A Nagy, Matt Alexander, Veronique Plantevin, Virginia Grant, Brandon Whitefield, Dehua Huang, Timothy Kercher, Roy Harris, et al. Discovery of crbn e3 ligase modulator cc-92480 for the treatment of relapsed and refractory multiple myeloma. Journal of medicinal chemistry, 63(13):6648–6676, 2020.
OpenUrl

[21] [21].↵
Bing Zhou, Jiantao Hu, Fuming Xu, Zhuo Chen, Longchuan Bai, Ester Fernandez-Salas, Mei Lin, Liu Liu, Chao-Yie Yang, Yujun Zhao, et al. Discovery of a small-molecule degrader of bromodomain and extra-terminal (bet) proteins with picomolar cellular potencies and capable of achieving tumor regression. Journal of medicinal chemistry, 61(2):462–481, 2018.
OpenUrl CrossRef PubMed

[22] [22].↵
Adelajda Zorba, Chuong Nguyen, Yingrong Xu, Jeremy Starr, Kris Borzilleri, James Smith, Hongyao Zhu, Kathleen A Farley, WeiDong Ding, James Schiemer, et al. Delineating the role of cooperativity in the design of potent protacs for btk. Proceedings of the National Academy of Sciences, 115(31):E7285–E7292, 2018.
OpenUrl Abstract/FREE Full Text

[23] [23].↵
Dennis Dobrovolsky, Eric S Wang, Sara Morrow, Catharine Leahy, Tyler Faust, Radoslaw P Nowak, Katherine A Donovan, Guang Yang, Zhengnian Li, Eric S Fischer, et al. Bruton tyrosine kinase degradation as a therapeutic strategy for cancer. Blood, The Journal of the American Society of Hematology, 133(9):952–961, 2019.
OpenUrl

[24] [24].↵
Alina Mares, Afjal H Miah, Ian ED Smith, Mark Rackham, Aditya R Thawani, Jenni Cryan, Pamela A Haile, Bartholomew J Votta, Allison M Beal, Carol Capriotti, et al. Extended pharmacodynamic responses observed upon protac-mediated degradation of ripk2. Communications biology, 3(1):1–13, 2020.
OpenUrl

[25] [25].↵
Christiane Kofink, Nicole Trainor, Barbara Mair, Simon Wöhrle, Melanie Wurm, Nikolai Mischerikow, Gerd Bader, Klaus Rumpel, Thomas Gerstberger, Yunhai Cui, and et al. A selective and orally bioavailable vhl-recruiting protac achieves smarca2 degradation in vivo. ChemRxiv, 2022.

[26] [26].↵
Scott J Hughes and Alessio Ciulli. Molecular recognition of ternary complexes: a new dimension in the structure-guided design of chemical degraders. Essays in biochemistry, 61(5):505–516, 2017.
OpenUrl Abstract/FREE Full Text

[27] [27].↵
Troy A Bemis, James J La Clair, and Michael D Burkart. Unraveling the role of linker design in proteolysis targeting chimeras: Miniperspective. Journal of Medicinal Chemistry, 64(12):8042–8052, 2021.
OpenUrl

[28] [28].↵
Morgan S Gadd, Andrea Testa, Xavier Lucas, Kwok-Ho Chan, Wenzhang Chen, Douglas J Lamont, Michael Zengerle, and Alessio Ciulli. Structural basis of protac cooperative recognition for selective protein degradation. Nature chemical biology, 13(5):514–521, 2017.
OpenUrl

[29] [29].↵
Robert Ian Troup, Charlene Fallan, and Matthias GJ Baud. Current strategies for the design of protac linkers: a critical review. Exploration of Targeted Anti-tumor Therapy, 1:273–312, 2020.
OpenUrl

[30] [30].↵
Michael L Drummond and Christopher I Williams. In silico modeling of protac-mediated ternary complexes: validation and application. Journal of chemical information and modeling, 59(4):1634–1644, 2019.
OpenUrl

[31] [31].↵
Michael L Drummond, Andrew Henry, Huifang Li, and Christopher I Williams. Improved accuracy for modeling protac-mediated ternary complex formation and targeted protein degradation via new in silico methodologies. Journal of Chemical Information and Modeling, 60(10):5234–5254, 2020.
OpenUrl

[32] [32].↵
Daniel Zaidman, Jaime Prilusky, and Nir London. Prosettac: Rosetta based modeling of protac mediated ternary complexes. Journal of chemical information and modeling, 60(10):4894–4903, 2020.
OpenUrl

[33] [33].↵
Gaoqi Weng, Dan Li, Yu Kang, and Tingjun Hou. Integrative modeling of protac-mediated ternary complexes. Journal of Medicinal Chemistry, 64(21):16271–16281, 2021.
OpenUrl

[34] [34].↵
Nicholas A Marze, Shourya S Roy Burman, William Sheffler, and Jeffrey J Gray. Efficient flexible backbone protein—protein docking for challenging targets. Bioinformatics, 34(20):3461–3469, 2018.
OpenUrl CrossRef PubMed

[35] [35].↵
Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.

[36] [36].↵
Cyril Dominguez, Rolf Boelens, and Alexandre MJJ Bonvin. Haddock: a protein-protein docking approach based on biochemical or biophysical information. Journal of the American Chemical S’ociety, 125(7):1731–1737, 2003.
OpenUrl CrossRef PubMed Web of Science

[37] [37].↵
Helen M. Berman. The Protein Data Bank. Nucleic Acids Research, 28(1):235–242, jan 2000.
OpenUrl CrossRef PubMed Web of Science

[38] [38].↵
Andrew T. McNutt, Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. GNINA 1.0: molecular docking with deep learning. Journal of Cheminformatics, 13(1):1–20, 2021.
OpenUrl

[39] [39].↵
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, aug 2021.
OpenUrl CrossRef PubMed

[40] [40].↵
Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yor-danova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, and Sameer Velankar. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1):D439–D444, jan 2022.
OpenUrl CrossRef

[41] [41].↵
Eric F. Pettersen, Thomas D. Goddard, Conrad C. Huang, Gregory S. Couch, Daniel M. Greenblatt, Elaine C. Meng, and Thomas E. Ferrin. UCSF Chimera - A visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25(13):1605–1612, oct 2004.
OpenUrl CrossRef PubMed Web of Science

[42] [42].↵
P. Eastman. PDBFixer, 2019.

[43] [43].↵
Christopher J. Williams, Jeffrey J. Headd, Nigel W. Moriarty, Michael G. Prisant, Lizbeth L. Videau, Lindsay N. Deis, Vishal Verma, Daniel A. Keedy, Bradley J. Hintze, Vincent B. Chen, Swati Jain, Steven M. Lewis, W. Bryan Arendall, Jack Snoeyink, Paul D. Adams, Simon C. Lovell, Jane S. Richardson, and David C. Richardson. MolProbity: More and better reference data for improved all-atom structure validation. Protein Science, 27(1):293–315, jan 2018.
OpenUrl CrossRef PubMed

[44] [44].↵
Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open Babel: An open chemical toolbox. Journal of cheminformatics, 3(1):33, jan 2011.
OpenUrl

[45] [45].↵
Apoorv Agnihotri and Nipun Batra. Exploring bayesian optimization. Distill, 5(5):e26, 2020.
OpenUrl

[46] [46].↵
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
OpenUrl

[47] [47].↵
Chi Zhang, Song Liu, Qianqian Zhu, and Yaoqi Zhou. A knowledge-based energy function for proteinligand, protein-protein, and protein-dna complexes. Journal of medicinal chemistry, 48(7):2325–2335, 2005.
OpenUrl CrossRef PubMed Web of Science

[48] [48].↵
Anthony K Rappíe, Carla J Casewit, KS Colwell, William A Goddard III, and W Mason Skiff. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 114(25):10024–10035, 1992.
OpenUrl CrossRef Web of Science

[49] [49].↵
J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313, 01 1965.
OpenUrl CrossRef Web of Science

[50] [50].↵
Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer school on machine learning, pages 63–71. Springer, 2003.

[51] [51].↵
Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, and Bharath K Sriperumbudur. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582, 2018.

[52] [52].↵
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
OpenUrl

[53] [53].↵
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1f12.6980, 2014.

[54] [54].↵
Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33, 2020.

[55] [55].↵
Jerome Eberhardt, Diogo Santos-Martins, Andreas F Tillack, and Stefano Forli. Autodock vina 1.2. 0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021.
OpenUrl CrossRef

[56] [56].↵
Scott Kirkpatrick, C Daniel Gelatt Jr., and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
OpenUrl Abstract/FREE Full Text

[57] [57].↵
Michael Nilges. A calculation strategy for the structure determination of symmetric demers by 1h nmr. Proteins: Structure, Function, and Bioinformatics, 17(3):297–309, 1993.
OpenUrl

[58] [58].↵
James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of chemical theory and computation, 11(8):3696–713, aug 2015.
OpenUrl

[59] [59].↵
Junmei Wang, Romain M. Wolf, James W. Caldwell, Peter A. Kollman, and David A. Case. Development and testing of a general amber force field. Journal of Computational Chemistry, 25(9):1157–74, jul 2004.
OpenUrl CrossRef PubMed Web of Science

[60] [60].↵
Araz Jakalian, David B. Jack, and Christopher I. Bayly. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. Journal of Computational Chemistry, 23(16):1623–1641, oct 2002.
OpenUrl CrossRef PubMed Web of Science

[61] [61].↵
D.A. Case, H.M. Aktulga, K. Belfon, I.Y. Ben-Shalom, S.R. Brozell, D.S. Cerutti, III T.E. Cheatham, G.A. Cisneros, V.W.D. Cruzeiro, T.A. Darden, R.E. Duke, G. Giambasu, M.K. Gilson, H. Gohlke, A.W. Goetz, R. Harris, S. Izadi, S.A. Izmailov, C. Jin, and K. Ka. Amber, 2021.

[62] [62].↵
Peter Eastman, Jason Swails, John D. Chodera, Robert T. McGibbon, Yutong Zhao, Kyle A. Beauchamp, Lee-Ping Wang, Andrew C. Simmonett, Matthew P. Harrigan, Chaya D. Stern, Rafal P. Wiewiora, Bernard R. Brooks, and Vijay S. Pande. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS’ Computational Biology, 13(7):e1005659, jul 2017.
OpenUrl

[63] [63].↵
Hai Nguyen, Daniel R. Roe, and Carlos Simmerling. Improved generalized born solvent model parameters for protein simulations. Journal of Chemical Theory and Computation, 9(4):2020–2034, apr 2013.
OpenUrl

[64] [64].↵
Vilhelm Ekberg and Ulf Ryde. On the Use of Interaction Entropy and Related Methods to Estimate Binding Entropies. Journal of Chemical Theory and Computation, 17(8):5379–5391, 2021.
OpenUrl

[65] [65].↵
Rodrigues J.P.G.L.M, Trellet M., Schmitz C., Kastritis P., Karaca E., Melquiond A.S.J., and Bonvin A.M.J.J. Clustering biomolecular complexes by residue contacts similarity. Proteins: Structure, Function, and Bioinformatics, 80(7):1810–1817, 2012.
OpenUrl

[66] [66].↵
Sankar Basu and Björn Wallner. Dockq: a quality measure for protein-protein docking models. PloS one, 11(8):e0161879, 2016.
OpenUrl CrossRef

[67] [67].↵
Marc F. Lensink, Nurul Nadzirin, Sameer Velankar, and Shoshana J. Wodak. Modeling protein-protein, protein-peptide, and protein-oligosaccharide complexes: Capri 7th edition. Proteins: Structure, Function, and Bioinformatics, 88(8):916–938, 2020.
OpenUrl

Bayesian Optimization for Ternary Complex Prediction (BOTCP)

Abstract

1 Introduction

2 Methods

2.1 Initialization

2.2 Bayesian optimization

2.2.1 Search space

2.2.2 Objective evaluation

PPI fitness

PROTAC fitness

Local fitness ascent

2.2.3 Surrogate model

2.2.4 Acquisition strategy

2.3 Local optimization with simulated annealing of the PROTAC stability score

PROTAC stability score

Simulated annealing

2.4 Clustering and filtering using TCP-AIR energy

Clustering

Filtering using TCP-AIR energy

Re-rank clustering

2.5 Refinement

2.6 Re-clustering and Re-ranking

2.7 Fitness landscape analysis

3 Results

3.1 Evaluation of fitness functions

3.2 Evaluation of the BO loop

3.3 Evaluation metric

3.4 Prediction of near-native poses

3.5 Comparison to previous work

4 Conclusion and outlook

Author Contributions

References

Citation Manager Formats

Subject Area