Druggability Assessment in TRAPP using Machine Learning Approaches

Accurate protein druggability predictions are important for the selection of drug targets in the early stages of drug discovery. Due to the flexible nature of proteins, the druggability of a binding pocket may vary due to conformational changes. We have therefore developed two statistical models, a logistic regression model (TRAPP-LR) and a convolutional neural network model (TRAPP-CNN), for predicting druggability and how it varies with changes in the spatial and physicochemical properties of a binding pocket. These models are integrated into TRAPP (TRAnsient Pockets in Proteins), a tool for the analysis of binding pocket variations along a protein motion trajectory. The models, which were trained on publicly available and self-augmented data sets, show equivalent or superior performance to existing methods on test sets of protein crystal structures, and have sufficient sensitivity to identify potentially druggable protein conformations in trajectories from molecular dynamics simulations. Visualization of the evidence for the decisions of the models in TRAPP facilitates identification of the factors affecting the druggability of protein binding pockets.


Introduction
Drug development is a very costly and time-consuming process, with only 7-11% of compounds in phase I testing finally becoming approved drugs. 1 A thorough drug target validation in the earlier stages of drug discovery projects could reduce the waste of resources and improve the success rate. Thus, the druggability of a protein, 2-4 which describes the protein's potential to accommodate low molecular weight drug molecules that modulate its biological function, should be evaluated during target validation. Low molecular weight compounds may show a therapeutic effect when they bind with high affinity to a druggable disease-related protein at a specific position, known as the binding pocket or binding cavity.
Information on protein druggability is particularly useful for prioritizing therapeutic targets, and for guiding the design of new drugs to avoid side-effects or unwanted polypharmacology.
The concept of druggability is used to described a biological target that is disease-related and can be modulated by commercially viable compounds that are usually orally available and are known as drug-like molecules. 5,6 An empirical rule for evaluating the druglikeness of a given compound is Lipinski's rule-of-five (RO5), 7 which was derived based on the physicochemical properties of 90% of orally active drugs that achieved Phase II status. Thus, a more precise and practical definition of protein druggability can be formulated as the protein's ability to bind drug-like ligands with high affinity. Most experimental approaches for assessing druggability have involved high-throughput or NMR-based screening using libraries of molecules that conform to lead-like characteristics. 3,8 However, the high cost of instrumentation and the uncertainty of negative assessments due to potentially inadequate compound collections restrict the scope of experimental approaches. Therefore, computational procedures, typically based on virtual screening or machine learning, 9 have been developed to provide a more generally applicable approach.
Here, we focus on protein structure-based druggability prediction using machine learning approaches to take advantage of the large amount of structural data on proteins. Existent statistical models for druggability prediction typically have 3 essential elements: an informative featurization of protein binding pockets, a druggability dataset, and a machine learning approach. The featurization usually involves two steps. First, a cavity or crevice in the protein is defined using a pocket estimation method. Then, the features are derived for the specific cavity. A wide range of features or descriptors can be used to characterize a pocket, including geometric information such as pocket volume or surface area, and physicochemical properties like hydrophobicity or polarity. The features that enable better discrimination between druggable and less-druggable pockets are then used as the input for the druggability prediction model. Based on the knowledge contained in the druggability dataset, the model learns to capture the relation between input features and output druggability using machine learning approaches. In order to compare the performance of different methods, consensus datasets, such as the non-redundant druggable and less druggable (NRDLD) dataset, 10 have been developed.
One of the earliest methods, developed by Hajduk et al., 3 predicts experimental screening hit rate, as a druggability measure, with a linear regression model based on descriptors of the polar/apolar surface area, the surface complexity, and the pocket dimensions. For DrugPred, 10 partial least-squares projection to latent structures discriminant analysis (PLS-DA) was used to select 5 out of 22 descriptors that show significant contributions to the model decision with positive contributions to druggability from hydrophobicity and proteinligand contact surface and a negative contribution from polar surface area. Volsite 6 encodes both the shape and the pharmacophoric properties of a binding cavity on regularly spaced grid points and derives 73 global and local descriptors from the grids, using a support vector machine algorithm (SVM) with a nonlinear kernel to train the druggability prediction model.
The non-linearity of the SVM means that the contribution of each feature to the model cannot be obtained directly. Another grid-based method is DoGSiteScorer, 11,12 which maps the protein of interest onto a grid and estimates pockets and subpockets using a Difference of Gaussian filter. 13 Based on the pocket-forming grid points and the pocket-lining residues, global and local descriptors are computed. From 17 global descriptors three descriptors namely, pocket depth, number of apolar amino acid residues and the pocket volume, are used to build a SVM model with a Gaussian kernel. In addition, local pocket properties are described by distance dependent histograms between atom pairs, e.g. hydrophilic-hydrophilic and lipophilic-lipophilic atom pairs, which allows druggability assessment through a nearest neighbor search. The local descriptors reveal that druggable pockets tend to have fewer shortrange hydrophilic interaction pairs and more short-range lipophilic pairs. The combination of global and local predictors increases the reliability of the model. Subsequently, as the druggability prediction of a single target varies when applying different pocket estimation methods, PockDrug 14 was developed to overcome these uncertainties by optimizing the model with different pocket estimation methods. It performs druggability prediction based on the consensus of seven linear discriminant analysis (LDA) models with complementary input features. In general, the properties involved describe the geometry, hydrophobicity and aromaticity of the pockets, as for most other druggability prediction models. 15,16 Most druggability prediction methods have been developed to use the static structures of proteins determined by experimental methods, such as X-ray crystallography, NMR, or cryo-electron microscopy. However, proteins are dynamic and possess an inherent flexibility 17 which alters the shape and properties of their binding pockets. [18][19][20] Therefore, it is valuable to explore the different conformations of a binding pocket by molecular simulation and the corresponding variations in druggability. Some methods that combine pocket druggability prediction and molecular dynamics simulation have been developed, such as MDpocket 21 and JEDI. 22 To compute the druggability score of a complete trajectory efficiently, these methods use only a few descriptors, such as volume and hydrophobicity, and therefore might not be sensitive enough to capture the variations in druggability due to subtle conformational changes. Therefore, based on the assumption that a change in the conformation of a binding site leads to a change in its druggability, our aim in this work was to build druggability assessment models that are able to trace the fluctuation of druggability as a result of conformational changes. TRAPP is a tool that allows the exploration of different protein conformations, the analysis of binding pocket flexibility and dynamics, and the extraction of spatial and physicochemical information on the binding pocket conformations. The goal of this study was to make use of the information provided by TRAPP for assessing the druggability of a binding pocket as its shape changes using machine-learning based approaches.
Global descriptors of the pocket were generated as the input for linear models using logistic regression (LR) and a support vector machine (SVM), whereas a grid representation of the pocket provided input to a convolutional neural network (CNN) for druggability prediction.
Visualization tools were developed to elucidate the evidence for the decisions of the models.
The performance of the models was evaluated and compared with other methods to predict druggability and they were found to perform equally well or better.

Data sets
Two data sets of experimentally determined protein structures were used to train and validate the druggability assessment models. In addition, to illustrate the capabilities of our models, MD trajectories of pocket motions were generated.

NRDLD dataset
The Non-Redundant Druggable and Less Druggable (NRDLD) dataset is the largest publically accessible druggability dataset available to date. 10 It contains 113 instances of proteins with high resolution (less than 2.6 Angstroms ) crystal structures of which 71 are labeled as druggable and 42 are labeled as less-druggable. A protein is defined as druggable if it is able to noncovalently bind small drug-like ligands that are orally available and do not require administration as prodrugs, whereas a protein is less-druggable if none of the ligands with which it was co-crystallized simultaneously met the following three requirements: (1) satisfied Lipinski's rule of five for orally available drugs; 7 (2) had clogP ≥ −2; 23 and (3) The ligand efficiency was ≥ 0.3 kcal mol −1 /heavy atom. 24 The redundant instances were removed by choosing a single crystal structure to represent protein targets sharing a pairwise sequence identity of greater than 60%. The NRDLD data set has a pre-defined split, with a training set with 76 instances and a test set with 37 instances. Since the performance of the latest druggability prediction tools was mainly assessed using the NRDLD test set, it allows comparison of the performance of our method with other methods.

DaPB dataset
A larger dataset, the Druggability augmented from PDBbind (DaPB) dataset, was constructed by filtering the 2017 release of the PDBbind refined set. 25 The PDBbind database collects the experimentally measured binding affinity data for the biomolecular complexes in the Protein Data Bank (PDB), and the PDBbind refined set is a subset containing 4154 protein-ligand complexes with better quality with regard to binding data, crystal structures and the nature of the complexes.
In this paper, we use the term bindability 15,26 to describe pockets that bind drug-like ligands with submicromolar affinity and refer to the positive and negative instances in the DaPB dataset as bindable and less-bindable pockets, in order to distinguish the concepts of druggability and bindability in the NRDLD and DaPB datasets. Although it was stated that borderline druggable proteins are bindable, 27 the distinction between bindability and druggability is rather obscure and the two terms are sometimes used synonymously. 5,6 Thus, the DaPB dataset used in this work was built through filtering or selection of data on the basis of bindability. To collect positive complexes, we extracted the properties of the ligands using Pybel, a Python wrapper for the OpenBabel cheminformatics toolkit, 28 and then selected the bindable pockets if they bind ligands that satisfy drug-like properties as defined in Sheridan et al. 15 with a binding affinity stronger than 1 µM (Table S1). For the negative instances in the DaPB dataset, instead of introducing a set of rules for filtering out the negative (lessdruggable) data, we directly applied one of the models trained on the NRDLD training set, TRAPP-SVM (see below), on the remaining unlabeled data and selected the negative data according to its prediction. In summary, the DaPB dataset used here contained 892 positive instances and 1180 negative instances. One fifth of the DaPB dataset was used as the test set and contained 190 positive instances and 224 negative instances.

Input data generation
Two types of input format were generated during the TRAPP-pocket procedure as explained in more detail in Ref. 29.

Preprocessing of the PDB files
Before interpolating the binding pocket on to a grid and computing descriptors, the input PDB file of the protein was preprocessed in two steps. First, redundant small molecules such as solvent, buffer, or some peptide linking groups were removed if they were not in the list of retained co-factors (Table S2). The protein residues and retained co-factors were then protonated at pH 7.0 using Pybel. 28 Multi-channel grid 8 different 3-dimensional grids, each corresponding to a different channel as shown in Table   1, were generated for each pocket. Each channel describes a specific property of the pocket ( Figure 1), resembling the idea of a color channel in an RGB image. The size of the grid box can be defined by either the grid edge length, e.g. 24Å, or the extent around the ligand.
For druggability assessment, only a fixed grid edge length was used, due to the limitation of a fixed input dimension for the CNN model. In this 4-dimensional grid representation of the pocket, the first channel describes the shape and position of the cavity, which is computed using the cavity distribution function G(r i , p) as described in Ref. 29 with a new pocket selection algorithm summarized in the Supplementary Information. For a cavity grid in which a single grid point is denoted as r i and the protein as p , an atom-occupied position has the grid value G(r i , p) = 0 and a pocket position has the grid value 0 < G(r i , p) ≤ 1. The remaining 7 channels describe the physicochemical properties of the cavity and were assigned in two steps. First, the atoms in the grid box were assigned to certain physicochemical properties if they satisfied the criteria given in Table 1. The positively and negatively charged atoms were assigned based on the atom and residue information, whereas the atoms with other properties were specified using Pybel 28 definitions. Second, Gaussian distribution functions G prop (r i , p) were spanned over those atoms with certain properties in order to map those atomic features to the grid.

Global descriptors
After the grid computations, each channel was compiled into a single value or a few values describing the global pocket properties. In total, ten global descriptors were extracted from the grids in the TRAPP-pocket procedure. The list of the ten global descriptors and their definitions are shown in Table S4. The pocket volume, protein-exposed and solvent-exposed surface area were computed from the cavity grid. The pocket volume is defined as the summation of the grid points that have G(r i , p) > 0 and then multiplied by the volume of a single grid element. The protein-exposed and solvent-exposed surface areas were computed by counting the number of transitions from a cavity grid point (G(r i , p) > 0) to the protein grid point (G(r i , p) = 0) or solvent grid point (G(r i , p) = −1), respectively, and then multiplying by the square of the grid spacing. The ratio of solvent-exposed surface area to protein-exposed surface area is defined as the pocket exposure, which is an approximate measure of the pocket shape and is used as one of the global descriptors for druggability prediction.
The other physicochemical property descriptors were computed based on the overlap between the cavity grid and the corresponding property grid. If G(r i , p) > 0, the property is incremented by G prop (r i , p). The final value representing each property was then multiplied by the volume of a single grid element in order to make it comparable to the other descriptors that are derived with a different grid setting.

Statistical Protocol Linear models based on global descriptors
The global pocket descriptors were taken as the inputs for the druggability assessment models using logistic regression and linear SVM, as implemented in scikit-learn. 30 The models, whose number of parameters is the number of input features plus one for the bias term, were trained on the small NRDLD training set with L1 or L2 weight regularization. The pipeline, which consisted of input generation, data standardization, and model prediction, was constructed and optimized as a whole. The hyperparameters, e.g. grid size and grid spacing for grid generation, as well as the parameter C, which controls the regularization strengths, were optimized by a 5-fold cross-validated grid search (Table S5). The splitting of the validation set was performed so as to ensure that each fold had similar numbers of the instances with the two labels.

Convolutional neural network based on a multi-channel grid
To capture the relationships between the druggability and the spatial/physicochemical properties of the binding pockets, the multi-channel grid, which serves as a detailed description of the binding site, was used as the input for druggability prediction with a convolutional neural network. The network architecture used for druggability prediction was inspired by

Ref. 31, which had 3 3 × 3 × 3 convolutional layers with rectified linear activation units
alternating with max pooling layers followed by a fully connected layer with two outputs and a softmax layer to obtain a probability distribution over two classes.
The DaPB dataset was used for training and hyperparameter searching of the neural network. To combat overfitting, data augmentation through 90 • rotation and random translation of 2 voxels in x, y, z coordinates for each training instance was performed. The network architecture and other hyperparameters, such as the learning rate and the weight decay for regularization, were optimized using a 3-fold cross-validated grid search. In total, 60 different network architectures with varying numbers of convolutional layers, numbers of convolutional filters per layer, and numbers of neurons in the fully-connected layer were compared based on their cross-validated F1 score. Finally, the optimized network structure was trained using the Adam optimizer with default parameters for momentum scheduling (β 1 = 0.99, β 2 = 0.999), a learning rate of 0.001 and a weight decay of 0.0001.

Model evaluation
The quality of the models was assessed using accuracy (Eq. 1), sensitivity, specificity (Eq. 2), and Matthew's correlation coefficient (MCC, Eq. 3). These metrics can be derived based on the four elements in the confusion matrix during binary classification, that is, the numbers of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).
The sensitivity and specificity indicate the ability to identify druggable and less-druggable pockets, respectively. The MCC, which takes into account all four values, is a balanced metric for an uneven class distribution and an ideal measure for assessing the quality of a binary classifier. Since the aim is to achieve a good balance between precision and recall, the harmonic mean of precision and recall, also known as the F 1 score or F-measure, is used in the cross validated grid search for the optimal hyperparameters (Eq. 4).
Interpretation and visualization of the druggability prediction mod- Model testing using a trajectory from an L-RIP simulation To assess the ability of our models to monitor druggability in an MD simulation, we used an enhanced sampling MD method called L-RIP 33 (Rotamerically Induced Perturbation 34 with Langevin dynamics) to sample different conformations of the active site of p38 mitogenactivated protein (MAP) kinase and the cryptic pocket of TEM-1 β-lactamase. The p38 MAP kinase is an intracellular signaling protein involved in cytokine synthesis and is an important target for the treatment of osteoarthritis and inflammation. β-lactamase is a hydrolase that breaks the β-lactam ring, leading to resistance to β-lactam antibiotics among bacteria. To study how transient pocket formation in these two example proteins impacts druggability, L-RIP perturbation was applied with 300 perturbation pulses and 300 implicit solvent MD timesteps of 0.002 ps after each perturbation. 33 The last frame of each perturbation step was stored for further analysis. The remaining parameters for applying L-RIP followed the default settings on the TRAPP webserver. 33,35 Before druggability analysis, all snapshots in the trajectory were superimposed by fitting all binding site heavy atoms that were within 4 A of the centre of any ligand atom.

Results
Three global descriptors -pocket volume, hydrophobicity and the number of H-bond donors -are the most important for predicting druggability using linear models Before training the models, we first inspected the data sets by plotting the global descriptors in a bar chart and a scatter-plot matrix (Figure 2A  On the other hand, the scatter-plot matrix allowed us to detect correlations between pairs of global features ( Figure 2B). First, we observed a linear correlation between the pocket volume and some of the physicochemical properties, such as the numbers of H-bond donors and acceptors and the hydrophobicity. This correlation reflects the fact that the computation of a physicochemical property is basically a summation of certain atom types located within the pocket and thus the value is proportional to the pocket volume. We therefore included the normalization of each physicochemical property to the pocket volume as one of the hyperparameters (parameter values: "yes"/"no") to be tuned during optimization. It was noticeable that the scatter plots for hydrophobicity/positively charged or hydrophobicity/number of H-bond donors show some linear separability between bindable and less bindable pockets, indicating the importance of these global features. In contrast, the most severe overlaps between the positive and negative data for bindability occur for the scatter-plots for the pocket exposure, negatively charged, and aromaticity features. The red and blue dots represent bindable and less-bindable binding pockets, respectively. The corresponding plot for the NRDLD set is given in Figure S1 To construct linear models using LR or a linear SVM, the global features extracted from the TRAPP procedure were used as input for classification. Both methods are simple, fast, and ideal for binary classification, and less prone to over-fitting for small datasets than more  (Table S3). Geometric descriptors have been widely used for druggability prediction, for example, the pocket depth or buriedness has been shown to have a positive contribution. 11,14,15,36 However, we find that the coefficient of pocket exposure is relatively small. This difference may arise because some of the previous models were built using only a few simple input features, thus overestimating the importance of pocket shape. 15,36 The use of a different druggability training set may also affect the coefficients of the final model. 11 Another possible reason is that the ratio between solvent-exposed and protein-exposed surface area is not an ideal shape descriptor.
In PockDrug, 14 the longest distance in the pocket or the radius of the smallest enclosing sphere were used instead.
The coefficients of the TRAPP-LR models with L1 or L2 regularization were overall similar. L1 regularization penalizes the absolute value of the weights instead of the squared value penalized in L2 regularization. As the L1 penalty induces sparsity, the weights corresponding to the negatively charged descriptor and aromaticity vanished, indicating that these two properties are less important for druggability prediction. This is as expected because of the severe overlap of the druggable and less-druggable classes in these two feature dimensions   The LRP results of TRAPP-CNN can be visualized as meshes on the structure of a binding pocket in order to identify critical sub-regions of the pocket for its bindability.
The positive and negative evidence are shown by red and blue regions, respectively. In general, we observe that most of the evidence is located close to or within the pocket mesh, demonstrating that the cavity grid generated from TRAPP-pocket is able to guide the model to the region of interest. On the other hand, for pockets predicted to be in the positive class (druggable), mainly positive evidence but not negative evidence is obtained and vice versa. This phenomenon is especially prominent for the pockets with predictions of very high confidence, that is, druggability scores close to 0 or 1. As pixel-wise decomposition distributes the model raw output onto the input grid points, a confident prediction is associated with a large difference between the output of the two classes. Therefore, the evidence corresponding to the class opposite to the prediction should be negligible. indicating that the cavity volume has the largest contribution to the positive (druggable) prediction. A closer examination of the binding site residues reveals that the positive evidence tends to be surrounded by hydrophobic residues. Thus, hydrophobicity is a critical feature for enabling TRAPP-CNN to discriminate druggable pockets from less druggable ones.
The negative examples shown are the α-kinase domain of myosin-II heavy chain kinase A and fumarylacetoacetate hydrolase (FAH). The α-kinases (PDB ID:4zme, Figure 5C) are serine/threonine protein kinases that exhibit no sequence similarity with conventional eukaryotic protein kinases. 37 The active site of α-kinase is, compared to that of p38 MAP kinase, more solvent-exposed and less hydrophobic according to the heatmaps. Two regions with negative evidence are observed. One of them is surrounded by positively charged residues, which corresponds to the large negative coefficients for the positively-  The TRAPP-based druggability prediction models show equivalent or slightly improved performance on a public test set The performance of the tuned models was first evaluated using the DaPB test set, which is the 1/5th of the DaPB dataset that was not used for training, see Table 2. As the negative data were labeled by the TRAPP-SVM model, its specificity is 1 and both accuracy and These data points in the convolved space, which is optimal for separating druggable from less-druggable pockets, are likely to be close and in the vicinity of the decision boundary.
As the negative data are slightly more numerous than the positive data, the model would be optimized more towards correct classification of the less-druggable class, leading to the higher specificity than sensitivity ( Figure S2A, S2B).
To compare with other existing druggability prediction tools, the performance of all three models was further evaluated on the NRDLD test set as shown in Table 2. The TRAPP-LR and TRAPP-SVM models perform equally to the other methods in terms of accuracy and MCC value. The TRAPP-CNN model, however, displays superior performance with only 2 misclassified positive data points. Note that the number of data points in the test set is low (37 instances), and thus the difference in performance arises from a difference in the misclassification of just two data points.  However, the low amount of positively charged groups in the pocket seems to increase the probability to be druggable and reverts the prediction.
We further applied the TRAPP-CNN model to these four examples that were misclassified by the TRAPP-LR model and visualized the evidence by pixel-wise decomposition with LRP ( Figure 6). For the complete NRDLD test set, 1owe and 1r55 are the only two misclassified cases when using the TRAPP-CNN model, whereas 1bmq and 1b74 are correctly predicted as less-druggable owing to the higher specificity of the TRAPP-CNN model. According to the structures and the cavity meshes, all four pockets are shallow and solvent-exposed.
In general, the evidence for less-druggable pockets highlights the regions that are at the surface or outskirts of each cavity, close to some hydrophobic residues. In comparison to many druggable pockets characterized by a well defined hydrophobic environment, these hydrophobic residues tend to be situated on flexible loop structures and are not clustered together in a confined region; thus they are unable to form a hydrophobic patch for ligand binding that would favour a druggable pocket. In 1owe, the shallow regions, as well as some positively charged residues, are identified as evidence for low druggability. The cavity in 1r55 is situated on the protein surface and the bound ligand is barely buried. As the original pixelwise decomposition result for 1r55 only emphasizes the metal ion, as discussed previously, the procedure was repeated using the metal-ion free structure as input. The less-druggable evidence also points out the shallow region where the ligand binds, as well as two other sub-regions which may be noise for the druggability prediction. Similar to the previous two cases, the shallow regions or dynamic hydrophobic residues close to the exterior of the pocket are highlighted with less-druggable evidence in 1bmq and 1b74. The cavity centers of 1bmq and 1b74 are surrounded by positively charged and hydrogen bonding groups, respectively, which are not characteristics of druggable pockets. Even though 1b74 is correctly predicted as less druggable by the TRAPP-CNN model and has the lowest druggability score among the four examples, the impact of the hydrogen bonding groups is unexplained by LRP.

More druggable conformations of transient pockets are identified using the TRAPP procedure
To demonstrate how conformational changes impact druggability, we applied our druggability prediction models to two systems containing cryptic binding sites. [38][39][40] It has been reported that appropriately sized pockets of the biologically relevant drug targets are necessary for the strong binding of drug-sized ligands. 41,42 For the target proteins that are considered tractable but less-druggable, it is of interest to identify their cryptic sites and to exploit those sites to boost the druggability of the drug targets. 41 The first example considered is the p38 MAP kinase. It has a deep ATP-binding pocket  To train the linear models for druggability prediction based on the global descriptors, namely TRAPP-LR and TRAPP-SVM, the NRDLD dataset 10 was used. This data set is a consensus data set that has been used for training and validation of state-of-the-art druggability prediction models. To optimize the grid settings, the preprocessing steps, and the hyperparameters for the models together, a pipeline, consisting of preprocessing, standard scaling of the input vectors, and then the LR or SVM model for binary classification, was constructed and a cross-validated grid search was performed. For the final TRAPP-LR and TRAPP-SVM models, which achieve a top mean and a relatively small variance of the crossvalidated score in the grid search, a grid edge length of 24Å, a grid spacing of 0.75Å, and a hyperparameter C of 1 was used.
The parameters in the trained linear models (Figure 3) indicate the importance of each global descriptor for druggability. The negatively charged and aromatic descriptors are the least important features with small or zero weights in L2 or L1 regularized TRAPP-LR. One of the most intriguing results is that the H-bond donor property has the largest absolute value for its weight, larger than for pocket volume and hydrophobicity, which occupy the second and third places, demonstrating the strong (but negative) contribution of the H-bond donor property to druggability prediction.
TRAPP-CNN, which has detailed grids as inputs and a large amount of adaptive weights, requires a larger training set than TRAPP-LR/SVM. Thus, we augmented the DaPB dataset from the PDBbind refined set 2017, 25 in which the positive class is selected by thresholding the properties and the binding affinity of the bound ligand while the negative class was identified using TRAPP-SVM. By projecting the data on to the global descriptors, similar trends were observed as in the NRDLD dataset ( Figure 2), indicating that the DaPB dataset is a reasonable surrogate dataset. To overcome overfitting, data augmentations with offline 90 • rotation and online random translation were applied to the training data. The network architecture and other hyperparameters were optimized through a cross-validated grid search.
The final network contained 3 convolutional layers and 2 fully-connected layers ( Figure 4).
The trained models were evaluated on the DaPB test set and the NRDLD test set, both of which are subsets of the dataset that were not used for training. Starting from the performance on the DaPB test set (Table 2), the high specificity was as expected since the parameters of TRAPP-LR are similar to those of TRAPP-SVM, which is responsible for the labeling of the negative class. However, the sensitivity of TRAPP-LR on the DaPB dataset was rather low. This could be due to the fact that the definition of druggable pockets in the NRDLD set is stricter than the definition of druggable (bindable) pockets in the DaPB set. The performance of TRAPP-CNN is close to that of TRAPP-LR, with higher specificity than sensitivity, potentially due to the slightly imbalanced dataset which makes the TRAPP-CNN model favor the negative class. Moreover, all TRAPP-based models have equivalent or superior performance on the NRDLD test set in comparison to the other druggability prediction tools. Both linear models misjudged the same 2 positive and 2 negative data points, whereas the TRAPP-CNN model, which is more precise in detecting the negative class, only missed the 2 positive data points. Thus, the TRAPP-CNN model, although trained on the DaPB bindability dataset, is suitable for druggability prediction.
Apart from the good performance in terms of druggability prediction, a major advantage of our models is the explanatory visualization for model prediction ( Figure 5). The contribution of each global descriptor to the druggability prediction from TRAPP-LR is color-coded, which allows easy reasoning of the model output. Furthermore, TRAPP-CNN, which is the first druggability assessment model using the grid representation with detailed spatial information as the input, allows visualization of the regions contributing to the model prediction.
We observe that for druggable pockets, TRAPP-CNN mainly identifies extensive hydrophobic patches in the druggable pockets, whereas for less-druggable pockets, it highlights the shallow regions, positively charged groups, or metal ions.
The druggability prediction was integrated as a subroutine in TRAPP-pocket and L-RIP generated trajectories for P38 MAP kinase and β-lactamase were analyzed as examples.
As the variation of the druggability score is only sensitive when it is close to the decision boundary, we displayed the pre-squashed output from TRAPP-LR and TRAPP-CNN for druggability. The two models show consistent predictions upon conformational changes of the binding pockets. For p38 MAP kinase, the druggability rises as the sub-pocket becomes more connected to the main pocket, which is independent of the position of D168. On the other hand, the striking increase in druggability upon the opening of the cryptic pocket in βlactamase is an ideal example demonstrating the power of combining the TRAPP procedure and druggability prediction.
In summary, we have built three TRAPP-based druggability prediction models, TRAPP-LR, TRAPP-SVN and TRAPP-CNN, which allow the comparison of the druggability of different conformations of a protein binding pocket and the identification of critical sub-regions or properties of a pocket for the corresponding prediction. The druggability assessment is embedded into the pocket estimation procedure in TRAPP, thus providing an efficient platform for selecting protein binding pocket conformations based on druggability scores.