SAMbinder: A web server for predicting SAM binding residues of a protein from its amino acid sequence

Piyush Agrawal; Gaurav Mishra; Gajendra P. S. Raghava

doi:10.1101/625806

Abstract

Motivation S-adenosyl-L-methionine (SAM) is one of the important cofactor present in the biological system and play a key role in many diseases. There is a need to develop a method for predicting SAM binding sites in a protein for designing drugs against SAM associated disease. Best of our knowledge, there is no method that can predict the binding site of SAM in a given protein sequence.

Result This manuscript describes a method SAMbinder, developed for predicting SAM binding sites in a protein from its primary sequence. All models were trained, tested and evaluated on 145 SAM binding protein chains where no two chains have more than 40% sequence similarity. Firstly, models were developed using different machine learning techniques on a balanced dataset contain 2188 SAM interacting and an equal number of non-interacting residues. Our Random Forest based model developed using binary profile feature got maximum MCC 0.42 with AUROC 0.79 on the validation dataset. The performance of our models improved significantly from MCC 0.42 to 0.61, when evolutionary information in the form of PSSM profile is used as a feature. We also developed models on realistic dataset contains 2188 SAM interacting and 40029 non-interacting residues and got maximum MCC 0.61 with AUROC of 0.89. In order to evaluate the performance of our models, we used internal as well as external cross-validation technique.

Availability and implementation https://webs.iiitd.edu.in/raghava/sambinder/.

Introduction

Structural and functional annotation of a protein is one among the major challenges in the era of genomics. With the rapid advancement in sequencing technologies and concerted genome projects, there is an increasing gap between the sequenced protein and functionally annotated proteins which are very few in number (Casari et al., 1995; Yu et al., 2014). Therefore, there is an urgent need for the development of automated computational methods which can identify the residues playing an important role in protein functions. Protein-ligand interaction has been identified as one of the important functions which play a vital function in all biological processes (Agrawal, Singh, et al., 2019). In the past, considerable efforts have been to develop tools which can identify the ligand binding residues in a protein (Sousa et al., 2006). In the early stage, generalized methods have been developed which predicts the binding site or pockets in the proteins regardless of their ligand (Hendlich et al., 1997; Dundas et al., 2006; Laskowski, 1995; Levitt and Banaszak, 1992; Le Guilloux et al., 2009). Later on, it was realized that all ligands are not the same and there is a wide variation in shape and size of binding pockets. Therefore, researchers started developing ligand specific methods (Chauhan, Mishra, and G. P. S. Raghava, 2009; Hu et al., 2018; Yu, Hu, Huang, et al., 2013; Chen et al., 2012; Chauhan et al., 2010; Hu et al., 2016), and it was observed that these ligand specific methods performed better than generalized methods (Yu, Hu, Yang, et al., 2013a; Chen et al., 2012; Hu et al., 2016).

All living organism consists of small molecular weight ligands or cofactors which carries out an important function in some metabolic and regulatory pathways. S-adenosyl-L-methionine (SAM) is one such important cofactor, which was first discovered in the year 1952. After ATP, SAM is the second most versatile and widely used small molecule (Cantoni, 1975). It is a natural substance present in the cells of the body and is a direct metabolite of L-methionine which is an essential amino acid. SAM is a conjugate molecule of two ubiquitous biological compounds; (i) adenosine moiety of ATP and (ii) amino acid methionine (CATONI, 1953; Waddell et al., 2000). One of the most important functions of the SAM is the transfer or donation of different chemical groups such as methyl (Thomas et al., 2004; Wuosmaa and Hager, 1990), aminopropyl (Lin, 2011), ribosyl (Kozbial and Mushegian, 2005), 5’deocxyadenosyl and methylene group (Gana et al., 2013; Kozbial and Mushegian, 2005) for carrying out covalent modification of a variety of substrates. SAM is also used as a precursor molecule in the biosynthesis of nicotinamide phytosiderophores, plant hormone ethylene, spermine, and spermidine; carry out chemical reactions such as hydroxylation, fluorination which takes place in bacteria (Cadicamo et al., 2004). It has become the choice of various clinical studies and possess therapeutic value for treating diseases like osteoarthritis (Najm et al., 2004), cancer (Chaib et al., 2011; Wagner et al., 2010), epilepsy (Item et al., 2004), Alzheimer’s (Borroni et al., 2004), dementia and depression (Bottiglieri et al., 1990; Rosenbaum et al., 1990), Parkinson (Zhu, 2004), and other psychiatric and neurological disorders (Bottiglieri, 1997). In the previous studies, it has been shown that mutation in the binding site of SAM has changed the protein function. For example, Aktas et. al. showed that Alanine substitution in the predicted SAM binding residues reduced the SAM binding affinity and enzyme activity dramatically (Aktas et al., 2011). Thus, there is a need to develop a method that can predict SAM binding sites in a protein as it is an important ligand.

Material and Methods

Dataset Creation

In the first step, we extracted 244 SAM binding proteins PDB IDs from ccPDB 2.0 (Agrawal, Patiyal, et al., 2019). In total, we obtained 457 SAM binding protein chains. In the next step, we filtered all the sequences with 40% sequence similarity using CD-HIT software (Huang et al., 2010) and resolution better than 3 Å and obtained total of 145 protein chains. In the next step, we run Ligand Protein Contact (LPC) software (Sobolev et al., 1999) on the chains to extract the contact information of SAM with residues present in the protein chains with cutoff criteria of 4 Å, which means if the distance of contact of SAM with a residue is less than or equal to 4 Å, the residue is called interacting otherwise the residue is called as non-interacting. This is well-established criteria adopted in many previous studies (Chauhan et al., 2010; Mishra and Raghava, 2010).

Additional dataset creation

An additional dataset was also created consisting of the SAM binding chains released after March 2018 and up to February 28, 2019. The dataset was generated following the same procedures as mentioned above and comprised of total 10 chains.

Internal and External Validation

Dataset was randomly divided into two parts; (i) training dataset, which comprises 80% of the protein chains and (ii) validation dataset, which consists of remaining 20% of the protein chains. Datasets were created at the protein level and not at pattern or residue level since in previous studies it has been shown that dataset generated at pattern level is biased and show higher performance (Yu et al., 2014). The balanced dataset contains 1798 SAM interacting and non-interacting residue in training dataset and 390 SAM interacting and non-interacting residue in the validation dataset. The realistic dataset consists of 1798 SAM interacting residues and 33314 SAM non-interacting residues in training dataset whereas validation dataset consists of 390 SAM interacting residues and 6715 SAM non-interacting residues.

Five-fold cross-validation

The five-fold cross technique was performed for evaluating the performance of different prediction models. This kind of performance evaluation has been used in many previous studies (Nagpal et al., 2018; Kumar et al., 2018).

Window or Pattern size

We created overlapping patterns of each sequence of different window size ranging from 5-23 amino acid length. If the pattern central residue is SAM interacting, it is assigned as a positive pattern; otherwise, it was assigned as a negative pattern. In order to generate the pattern for terminus residues, we added (L-1)/2 dummy residue ‘X’ at both the termini of the protein chain (where L is pattern length).

Binary Profile

We generated binary profile of each patterns by assigning binary values to the amino acids in fixed length pattern. A vector of dimension 21 represented each amino acid present in the pattern hence leading to final vector of N x 21, where N is length of the pattern. For example, residue ‘A’ was represented by 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; which contains 20 amino acids and one dummy amino acid ‘X’. X was represented by 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 (Agrawal, Kumar, et al., 2019; Agrawal and Raghava, 2018).

Position Specific Scoring Matrix (PSSM)

PSSM profiles containing evolutionary information has been shown as an important feature in many previous studies for predicting protein-ligand interaction and other bioinformatics problems (Yu, Hu, Huang, et al., 2013; Chen et al., 2012). PSSM profiles of a sequence were generated using Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) and searching against the Swiss Prot database. Three iterations were performed with E-value cut-off of 0.001 against each sequence. The original PSSM profiles obtained were further normalized to get value in between 0 and 1, followed by calculation of position specific score for each residue. The final matrix obtained consists of 21× N elements (20 amino acids residue and one dummy residue ‘X’). Here N, is the length of the pattern.

Machine Learning Techniques

We implemented the python based machine learning package SCIKIT-learn (Pedregosa FABIANPEDREGOSA et al., 2011) for developing prediction models. We implemented Support Vector Classifier (SVC), Random Forest classifier (RFC), ExtraTree classifier (ETC), K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP) and Ridge classifier for developing prediction models. Before, developing prediction models, we optimized different parameters on our internal dataset using Grid Search parameter present in the package.

Evaluation Parameters

Performance of developed prediction models was evaluated in terms of Sensitivity (Sen), Specificity (Spc), Accuracy (Acc), Matthew’s Correlation Coefficient (MCC) and Area Under Receiver Operating Characteristics (AUROC). ‘pROC package’ implemented in R was used for computing AUROC (Title Display and Analyze ROC Curves, 2019). The formula for calculating is explained in the equation 1-4. Where TP represents correctly predicted positive value, TN represents the correctly predicted negative value, FP represents the actual negative value which has been wrongly predicted as positive, and FN represents the positive value which has been wrongly predicted as negative.

Results

Composition Analysis

We have analyzed the amino acid composition of SAM interacting and non-interacting residues in SAM binding proteins. As shown in Figure 1, composition of residues C, D, F, G, H, M, N, S, W, and Y are higher in SAM interacting sites whereas composition of residues like A, E, I, K, L, P, Q, R, and V are higher in SAM non-interacting sites.

Figure 1.

Percentage composition of SAM interacting and non-interacting residues.

Propensity Analysis

We also analyzed the propensities of amino acid residues in SAM interacting and non-interacting sites. We observed that propensities of residues like C, D, F, G, H, M, N, S, W, and Y was higher in SAM interacting sites whereas propensities of residues like A, E, I, K, L, P, Q, R, T, and V was higher in SAM non-interacting sites (Supplementary Figure S1).

Physiochemical Properties Analysis

We also analyzed the various physiochemical properties in SAM binding proteins. We found that SAM interacting sites are rich in acidic, small, polar and aromatic amino acids whereas SAM non-interacting sites are more predominant in charge, basic, and aliphatic amino acids. (Supplementary Figure S2).

Machine learning model performance using binary patterns

Various machine learning models were developed using binary patterns for window size 5-23 on the balanced dataset. We compiled the best result obtained for each window size in Table 1 and plot the AUROC obtained for both training and validation dataset as shown in Figure 2(a) and Figure 2(b) respectively. In our analysis, we found that prediction model developed using random forest on the window size 21 performed best among all the prediction models. The model achieved accuracy of 70.79%, 0.42 MCC and 0.78 AUROC on the training dataset and accuracy of 70.85%, 0.42 MCC and 0.79 AUROC on the validation dataset. Detail result obtained for each window size by different machine learning techniques is provided in the Supplementary Table S1-S10.

View this table:

Table 1.

The performance of best machine learning model developed using amino acid sequence (binary pattern) for individual window size on balanced dataset.

Figure 2.

AUROC plots obtained for various window length developed using binary profile on balanced dataset for (a) training dataset and (b) validation dataset.

Machine learning model performance using evolutionary information (PSSM profile)

Prediction models were developed using PSSM profiles for all the considered window size on the balanced dataset. Best result obtained for each window size is compiled in Table 2 and AUROC was plotted for the training (Figure 3(a)) and validation dataset (Figure 3(b)). We observed that, in case of PSSM profiles, the performance of the prediction models were increased. ExtraTree Classifier model developed on the window size 17 performed best among all the developed models. It achieved the highest accuracy of 80.39%, MCC of 0.61 and AUROC of 0.88 on training dataset whereas on validation dataset it achieved accuracy of 77.07%, MCC of 0.54, and AUROC of 0.86. Result obtained by different classifiers on each window size has been provided in the Supplementary Table S11-S20.

View this table:

Table 2.

The performance of best machine learning model developed using PSSM profile for individual window size on balanced dataset.

Figure 3.

AUROC plots obtained for various window length developed using evolutionary profile on balanced dataset for (a) training dataset and (b) validation dataset.

Machine learning model performance using hybrid feature

We also developed models on the hybrid feature where we sum up the values of binary profile and the evolutionary information obtained for the residue. Best result obtained for each window length is compiled in the Supplementary Table S21 and plotted AUROC for the training dataset (Supplementary Figure S3 (a)) and validation dataset (Supplementary Figure S3 (b)). Maximum accuracy of 80.58%, MCC of 0.61 and AUROC of 0.89 was obtained by SVC on the training dataset for window size 19. In case of validation dataset, accuracy of 78.50%, MCC of 0.57 and AUROC of 0.87 was obtained. Result for all the window size obtained by different classifiers is provided in the Supplementary Table S22-S31.

Performance of the machine learning models developed on the realistic dataset

Window size 17 was found to be optimum window size as the model developed using PSSM profile performed best among all the models. Therefore, we used this window size for developing prediction models on the realistic dataset using PSSM profile as an input feature. When balanced specificity and sensitivity was considered SVC based model achieved maximum MCC value of 0.32 on training dataset and 0.31 on independent dataset. However, MCC value increases to 0.61 on training dataset and 0.52 on validation dataset when balanced sensitivity and specificity was not taken into account (Table 3). The AUROC achieved on the training dataset and validation dataset was 0.89 and 0.87 respectively (Figure 4).

View this table:

Table 3.

The performance of PSSM profile based models developed using different machine learning techniques for window size 17 on realistic dataset.

Figure 4.

AUROC plots obtained for window length 17 developed using evolutionary profile on realistic dataset for (a) training dataset and (b) validation dataset.

Performance on additional dataset

We also checked the performance of our best model developed on realistic dataset using PSSM profile and window size 17 on an additional dataset. As shown in Supplementary Table S32, model developed using SVC classifier achieved the best performance with accuracy of 82.27%, and 0.33 MCC with balanced sensitivity and specificity. However the maximum accuracy of 97.14% and MCC of 0.62 was obtained when no balanced sensitivity and specificity were considered. The AUROC obtained was 0.88.

Implementation of Model in Web server

In order to help biologist for predicting SAM interacting residues, we implemented our best models in a web server “SAMbinder”. The web server consists of several modules such as “Sequence”, “PSSM Profile”, “Peptide Mapping”, “Standalone” and “Download”. These modules have been explained below in detail.

(i) Sequence

This module allows users to predict SAM interacting residue in a protein from its primary sequence. A user can submit either single or multiple sequences or upload the sequence file in the FASTA format and can select the desired probability cut-off and machine learning classifier for prediction. The module utilizes the binary profile as an input feature and number of machine learning models has been implemented into it. The classifier provides the prediction score which is normalized in between propensity score 0-9. Residues having the propensity score equal or above the selected cut-off threshold are highlighted in blue colour and remaining residues are highlighted in black colour. Blue colour indicates that the probability of these residues in SAM binding is high in comparison to the residues present in black colour. The result is downloadable in the “csv” file format and will be sent to email also if the user has provided the email.

(ii) PSSM Profile

As the name suggests, this module utilizes the PSSM profile as an input feature for predicting SAM interacting residues in a given protein sequence. This feature is better than the binary profile however the only limitation is that it is very computer intensive. Therefore, a user can use this module if the number of the sequences are very few. The output is provided in the same format as Sequence module provides. For doing the prediction for multiple sequences using PSSM profile, we suggest user to use the standalone version of the software.

(iii) Peptide Mapping

In this module, we have provided the facility where a user can map the peptide that contains SAM interacting central residue. We pre-computed propensity (between 0-9) of each tri and penta-peptides which contains SAM interacting central residues from known PDB protein structure. The propensity was computed using all SAM interacting protein chains i.e. redundancy was not removed, in order to avoid loss of information. Once a user submits sequence in FASTA format, all the possible segment of selected length is generated and mapped on the protein sequence along with the propensity score. Based on that mapping server predicts whether the peptide segment is SAM interacting or non-interacting. If propensity of residue is equal to greater than the selected threshold, it is known as SAM interacting residue.

Standalone

Standalone of SAMbinder is Python-based and is available at the Github site. The user can download it from the site https://github.com/raghavagps/sambinder/. SAMbinder standalone version is also implemented in the docker technology. Complete usage of downloading the image and its implementation is provided in the docker manual “GPSRdocker” which can be downloaded for the website https://webs.iiitd.edu.in/gpsrdocker/.

Discussion

SAM is one of the important essential metabolic cofactor/intermediates which is found in almost every cellular life forms and enzymes. SAM binding proteins are predominant in two major types of folds; (i) Rossman fold and TIM barrel fold and different motifs (Motif I-VI) (Gana et al., 2013). SAM binding proteins play a vital role in many metabolic and regulatory pathways in almost all form of living organism and acts as a potential drug target in a number of diseases. In Europe, SAMe is used as drug for treating diseases like liver disorder, depression, fibromyalgia and osteoarthritis. It has also been used as dietary supplements in United States for supporting the bones and joints. Therefore, it is very important to predict the SAM interacting residues in a given protein. We analysed various properties of SAM interacting protein chains such as composition, propensity and physiochemical properties and developed various machine learning models for predicting SAM interacting residue in new protein using number of input features. The models were first developed on the balanced dataset and different window sizes. We observed that model developed using PSSM profile and window size 17 performed best among all the models. Performance of the models was also validated on an independent dataset and an additional dataset. Python-based machine learning package scikit-learn was implemented for developing the prediction models. In order to assist the scientific community, we have created a python-based standalone version of our software and also developed a web server where a user can predict the SAM interacting residues in the target protein. The server can be freely accessible at http://webs.iiitd.edu.in/raghava/sambinder. Complete workflow of SAMbinder is shown in figure 5.

Figure 5.

Architecture of SAMbinder.

Conflict of Interest Statement

The authors declare that they have no conflict of interest.

Author’s Contribution

PA collected and compiled the datasets. PA performed the experiments. PA and GPSR developed the web interface. PA and GM developed the standalone software. PA, and GPSR analysed the data and prepared the manuscript. GPSR conceived the idea and coordinated the project. All authors read and approved the final paper.

Funding Information

This work was supported by J.C. Bose National Fellowship, Department of Science and Technology, Govt. of India.

Acknowledgement

Authors are thankful to J.C. Bose National Fellowship, Department of Science and Technology (DST), Government of India, and DST-INSPIRE for fellowships and the financial support.

Footnotes

↵Emails of Authors: PA: piyush_11{at}imtech.res.in, GM: gm213{at}snu.edu.in, GPSR: raghava{at}iiitd.ac.in

References

↵
Agrawal, P., Singh, H., et al. (2019) Benchmarking of different molecular docking methods for protein-peptide docking. BMC Bioinformatics, 19, 426.
OpenUrl
↵
Agrawal, P., Patiyal, S., et al. (2019) ccPDB 2.0: an updated version of datasets created and compiled from Protein Data Bank. Database (Oxford)., 2019.
↵
Agrawal, P., Kumar, S., et al. (2019) NeuroPIpred: a tool to predict, design and scan insect neuropeptides. Sci. Rep., 9, 5129.
OpenUrl
↵
Agrawal, P. and Raghava, G.P.S. (2018) Prediction of Antimicrobial Potential of a Chemically Modified Peptide From Its Tertiary Structure. Front. Microbiol., 9, 2551.
OpenUrl
↵
Aktas, M. et al. (2011) S-adenosylmethionine-binding properties of a bacterial phospholipid N-methyltransferase. J. Bacteriol., 193, 3473–81.
OpenUrl Abstract/FREE Full Text
↵
Borroni, B. et al. (2004) Catechol-O-methyltransferase gene polymorphism is associated with risk of psychosis in Alzheimer Disease. Neurosci. Lett., 370, 127–9.
OpenUrl CrossRef PubMed Web of Science
↵
Bottiglieri, T. (1997) Ademetionine (S-adenosylmethionine) neuropharmacology: implications for drug therapies in psychiatric and neurological disorders. Expert Opin. Investig. Drugs, 6, 417–426.
OpenUrl PubMed
↵
Bottiglieri, T. et al. (1990) Cerebrospinal fluid S-adenosylmethionine in depression and dementia: effects of treatment with parenteral and oral S-adenosylmethionine. J. Neurol. Neurosurg. Psychiatry, 53, 1096–8.
OpenUrl Abstract/FREE Full Text
↵
Cadicamo, C.D. et al. (2004) Enzymatic fluorination in Streptomyces cattleya takes place with an inversion of configuration consistent with an SN2 reaction mechanism. Chembiochem, 5, 685–90.
OpenUrl PubMed
↵
Cantoni, G.L. (1975) Biological methylation: selected aspects. Annu. Rev. Biochem., 44, 435–51.
OpenUrl CrossRef PubMed Web of Science
↵
Casari, G. et al. (1995) A method to predict functional residues in proteins. Nat. Struct. Biol., 2, 171–8.
OpenUrl CrossRef PubMed Web of Science
↵
Catoni, G.L. (1953) S-Adenosylmethionine; a new intermediate formed enzymatically from L-methionine and adenosinetriphosphate. J. Biol. Chem., 204, 403–16.
OpenUrl FREE Full Text
↵
Chaib, H. et al. (2011) [Histone methyltransferases: a new class of therapeutic targets in cancer treatment?]. Med. Sci. (Paris)., 27, 725–32.
OpenUrl
↵
Chauhan, J.S., Mishra, N.K., and Raghava, G.P. (2009) Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics, 10, 434.
OpenUrl CrossRef PubMed
↵
Chauhan, J.S. et al. (2010) Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics, 11, 301.
OpenUrl CrossRef PubMed
↵
Chen, K. et al. (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics, 28, 331–41.
OpenUrl CrossRef PubMed Web of Science
↵
Dundas, J. et al. (2006) CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res., 34, W116–8.
OpenUrl CrossRef PubMed Web of Science
↵
Gana, R. et al. (2013) Structural and functional studies of S-adenosyl-L-methionine binding proteins: a ligand-centric approach. BMC Struct. Biol., 13, 6.
OpenUrl CrossRef PubMed
↵
Le Guilloux, V. et al. (2009) Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics, 10, 168.
OpenUrl CrossRef PubMed
↵
Hendlich, M. et al. (1997) LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Model., 15, 359–63, 389.
OpenUrl CrossRef PubMed Web of Science
↵
Hu, J. et al. (2018) ATPbind: Accurate Protein-ATP Binding Site Prediction by Combining Sequence-Profiling and Structure-Based Comparisons. J. Chem. Inf. Model., 58, 501– 10.
OpenUrl
↵
Hu, X. et al. (2016) Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals. Bioinformatics, 32, 3260–3269.
OpenUrl CrossRef PubMed
↵
Huang, Y. et al. (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 26, 680–2.
OpenUrl CrossRef PubMed Web of Science
↵
Item, C.B. et al. (2004) Characterization of seven novel mutations in seven patients with GAMT deficiency. Hum. Mutat., 23, 524.
OpenUrl PubMed
↵
Kozbial, P.Z. and Mushegian, A.R. (2005) Natural history of S-adenosylmethionine-binding proteins. BMC Struct. Biol., 5, 19.
OpenUrl CrossRef PubMed
↵
Kumar, V. et al. (2018) Prediction of Cell-Penetrating Potential of Modified Peptides Containing Natural and Chemically Modified Residues. Front. Microbiol., 9, 725.
OpenUrl
↵
Laskowski, R.A. (1995) SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph., 13, 323–30, 307–8.
OpenUrl CrossRef PubMed Web of Science
↵
Levitt, D.G. and Banaszak, L.J. (1992) POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J. Mol. Graph., 10, 229–34.
OpenUrl CrossRef PubMed Web of Science
↵
Lin, H. (2011) S-Adenosylmethionine-dependent alkylation reactions: when are radical reactions used? Bioorg. Chem., 39, 161–70.
OpenUrl CrossRef PubMed
↵
Mishra, N.K. and Raghava, G.P.S. (2010) Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics, 11 Suppl 1, S48.
OpenUrl
↵
Nagpal, G. et al. (2018) Computer-aided prediction of antigen presenting cell modulators for designing peptide-based vaccine adjuvants. J. Transl. Med., 16, 181.
OpenUrl
↵
Najm, W.I. et al. (2004) S-adenosyl methionine (SAMe) versus celecoxib for the treatment of osteoarthritis symptoms: a double-blind cross-over trial. [ISRCTN36233495]. BMC Musculoskelet. Disord., 5, 6.
OpenUrl CrossRef PubMed
↵
Pedregosa FABIANPEDREGOSA, F. et al. (2011) Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot.
↵
Rosenbaum, J.F. et al. (1990) The antidepressant potential of oral S-adenosyl-l-methionine. Acta Psychiatr. Scand., 81, 432–6.
OpenUrl CrossRef PubMed Web of Science
↵
Sobolev, V. et al. (1999) Automated analysis of interatomic contacts in proteins. Bioinformatics, 15, 327–32.
OpenUrl CrossRef PubMed Web of Science
↵
Sousa, S.F. et al. (2006) Protein-ligand docking: current status and future challenges. Proteins, 65, 15–26.
OpenUrl CrossRef PubMed Web of Science
↵
Thomas, D.J. et al. (2004) Elucidating the pathway for arsenic methylation. Toxicol. Appl. Pharmacol., 198, 319–26.
OpenUrl CrossRef PubMed Web of Science
↵
Title Display and Analyze ROC Curves (2019).
↵
Waddell, T.G. et al. (2000) Prebiotic methylation and the evolution of methyl transfer reactions in living cells. Orig. Life Evol. Biosph., 30, 539–48.
OpenUrl PubMed
↵
Wagner, J.M. et al. (2010) Histone deacetylase (HDAC) inhibitors in recent clinical trials for cancer therapy. Clin. Epigenetics, 1, 117–136.
OpenUrl CrossRef PubMed
↵
Wuosmaa, A.M. and Hager, L.P. (1990) Methyl chloride transferase: a carbocation route for biosynthesis of halometabolites. Science, 249, 160–2.
OpenUrl Abstract/FREE Full Text
↵
Yu, D.-J., Hu, J., Yang, J., et al. (2013a) Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinforma., 10, 994–1008.
OpenUrl
Yu, D.-J., Hu, J., Yang, J., et al. (2013b) Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering. IEEE/ACM Trans. Comput. Biol. Bioinforma., 10, 994–1008.
OpenUrl
↵
Yu, D.-J. et al. (2014) Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble. BMC Bioinformatics, 15, 297.
OpenUrl
Yu, D.-J., Hu, J., Huang, Y., et al. (2013) TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem., 34, 974–85.
OpenUrl
↵
Zhu, B.T. (2004) CNS dopamine oxidation and catechol-O-methyltransferase: importance in the etiology, pharmacotherapy, and dietary prevention of Parkinson’s disease. Int. J. Mol. Med., 13, 343–53.
OpenUrl PubMed Web of Science

View the discussion thread.

Posted May 02, 2019.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5215)
Biochemistry (11753)
Bioengineering (8752)
Bioinformatics (29201)
Biophysics (14974)
Cancer Biology (12100)
Cell Biology (17413)
Clinical Trials (138)
Developmental Biology (9422)
Ecology (14182)
Epidemiology (2067)
Evolutionary Biology (18309)
Genetics (12245)
Genomics (16804)
Immunology (11869)
Microbiology (28098)
Molecular Biology (11596)
Neuroscience (60975)
Paleontology (451)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2886)
Systems Biology (7340)
Zoology (1651)

[1] ↵
Agrawal, P., Singh, H., et al. (2019) Benchmarking of different molecular docking methods for protein-peptide docking. BMC Bioinformatics, 19, 426.
OpenUrl

[2] ↵
Agrawal, P., Patiyal, S., et al. (2019) ccPDB 2.0: an updated version of datasets created and compiled from Protein Data Bank. Database (Oxford)., 2019.

[3] ↵
Agrawal, P., Kumar, S., et al. (2019) NeuroPIpred: a tool to predict, design and scan insect neuropeptides. Sci. Rep., 9, 5129.
OpenUrl

[4] ↵
Agrawal, P. and Raghava, G.P.S. (2018) Prediction of Antimicrobial Potential of a Chemically Modified Peptide From Its Tertiary Structure. Front. Microbiol., 9, 2551.
OpenUrl

[5] ↵
Aktas, M. et al. (2011) S-adenosylmethionine-binding properties of a bacterial phospholipid N-methyltransferase. J. Bacteriol., 193, 3473–81.
OpenUrl Abstract/FREE Full Text

[6] ↵
Borroni, B. et al. (2004) Catechol-O-methyltransferase gene polymorphism is associated with risk of psychosis in Alzheimer Disease. Neurosci. Lett., 370, 127–9.
OpenUrl CrossRef PubMed Web of Science

[7] ↵
Bottiglieri, T. (1997) Ademetionine (S-adenosylmethionine) neuropharmacology: implications for drug therapies in psychiatric and neurological disorders. Expert Opin. Investig. Drugs, 6, 417–426.
OpenUrl PubMed

[8] ↵
Bottiglieri, T. et al. (1990) Cerebrospinal fluid S-adenosylmethionine in depression and dementia: effects of treatment with parenteral and oral S-adenosylmethionine. J. Neurol. Neurosurg. Psychiatry, 53, 1096–8.
OpenUrl Abstract/FREE Full Text

[9] ↵
Cadicamo, C.D. et al. (2004) Enzymatic fluorination in Streptomyces cattleya takes place with an inversion of configuration consistent with an SN2 reaction mechanism. Chembiochem, 5, 685–90.
OpenUrl PubMed

[10] ↵
Cantoni, G.L. (1975) Biological methylation: selected aspects. Annu. Rev. Biochem., 44, 435–51.
OpenUrl CrossRef PubMed Web of Science

[11] ↵
Casari, G. et al. (1995) A method to predict functional residues in proteins. Nat. Struct. Biol., 2, 171–8.
OpenUrl CrossRef PubMed Web of Science

[12] ↵
Catoni, G.L. (1953) S-Adenosylmethionine; a new intermediate formed enzymatically from L-methionine and adenosinetriphosphate. J. Biol. Chem., 204, 403–16.
OpenUrl FREE Full Text

[13] ↵
Chaib, H. et al. (2011) [Histone methyltransferases: a new class of therapeutic targets in cancer treatment?]. Med. Sci. (Paris)., 27, 725–32.
OpenUrl

[14] ↵
Chauhan, J.S., Mishra, N.K., and Raghava, G.P. (2009) Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics, 10, 434.
OpenUrl CrossRef PubMed

[15] ↵
Chauhan, J.S. et al. (2010) Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics, 11, 301.
OpenUrl CrossRef PubMed

[16] ↵
Chen, K. et al. (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics, 28, 331–41.
OpenUrl CrossRef PubMed Web of Science

[17] ↵
Dundas, J. et al. (2006) CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res., 34, W116–8.
OpenUrl CrossRef PubMed Web of Science

[18] ↵
Gana, R. et al. (2013) Structural and functional studies of S-adenosyl-L-methionine binding proteins: a ligand-centric approach. BMC Struct. Biol., 13, 6.
OpenUrl CrossRef PubMed

[19] ↵
Le Guilloux, V. et al. (2009) Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics, 10, 168.
OpenUrl CrossRef PubMed

[20] ↵
Hendlich, M. et al. (1997) LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Model., 15, 359–63, 389.
OpenUrl CrossRef PubMed Web of Science

[21] ↵
Hu, J. et al. (2018) ATPbind: Accurate Protein-ATP Binding Site Prediction by Combining Sequence-Profiling and Structure-Based Comparisons. J. Chem. Inf. Model., 58, 501– 10.
OpenUrl

[22] ↵
Hu, X. et al. (2016) Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals. Bioinformatics, 32, 3260–3269.
OpenUrl CrossRef PubMed

[23] ↵
Huang, Y. et al. (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 26, 680–2.
OpenUrl CrossRef PubMed Web of Science

[24] ↵
Item, C.B. et al. (2004) Characterization of seven novel mutations in seven patients with GAMT deficiency. Hum. Mutat., 23, 524.
OpenUrl PubMed

[25] ↵
Kozbial, P.Z. and Mushegian, A.R. (2005) Natural history of S-adenosylmethionine-binding proteins. BMC Struct. Biol., 5, 19.
OpenUrl CrossRef PubMed

[26] ↵
Kumar, V. et al. (2018) Prediction of Cell-Penetrating Potential of Modified Peptides Containing Natural and Chemically Modified Residues. Front. Microbiol., 9, 725.
OpenUrl

[27] ↵
Laskowski, R.A. (1995) SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph., 13, 323–30, 307–8.
OpenUrl CrossRef PubMed Web of Science

[28] ↵
Levitt, D.G. and Banaszak, L.J. (1992) POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J. Mol. Graph., 10, 229–34.
OpenUrl CrossRef PubMed Web of Science

[29] ↵
Lin, H. (2011) S-Adenosylmethionine-dependent alkylation reactions: when are radical reactions used? Bioorg. Chem., 39, 161–70.
OpenUrl CrossRef PubMed

[30] ↵
Mishra, N.K. and Raghava, G.P.S. (2010) Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics, 11 Suppl 1, S48.
OpenUrl

[31] ↵
Nagpal, G. et al. (2018) Computer-aided prediction of antigen presenting cell modulators for designing peptide-based vaccine adjuvants. J. Transl. Med., 16, 181.
OpenUrl

[32] ↵
Najm, W.I. et al. (2004) S-adenosyl methionine (SAMe) versus celecoxib for the treatment of osteoarthritis symptoms: a double-blind cross-over trial. [ISRCTN36233495]. BMC Musculoskelet. Disord., 5, 6.
OpenUrl CrossRef PubMed

[33] ↵
Pedregosa FABIANPEDREGOSA, F. et al. (2011) Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot.

[34] ↵
Rosenbaum, J.F. et al. (1990) The antidepressant potential of oral S-adenosyl-l-methionine. Acta Psychiatr. Scand., 81, 432–6.
OpenUrl CrossRef PubMed Web of Science

[35] ↵
Sobolev, V. et al. (1999) Automated analysis of interatomic contacts in proteins. Bioinformatics, 15, 327–32.
OpenUrl CrossRef PubMed Web of Science

[36] ↵
Sousa, S.F. et al. (2006) Protein-ligand docking: current status and future challenges. Proteins, 65, 15–26.
OpenUrl CrossRef PubMed Web of Science

[37] ↵
Thomas, D.J. et al. (2004) Elucidating the pathway for arsenic methylation. Toxicol. Appl. Pharmacol., 198, 319–26.
OpenUrl CrossRef PubMed Web of Science

[38] ↵
Title Display and Analyze ROC Curves (2019).

[39] ↵
Waddell, T.G. et al. (2000) Prebiotic methylation and the evolution of methyl transfer reactions in living cells. Orig. Life Evol. Biosph., 30, 539–48.
OpenUrl PubMed

[40] ↵
Wagner, J.M. et al. (2010) Histone deacetylase (HDAC) inhibitors in recent clinical trials for cancer therapy. Clin. Epigenetics, 1, 117–136.
OpenUrl CrossRef PubMed

[41] ↵
Wuosmaa, A.M. and Hager, L.P. (1990) Methyl chloride transferase: a carbocation route for biosynthesis of halometabolites. Science, 249, 160–2.
OpenUrl Abstract/FREE Full Text

[42] ↵
Yu, D.-J., Hu, J., Yang, J., et al. (2013a) Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinforma., 10, 994–1008.
OpenUrl

[43] Yu, D.-J., Hu, J., Yang, J., et al. (2013b) Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering. IEEE/ACM Trans. Comput. Biol. Bioinforma., 10, 994–1008.
OpenUrl

[44] ↵
Yu, D.-J. et al. (2014) Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble. BMC Bioinformatics, 15, 297.
OpenUrl

[45] Yu, D.-J., Hu, J., Huang, Y., et al. (2013) TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem., 34, 974–85.
OpenUrl

[46] ↵
Zhu, B.T. (2004) CNS dopamine oxidation and catechol-O-methyltransferase: importance in the etiology, pharmacotherapy, and dietary prevention of Parkinson’s disease. Int. J. Mol. Med., 13, 343–53.
OpenUrl PubMed Web of Science

SAMbinder: A web server for predicting SAM binding residues of a protein from its amino acid sequence

Abstract

Introduction

Material and Methods

Dataset Creation

Additional dataset creation

Internal and External Validation

Five-fold cross-validation

Window or Pattern size

Binary Profile

Position Specific Scoring Matrix (PSSM)

Machine Learning Techniques

Evaluation Parameters

Results

Composition Analysis

Propensity Analysis

Physiochemical Properties Analysis

Machine learning model performance using binary patterns

Machine learning model performance using evolutionary information (PSSM profile)

Machine learning model performance using hybrid feature

Performance of the machine learning models developed on the realistic dataset

Performance on additional dataset

Implementation of Model in Web server

(i) Sequence

(ii) PSSM Profile

(iii) Peptide Mapping

Standalone

Discussion

Conflict of Interest Statement

Author’s Contribution

Funding Information

Acknowledgement

Footnotes

References

Citation Manager Formats

Subject Area