Machine learning-based modulation of Ca2+-binding affinity in EF-hand proteins and comparative structural insights into site-specific cooperative binding

Ca2+-binding proteins are present in almost all living organisms and different types display different levels of binding affinities for the cation. Here, we applied two new scoring schemes enabling the user to manipulate the binding affinities of such proteins. We specifically designed a unique EF-hand loop capable of binding calcium with high affinity by altering five residues of the loop based on the scoring scheme. We worked on the N-terminal domain of Entamoeba histolytica calcium-binding protein1 (NtEhCaBP1), and used site-directed mutagenesis to incorporate the designed loop sequence into the second EF hand motif of this protein. The binding isotherms calculated using ITC calorimetry showed a ∼500-fold greater association constant (Ka) for the mutant. The crystal structure of the mutant was also determined, and displayed more compact Ca2+-coordination spheres in both of its EF loops than did the structure of the wildtype protein, consistent with the greater calcium-binding affinities of the mutant. The NtEhCaBP1 mutant was also shown to form a hexamer rather than just a trimer, and this hexamer formation was attributed to the position of the last helix of the mutant having been changed as a result of the strong calcium coordination. Further dynamic correlation analysis revealed that the mutation in the second EF loop changed the entire residue network of the monomer, resulting in a stronger coordination of Ca2+ even in other EF hand loop.

Previously, we developed a method to identify and classify the Ca 2+ -binding EF-hand loops by levels of their Ca 2+ -binding affinities using support vector machine (SVM) based approaches operating through a classification-based scoring system [23]. The classifier uses a nonlinear SVM with a gaussian radial basis function kernel [24]. The program showed high accuracy for the prediction of Ca 2+ binding motifs available from the literature but it only predicts the affinities qualitatively. Despite of huge implication of EF-hand motifs mediated Ca 2+ signaling across the various species, there is no proper method has been developed yet which helps in manipulating Ca 2+ binding affinity systematically. Although Ca 2+ binding affinity can be manipulated by mutating critical residues of EF-hand motifs (1 st ,3 rd , 5 th , 7 th and 12 th ) of EF-hand motifs, but it often abolishes the Ca 2+ binding affinity completely, which limits the study the function of CaBPs over a wide range of Ca 2+ binding affinity.
To overcome this shortcoming in the field of Ca 2+ signaling, we upgraded our previously available software in a way so that user can not only predict the Ca 2+ binding affinity qualitatively rather one can manipulate the binding affinities in-vitro by mutating the EF-hand motif residues. For upgrading our current software, we extracted the margin distances for each prediction from our non-linear hyperplane which is the distance between the decision boundary and the support vectors and calculated the weighted position-specific scores (PSM) for each prediction [23]. With two additional parameters to further classify the EF-hand loops, we designed a unique EF-hand loop with predicted capability to bind Ca 2+ with high affinity. The machine trained to identify Ca 2+ binding sites & estimate affinity was further used to design and discriminate between two similar sites based on new scores.
To validate the binding abilities of the new designed site, we introduced the designed sequence in N-terminal domain of Entamoeba histolytica Ca 2+ binding protein1, EF-hand loop 2 (Nt-EhCaBP1 EF-2 mutant) and we applied, biochemical, structural, and computational approach to validate our hypothesis. The binding affinity determined using ITC clearly shows Nt-EhCaBP1 EF-2 mutant forms of protein binds Ca 2+ with higher affinity than Wild type-Nt-EhCaBP1 (Wt-Nt-EhCaBP1).
Furthermore, we employed structural biology approach (X-ray crystallography) to understand the changes at the atomic level, that lead to changes in its Ca 2+ binding affinity. Further, we used normal mode analysis (NMA) [25] to obtain insights leading into the coupled motions and allosteric effects directly influencing the interactions in two Ca 2+ -binding sites of each protein.
The analysis indicated a direct influence of the EF-hand loops on the central helix as a result of long and short-range interactions affecting the entire residue network and hence enhancing the allosteric interactions in high-affinity sites. Based on sequence and structural investigations followed by the network analysis, we derived the basis of allosteric cooperativity shown by different EF-hand Ca 2+ -binding loops.
A set of scripts with new scoring functions and a user-friendly webserver to predict, design and engineer EF-hand binding loops were developed. The webserver and the downloadable set of scripts are available at http://sbl.jnu.ac.in/calb/index.html.

Scoring system for each prediction
Previously, we developed a method to identify and classify Ca 2+ -binding EF-hand loops [23]. The classifier uses support vector machine (SVM) machine learning with C and Gamma parameters for specifically a nonlinear SVM with a Gaussian radial basis function (RBF) kernel [23]. To increase the efficiency of the classification, we extracted the margin distance between the decision boundary and the support vectors for each prediction.
In general, margin is referred to as the distance of the vector (x) which is the Euclidean distance of x from the separating hyperplane [26], is given by Where w = perpendicular to the hyperplane and b is, the parameter used to determine the offset of the hyperplane from the origin along the normal vector.
The distance to the origin was calculated as |%| ||!|| . The margin (d (margin)) was calculated using the equation ( ) = _ + _ . The output is a signed distance that is either negative/positive, depending on which side of the hyperplane the point x resides [27,28].

PSSM based log likelihood scores (PSMLogL)
The conservation score based on PSSM (position specific scoring matrix) for each sequence submitted for classification was calculated by using the equation = ( / ).Where is the calculated PSSM score and Bij is the frequency of amino acid residue i at position j in matrix S, and was calculated by using the equation where q denotes the observed counts of amino acid residue type i at position j, Pi is probability of amino acid residue type i, b is a pseudocount considered here as the square root of the total number of training sequences, and n is the number of training sequences. Gij (PSSM score) represents the foreground model, which is the true homology and Pi represents the probability that a match occurs at random (background model) calculated using the BLOSUM62 substitution matrix [29]. In short, the PSSM-based scoring includes the relative frequencies obtained by counting the number of times each amino acid residue occurred at each position of the alignment, followed by normalization of the frequencies.

Design of the unique EF-loop site
We used SVM margin score (SVMMar) and the PSSM-based log likelihood score (PSMLogL) values to design a unique Ca 2+ -binding site, one does not present in any of the protein sequence databases [30,31]. The design was developed in order to improve the predictions of the classifier as well as to understand the binding mechanism by deciphering the roles of Ca 2+ -binding residues. The sequence design was carried out by substituting each of the 20 encoded amino acid residues in all possible 12 positions in the Ca 2+ -binding loop; giving a unique score with every iteration. The sequence DKDGDGFIDFEE showed a high score and the proteome wide BlastP [30] showed the absence of the site in all known databases. Therefore, the 2 nd EF-hand loop of NtEhCaBP1, i.e., DADGNGEIDQNE, was replaced with the DKDGDGFIDFEE sequence. This substitution was carried out using site-directed mutagenesis. Note that NtEhCaBP1 denotes the N-terminal domain of the EF-hand-containing Ca 2+ -binding protein1, which has been well characterized in our laboratory [32][33][34]. The N-terminal domain has two Ca 2+ -binding EF hand motifs. In order to construct the desired mutant, we incorporated 5-point mutations, namely A47K, N50D, E52F, Q55F and N56E, in the 2 nd EF-loop of NtEhCaBP1.
We selected NtEhCaBP1 and the NtEhCaBP1 EF2 site as the model in this study since this set provided several advantages. 1) The sequence of the designed construct and that of the wild type were fairly similar so that relatively few single-site substitutions were required ( Figure 1D). 2) We used an optimized set of protocols and availability of biophysical and structural data on the Nterminal domain construct of the protein.
3) The NtEhCaBP1 protein construct with two known Ca 2+ -binding sites allowed us to investigate the effect of one binding site on the other and hence to potentially understand the mechanism of cooperative binding in this protein. 4) The designed mutant sequence was unique.

Cloning of the NtEhCaBP1 mutant
The gene fragments corresponding to the N-terminal domain of the EhCaBP1 protein were cloned by using an existing N-terminal clone of EhCaBP1 as a template in the bacterial expression vector pET28(b). The mutations were created (in NtEhCaBP1 EF-II) at positions 141, 150, 156, 165, and 168 by performing site-directed mutagenesis. The following primers were used for the mutation: The mutations were confirmed by DNA sequencing.

Overexpression and purification of the NtEhCaBP1 EF-2 mutant
NtEhCaBP1 was expressed and purified as described in previously published literature [32]. The NtEhCaBP1 EF-2 mutant construct was transformed into Escherichia coli strain BL21 (DE3) for expression. Cells were grown in Luria-Bertani (LB) medium supplemented with 50 mg ml -1 kanamycin at 37˚C. The culture was induced with 0.8 mM IPTG when the OD reached 0.7 at A600.
It was then incubated at the same temperature for 3 h for further growth. Cells were harvested by subjecting by centrifugation at 7000 rpm for 10 min. The cell pellet was resuspended in suspension buffer (50 mM Tris pH 7.5, 2 mM EGTA). The cells were then lysed by freeze-thaw followed by sonication. A clear supernatant was obtained by carrying out centrifugation at 12000g for 30 min.
The protein supernatant was passed through an ion-exchange chromatography using DEAE resin column, was pre-equilibrated with ten bed volumes of suspension buffer. The column was then washed with 30-40 ml of wash buffer (50 mM Tris pH 7.5, 5 mM NaCl) to remove nonspecifically bound proteins. Finally, the protein was eluted with elution buffer (50 mM Tris pH 7.5, 5 mM CaCl2). Further, an ion exchange purified protein was subjected to gel filtration chromatography using Superdex G-75 column pre-equilibrated in buffer containing, 50 mM Tris pH 7.5, 5 mM CaCl2. The NtEhCaBP1-EF2 mutant protein peak was eluted at 65.93 ml (Figure 3.1). The molecular weight was calculated for observed peak using standard protein marker of different molecular weights, supplied by Sigma Aldrich. The purity of the protein was checked using 15% SDS-PAGE.

Crystallization of the NtEhCaBP1 EF-2 mutant
Prior to crystallization the SEC purified NtEhCaBP1 EF-2 mutant protein was concentrated to 15mg ml -1 by using a 3kDa-cut-off centricon filter. Crystallization trials were performed by using hanging-drop vapour-diffusion method in 24-well plates. 2µl of protein sample was mixed with an equal volume of precipitant solution in hanging drops and equilibrated against 500 ml of reservoir solution (precipitant). Initially the same crystallization condition was tried in which Wt-NtEhCaBP1 was crystallized [32], but it did not yield crystals. Rather the NtEhCaBP1 EF-2 mutant was crystallized in conditions similar to the crystallization condition of full length EhCaBP1 [35]. The NtEhCaBP1 EF-2 mutant crystallize in the presence of MPD (58%-63%), 50mM sodium acetate (pH 5.0-5.5), and 5mM CaCl2.
X-ray diffraction, data collection, and processing and structure determination diffracted to a resolution of 1.9 Å. Diffraction data were processed and scaled using HKL2000 [36]. Crystal belonged to the space group P212121, with unit cell parameters a= 44.6, b= 101.3, c= 107.4 Å. Matthews's coefficient, VM, was 2.90 Å 3 Da -1 , suggested presence of six molecules in the asymmetric unit, with a solvent content of 57.5% [37]. Structure was determined by molecular replacement method using Phaser program [38] and the assembled trimer of wild-type structure of EhCaBP1 (PDB code 2NXQ) was used as a search model [35]. Overall, thirteen Ca 2+ ions were identified in the electron density, two Ca 2+ ions in each chain at the center of the EF-hand loop and one extra Ca 2+ ion was observed which was not bound to an EF hand motif. All thirteen Ca 2+ ions were included in the refinement. Structure was refined by REFMAC5 [39] and carrying out iterations of model building using the COOT graphics package [40]. For the final model, the Rwork was 21.2 % and Rfree was 24.9% and quality of structure using check using the program PROCHECK [41] showed it to have good stereochemistry with 97.6% of residues lying in the most-favored regions of the Ramachandran plot. Structure factor and refined model of NtEhCaBP1 EF-2 mutant were deposited in the Protein Data Bank [42] with the accession code 5XOP. Data collection and final refinement statistics are shown in Table 1.

Structure and sequence analyses
We performed the sequence alignments using Clustal Omega and BioEdit [43,44]. The structural alignment was performed using the Dali server [45]. Protein-protein interactions between the subunits were calculated using the PDBsum webserver [46] and Dimplot [47]. The Ca 2+ coordination distances and angles were calculated using LIGPLOT [47] and the PLIP server [48] as well as using PyMol software [49]. Representations of the structure were prepared by using PyMol, Chimera [50] [45] and Photoshop software [51]. The individual residues involved in interactions of both the structures were analyzed by using the Arpeggio web server [52] and Hydrogen Bonds Computing Server (HBCS) [53]. The contact maps from the protein structures were calculated by using the Residue Interaction Network Generator (RING) web server [54].

Cross-correlation and dynamic network analysis
Network and dynamics cross-correlation map (DCCM) analyses were performed by using the Bio3D package [55]. Here, the proteins network systems were used to show insights into the structural relationships between three EF-hand-containing protein systems exhibiting different binding affinities. The dynamic networks were generated by using an ensemble of structures showing cross correlation between intramolecularly connected residues. We used normal mode analysis (NMA) [25] to obtain insights into the coupled motions and allosteric effects directly shown in the same color. The community clustering (clusters as communities in a graph), which deals with the graph partitioning problem, has been solved by the Girvan-Newman edgebetweenness approach which is implemented in Bio3D to generate the network [55]. The residues belonging to the communities/clusters in the graph are also shown in different color codes along with position of each residue in the DCCM on the X axis and Y axis.

Roles of EF-hand loop residues in Ca 2+ binding
Mutations occurring in the sequences of many critical genes form the major driving mechanism leading to the evolution of complex organisms. These mutations have been seen in different protein families across all biological systems. But mutations are not prevalent in certain locations of proteins such as their structural cores, and the conserved nature of these residues are suggestive of their important functional roles [56]. Enhancing the ability of a protein to carry out a specific function requires an understanding of the functional domains and the roles of conserved residues.
The EF-hand is a well-studied motif having many conserved residues, where the conservation is particularly strict at the Ca 2+ -binding site. The well-studied high preference of certain amino acid residues at particular positions of the motif, such as aspartate at the beginning of the motif and glutamate at the end indicates that some ligands are indispensable for Ca 2+ binding [17,57]. These negatively charged residues placed at many positions in a short stretch of sequence makes an ideal site for the binding of a positively charged divalent ion such as Ca 2+ or magnesium. In physiological conditions, both of these ions are present in the cell in high concentrations, yet the selectivity for the metal ion is highly specific in EF-hand proteins [58,59]. Ca 2+ -binding loops have evolved to reflect biological functions that require of the proteins a range of binding affinities [5]. In order to understand the role of individual residues in the Ca 2+ -binding loop, we classified our data into two groups, namely high-affinity sites (HASs) and low-affinity sites (LASs) and aligned the sequences within each group to derive sequence motifs of the two groups ( Figure 1).
Figures 1A and 1B illustrate, for the two groups, the distributions of residues at each position of the EF-hand loop of the Ca 2+ -binding site. The first position (X) of the loop is dominated by aspartic acid (D), known to be indispensable for Ca 2+ binding due to its being involved in the critical hydrogen-bonding network and due to its providing an oxygen for coordinating the Ca 2+ [60]. Some of the EF-hand Ca 2+ -binding sites with a non-aspartate residue at the X position have shown low or even no affinity for Ca 2+ . This observation has also been seen for numerous mutational studies of different EF-hand proteins [60]. Comparison of the high-affinity and lowaffinity sequences (figure 1A, 1B) suggests that positively charged lysine in the second position (non-interacting site) of the loop is present more often in HAS sequences than in LAS sequences.
In this position of the LAS sequences showing the occurrence of a variety of amino acid residues such as alanine and isoleucine. In the third position (Y position), aspartic acid occurs most often followed by asparagine. The eighth position is mostly occupied by the hydrophobic residue isoleucine, valine and leucine. A close inspection of the sequences indicates a relatively high occurrence of serine at the 9 th position of LAS sequences but not HAS sequences. In the twelfth position, glutamic acid is conserved in HAS sequences, not surprising since this residue provides a bidentate side-chain ligand for Ca 2+ binding. In a few LAS sequences, aspartic acid is present in the twelfth position. Aspartic acid at the 12 th position sometimes requires, because of the short side chain, one more water molecule to coordinate Ca 2+ , resulting in a lower affinity for the Ca 2+ due to increased solvent entropy [60]. The analysis suggested that Ca 2+ -binding loop residues that do not directly coordinate the Ca 2+ nevertheless have important roles for the binding of Ca 2 , including stabilizing the loop, contributing to the extensive intricate hydrogen-bond network required for the proper folding of the loop, and properly positioning the negatively charged residues of the loop into the three-dimensional arrangement best suited for Ca 2+ binding [60].

Scoring scheme to assist modeling of the EF-hand Ca 2+ binding site
In an earlier study, we were able to classify EF-hand domains based on their affinities (high, low and none) for Ca 2+ [61]. This classification was done on the basis of datasets trained in machine learning, specifically based on support vector machines (SVMs). The correlations derived from the use of various amino acid indexes showed high prediction accuracies when tested against many EhCaBPs [62]. In the present study, we extracted the margin distance from the decision boundary

Designing a high binding affinity EF-hand loop
So far, several mutational and design studies on EF-hand proteins have yielded distinctive impacts on their range of binding affinities, thus influencing the functions of those proteins [19,23] [59].
In many of these studies, mutations were made to help assess the Ca 2+ -binding capability of the protein. Substitutions in the first (X) and last (-X) positions of the binding site did yield desirable results, as these ligand positions are extremely important and require negatively charged chelating residues for proper function [57,63]. In order to validate our prediction method, we designed a unique EF-hand loop, that is not presents in any protein sequence database ( Figure 1C). The and PSMLogL score of 6.46, ", both comparable to those of the Ca 2+ -binding sites with high affinity from the literature [23]. Sequence alignment performed with EF-hand containing proteins from different families ( Figure 1C) suggested that the designed loop is most similar to the Ca 2+ -binding pollen allergen [64], and similar to the structure of the WtNtEhCaBP1 protein, which shows a head-to-tail arrangement with domain-swapped EF-hand pairing. We inserted the EF-hand loop sequence having the high predicted Ca 2+ binding affinity into the well-characterized WtNtEhCaBP1 Ca 2+ -binding site to attempt to manipulate the Ca 2+ binding affinity of this protein.
We specifically mutated the sequence of the 2nd EF-hand motif of the Ca 2+ -binding loop of NtEhCaBp1. The NtEhCaBP1-EF2 loop was selected because of the greater similarity between its sequence and that of the designed loop than between the sequences of the loops of other characterized Ca 2+ -binding proteins of E. histolytica and that of the designed loop ( Figure 1D).
The crystal structure of the N-terminal construct of NtEhCaBP1 has two Ca 2+ -binding sites, while the full-length protein has four Ca 2+ -binding EF-hand motifs [32,35]. In a previous study, ITC experiments using full-length protein showed four binding sites with two sites displaying high Ca 2+ -binding affinity and two sites with low binding affinity [65]. the proper folding of the molecule the construct was built with two EF-hand loops. We were also interested in understanding how synthetic EF-hand loops influence cooperative binding in this system. In order to accomplish these aims and to understand the biophysical characteristics of the protein containing the engineered loop, we purified both WtNtEhCaBP1 and the NtEhCaBP1 mutant (EF2) and compared them by performing biophysical and computational experiments. B) Sequence logo of the residues forming of low-Ca 2+ affinity EF-hand loops from the published dataset with experimentally known binding affinities [62]. The alignment showed certain position of the EF-hand loop to be relatively variable, but others to be quite conserved.

Solution state oligomerization and three-dimensional structure of NtEhCaBP1 EF-2 mutant
In our previous reports it has been shown that either full length EhCaBP1 or its C-terminal deleted mutant (NtEhCaBP1) form of protein always forms a trimer in solution or in crystal form [32,35].
In contrast to our previous observation, we found that while purifying NtEhCaBP1 EF-2 mutant protein differed in oligomeric state with respect to Nt-EhCaBP1. The size exclusion profile (SEC) of NtEhCaBP1 EF-2 mutant protein shows a peak correspond to molecular mass of about 42-44 kilodalton (kDa). As, the molecular wight of NtEhCaBP1 is 7.0 kDa, the SEC profile suggest the formation of hexamer in solution ( Figure 3A).
Consistent with the solution state hexameric form, crystal structure also shows presence of six molecule in the asymmetric unit, and it is arranged in the form of two trimer, which leads to hexamer formation. The structure is well refined, and a representative electron density map of EF-1 and EF-2 are shown in (Figure 3B & 3C). Structural analysis suggested that each trimer in Nt-EhCaBP1 EF-2 mutant structure adopted similar trimeric arrangement as formed in case of Wt-NtEhCaBP1 structure [32] ( Figure 3D).  Table 2) with nitrogen atom of glutamate-36 of the central helix ( Figure 4E). This interaction brings the both the trimer in close proximity thereby leading to hexamer formation, in solution as well as in crystal form.  Table 2 The hydrogen bonds formed between the trimer1 (Chain: A, B, C) and trimer2 (Chain: D, E, F).
The interactions between the three assembled domains in each trimer of Nt-EhCaBP1-EF2 mutant structure were found to be similar as it was observed in case of Wt-NtEhCaBP1 structure.
There are some additional hydrogen bonding interactions were found in hexamer of NtEhCaBP1-EF2 mutant structure. The various interactions involved in hexameric state of Nt-EhCaBP1-EF2 structure are shown in supplementary figures S1 and S2 (Table S4 -S5).

An extra Ca 2+ ion bound outside of EF-hand motif
Apart from the usual Ca 2+ ions occupied at each EF-hand motifs, an extra Ca 2+ ion was observed at the 2 nd EF-hand motif of chain B. This extra ion occupied space outside of the regular Ca 2+ -binding coordination sphere. In order to understand the binding of two Ca 2+ ions to one EF-hand motif, we calculated and plotted all the atomic interactions involving each of the two Ca 2+ ions ( Figure 4A). The first Ca 2+ ion was observed to interact with X, Y, Z, -Y, -Z, and -X, including specifically with one of the side-chain oxygens (

Mutation induced bend in the third helix (H3) of NtEhCaBP1 EF-2 structure
To investigate whether the mutation induced any structural changes in the overall structure or in particular EF-hand motif, we superimposed the NtEhCaBP1 EF-2 mutant structure on Wt-NtEhCaBP1 by using PyMol. An alignment of all 66 Cα atoms yielded an RMSD of 0.96 Å.
Structural superimposition shows the difference in the orientation of the third α-helix (i.e., H3) ( Figure 5A). Distance measurement between second helix and the C-terminal part of third helix indicates that NtEhCaBP1 EF-2 mutant structure acquire 76.3 Å whereas the wildtype structure shows the distance of 83.4 Å, suggesting the occurrence of a distinctive structural change due to this mutation of the Ca 2+ -binding sites. We calculated this difference in the orientation of the third helix by measuring the angle between CD2 atom of residue Leu-36, CD1 atom of residue Ile-44, and the OE-1 atom of residue Gln-65 ( Figures 5B & 5C). This measurement suggests a difference of about 7.1°, between the orientation of the third helix with respect to helix-2, in the mutant structure ( Figure 5C).

Hydrophobic interactions leading to positive cooperativity
Hydrophobic interactions are one of the driving forces in EF hand proteins due to their role in the cooperative binding of the two EF hand motifs to each other [13,17,21]. The locations of hydrophobic residues in the NtEhCaBP1 structure and in the NtEhCaBP1 mutant structure interacting are highlighted in green ( Figure 5E and 5F), where the four-helices forming the hydrophobic core are tightly associated with each other. These interactions mostly involved phenylalanine and tyrosine residues, which appeared to play an important role in stabilizing the EF-hand loop by forming interactions between one EF hand loop and the other ( Figure 5E and 5F).
The interactions were aided by the formation of an antiparallel EFβ-scaffold ( Figure 5E), which helps the loop to close while folding without causing large structural change [64]. The stronger allosteric interactions between the odd-numbered and even-numbered EF loops were found to lead to a greater calcium binding affinity; in the NtEhCaBP1 mutant, these interactions appeared to be further enhanced due to the presence of two additional aromatic residues (E52F and Q55F). The observation of conserved hydrophobic residues at the 6 th and 22 nd positions of EF hand motifs suggested the communication between the two EF hands to be chiefly mediated by the noncovalent interactions. interactions of the EF-1 and EF-2 motifs from two subunits of the NtEhCaBP1 mutant were found to mimic the pairing required for the proper functioning of the EF-hand proteins [53]. The hydrophobic residues are labeled and the residues in the EF-2 loop are shown in green. F) Similarly, the two chains from the trimer of NtEhCaBP1 is shown with residues forming the hydrophobic core.

Changes in overall charge distribution
To understand the differences between the charge distribution in the mutated loop and that in the  Figure S3). This result suggested that increasing the binding affinity for Ca 2+ does not necessarily depend upon introducing more negatively charged atoms but in creating an optimum distribution of charges where the stabilizing hydrophobic interactions overcome the destabilizing electrostatic interactions caused by the negatively charged residues in close proximity [66].

Comparison of EF-hand-2 of Nt-EhCaBP1-EF2 mutant structure with Wt-NtEhCaBP1
To understand if enhanced Ca 2+ binding observed in ITC data of Nt-EhCaBP1-EF2 mutant protein, contributed any changes in Ca 2+ coordination sphere we compared the distances and angles of the bound Ca 2+ with their respective interacting residues of the EF-2 loop of the mutant protein with the EF-2 of Wt-NtEhCaBP1(PDB code 2NXQ). The Ca 2+ coordination distance (d1) acquired by various residues in Nt-EhCaBP1-EF2 mutant structure were comparatively small with respect to the Ca 2+ coordinating distance (d2) acquired by respective residues of Wt-NtEhCaBP1 structure (Table 3). Compared to Wt-NtEhCaBP1 structure, each Ca 2+ coordinating residues of Nt-EhCaBP1-EF2 mutant structure shows the difference of ~0.1Å-0.5Å, suggesting that reduced distance in Ca 2+ of Nt-EhCaBP1-EF2 structure attribute the high Ca 2+ binding affinity.
Also, the oxygen-water-Ca 2+ angles (Θ1), of Nt-EhCaBP1-EF2 mutant structure shows the difference, for example Asp-46, interacting from the top of the pyramidal geometry showed an HOH-CA-OD1 angle of 172.6° compared to 158° for NtEhCaBP1-EF2 (Figure 6 A & 6B). The other coordinating contacts showed a variation of ~15° of angles (80° to 95°) amongst themselves (Table 3). These differences in bond lengths and angles between the mutant and wildtype structures may have caused the difference in ITC results, where the NtEhCaBP1-EF2 mutant protein sites showed higher binding affinities for Ca 2+ .

The EF-1 loop of the NtEhCaBP1-EF2 mutant also shows reduced bond distance and bond angle
Although, we could not change any residues of EF-1 of NtEhCaBP1-EF2 mutant protein, but a comparative analysis of the Ca 2+ coordinating distances of EF-1 of NtEhCaBP1-EF2 mutant structure also shows shrinkage in the Ca 2+ coordinating sphere with respect to EF-1 of Wt-NtEhCaBP1 structure (Table 4) ( Figure 6C and 6D). The OD1 atom of Asp-10, the first coordinating residue, was found to be positioned at the same site in both structures, but closer to the Ca 2+ ion in the NtEhCaBP1-EF2 mutant than Wt-NtEhCaBP1 structure. Along with other residues which are involved in the Ca 2+ coordination sphere, the 12 th position Glu-21 ligand was also observed to be closer to the Ca 2+ ion in the mutant than in WtNtEhCaBP1 (Table 4).   To overcome the problem of having small dataset, we included all six chains from two other reported structure of EhCaBP1 (3LI6 and 2NXQ) and six chains of NtEhCaBP1-EF2 mutant structure (5XOP) for distance comparison of each Ca 2+ ligating oxygen in EF-hand motifs ( Figure   6E). The average Ca 2+ coordination distance of NtEhCaBP1-EF2 mutant structure shows the shortest coordination distance for EF-2 than EF-1 and longest in EF1-of WtEhCaBP1 followed by EF-2 (Listings of the coordinating residues and distances for each of the individual chains from each structure are shown in Supplementary tables S7-S13). Cooperative binding enhances Ca 2+ binding capability of the EF-1 loop of the mutant protein EF-hand motifs mostly occur in pairs, and the Ca 2+ -binding mechanism and the resulting affinity also rely on how well these two motifs influence each other's presence by interacting with the residues at the interface of Ca 2+ -binding sites. The non-covalent interactions play very important role in the stabilization of a pair of EF-hand loops [61,64]. The communication between the two EF-hands in our structure was observed to be mediated by the non-covalent interactions involving conserved hydrophobic residues. We used the Residue Interaction Network Generator (RING) to identify covalent and non-covalent bonds, including π-π stacking and π-cation interactions, between the EF-1 and EF-2 motifs in both the NtEhCaBP1 and NtEhCaBP1-EF2 mutant structures

Dynamically coupled networks and correlated motions in EF-hand proteins
One of the factors that has been not yet been well studied in Ca 2+ -binding proteins is that how the motion of one domain of the protein influences that of the other and how the residues of these domains are networked together by long-range interactions so that they can manipulate each other's functions by undergoing small conformational changes. During the comparative analysis throughout this study, we noticed dissimilarities of many side chain interactions, with these dissimilarities due to different orientations of the side chains (Supplementary Figure S4). differing by only 5 residues, was quite surprising. Six residues ( Figure 8A) from the EF loops of NtEhCaBP1 were found to participate in the largest community (grey color) cluster compared to 13 residues in the NtEhCaBP1-mutant and 11 residues in CaM (community 2). The largest communities in each of the EF-hand motifs were determined to consist of residues mainly from the central helix. But the pattern varied in the high-affinity motifs, where the residues in the largest community were found to be distributed throughout the structure especially in the designed mutant ( Figure 8B), all of whose C-terminal residues were observed to be part of the largest connected  The coarse grain network based on dynamically coupled communities shows the central helix residues forming the largest community connected with few residues from the N-and C-termini of NtEhCaBP1, in contrast to the NtEhCaBP1-mutant and CaM where the central helix is strongly connected to the residues from the EF loop site and the residues at the C-terminal end.

Discussion
In this study we upgraded our earlier designed software 'Calbinder' in a way so that along with identification of an EF-hand protein sequence user can also design and manipulate Ca 2+ binding affinity of their protein of interest, as well. To confirm that our upgraded software helps in manipulating the Ca 2+ binding affinity accurately or reaching to our expectation, we designed a high affinity EF-hand motif sequence and validated our hypothesis by exploiting biophysical and structural approach. As it was predicted, the newly designed EF-hand motifs of NtEhCaBP1-EF2 mutant protein yielded a higher Ca 2+ binding affinity (ITC data) than the wild type protein. Further, with respect to Wt-NtEhCaBP1, the EF-2 of NtEhCaBP1-EF2 structure, shows a more compact Ca 2+ coordination sphere, suggest the tight binding of Ca 2+ . The compact Ca 2+ coordination sphere induced structural bend in the helix-3, due to which helix-3 of one trimer interact with the helix-2 of another trimer leading to the hexamer formation in the solution as well as in the crystal state.
We anticipate that the incorporated mutation in NtEhCaBP1-EF-2 protein induced the cooperativity, i.e., binding of Ca 2+ at one site induced the binding at second site. Due to the induced cooperative in Ca 2+ binding, the NtEhCaBP1-EF-2 form of protein may binds Ca 2+ with higher affinity.
As, the crystal structure of Wt-NtEhCaBP1 has Ca 2+ bound at its both EF-hand motifs [28], therefore the observation of week Ca 2+ binding in ITC experiment of Nt-EhCaBP1 was a surprising observation. A justified statement for this concern is that the crystallization trials of Wt-NtEhCaBP1 was performed in the presence of high Ca 2+ concentration (5mM), which allows Ca 2+ ion to accommodate in the crystal lattice, therefore it observed in the crystal structure. Further, to check if higher concentration of Ca 2+ may allow better binding for Wt-NtEhCaBP1 and the binding curve may achieve the saturation in ITC experiment, we stepwise increased the amount of Ca 2+ till 25mM, however it did not help in achieving the saturation point similar or close to Nt-EhCaBP1 EF-2 mutant form of protein (data not shown). Suggesting that Wt-Nt-EhCaBP1 form of protein could not bind Ca 2+ with high affinity even in presence high Ca 2+ concentration.
Binding affinity of Ca 2+ to EF-hand motifs containing CaBPs is widely governed by the residues present in EF-hand loop and the Ca 2+ binding affinity can be manipulated usually by altering the 1 st , 3 rd , 5 th , and 12 th residues of the active site. Altering these critical Ca 2+ binding residues of EFhand motifs may drastically reduce the Ca 2+ binding affinity or completely abolish the binding of Ca 2+ . Such approach reduces the possibility study the function of these proteins over a wide range of Ca2+ binding affinity. Our software addresses this issue and one can design/manipulate the EFhand motif sequence to achieve the desired Ca 2+ binding affinity, as per their experimental requirement. Most importantly with this software, manipulation of Ca 2+ binding affinity is not limited only with the residues which are directly involved in Ca 2+ coordination rather one can explore all the twelve residues of the EF-hand motif and manipulate the Ca 2+ binding affinity by performing permutation combination of entire twelve residues present at EF-hand loop. Such flexibility may allow user to manipulate the Ca 2+ binding affinity without affecting the conserve structural core (EF-hand motifs) of CaBPs. Although the experimental data presented in this manuscript is from only one protein, nevertheless it provides the strong confidence of machine learning based method of Ca 2+ binding affinity manipulation.
With enough data, we expect it to be possible to factor the participation of cooperative binding into a predictive model that should provide deeper insight into the physiological Ca 2+ -binding state.
In summary, the non-bonded interactions at the interface of two EF loops and the ionic interactions at the Ca2+-binding sites changed the binding characteristics of EF1 in the designed protein despite the residues of EF1 not having been altered. The allosteric communication from the coupling of structural changes at the Ca2+-binding sites helped regulate the binding characteristic of the proteins. As an allosteric effector, the EF2 mutation was shown to modulate the behavior of the EF1 site. This form of intramolecular communication provided insight into the role of residues that participates in Ca2+ binding via dynamically coupled coordination with other residues in the structure. We expect the structure-function relationship obtained from the analysis of EF hand proteins to aid the building of highly accurate predictive models of Ca 2+ binding in EF hand proteins. We also expect a better understanding of the various interactions carried out by all of the residues, i.e., not only those at the active sites but the entire EF hand motifs, to help in building machine-learned models capable of predicting the effect of cooperativity. Moreover, the understanding of accurate protein networks generated from high-resolution structural data can further enable the designing of similar proteins mimicking the EF-hand motif functions.