IDH1 and IDH2 mutants identified in cancer lose inhibition by isocitrate because of a change in their binding sites

IDH1 and IDH2 are human enzymes that convert isocitrate (ICT) into α-ketoglutarate (AKG). However, mutations in positions R132 of IDH1 and R140 and R172 of IDH2 cause these enzymes to convert AKG into 2-hydroxyglutarate (2HG). Concurrently, accumulation of 2HG in the cell is correlated with the development of cancer. This activity change is mainly due to the loss of the competitive inhibition by ICT of these enzymes, but the molecular mechanism behind this loss of inhibition is currently unknown. In this work we characterized the inhibition and loss of inhibition of IDH1 and IDH2 by means of the binding energies derived from molecular docking calculations. We characterized the substrate binding sites and how they differ among the mutant and wild type enzymes using a Jaccard similarity coefficient based on the residues involved in binding the substrates. We found that molecular docking effectively identifies the inhibition by ICT in the wild type and mutant enzymes that do not appear in tumors, and the loss of inhibition in the mutant enzymes that appear in tumors. Additionally, we found that the binding sites of the mutant enzymes are different among themselves. Finally, we found that the regulatory segment of IDH1 plays a prominent role in the change of binding sites between the mutant enzymes and the wild-type enzymes. Our findings show that the loss of inhibition is related to variations in the enzyme binding sites. Additionally, our findings show that a drug capable of targeting all IDH1 and IDH2 mutations in cancer is unlikely to be found due to significant differences among the binding sites of these paralogs. Moreover, the methodology developed here, which combines molecular docking calculations with binding site similarity estimation, can be useful for engineering enzymes, for instance, when aiming to modify the substrate affinity of an enzyme.

124 Structures without substrate 125 In the IDH1 R132H WS structure obtained from the PDB (PDB ID: 3MAR) some of the 126 unresolved residues were considered relevant for the protein-ligand molecular docking due to 127 their possible proximity to the binding site. An additional structure of IDH1 R132H WS with all 128 its residues was modeled over the structure of IDH1 R132H WS and defined as IDH1 R132H 129 Modeled (Mod.) WS. The protocol followed for modeling was the same as for IDH2 WT OF 130 model. Because IDH1 R100Q WS was not available in PDB database, it was modeled by 131 introducing mutations into the structure of IDH1 R132H WS. The structure was mutated (R100Q 132 and H132R) with the Maestro Suite Platform on all chains and the mutated residues were 133 minimized using the Prime tool of Maestro. IDH2 R140Q WS (PDB ID: 5SVO) and IDH2 134 R172K WS (PDB ID: 5SVN) were available in the PDB database. 135 We aimed to perform a comparison between the binding sites of the open, intermediate and 136 closed structures among the WT and mutant IDH1 and IDH2 enzymes. The secondary structure 137 of the regulatory segments was annotated using the Maestro Suite Platform (Table 3). each subunit. This step was essential to identify the OC and the SC of each 146 structure, so the binding site and binding energies for the complexes were compared among 147 clefts of the same kind. The residues flanking the clefts in IDH1 were obtained from the 148 literature  and the residues in IDH2 and porcine IDH2 were inferred by sequence 149 alignment with IDH1 (Table 4). The back cleft is flanked by residues of the same subunit, 150 whereas the active site cleft is flanked by residues on opposite subunits, so we emphasized 151 whether the flanking residues belonged to the same chain (S. Ch.) or the opposite chain (O. Ch.). 152 The width was measured as the distance between the α-carbons of the flanking residues.
153 Root Mean Square Deviation (RMSD) between both subunits of each structure was also 154 calculated in order to classify the structures as symmetric or asymmetric. These measurements 155 were used for later analyses of the binding sites. RMSD values were calculated based on the α-156 carbons of the structures using Protein3Dfit (Lessel & Schomburg, 1994).
157 Molecular docking 158 A validation was necessary to prove the fitness of the molecular docking algorithms used for the 159 enzyme-substrate complex system under study. The method for validating consisted of removing 160 co-crystallized molecules from protein complex structures and then re-docking them on their 161 original site, testing if the docked molecule position was equivalent to the original molecule 162 position (Warren et al., 2006). The molecular docking algorithms Autodock Vina (Morris et al., 163 2009) and Glide XP (Schrödinger, 2015) were validated by redocking ICT on IDH1 WT CF and 164 AKG on IDH1 R132H CF (data not shown). Glide XP outperformed Autodock Vina and 165 therefore was selected for the molecular docking simulations reported in this work.
166 Receptor files for the molecular docking simulations were prepared with the "Protein Preparation 167 Wizard" of the Maestro Suite Platform. All substrate structures were obtained from PubChem 168 (Kim et al., 2016). PubChem CID code for ICT is 1198 and for AKG is 51. Ligand files were 169 prepared using the "Ligand Preparation Wizard" of the Maestro Suite Platform. The size of the 170 grid was established at 40Åx40Åx40Å and it was centered on the centromere of the binding 171 residues in the closed form of the enzymes, calculated independently for each grid (Table 5). For 172 all structures, binding residues were defined as the residues within 4Å from the substrates, and 173 the centromere was calculated using the α-carbons of the binding residues. The closed structure 174 forms used to obtain the binding residues in closed formation were IDH1 WT CF and IDH1 175 R132H CF. The binding residues were identified in all chains of both structures and included in 176 the residue set used for defining the centromere location. Interestingly, both closed structures had 177 the same binding residues. The binding residues of IDH2 were mapped from IDH1 by sequence 178 alignment.
179 Molecular docking comparisons 180 We used the docking score of Glide XP as the binding energies of the docking simulations. The 181 differences registered among the binding energies of both ICT and AKG and the receptors in the 182 molecular dockings were obtained in order to confirm that the mutant enzymes were, in effect, 183 uninhibited when compared to the wild type enzymes (hereafter ΔΔG). The mutant ΔΔG value 184 was then compared with the registered difference in the wild type enzymes (hereafter ΔΔΔG). 185 The lowest registered ΔΔG value of the WT enzymes for each cleft (LΔΔG) was used in the 186 comparisons to increase the stringency of our analyses.
187 The binding sites of the molecular docking assays were defined as the residues within a 4Å 188 distance from the substrates. The similarity between the binding sites was measured using the 189 Jaccard Similarity Coefficient (JS) (Jaccard, 1912) as shown in Eq. 1. The similarity between the 190 binding sites of the substrates was computed considering the similarity between the binding sites 191 of both substrates (Both S.) ICT and AKG together, or each substrate individually. The structure 192 of IDH1 R132H CF was ignored as it presents the same binding residues as IDH1 WT CF.

Reported mutations for IDH1 and IDH2
196 Substitutions reported in tumours for positions IDH1 R100, IDH1 R132, IDH2 R140 and IDH2 197 R172 were considered for identifying the most common mutation per position (Table 6). A total 231 Secondary structure of the regulatory segment 232 IDH1, and by homology IDH2, present a self-regulating mechanism involved in blocking the 233 conversion of ICT into AKG when the concentration of ICT is low. The segment of the enzyme 234 participating in this mechanism, the regulatory segment, blocks the access of the substrate to the 235 catalytic residues, and can only be displaced when enough concentration of the substrate is 236 reached (Xu et  242 Mutation IDH1 R132H increases the flexibility of the regulatory segment, hindering its 243 regulatory function. This increased flexibility also prevents the regulatory segment from being 244 resolved during X-ray crystallography, as it happens in the structure IDH1 R132H WS (Yang et 245 al., 2010). As Dang et al exposed in 2008, the absence of the regulatory segment in the binding 246 site forms a new binding site (Dang et al., 2009). Due to the importance of the secondary 247 structure of the regulatory segment, we annotated the secondary structure of all its residues in all 248 the studied structures (Table 3). 249 The following findings are worth considering: 250 -IDH1 R132H IF regulatory segment has some unresolved residues. 251 -IDH2 R140Q WS and IDH2 R172K WS regulatory segments present an α-helix structure, in 252 concordance with the observations above that the differences between cleft widths of the same 253 kind and RMSD values between chains are equivalent to those of the known closed forms.
254 -IDH1 R132H Mod. WS regulatory segment has a loop structure, as expected, given that it was 255 modelled using the IDH1 R132H WS structure that had most of its regulatory segment 256 structurally unresolved.
257 Molecular docking binding energies 258 We docked both substrates ICT and AKG into the active sites of the open forms of the WT 259 enzyme structures as well as in the WS forms of the mutant enzymes. Then we evaluated their 260 binding energies (i.e. ΔG) in order to confirm that the binding energy differences between the 261 substrates (ΔΔG) was smaller in the mutant enzymes (IDH1 R132H WS, IDH2 R140Q WS and 262 IDH2 R172K WS) than in the wild type and IDH1 R100Q WS structures. Thus, explaining the 263 loss of inhibition by ICT in the former group of mutants. We used the lowest binding energy of 264 the wild type complexes (LΔΔG) for each cleft as reference, independent of the enzyme studied 265 (IDH1 or IDH2), to increase the stringency of our calculations (  271 We identified the residues in the binding sites that bind each substrate ( Figure 2) and compared 272 them using the Jaccard similarity index (Figure 3) to define a binding site similarity (BSS). We 273 did not attempt to identify a BSS threshold above which sites are similar with statistical 274 significance. Instead, we defined a threshold of 0.5 to differentiate two broad populations of 275 comparisons, and just consider those with BSS greater than 0.5 in our discussion of results, 276 noting that 15.6% of all comparisons made were in this group. We also aimed to identify binding 277 sites among the mutant enzymes that were similar to the binding sites of known functional 278 structures. These known functional structures were IDH1 WT CF, IDH1 R132H IF, IDH1 WT 279 OF and IDH2 WT OF. 289 Discussion 290 Contrasts between IDH1 R100Q WS and IDH1 R132H WS binding sites 291 IDH1 R100Q WS structure was modelled by introducing two mutations (R100Q and H132R) on 292 both chains of IDH1 R132H WS. Although small structural changes were expected for IDH1 293 R100Q WS and IDH1 R132H WS binding sites, significant variations were detected in both the 294 binding energies and the binding residues among these structures. The binding energies obtained 295 by our molecular dockings indicated that IDH1 R100Q conserves its inhibition by ICT, and thus 296 it is not expected to produce 2HG. Although to our knowledge there are no functional 297 characterizations of IDH1 R100Q, there is an IDH1 R100 mutant, IDH1 R100A, that has been 298 proven to produce 2HG (Ward et al., 2012). However, nucleotide substitutions resulting in 299 alanine mutations are rarely found in tumours and, in addition, it is not among the possible amino 300 acid substitutions for IDH1 R100 position attained with just one nucleotide substitution. Instead, 301 the most common mutation of IDH1 R100 in cancer is IDH1 R100Q (Table 6). This is due to the 302 presence of a CpG site on IDH1 R100. It is well established that CpG (CG) sites are prone to 303 mutating into TG sites (Cheng & Blumenthal, 2011). It is interesting to note that the two reported 304 mutations in IDH1 R100 are CG -> TG in the sense (R100X) and antisense (R100Q) strands.
305 The binding sites of the mutant enzymes are different among themselves 306 One of the expected results of this work was finding a high similarity between the binding sites 307 of IDH1 R100Q WS and IDH2 R140Q WS and between those of IDH1 R132H WS and IDH2 308 R172K WS, given that mutations occur in analogous positions. However, our results show large 309 differences among these binding sites, as showed in Figure 3. But even though binding sites are 310 very different, all of these mutants, with the exception of IDH1 R100Q, produce 2HG. Even 311 more, it has been reported that mutating IDH1 R132H analogous positions in isocitrate 312 dehydrogenase enzymes leads to 2HG production in at least two yeast species (Song et al., 313 2014).
314 Differences among mutants are registered not only in the binding sites but also in other structural 315 features. The regulatory segment is unresolved in IDH1 R132H WS, so presumably it is 316 behaving as a flexible loop. However, the regulatory segment in IDH2 R140Q WS and IDH2 317 R172K WS are fully folded into α-helices, as in the WT closed form enzyme (Table 3). It could 318 be that the regulatory segment in IDH2 does not need ICT or AKG to fold into an α-helix and 319 successfully close the enzyme. This observation is also aligned with other structural features of 320 IDH2 mutants (differences between cleft widths of the same kind and RMSD values between 321 chains) which are similar to those of the closed form enzymes ( 329 The importance of the regulatory segment 330 Mutant binding sites are deeply influenced by the regulatory segment behavior, which either 331 exposes a previously inaccessibly site (IDH1 R100Q WS and IDH1 R132H WS), is part of the 332 binding site or helps the change in enzyme form (IDH2 R140Q WS and IDH2 R172K WS). 333 However, a regulatory segment like the one in IDH1 has not been previously characterized in 334 enzymes that were not IDH1 homologs. The regulatory segment is in the β-sandwich domain of 335 the enzyme, while the active site is in the Rossman domain . This means that the 336 regulatory segment is not necessarily exclusive of enzymes with the Rossman domain or 337 enzymes with isocitrate dehydrogenase activity.
338 Conclusion 339 Our methodology, i.e. characterization of the inhibition loss through molecular docking, proved 340 to be a successful approach to our system. Binding energies reported by molecular docking were 341 consistent with the known inhibition/loss of inhibition phenotypes from the different enzymes