Computational evidence of a new allosteric communication pathway between active sites and putative regulatory sites in the alanine racemase of Mycobacterium tuberculosis

Alanine racemase, a popular drug target from Mycobacterium tuberculosis, catalyzes the biosynthesis of D-alanine, an essential component in bacterial cell walls. With the help of elastic network models of alanine racemase from Mycobacterium tuberculosis, we show that the mycobacterial enzyme fluctuates between two undiscovered states—a closed and an open state. A previous experimental screen identified several drug-like lead compounds against the mycobacterial alanine racemase, whose inhibitory mechanisms are not known. Docking simulations of the inhibitor leads onto the mycobacterial enzyme conformations obtained from the dynamics of the enzyme provide first clues to a putative regulatory role for two new pockets targeted by the leads. Further, our results implicate the movements of a short helix, behind the communication between the new pockets and the active site, indicating allosteric mechanisms for the inhibition. Based on our findings, we theorize that catalysis is feasible only in the open state. The putative regulatory pockets and the enzyme fluctuations are conserved across several alanine racemase homologs from diverse bacterial species, mostly pathogenic, pointing to a common regulatory mechanism important in drug discovery. Author summary In spite of the discovery of many inhibitors against the TB-causing pathogen Mycobacterium tuberculosis, only a very few have reached the market as effective TB drugs. Most of the marketed TB drugs induce toxic side effects in patients, as they non-specifically target human cells in addition to pathogens. One such TB drug, D-cycloserine, targets pyridoxal phosphate moiety non-specifically regardless of whether it is present in the pathogen or the human host enzymes. D-cycloserine was developed to inactivate alanine racemase in TB causing pathogen. Alanine racemase is a bacterial enzyme essential in cell wall synthesis. Serious side effects caused by TB drugs like D-cycloserine, lead to patients’ non-compliance with treatment regimen, often causing fatal outcomes. Current drug discovery efforts focus on finding specific, non-toxic TB drugs. Through computational studies, we have identified new pockets on the mycobacterial alanine racemase and show that they can bind drug-like compounds. The location of these pockets away from the pyridoxal phosphate-containing active site, make them attractive target sites for novel, specific TB drugs. We demonstrate the presence of these pockets in alanine racemases from several pathogens and expect our findings to accelerate the discovery of non-toxic drugs against TB and other bacterial infections.

Author summary 38 In spite of the discovery of many inhibitors against the TB-causing pathogen 39 Mycobacterium tuberculosis, only a very few have reached the market as effective TB drugs. 40 Most of the marketed TB drugs induce toxic side effects in patients, as they non-specifically 41 target human cells in addition to pathogens. One such TB drug, D-cycloserine, targets 42 pyridoxal phosphate moiety non-specifically regardless of whether it is present in the 43 pathogen or the human host enzymes. D-cycloserine was developed to inactivate alanine 44 racemase in TB causing pathogen. Alanine racemase is a bacterial enzyme essential in cell 45

55
Tuberculosis is one of the top 10 causes of mortality globally and according 56 to latest available estimates, 10.4 million people developed this disease in 2016, of which 4.9 57 million people were infected with multidrug-resistant TB strains (MDR-TB) [1]. The 58 prevalence of multidrug-resistant TB (MDR-TB) and extensively drug-resistant tuberculosis 59 (XDR-TB) necessitates the inclusion of novel anti-tubercular therapies and strategies in the 60 treatment of TB. Treatment regimen comprising simultaneous use of multiple drugs is the 61 current strategy in practice [2]. Despite the implementation of this strategy, TB mortality 62 rates have not abated. Therefore, efforts to eradicate the TB pandemic have been stepped up 63 globally through research oriented towards finding new drugs against the tubercle bacilli [3]. 64 Alanine racemase (EC 5.1.1.1; Alr), an essential bacterial enzyme [4] is a 65 popular drug target due to the absence of human homologs. The enzyme catalyzes the inter-66 conversion of L-and D-alanine and requires pyridoxal 5'-phosphate (PLP) as a cofactor. PLP 67 is covalently attached to the enzyme through an internal Schiff's base linkage [5]. In the L to 68 D direction, the enzyme catalyzes the formation of D-alanine, an essential component of D-69 alanyl-D-alanine found in the peptidoglycan layer in bacterial cell walls [5]. In some bacteria 70 including Escherichia coli [6], Salmonella typhimurium [7] and Pseudomonas aeruginosa 71 [8], there are two Alr isozymes (Alr1 and Alr2 (aka DadX)), responsible for the anabolic and 72 catabolic functions respectively. 73 The catalytically active form of Alr is a dimer [9], due to the participation of 74 residues from both the monomers towards the formation of a functional active site. A narrow 75 passage from the exterior forms an entryway to the substrate binding cavity in the active site 76 and is lined by conserved residues, some of which have been demonstrated to orient the 77 substrate molecules during their entry into the active site [10,11]. In Alr Mtb , the substrate 78 binding cavity is a small, conical space gated by two tyrosine residues (inner gates), which 79 restrict the entry of substances into the active site [12]. Carboxylates such as acetate, 80 propionate and substrate analogs such as alanine phosphonate co-crystallize in the substrate 81 binding cavities of alanine racemases [13][14][15] and are suggested to regulate catalysis by 82 competitive inhibition, though the exact control mechanisms are not known [16]. 83 Including the structure of Alr Mtb [12], there are around a dozen and a half 84 unique alanine racemase structures in protein databases [13,[17][18][19][20][21][22][23]. Though there has been 85 considerable interest in elucidating the detailed catalytic mechanism of D-to L-alanine 86 racemization in several organisms [5,10,24,25], the regulatory aspects of catalysis suffer 87 from lack of research. In spite of the discovery of a plethora of inhibitors against pathogenic 88 Alr [26-28], only one of them has reached the market as a TB drug. This drug (D-89 cycloserine) is a structural analog of D-alanine and binds to all PLP-containing enzymes non-90 specifically, including those in the host, inducing toxic side-effects [29]. Current drug 91 discovery efforts focus on finding safer, selective, non-substrate inhibitors. Several inhibitors 92 of Alr are non-substrate leads, whose target sites on the enzyme are not known. Of these, five 93 were shown to be non-toxic to mammalian cells in a high-throughput screen for anti-94 tubercular small molecule inhibitors [28]. Until now, there have been no studies concerning 95 the binding sites of these five drug-like leads (Fig 1) on the enzyme. Considering the 96 numerous hurdles in culturing M. tb and the urgency in developing novel drugs to contain the 97 superbug strains, we sought to determine the target sites of these leads through computational 98 studies. 99 In recent years, normal mode analysis (NMA) has been widely used in 100 probing large-scale, collective motions of proteins and has been increasingly utilized to 101 characterize the dynamic aspects of enzymes [30][31][32]. Particularly, elastic network model 102 (ENM) based NMA has been useful in studying intrinsic dynamics of slow protein motions 103 over longer timescales [33,34]. Computationally, the generation of elastic network models of 104 diverse protein conformations is less expensive compared to molecular dynamics (MD) 105 simulations [35]. In enzymes, ENM-NMA-predicted global motions represent biologically 106 relevant functional motions and have been shown to include local fluctuations such as loop 107 movements essential in catalysis [36]. We searched ENM-based Alr Mtb conformations for 108 target sites of lead inhibitors through multiple, robust search algorithms by a blind docking 109 strategy (BD). BD remains a common choice in the discovery of novel, allosteric binding 110 sites [37,38]. In conjunction with pocket search tools, BD is capable of identifying new 111 functional pockets on the target protein [39]. This strategy helped us in the successful 112 identification and validation of new pockets in Alr Mtb . Further to the above investigations, a 113 comparative study of the intrinsic dynamics of Alr homologs with the help of a range of 114 computational tools helped us gain new insights into the regulatory aspect of D-alanine 115 synthesis. 116

All-atom normal mode analysis 118
The putative regulatory pockets are conserved across homologs The crystal structure of 119 Alr Mtb is a kidney-shaped dimer, with two active site cavities opening on the convex side 120 (Figs 2A, 2C and 2E) and two pockets located on the concave side (Figs 2A and 2D). 121 Residues found to be missing (Fig 2B) in the crystal structure were from both internal and 122 terminal regions. The internal stretches of missing residues (176-180 of subunit A and 266-123 280 of subunit B), pertained to the same region, i.e., the mouth region of the first active site 124 cavity ( Fig 2E). 125

Fig 2 Structure of alanine racemase from Mycobacterium tuberculosis.
A. Molecular surface representation of the structure of alanine racemase (monomers A and B shown in green and cyan colours respectively). Magnified region shows the putative dimer interface groove (DIG) pocket on the dimer interface. B. Unresolved regions in the crystal structure of Alr indicated by different colours in the cartoon representation of the enzyme (missing Nterminus-yellow; missing C-terminus-blue; missing internal stretches-red). C. TIMbarrel of active site 2 showing the cofactor PLP (red sticks) covalently attached to the catalytic residue Lys44 (green sticks). Note that the active site is composed of residues from both monomers (B monomer shown in cyan colour and residues from A monomer are coloured green) D. Surface representation of the enzyme showing the putative regulatory sites (yellow) E. Surface representation of the enzyme showing tiny pockets (pink) flanked on either side by the active site cavities (red). (Due to the revision in UniProt sequence information, the residue numbers given in this work should be decremented by 2 in order to compare with the numbering provided in LeMagueres et al., 2005 [12]. For example, the residues, 176-180 in our work refer to residues, 174-178 in LeMagueres et al., 2005 [12]).
Alignment of the protein sequences of Alr homologs (Figs 3, S1 and S2) 126 revealed highly similar residues in the newly identified regions (described later): dimer 127 interface groove region (Fig 3B), putative regulatory sites ( Fig 3C) and a short helix (Fig  128   3D). On the other hand, the N-termini of the homologous Alr were of different lengths and 129 were dissimilar in sequence composition (Fig 3A). Despite the presence of terminus in their 130 sequences, 8 of the crystal structures of the homologs were devoid of either the N-terminus 131 (varied between 3-15 residues) or the C-terminus (varied between 1-6 residues) or both. Of 132 the remaining structures, eight were complete and showed disordered coils in their termini. 133 Both PSI-PRED (secondary structure predictor based on position-specific-scoring-matrices of 134 unique fold libraries) and Phyre2 (protein structure modeller based on a combination of ab 135 initio and template-based strategies) generated highly disordered coils in the terminal regions 136 homologs. In some of the homologs, the putative lid regions were shorter than those in M. tb. 177 In the closed conformations of such homologs, the pockets were not completely covered. 178  Other residues of the two pockets exhibit a unique arrangement in the sequence and 212 are placed side by side in an alternating fashion ( Fig 3C). Consequently, the adjacent 213 residues in the structure belong to one of the two pockets and assume opposing states 214 at any given instant during the dynamics. Therefore, the tiny and the RS pockets may 215 be fulfilling opposing roles in regulation. 216 Concordantly, higher deformation energies were seen in the pivot residues (96, 141,146, 261 236 and 263) of the dimer interface pocket region (DIG pocket region), part of which is the far C-237 terminal region (Fig 7). Apart from this hinge-like region, the putative terminal lid regions, 238 the catalytic tyrosine and the short helix region showed higher deformation energy peaks 239 signifying greater local flexibility (Fig 7). 240   homologs, refer to Text S1. Across the homologs, the conserved residues constituting the 274 invariant core of the enzyme were found clustered around the same amplitude (Fig S4-A). 275 Moreover, the alignment positions displaying partial conservation of residues also fluctuated 276 more or less to the same extent (Fig S4-B).   subsp. tengcongensis, which is a remote homolog of Alr Mtb (sequence identity=28.6%) shows 324 99.84% similarity in dynamics, as measured by the Bhattacharyya coefficient. Though the 325 differences between the RMSIP scores were more pronounced than those of BC (Table 3), 326 the latter is generally considered to be a better index for assessing the similarity of dynamics, 327 as it incorporates eigenvalues. It is to be noted that RMSIP does not represent the energetic 328 separation between the modes in the sets [43]. Sequence and structural similarity measures 329 such as RMSD values scored lesser than dynamics similarity measures such as RMSIP and 330 BC values (Table 3), proving that the conservation of dynamics far exceeds the sequence and 331 structural conservation in alanine racemases. 332   (Table S1)). The aromatic ring side of L2 -04 was 339 often found in pi-stacking interactions between the inner gate residues, Tyr366 and Tyr273' 340 (residue labeled with a prime to indicate that it belongs to the opposite monomer) while its 341 tail formed hydrogen bonds with the cofactor in the substrate binding cavity of Alr Mtb . 342 Substrate binding cavity measures 5.5 X 5.0 X 2.5 Å 3 and accommodates the substrate, L-343 alanine. Many guest substrates, substrate analogues and inhibitors such as acetate, 344

Exploring inhibitor binding sites on Alr
propanoate, L-alanine phosphonate, lysine and D-cycloserine have been reported to occupy 345 this cavity in homologs [13-15, 17, 44]. In the crystal structure of a thermo-stable Alr of a novel thermophile, Caldanaerobacter subterraneus subsp. tengcongensis [21], the substrate 347 is found between the catalytic residues, Lys40 and Tyr268 (equivalent to Lys44 and Tyr273 348 in Alr Mtb ) in the substrate binding cavity and forms hydrogen bonds (2.7 Å) with the catalytic 349 tyrosine. We found that the substrate, alanine (Fig S12)  Superposition of the open and closed states, both of whose RS pockets were 389 bound with high affinity inhibitor poses (Fig 10A), clearly demonstrated the twisted active 390 site cavity in case of the closed state. Docked poses of the bound inhibitors were observed to 391 interact with charged RS pocket residues, Arg378 or Asp46 or both and such interactions 392 appears to be driving the pull experienced by the short helix, H2 (Y48-G47-D46-A45-K44), 393 seen in the normal mode motions. Such a movement of the short helix (Fig 10B) between the 394 active site cavity and the RS pocket leads to the expansion and contraction of the active site 395 cavities, as observed in the conformations of LF 8 . The catalytic residue, Lys44, linked by a 396 covalent bond with the cofactor PLP on the inside of the TIM-barrels of the active site cavity 397 (Fig 2C), would be dragged along with Lys44 towards the periphery of the active site cavity. 398 As a result, the orientation of the catalytic residues would be lost. Tyr48, which walls the 399 substrate binding cavity on one side through its side chain, forms the other end of the short 400 helix and therefore would also be displaced, leading to the rearrangement of the substrate 401 binding cavity (Fig 10B). Thus, the dynamic interactions between the inhibitor and the 402 enzyme residues, viz., Arg378---Asp46 ( ranks all the three middle residues of the short helix as highly flexible residues in the order, 414 Glycine > Serine > Alanine, Aspartic acid and Asparagine. This result is in agreement with 415 the need for higher conformational flexibility in the short helix residues in order to move 416 between the RS pocket and the active site upon inhibitor binding. Supporting the above 417 results, NMA studies show that deformation energies of the short helix residues are higher 418 than the surrounding structure, indicating higher local flexibility (Fig 7). Generally, in TIM-419 barrel structures, there is a repetition of 8 alternating α helices and β strands. But, in case of 420 alanine racemases, the arrangement of the active site TIM-barrel is as follows: α1-β1-α2-α3-421 β2-α4-β3-α5-β4-α6-β5-α7-β6-α8-β7-α9-α10-β8. It appears that the short helix H2 (α2), is an 422 additional insertion (most likely by the splitting of the original second helix into α2 and α3) 423 into the conventional TIM-barrel arrangement, the insertion event evolving probably later, in 424 order to carry out allosteric regulation. 425  (Table 4). In 441 contrast, the active site entrance was twisted and closed in the closed state, rendering the 442 entryway (active site entrance) inaccessible. In such a shut active site cavity, catalysis is not 443 feasible. Therefore, we reason that the open state is catalytically active. 444 Table 4 Hydrogen bonds between L-alanine and alanine racemase residues  (1) 550 where K AA represents the sub-matrix of K corresponding to the aligned C-α atoms, K QQ for 551 the gapped regions, and K AQ and K QA are the sub-matrices relating the aligned and gapped 552 sites [65]. The normal modes of the individual structure in the ensemble can then be obtained 553 by solving the eigenvalue problem, 554 where V is the matrix of eigenvectors and λ, the associated eigenvalues. 556 In order to analyze the flexibility profile of the mycobacterial racemase, 557 cross-correlations of residual fluctuations and deformation energy profiles were generated on 558 the filtered normal mode data. Across homologs, the alanine racemase motions along the 559 selected normal modes were compared with the help of similarity measures, viz., RMSIP and 560 Bhattacharyya coefficient. 561 where represents the i th eigenvector, the corresponding eigenvalue, and N, the number 591 of C-α atoms in the protein structure (3N−6 non-trivial modes). As formulated by Fuglebakk 592 et al. [69], the Bhattacharyya coefficient can then be written as, 593             Table S1. Results of docking simulation runs of substrate and inhibitors on the ensemble conformations of alanine racemase from Mycobacterium tuberculosis. Table S2. Properties of pockets of NMA ensemble conformations as calculated on the CASTp server.

Flexibility measures to assess Alr
Text S1. Multiple sequence alignment of Alr homologs utilized in ensemble NMA (shown with alignment positions).
Movie S1. Secondary structure representation of normal mode number 8 in Alr Mtb . Nterminal putative lid-like region is shown in red colour. Helices H3 (yellow) and H4 (violet) undergo displacements.