Structural and functional characterization of Rv0792c from Mycobacterium tuberculosis: identifying small molecule inhibitors against GntR protein

In order to adapt in host tissues, microbial pathogens regulate their gene expression through an array of transcription factors. Here, we have functionally characterized Rv0792c, a GntR homolog from M. tuberculosis. In comparison to the parental strain, ΔRv0792c mutant strain of M. tuberculosis was compromised for survival upon exposure to oxidative stress, cell wall agents and infection in guinea pigs. RNA-seq analysis revealed that Rv0792c regulates the expression of genes that are involved in stress adaptation and virulence of M. tuberculosis. Solution small angle X-ray scattering (SAXS) data steered model building confirmed that the C-terminal region plays a pivotal role in dimer formation. Systematic evolution of ligands by exponential enrichment resulted in identification of ssDNA aptamers that can be used as a tool to identify small molecule inhibitors targeting Rv0792c. Using SELEX and SAXS data based modelling, we identified residues essential for the DNA binding activity of Rv0792c and I-OMe-Tyrphostin as an inhibitor of Rv0792c aptamer binding activity. Taken together, we provide a detailed shape-function characterization of GntR family of transcription factors from M. tuberculosis. To the best of our knowledge, this is the first study that has resulted in the identification of small molecule inhibitors against GntR family of transcription factors from bacterial pathogens.


INTRODUCTION
The models of unliganded (His)6-Rv0792c and ssDNA aptamers with residue/base 243 level details were generated using the primary structures of protein and aptamers. SWISS-244 MODEL server was used to search for structural templates of Rv0792c 245 (https://swissmodel.expasy.org) (31). The results provided the putative template for Rv0792c 246 from residues 53-286 of lin2111 from Listeria innocua [PDB ID: 3EDP]. The amino-terminus 247 1-52 residues and carboxy terminus segment from 287-303 residues were generated using 248 molecular dynamics of the segments. MD stimulation studies were performed using Tinker 249 molecular modeling package v 4.2 along with OPLSUA forcefield. Advanced Newton 250 Raphson method was employed to compute structures of segments at 298K in implicit water 251 (e = 80). Simulations were run for 10 ns with restart coordinates written at every 1 ps. The 252 predominant low energy structures were filtered out as described previously (32,33). SAXS 253 data supported a dimeric state of unliganded Rv0792c, thus using the dimer as a central 254 scaffold, two copies of predominant low energy conformations of N-and C-terminal segments 255 were aligned in space using the SASREF program, as reported previously (34). The composite 256 structure of Rv0792c was generated and energy minimized by performing template-based 257 modeling using the SWISS-MODEL server. ELNEMO server was used to compute low 258 frequency collective vibrations accessible to the protein structure (35). ssDNA aptamers were 259 modeled using their sequences and the ICM 3.8 program. SAXS data supported their masses 260 to be close to their monomers and thus monomeric forms of all three aptamers were considered 261 for modeling studies. Repeated runs of global minimization and local optimization were 262 performed till the conformations did not change more than 0.01 RMSD across all atoms. By 263 comparing theoretical SAXS profiles of the ten lowest energy conformations of aptamers with 264 experimental SAXS data on the aptamers, the best conformation of aptamer agreeing with 265 experimental data was identified. Further, the ELNEMO server was performed to compute the 266 most collective low energy vibration mode to compare with SAXS data-based information and 267 shape. These structures of ssDNA aptamers were docked on the structure of dimeric Rv0792c 268 protein, and their pose on the protein was identified using SAXS data of the complexes as 269 reference. The graphs pertaining to SAXS data analysis were prepared using OriginLab v5 270 software. The images of molecular models were prepared using open-source Pymol v 1.1 and 271 UCSF Chimera softwares v 1.14. 272 In silico screening of drug-like molecules. Molecular docking studies were performed using 273 ICM Chemist Pro software v3.8. Using the interaction distance mapping option, all residues 274 within 3 Å of the interacting surface of Rv0792c to aptamers, from the models of protein: 275 aptamer complexes were selected. From both chains of Rv0792c,[224][225][226][227][228][229][230][231][253][254][255][256][257][279][280][281][282][283][284][285][286][287][288][289][300][301][302][303] were selected to form the aptamer binding site. 277 The library of approved drugs from drugbank.ca was used for docking studies, and the docking 278 was done in an automated manner. Full degrees of freedom and rotations were given to the 279 ligand during evaluating docking poses on the identified receptor surface. The docking was 280 carried out individually with each ligand, and its various poses with respective to the receptor 281 pocket's charge and shape profile was calculated. The scores obtained of the docked pose were 282 then arranged from low to high, and the top ten lowest scoring ligands were further selected 283 for optimizing receptor residues around the low score pose of ligand to obtain new score. The 284 shortlisted ligands were further arranged according to the 4D docking score. 285 Next, we performed competitive ALISA to determine the ability of top two hits to 286 compete with the binding of aptamer to Rv0792c. The coating of the wild-type protein and 287 blocking of non-specific sites was performed as described above. Subsequently, the binding of 288 aptamer was determined in the presence or absence of the top-two small molecule inhibitors. 289 For IC50 determination assays, inhibition assays were performed in the presence of 2.0-fold 290 serial dilutions of small molecules. IC50 values were calculated as the drug concentration that 291 showed 50% inhibition for aptamer binding with Rv0792c. 292 Data availability: The raw data files for RNA-seq experiments has been deposited at NCBI: 293 PRNJA727912. The data pertaining to unliganded protein, aptamer and protein-aptamer 294 complexes is available at https://www.sasbdb.org/project/1396/kv4wukfdsj. 295 Statistical analysis. GraphPad Prism 8 software (version 8.4.3, GraphPad Software Inc., CA, 296 USA) was used for statistical analysis and graphs generation. Significant differences between 297 indicated groups were calculated using the 't-test' function and were considered significant at 298 a P-value of <0.05. 299

Rv0792c from M. tuberculosis belongs to HutC sub-family of GntR transcription factors. 301
GntR family of transcription factors are highly conserved in the bacterial kingdom and M. 302 tuberculosis genome encodes for eight GntR homologs (Rv0043c, Rv0165c, Rv0494, Rv0586, 303 Rv0792c, Rv1152, Rv3060c and Rv3575c). Multiple sequence alignment revealed that GntR 304 homologs from M. tuberculosis shared almost identical residues in DNA binding amino 305 terminus region. As expected, not much sequence identity was seen in the effector binding 306 region of the GntR family of transcription regulators. Phylogenetic analysis revealed the 307 formation of two preponderant groups. Group-1 is the largest and ~91% of proteins are 308 clustered together in Group-1, possibly due to high similarity in amino acid sequences ( as HutC from P. putida, DasR from S. coelicolor, NagR from B. subtilis, PhnR from S. enterica, 311 MngR from E. coli (10,(36)(37)(38). Multiple sequence alignment analysis between Rv0792c and 312 other HutC homologs revealed that these proteins share an identity of ~ 30% among themselves 313 (Fig. S1). FadR homologs from M. tuberculosis (Rv0494, Rv0586 and Rv3060c) and YtrA 314 homolog (Rv1152) also grouped with their respective analogs from other bacterial species (Fig.  315 1A). The Group-2 cluster consisted of Rv3575c and GntR homolog from Klebsiella 316 Pneumoniae that shared similarity 24% among themselves (Fig. 1A). In the present study, we 317 have performed experiments to biochemically, functionally and structurally characterize 318

Rv0792c from M. tuberculosis. 319
Rv0792c is a dimeric protein and bind its own promoter. For biochemical characterization, 320 Rv0792c was cloned in pET28b and recombinant protein was purified with amino-terminus 321 histidine tag (Fig. 1B). The purity of various fractions was confirmed by SDS-PAGE analysis. 322 The purified fractions were dialyzed, concentrated and subjected to sedimentation velocity 323 ultracentrifugation experiments at varying protein concentrations, 0.38 mg/mL, 0.76 mg/mL 324 and 1.52 mg/mL (Fig. 1C). The continuous distribution (c(s)) analysis of absorbance scans at 325 different protein concentrations revealed that Rv0792c predominantly sediments at s20,w of 326 ~2.3S, consistent with a molecular weight of ~58kDa, thereby suggesting the protein is 327 primarily dimeric in solution ( Fig. 1C and Table 1). Additionally, very small fractions of higher 328 order oligomeric species sedimenting at s20,w of ~4.3S (8-11%) and ~8.5S (4-5%) 329 corresponding to 130 kDa and 396 kDa, respectively, were also observed (Fig. 1C). The 330 increase in fraction of species sedimenting at ~4.3S and ~8.5S with increased protein 331 concentration suggest formation of higher order oligomers at relatively higher concentrations 332 of the protein. It has been previously reported that the GntR family of transcription factors bind to 334 their own promoters and autoregulate their own expression (39,40). Therefore, we next 335 performed EMSA assays to study the binding of purified Rv0792c with its native promoter. As 336 shown in Fig. 1D, the purified protein was able to bind to the radiolabeled promoter in a dose-337 dependent manner. Clear retardation was seen in the mobility of labeled DNA in the presence 338 of purified protein. These observations suggest that similar to other GntR homologs, Rv0792c 339 binds to its own promoter and likely autoregulates its expression (39,40). DNA binding 340 domains of GntR protein at the amino-terminus are highly conserved (14, 41). Multiple 341 sequence alignment analysis revealed that residues important for DNA binding, Arg49 and 342 Gly80 of Rv0792c that corresponds to Arg35 and Gly66 of FadR proteins were conserved. 343 Next, we performed EMSA assays using labeled Rv0792c promoter and purified wild type, 344 Rv0792c R49A and Rv0792c G80D mutant proteins. We observed that Rv0792c R49A mutant protein 345 binds to the Rv0792c promoter (Fig. 1D). However, the mutation of glycine 80 to aspartic acid 346 completely abrogated the ability of Rv0792c to bind to its native promoter (Fig. 1D). Based on 347 these findings, we conclude that Rv0792c binds to its own promoter and glycine residue at 348 position 80 is essential for its DNA binding ability. The exact role of GntR homologs in M. tuberculosis pathogenesis has not been deciphered 355 extensively. Here, we determined the role of Rv0792c in physiology, stress adaptation and 356 virulence of M. tuberculosis. Using temperature sensitive mycobacteriophages, we 357 generated a D0792c mutant strain of M. tuberculosis (Fig. S2A). The construction of the 358 mutant strain was confirmed by PCR and qPCR using gene specific primers. As shown in 359 the complemented strain was confirmed by qPCR (Fig. S2C). As shown in Fig. S2D and 362 S2E, no changes were observed in growth patterns and colony morphology of parental and 363 D0792c mutant strain of M. tuberculosis. 364 We next compared the survival of various strains in different stress conditions in 365 vitro. As shown in Fig. 2A, a growth defect of 5.0-and 11.0-folds was seen in the survival 366 of mutant strain in comparison to the parental strain after exposure to oxidative stress for 24 367 hrs and 72 hrs, respectively ( Fig. 2A, *P<0.05). This growth defect associated with the 368 mutant strain was restored in the complemented strain ( Fig. 2A). The mutant strain also 369 exhibited a ~ 5.0-fold growth defect after exposure to cell wall degrading agent, lysozyme 370 in comparison to the wild type strain (Fig. 2B, **P<0.01). We observed that both wild type 371 and mutant strains were susceptible to comparable levels after exposure to other stress 372 conditions tested in this study (Fig. 2C,2D,2E and 2F). Since GntR's have been shown to 373 be involved in the biofilm formation of bacterial pathogens, we also determined the role of 374 Rv0792c in M. tuberculosis biofilm formation in vitro (43)(44)(45)(46). We observed that the 375 parental and D0792c mutant strain of M. tuberculosis were comparable in their ability to 376 form biofilms in vitro (data not shown). In order to understand the role of Rv0792c in drug 377 tolerance, we next determined the susceptibility of various strains to drugs with a different 378 mechanism of action. We also observed that complementation with Rv0792c partially restored the growth defect 401 associated with the mutant strain at both 4-and 8-weeks post-infection (Fig. 3B, 3C, 3D 402 and 3E). Concordantly, minimal tissue involvement was observed in hematoxylin and eosin 403 stained sections from guinea pigs infected with the mutant strain at 8 weeks post-infection 404 (Fig. 3F). The granuloma formation was seen in sections from animals infected with either 405 wild-type or complemented strains (Fig. 3F). Taken together, we show that Rv0792c is not 406 essential for growth in vitro but is indispensable for M. tuberculosis to establish infection in 407 host tissues. 408

Effect of deletion of Rv0792c on the transcriptional profile of M. tuberculosis. 409
The observed growth defect of the Rv0792c mutant strain in guinea pigs suggests that it might 410 regulate the expression of genes that are involved in the virulence or stress adaptation of M. 411 tuberculosis. In order to define Rv0792c regulon, RNA-seq experiments were performed using 412 total RNA isolated from mid-log phase cultures of wild type and the mutant strain as described 413 in Materials and Methods. We observed that majority of the genes were expressed to similar 414 levels in the mutant and wild type strain. Using a cut-off value of >2.0-fold change and p-value 415 <0.05, transcriptome analysis revealed that a total of 197 genes were differentially expressed 416 in mutant strain compared to the wild type strain (Fig. 4A). Among these, the levels of 108 and 417 89 transcripts were increased and decreased, respectively, in the mutant strain (Fig. 4A, Table  418 S4). These differentially expressed genes were further characterized based on their annotations 419 in Mycobrowser (https://mycobrowser.epfl.ch/). We noticed that most of the differentially 420 expressed genes were either conserved hypothetical proteins or involved in processes such as 421 cell wall synthesis or intermediary metabolism (Fig. 4B). The transcript levels of proteins such 422 as Rv0383c, Rv1094 (desA2), Rv1285 (cysD), Rv1350 (fabG2), Rv2166c, Rv2846c, Rv2988c 423 and Rv3139 that are essential for M. tuberculosis growth in vitro were downregulated in the 424 mutant strain (Table S4). RNA-seq analysis revealed that the transcripts of genes upregulated 425 in low oxygen conditions (such as Rv2624c, Rv2625c, Rv3126c, Rv0572c and Rv1734c) or 426 nutrient limiting conditions (such as Rv1149, Rv1285, Rv1929c, Rv2169c, Rv2269c, Rv2660c 427 and Rv2745c) were reduced in the mutant strain (Table S4, (47,48). Among, the upregulated 428 genes, the transcript levels of genes adjacent to Rv0792c and mymA operon were increased in 429 the mutant strain (49, 50). A subset of these differentially expressed genes in RNA-seq 430 experiments was also assessed by qPCR. As expected, the expression patterns obtained by 431 qPCR were similar to those obtained from RNA-seq data (Fig. 4C). These observations indicate 432 that Rv0792c regulates the expression of genes and this transcriptional reprogramming is 433 required for M. tuberculosis to adapt and survive in host tissues. 434

Generation of Rv0792c binding aptamer through SELEX. 435
We next performed Systematic Evolution of Ligands by EXponential enrichment (SELEX) 436 experiments to find DNA aptamers as a possible tool to identify epitope(s) which may bind 437 small molecule inhibitors against Rv0792c. Thus, SELEX was performed using an 80-438 nucleotide long random ssDNA library. To diversify the sequences of aptamer library, SELEX 439 binding experiments were performed using an error-prone Taq DNA polymerase (51). Prior to 440 successive SELEX rounds, the double-stranded (ds) PCR products that were obtained were 441 converted to single-stranded form, using previously reported methods (51). After 6 rounds of 442 SELEX, the enrichment of Rv0792c-specific binders was determined by ALISA. As shown in 443 (13)). 457 Next, to confirm the binding of aptamers with Rv0792c, EMSA assays using a panel 458 of selected aptamer candidates (Rv0729c_1, 2, 3, 4 and 5) were performed. As shown in Fig.  459 5C, we observed that these aptamers interacted with Rv0792c at varying strength. Based on 460 these findings, we selected three best aptamer candidates namely Rv0792c_1, Rv0792c_2 and 461 Rv0792c_5 for further biochemical and functional characterization of Rv0792c. As shown in 462 Fig. 6A, maximum binding with aptamers was observed with the wild-type protein, Rv0792c. 463 Also, as expected, mutation of arginine 49 and glycine 80 abrogated the aptamer binding ability 464 of Rv0792c. Based on these observations, we conclude that Rv0792c_2 displayed the highest 465 binding for Rv0792c and in concordance with previous data, Arg49 and Gly80 are essential for 466 binding of aptamers by Rv0792c (39,40). In order to determine the role of Mg 2+ ions in 467 Rv0792c aptamer binding, ALISA assays were performed in the presence or absence of 10 mM 468 EDTA. We observed that the inclusion of EDTA resulted in ~90% reduction in aptamer binding 469 to Rv0792c, thereby, indicating that Mg 2+ ion is essential for aptamer-protein interaction (Fig.  470 6B). We determined the dissociation constant for binding of aptamer Rv0729c_2 with wild-471 type and mutant proteins and fitting of data was performed using the non-linear regression 472 method. As expected, the highest binding was observed in wild-type protein with a Kd value 473 of ~51 nM. In comparison, Rv0792c R49A and Rv0792c G80D

aptamers. 497
We next acquired SAXS data to build a structural model in solution and identify the aptamer 498 binding regions for Rv0792c. SAXS data was collected for Rv0792c at a concentration of 3.2 499 mg/ml as shown in Fig. S5A. The double logarithm mode of presentation confirmed a lack of 500 aggregation or inter particulate effect in the protein sample (58). The inset in Figure S5A shows 501 the Guinier region considering globular scattering nature and linear fit to the analysis 502 confirming the monodisperse profile of the sample. Guinier analysis suggested that particle 503 size to be characterized by a radius of gyration (Rg) of about 3.3 nm (Table S3). Indirect Fourier 504 transformation of the data provided frequency distribution of pairwise interatomic vectors 505 which further provided an estimate of maximum linear dimension (Dmax) and Rg of 12.5 and 506 3.31 nm, respectively (Fig. S5B). Molecular mass estimation from different Bayesian models 507 applied on experimental SAXS data suggested that the mass of the scattering particles was ~ 508 63.2 ± 5.7 kDa supporting a dimeric state of association in solution (theoretical mass of 509 monomer is 32.5 kDa). 510 As mentioned in materials and methods, a dummy residue model best representing the 511 scattering shape of Rv0792c in solution was restored by averaging ten independent models and 512 is presented in transparent map format with variation amongst models reflected as wire format 513 ( Fig. 7A and Fig. S6A). A normalized spatial disposition (NSD) value of 0.93 supported the 514 similarity of the ten models solved and averaged for Rv0792c using SAXS data (Table S3). In 515 order to compare the SAXS based envelope with structural model of Rv0792c, a sequence-516 based homology model was searched. The best sequence identity of 18.14% was observed 517 between the 53-285 residues of Rv0792c with the solved structure for protein lin2111 from 518 Listeria innocua Clip11262 (PDB deposition 3EDP; unpublished structure). We observed that 519 most templates were similar in fold with a predicted association state of dimer. As stated in 520 materials and methods, missing 52 and 16 residues from the amino and carboxy terminus, 521 respectively, were modelled, their predominant conformation was oriented and subsequently 522 attached to this central structural model of Rv0792c dimeric structure. Inertial axes of this 523 structure was superimposed with those of SAXS based model for Rv0792c and similarities in 524 the profile can be visually judged in the orthogonal views shown in Fig. 7A and Fig. S6A. 525 Furthermore, a c 2 value of 1.3 between theoretical SAXS profile of the residue-level model of 526 Rv0792c and experimental data supported a similarity between the two models in three-527 dimensions (Table S3). Zoomed-in image in Fig. 7A highlights that the two C-terminal 528 extensions of chains bind each other, thus contributing to additional stabilization of the dimeric 529

entity. 530
Further, in order to perceive local and relative flexibility embedded in the computed 531 structure of dimeric Rv0792c, low frequency normal modes accessible to the protein were 532 calculated (Fig. S6B). The collective modes indicated that the N-terminal domain moved in 533 synchronized mode independent of the central b-barrel type dimeric contact. The C-terminal 534 tail of the proteins also move up and down the interacting b-barrel and linker connecting the 535 barrel and N-terminal domain of the other chain in dimer. These theoretical analyses imply that 536 the C-terminal ends of the dimer remain attached to each other chain. Next, using SAXS data 537 analysis, solution shape parameters, association state, and structure of Rv0792c binding 538 aptamers were determined in their unliganded state (Fig. 7B). Double Log profile of SAXS 539 data from aptamers confirmed lack of any aggregation or interparticulate effect in the samples 540 ( Fig. S5C). Guinier analysis for globular scattering profiles are shown as inset and linearity of 541 the fits in low q range further validated monodisperse nature of aptamers. The parameters 542 deduced for predominant scattering shape of aptamers are listed in Table S3. In summary, 543 aptamers had Rg and Dmax in range of 1.8-2.1 nm and 7.1-8.6 nm, respectively. For all aptamers, 544 the calculated P(r) indicated a "tailing" at higher r values suggesting flexible ends about core 545 shape (Fig. S5D). Using SAXS data profiles, their estimated molecular masses were in the 546 range of 12.9 -13.5 kDa, clearly supporting a monomeric state of these ssDNA molecules in 547 solution. Their dummy residue models solved within SAXS data-based constraints are shown 548 in Fig. 7B. As mentioned in methods, considering monomeric status, predominant low energy 549 conformations of the ssDNA aptamers were calculated in implicit dielectric of 80 (representing 550 water), and the best-resembling conformation was selected using lowest c 2 value between the 551 calculated SAXS profile for the conformation and experimental data. These models for 552 aptamers are shown overlaid on SAXS data-based model and alone in Fig. 7B. The additional 553 views are shown in Figs. S7A, S7B and S7C. Relative to Rv0792c, NSD values for aptamers 554 in the range of 0.5-0.7 indicated differential nature of ten models solved for the three aptamers 555 (Table S3). This implied relatively higher inherent disorder in the unliganded aptamers as 556 monomers. Similar disorders were observed in the higher terminal motions in the residue level 557 models computed for these aptamers. Pertinently, it also explained the extended nature of their 558 computed P(r) curves. 559 Having characterized that Rv0792c protein adopts dimeric state in solution, and all 560 the binding aptamers are monomers, next set of SAXS data was acquired on molar mixtures of 561 the protein and individual aptamers (ratio was computed for dimeric to monomeric state of 562 Rv0792c and ssDNA) ( Fig. S5E and S5F). It is important to state here that concentration of the 563 molecules were higher than the estimated binding constant of the protein and DNA molecules, 564 supporting a higher order of binding between available molecules and scope of none or little 565 unliganded molecules in samples used for SAXS data collection. Double Log plot and Guinier 566 analysis of the Rv0792c_1, Rv0792c_2 and Rv0792c_5 aptamers support that the scattering 567 molecules did not aggregate or underwent inter particulate effect upon mixing (Fig S5E). While 568 mixtures of Rv0792c with aptamer Rv0792c_5 and Rv0792c_1 showed Dmax and Rg values of 569 12.5 and 3.4-3.5 nm, respectively, the complex of GntR and Rv0792c_2 aptamer adopted a 570 decreased Dmax and Rg of 9.8 and 3.1 nm, respectively (Table S3). The lower dimensions of 571 complex with Rv0792c_2 was correlated with SAXS based molecular mass prediction which 572 indicated mass of ~ 44 kDa for this complex, and ~75 kDa for samples with Rv0792c and 573 aptamers Rv0792c_5 and Rv0792c_1. This result indicated that Rv0792c_5 and Rv0792c_1 574 binds to dimeric protein and does not alter the association state of Rv0792c into monomers. In 575 contrast, binding of Rv0792c_2 aptamer induces dissociation of dimeric proteins into 576 monomers. The observed differences with Rg, Dmax values and P(r) profiles between these 577 Rv0792c-aptamer complex, unliganded Rv0792c and aptamers supported that protein and 578 aptamer molecules were bound to each other during data collection. Shapes restored for these 579 scattering species showed higher NSD values than unliganded aptamers. 580 As mentioned in materials and methods, results from SAXS data analysis revealed 581 that Rv0792c_5 and Rv0792c_1 form 2:1 complex with Rv0792c, but binding of Rv0792c_2 582 dissociates Rv0792c dimer into monomers. Accordingly, the low energy structures of aptamers 583 were docked on Rv0792c dimers for Rv0792c_5 and Rv0792c_1 aptamer, and on monomer 584 for Rv0792c_2 aptamer to obtain models for their complexes. For latter, approximation was 585 made that no large shape change occurred theoretically detaching monomer from dimer of 586 Rv0792c. Different poses of docked aptamers on Rv0792c were filtered to correlate with the 587 shape solved for the complexes (Fig. 7C and Fig. S8). The models selected for Rv0792c: 588 aptamer complexes indicated that aptamers bind to the C-terminal portion of Rv0792c (Fig.  589 8A). Energy minimization of the residue-level models of complexes obtained from docking 590 indicated that while Rv0792c_5 and Rv0792c_1 aptamer coalesced with dimer interface, 591 Rv0792c_2 aptamer induced opening of the C-tail latch of Rv0792c. Probably, this last event 592 in case of Rv0792c_2 weakens the protein-protein interaction between Rv0792c and leads to 593 eventual formation of 1:1 complex. In summary, all aptamers remain monomer in the presence 594 or absence of Rv0792c and bind to its C-tail region, and some interaction extends to the stretch 595 encompassing PRG (residues 40 -42) of Rv0792c protein (boxed in the zoomed image in 596 lower panel in Fig. 8A). 597

Computational docking to identify small molecule inhibitors for Rv0792c. 598
As seen from shape restoration and docking data, all screened aptamers were binding 599 to C-terminal dimerizing segment of Rv0792c protein. Additionally, binding of Rv0792c_2 600 induces dissociation of Rv0792c dimer. Presuming that this segment is key to structural 601 organization of Rv0792c and small molecules capable of binding this segment may alter the 602 native functioning of this protein, we used the aptamer binding segments to screen for 603 molecules which can bind to this protein. Fig. 8B shows the defined aptamer binding region 604 and three best hits in their lowest energy pose with the receptor. These molecules were I-OMe-605 Tyrphostin, Clofibrate and Rottlerin in the decreasing order of their relative docking score. 606 Next, we performed ALISA experiments to investigate whether (i) PRG motif is required for 607 aptamer binding to Rv0792c and (ii) whether the small molecules identified from 608 computational docking can inhibit the binding of Rv0792c_2 aptamer to Rv0792c. In order to 609 determine the role of PRG motif in binding to these aptamers, Rv0792c harboring Pro 40 -Ala 40 610 and Arg 41 -Ala 41 mutation was cloned, expressed and purified as (His)6 tagged protein. In 611 concordance with SAXS based modeling results, we observed that mutation of either proline 612 40 or arginine 41 to alanine abrogated the aptamer binding ability of Rv0792c (Fig. 8C). Of 613 the three compounds, we were able to procure only 2 compounds with assured purity. Both 614 compounds did not exhibit solubility issues and were evaluated for their ability to inhibit 615 Rv0792c enzymatic activity. As shown in Fig. 8D, among these two compounds, I-OMe-616 Tyrphostin was able to inhibit the binding of aptamer to Rv0792c protein by ~60% at 200 µM 617 concentration. We noticed no inhibition of aptamer binding in the presence of Clofibrate even 618 at 200 µM concentration (Fig. 8D). We also observed that I-OMe-Tyrphostin was able to 619 inhibit Rv0792c activity in a concentration dependent manner. As shown, the small molecule 620 inhibited aptamer binding with an IC50 of ~ 109 nM (Fig. 8E). Taken together, this is the first 621 study, where we show that GntR homolog, Rv0792c from M. tuberculosis is essential to 622 establish infection in host tissues. We also report novel aptamers which bind to the dimerizing 623 segment of the protein, and used this information to identify an FDA-approved drug which can 624 conceptually act as an inhibitor of Rv0792c protein. 625

Discussion 626
Transcriptional regulation has been shown to be essential for adaptation of various bacterial 627 pathogens upon exposure to unfavourable and harsh environmental conditions. M. 628 tuberculosis is a highly successful intracellular pathogen owing to its ability to sense 629 external stimulus, reprogram its transcription machinery and persist in host tissues. The M. activity of HutC subfamily has also been reported to be regulated by N-acetyl-glucosamine 654 and urocanic acid (55, 62). We also evaluated the ability of various effectors to regulate 655 DNA binding activity of Rv0792c and observed that inclusion of L-histidine and urocanic 656 acid couldn't affect the DNA binding ability of Rv0792c. However, arabinose increased the 657 DNA binding ability of Rv0792c by ~2.0-fold which suggests that L-arabinose might be the 658 effector molecule for Rv0792c. Nevertheless, regulation of Rv0792c might still be fine-659 tuned by this specific ligand interaction and/or by some unknown ligand that needs to be 660 investigated in near future. 661 In order to investigate GntR role in physiology, stress adaptation and virulence, 662 Rv0792c mutant strain was generated using temperature sensitive mycobacteriophages. We 663 noticed that the colony morphology, growth pattern and biofilm formation of the mutant and 664 parental strain were comparable. Transcriptional regulators are well known in mediating 665 mycobacterial stress adaptation. In order to mimic the environmental clues as encountered 666 by mycobacterium within the host macrophage or granuloma, the survival of various strains 667 was evaluated in different stress conditions. The mutant strain was compromised for survival 668 upon exposure to oxidative stress and cell wall damage. However, no differences were 669 belonging to either PE/PPE or toxin-antitoxin modules or lipid metabolism were also 689 downregulated in the mutant strains. We also observed that the transcript levels of Rv0793 690 (gene neighbouring to Rv0792c) and mymA operon (shown to be upregulated in acidic 691 conditions) were increased in the mutant strain (50, 67). These findings clearly suggest that 692 Rv0792c regulates the expression of "subset" of genes that enables the bacteria to adapt and 693 persist in host tissues. 694 SELEX strategy was employed to search for novel ssDNA aptamers capable of 695 tightly binding Rv0792c. Extended aim was to utilize the aptamer binding information to 696 screen for small molecules which may efficiently bind Rv0792c and act as inhibitors of its 697 function. The direct interactions between Rv0792c and SELEX derived DNA aptamers were 698 confirmed using EMSA and ALISA. Among the identified, three DNA aptamer candidates 699 (Rv0792c_1, Rv0792c_2 and Rv0792c_5) showed good binding to Rv0792c albeit with 700 varied intensity. This difference in the intensity of the DNA-protein complex evinced the 701 differential rate of aptamer-target complex association and dissociation (56,68). 702 Interestingly, the binding of all three aptamers was abrogated by the substitution of glycine 703 80 to aspartic acid and arginine 49 to alanine in the helix-turn-helix motif. This is possibly 704 because of two reasons: (i) aptamer selection was performed with the wild-type protein and 705 (ii) these residues (glycine 80 and arginine 49) play a critical role in maintaining the protein 706 structure where aptamer binds (41). Notably, the presence of EDTA also abrogated the 707 aptamer binding to Rv0792c indicating that Mg 2+ is essential to maintain the active 708 conformation of aptamers required for protein binding. This finding is in agreement with 709 the previous reports where the role of divalent ion in aptamer binding to cognate protein 710 target has already been established (69). Interestingly, all these aptamer candidates 711 displayed Kd in the nanomolar range, an observation which is in concordance with the 712 previously reported protein binding aptamers (29,70,71). Notably, the primary sequence of 713 Rv0729c_1, Rv0729c_2 and Rv0729c_3 DNA aptamer candidates designed against the 714 Rv0792c, a HutC protein showed high similarity to DNA binding sequences of FadR 715 subfamily transcription factors (13). We observed that the level of similarity was much 716 higher in the case of Rv0729c_1, Rv0729c_2 compared to Rv0729c_5. This pattern clearly 717 indicates the possible role of these nucleotides to provide affinity to bind the transcription 718 factor of the GntR family. In silico structure prediction and their validation by CD clearly 719 demonstrates the presence of stem-loop like structures which are very common among 720 protein binding DNA aptamers (70, 72). As mutation of arginine and glycine abrogated 721 aptamer binding, our data suggests that aptamers are binding to a region in Rv0792c that is 722 essential for DNA binding. Therefore, we hypothesized the epitope at which these aptamers 723 bind may be functionally important and a small molecule binding to this site may impede 724 the functional profile of the protein. Further, to gain insight into the complexes of aptamers bound to Rv0792c,SAXS 726 data analysis and molecular modelling was utilized. Analysis of the unliganded protein and 727 aptamers showed that in solution their association state is predominantly dimer and 728 monomer, respectively. The dimeric status of Rv0792c correlated well with the AUC data, 729 which showed presence of minor higher-order associated species too. Interestingly, mixing 730 of aptamers to dimeric Rv0792c showed that while one molecule of Rv0792c_1 aptamer 731 and Rv0792c_5 aptamer binds to one dimer of Rv0792c, binding of Rv0792c_2 aptamer led 732 to dissociation of the Rv0792c dimer into monomer. In silico molecular modelling steered 733 and selected within constraints from experimental SAXS data provided a key insight that 734 Rv0792c dimerizes across its C-terminal, and the extended C-tail wraps around each other 735 chain that provides additional stability to the dimer. The dimeric status or even association 736 architecture is not novel to the GntR family of proteins, but the unique wrapping up of C-737 tail on each other chain opens up queries on its functional relevance and possible uniqueness 738 in this family of regulators. Solution scattering data supported that the aptamers remained 739 predominantly monomer. Interestingly, the structural analysis revealed that the exposed side 740 of the dimeric Rv0792c is also the interaction site of the three aptamers that were identified 741 from the SELEX study. Taken together, SAXS data provided insight that binding of 742 Rv0792c_2 aptamer induces rearrangement(s), which leads to dissociation of the dimer of 743

Rv0792c protein. 744
Taking cue from the poses of aptamers on Rv0792c dimer, we considered using the 745 interacting residues in the protein to screen for small molecules which may even compete 746 with the binding of aptamers. The two molecules from the identified top hits were 747 experimentally evaluated in our aptamer binding assays and we observed that I-OMe-748 Tyrphostin was able to inhibit binding Rv0792c_2 aptamer to Rv0792c. It is worth 749 mentioning here that Rv0792c_2 binds with the highest affinity to Rv0792c, so it can be 750 safely extrapolated that Tyrphostin analog may also competitively inhibit binding of other two 751 aptamers. This molecule, I-OMe-Tyrphostin and its analogs have been assayed before for their 752 of our experiments will explore the efficacy of this small drug molecule in inhibiting the growth 757 or survivability of M. tuberculosis in different assays or models. Definitely, being an approved 758 drug, any efficacy against M. tuberculosis will enable its quick translation. In conclusion, we 759 have (i) delineated the role and contribution of GntR-like factors in Mtb physiology, stress 760 tolerance and pathogenesis and (ii) also identified small molecule inhibitor against Rv0792c,761 an in vivo essential transcription factor. 762