GWAS and functional studies implicate a role for altered DNA repair in the evolution of drug resistance in Mycobacterium tuberculosis

The emergence of drug resistance in Mycobacterium tuberculosis (Mtb) is alarming and demands in-depth knowledge for timely diagnosis. We performed genome-wide association analysis (GWAS) using 2237 clinical strains of Mtb to identify novel genetic factors that evoke drug resistance. In addition to the known direct targets, for the first time, we identified a strong association between the mutations in the DNA repair genes and the multidrug-resistant phenotype. To evaluate the impact of variants identified in the clinical samples in the evolution of drug resistance, we utilized knockouts and complemented strains in Mycobacterium smegmatis (Msm) and Mtb. Results show that variant mutations abrogated the function of MutY and UvrB. MutY variant showed enhanced survival compared with wild-type (Rv) when the Mtb strains were subjected to multiple rounds of ex vivo antibiotic stress. Notably, in an in vivo Guinea pig infection model, the MutY variant outcompeted the wild-type strain. Collectively, we show that novel variant mutations in the DNA repair genes abrogate their function and contribute to better survival under antibiotic/host stress conditions.


Introduction 36
The acquisition of drug resistance in Mtb has evoked a precarious situation lineages-Lineage 1-Indo Oceanic (EAI); Lineage 2-Beijing; Lineage 3-Central Asian (CAS) 48 and Lineage 4-Euro-American (4) are prevalent in humans. Although Mtb has a lower 49 mutation rate, it gains drug resistance in clinical settings (5). Clinical strains that belong 50 to lineage 2 are more prone to develop drug resistance faster than the lineage 4 strains 51 (6). Acquisition of drug resistance in Mtb is majorly attributed to the chromosomal 52 mutations that either modify the antibiotic's direct target or increase the expression of 53 efflux pumps that helps in decreasing the effective concentration of the drug inside the 54 cell. The expression of drug modifying/degrading enzymes also contributes towards the 55 acquisition of drug resistance (7). Despite well-known mechanisms of drug resistance, in 56 to generate 152 msmmutY::mutY, and msmmutY::mutY-R262Q strains. 8 deletion resulted in 4.32 fold increase in the mutation frequency (Fig 3c) (31). While 156 complementation with wild-type mutY restored the mutation frequency, generated the gene replacement mutant of mutY in laboratory strain H37Rv, wherein the 169 mutY at native loci was disrupted with hygromycin resistance cassette. Replacement at the native loci was confirmed by performing multiple PCRs (Fig S8b-c). We determined 171 the mutation frequency in the presence of rifampicin and isoniazid (Fig 3g-i). The fold 172 increase in the mutation frequency relative to Rv for RvmutY, RvmutY::mutY, and in the presence of rifampicin (Fig 3g-i). Together the data suggested that variants of mutY and uvrB identified in the GWAS compromised their function.

176
The variant of mutY confers survival advantage ex vivo.

10
To test these conclusions, we performed the co-infection experiment with the 209 strains that were subjected to three rounds of selection in the host in the presence or 210 absence of antibiotics as described in Fig 4. Peritoneal macrophages were infected with a 211 combination Rv+RvmutY or Rv+RvmutY::mutY or Rv+RvmutY::mutY-R262Q (Fig 5a).
212 24 h p.i cells were either treated or not treated with an antibiotic for the subsequent 72 h, 213 and total CFUs were enumerated as described above to evaluate the survival rates. As 214 expected, there was no difference in the CFUs either at 4 h p.i (Fig 5b-

217
RvmutY::mutY did not show any real advantage over Rv under any conditions. These 218 results suggest that subjecting deletion or variant strains repeatedly to antibiotic stress in 219 the host helps evolve strains that can outcompete the wild-type strain. 220

221
Upon entering the host macrophages, Mtb encounters multiple stresses that 222 impede its growth. To survive and grow in such a hostile environment, Mtb employs 223 multiple defense mechanisms (34). The treatment regime with anti TB drugs imposes a 224 supplementary layer of stress on the pathogen. An auxiliary mechanism used by the 225 pathogen is to accumulate mutations in its genome that improve its ability to combat 226 antibiotic and host-induced stress. We asked if the variant mutations identified in DNA 227 repair genes are one such auxiliary mechanism? To test this hypothesis, we performed 228 guinea pig infection experiments using Rv, RvmutY, RvmutY::mutY, and

230
CFUs obtained on day 1 showed the deposition was equal for wild type, mutant, and 231 completed strains. Gross and histopathology analysis of infected lungs showed the 232 presence of well-formed granulomas (Fig 6b-c). Significantly, 56 p.i, RvmutY, spleen homogenates were plated on 7H11 to determine total CFUs (Fig 6f, Fig S10). The 238 total CFUs were found to be comparable in both lungs and spleen across all combinations 239 (Fig 6f). Lung and spleen homogenates were plated on either Kanamycin (Rv) or 240 Hygromycin (RvmutY, RvmutY::mutY or RvmutY::mutY-R262Q) containing plates to 241 determine survival (Fig 6g). As was the case with independent infections, RvmutY, or plotted as percent survival, the data clearly showed that mutY deletion or 244 complementation with the variant confers a survival advantage to the pathogen (Fig 6h). We performed a gene-based GWAS analysis on a large dataset of susceptible and 268 MDR/XDR clinical strains (Fig 1). We identified mutations in the known direct targets of 269 both first and second-line antibiotics and few recently reported genetic variants using 270 association analysis (Table 1-3). Besides, analysis captured mutations in genes that are 271 involved in cell metabolism (Fig 2). This finding supports a recent study in E.coli, where 272 drug treatment led to the acquisition of mutations in the metabolic genes that impart 273 drug resistance (29). We speculate that these mutations may be compensatory mutations 274 that placate the fitness cost associated with antibiotic resistance and hence might be the 275 consequence of antibiotic resistance. 276 Salmonella typhimurium, where mutations or deletions in the DNA repair genes were 281 identified in the clinical isolates (39). Moreover, the deletion of ung and udgB, BER 282 pathway genes, independently or together, provides a survival advantage to the bacteria 283 (40). It is known that lineage 2 clinical strains have polymorphisms in the DNA repair and 284 replication genes (41, 42). However, we identified mutations in DNA repair genes in 285 lineage 4, suggesting that the phenomenon is not confined to lineage 2 ( Fig S8a). 286 Mutations in the dnaA, DNA binding protein that initiates replication, confers a survival 287 advantage in the presence of isoniazid (43). We also identified a mutation in the dnaA, 288 which is associated with drug resistance (Table 2). 289 We investigated if the mutations in DNA repair pathways are the cause or 290 consequence of antibiotic resistance. We functionally validated the identified mutations 291 of two different pathway genes, mutY and uvrB, in Msm using gene replacement mutants.

292
Mutation frequency analysis suggests that GWAS identified mutations Arg262Gln and 293 Ala524Val, in mutY and uvrB, respectively, abrogated their function (Fig 3). This agrees 294 with the previously published study, wherein WGS analysis of antibiotic susceptible strain 295 isolated from a patient showed a non-synonymous mutation in UvrB (A582V). Notably, 296 the variant strain eventually evolved into XDR-TB over 3.5 years after the first and second-297 line drugs treatment (44). Therefore, we propose that mutations identified in DNA repair 298 genes contribute to the evolution of antibiotic resistance (Fig S6). 299 Poor adherence of the patients to the antibiotic regimen is the leading cause of 300 the emergence of drug resistance. The continual treatment of antibiotics provides 301 sufficient time to evolve strains into MDR or XDR. We emulated the condition by 302 repeatedly exposing strains to antibiotics in the ex vivo model, followed by growing them the mixed infection scenario. Competition experiment using the evolved strains showed 306 that RvmutY and RvmutY::mutY-R262Q could outcompete Rv (Fig 5). Host stress drives 307 the selection of the bacterial population that acquires the ability to withstand adverse 308 conditions. We evaluated the survival of the different strains in guinea pigs. Results 309 suggest that RvmutY and RvmutY::mutY-R262Q exhibit improved survival. Similarly, we 310 observed that RvmutY and RvmutY::mutY-R262Q could successfully outcompete Rv in 311 the guinea pig model of infection (Fig 6). 312 Using the GWAS approach and functional validation of the clinical mutations 313 identified in the BER and NER pathway, we established a novel link between compromised 314 DNA repair and the evolution of antibiotic resistance (Fig 7). Collectively, the data 315 presented here for the first time suggests that the evolution of MDR or XDR-TB is a 316 consequence of the loss of function of DNA repair genes. The presence of mutations in 317 DNA repair genes can be an early stage diagnostic marker for the evolution of the strain 318 into MDR/XDR-TB. Molecular diagnosis of DNA repair gene mutations at the onset of 319 infection should help design better therapies to impede the evolution of these strains into 320 MDR or XDR. These findings indicate that compromised DNA repair in bacteria can 321 contribute to the accumulation of mutations, which can evolve into MDR and XDR-TB with 322 continued antibiotics treatment and selection. 323 (Table S1), representing all four lineages from 9 countries. Sequence data was retrieved 328 from EBA (https://www.ebi.ac.uk/) and NCBI database (https://www.ncbi.nlm.nih.gov/), 329 and quality filtered using Trimmometic software (45). Adapter and E. coli sequence

Association analysis 343
Genome-wide association analysis was carried out with 1,815 susceptible strains and 422 344 MDR/XDR strains (Table S2) mapping analysis was carried out by combining the population structure analysis and the population structure (Q) as well as kinship (K) values and often leads to overcorrection 354 (21, 22). We selected a corrected p-value of 10 -5 as the threshold for selecting associated 355 genes. The associated SNPs were annotated using the snpEff v4.11 (51). A snpeff database 356 was generated using the H37Rv (ASM19595v2, was used as a reference for mapping). The 357 corresponding .gff file and the SNPs were annotated based on their position on the 358 genome. Dot-plot, Manhattan's plot, and volcano plot were generated using custom R 359 scripts. 360

Analysis of Mutation frequency. 361
Antibiotic sensitive cultures of msm, msmmutY, msmmutY::mutY, msmmutY::mutY-Cultures were grown for six days in a 37 0 C incubator at 200 pm. On the 15 th day, 365 appropriate dilutions were plated on 7H11-OADC plain plates for determining the load, 366 and 1ml was plated on rifampicin (50 g/ml). Rv, RvmutY, RvmutY::mutY and RvmutY::mutY R262Q strains were inoculated at A600 ~0.1. Strains were grown to A600 ~ 368 0.6 and plated on 7H11-plain or rifampicin (2 g/ml) or isoniazid (10 g/ml) plates to 369 calculate the mutation frequency. Two biologically independent experiments were 370 performed for determining mutation frequency. Each experiment was performed in 371 technical triplicates. Data represent one of the two biological experiments. Data represent 372 mean and standard deviation. Statistical analysis (two-way ANOVA) was performed using 373 Graphpad prism software. *** p<0.0001, **p<0.001 and *p<0.01. 374 Survival of strains ex vivo.   negative effect (red dots) shows that the SNPs would restrain the development of 596 MDR/XDR. b) Manhattan plot representing the association between the genes and drug 597 resistance phenotype. A total of 188 genes that include intergenic regions were identified 598 above the 10 -5 cut-off value through association studies. Blue dots represent mutation in 599 the lipid metabolism, membrane proteins, intermediary metabolism genes, and others. 600 Green dots represent mutation in the direct targets for the first-and second-line 601 antibiotics. Red dots represent mutations associated with the DNA repair genes. A 602 detailed list of associated genes is provided in Table S3 and S4. c-e) Pie chart represents 603 the total (c), non-synonymous (d) and synonymous (e) SNPs identified in the genes that 604 belong to different categories. f) Bar plot represents the percentage of synonymous 605 mutations in the genes that resulted in abundant/moderate codon usage to the rare 606 codon compared to H37Rv. 607  used to perform a competition experiment (Fig 4a). b-e) Percent survival of Rv, RvmutY, pigs at one day (whole lung, n=3) and 56-days p.i (CFU/ml, n=7). Statistical analysis was 652 performed using two-way ANOVA. Graphpad prism was employed for performing 653 statistical analysis. *** p<0.0001, **p<0.001 and *p<0.01. h) Survival of each strain at