Deciphering Bedaquiline and Clofazimine Resistance in Tuberculosis: An Evolutionary Medicine Approach

Bedaquiline (BDQ) and clofazimine (CFZ) are core drugs for treatment of multidrug resistant tuberculosis (MDR-TB), however, our understanding of the resistance mechanisms for these drugs is sparse which is hampering rapid molecular diagnostics. To address this, we employed a unique approach using experimental evolution, protein modelling, genome sequencing, and minimum inhibitory concentration data combined with genomes from a global strain collection of over 14,151 Mycobacterium tuberculosis complex isolates and an extensive literature review. Overall, 230 genomic variants causing elevated BDQ and/or CFZ MICs could be discerned, with 201 (87.4%) variants affecting the transcriptional repressor (Rv0678) of an efflux system (mmpS5-mmpL5). Structural modelling of Rv0678 suggests four major mechanisms that confer resistance: impairment of DNA binding, reduction in protein stability, disruption of protein dimerization, and alteration in affinity for its fatty acid ligand. These modelling and experimental techniques will improve personalized medicine in an impending drug resistant era.


Introduction
population were genotypically characterized, and single selected mutations were phenotypically defined 95 (elevated MIC) (Tables 1, S1). 96 Strikingly, exposure to BDQ or CFZ concentrations as low as one-eighth of the WT MIC over 20 days (12 to 97 19 generations) led to the selection and enrichment of significantly more resistant cells as compared to 98 growth in the absence of the drugs (P=2.8 x 10 -10 , and P=2.4 x 10 -3 , respectively, Figure S2). As expected, 99 there was less enrichment of resistant populations at early timepoints, e.g. after 4 days. 100 Table 1: In vitro selected mutations conferring BDQ resistance. Genotype of each variant was determined 101 by whole genome sequencing, and substitution annotated by coding position, "alternative allele" denotes 102 base pair change. Minimum inhibitory concentration (MIC) was determined by resazurin microtiter plate 103 assay and compared to wild type (WT) susceptible ancestor to describe MIC fold increase. The number of 104 times a mutation was independently selected was included under "# of clones". Finally, from which 105 experiment (BDQ or CFZ) the mutant was isolated from included as "isolate from". 106 In total, we randomly selected 270 single colonies from BDQ-and CFZ-supplemented agar plates that had 110 been inoculated with samples from passages 1 (4 days), 3 (12 days), and 5 (20 days), of two independent 111 experiments ( Figure S1). In total, 203/270 mutants were successfully pheno-and genotyped (67 were 112 excluded due to DNA library preparation issues or inadequate growth). The single colonies were sub-113 cultured and the MICs for BDQ and CFZ were determined for each clone using broth microtiter dilution. 114 All clones exhibited an elevated MIC at least 2-fold higher than the WT drug susceptible ancestor and we 115 identified in total 61 unique variants with one mutation in the Rv0678 gene per clone. We additionally 116 selected five unique mutations in the atpE (Rv1305) gene (Table 1). 117 Clones which harbored a mutation in Rv0678 had a 2-4-fold MIC increase for BDQ, whereas mutations in 118 atpE, exhibited 4 to over 10 times the MIC of the WT strain (Table 1). For 12 randomly selected clones 119 with seven different variants we verified the results obtained in broth microtiter dilution plates using a 120 Mycobacterium growth indicator tube (MGIT) assay. These experiments showed comparable MICs of the 121 mutant clones with those obtained from the previous method (Table S2). 122 The 61 unique mutations in Rv0678 affected 54 different codons (Table 1), out of which 21 (34%) were 123 frameshift (fs) mutations, 34 (56%) had one non-synonymous SNP, and 6 (10%) a premature inserted stop 124 codon. The mutations were scattered over the entire sequence of Rv0678 and no dominant cluster could 125 be identified ( Figure 1A). However, the four mutations which were most frequently observed were: 192 126 ins g (64fs), A99V, 193 del g (65fs), and R50W; these mutations were detected in 19, 17, 13, and 12 single 127 selected clones, respectively (Table 1). Notably, nucleotides 192-198 were also identified as a hotspot for 128 frameshifting variation in a prior review, potentially due to the homopolymeric nature of this region 41 . As mentioned before, no mutant clone had more than one resistance mediating variant in Rv0678 or atpE, 137 however, 38 mutants harbored a second mutation in one of 14 genes that have not been previously 138 proposed to be associated with resistance against BDQ and/or CFZ (Table S3). The most common mutation 139 found in addition to Rv0678 mutations was R119H in Rv1890c -this gene codes for a conserved 140 hypothetical protein 42 , and the mutation was selected 11 times along with six different resistance 141 mutations. The second most common co-selected mutation was *130R in Rv1871c (a conserved 142 hypothetical membrane protein 43 ) that was selected seven times along with four different resistance 143 mutations. Other secondary mutations occurred in genes which were involved in cell wall synthesis, 144 information pathways, metabolism/respiration, protein regulation, or lipid metabolism (Table S3). 145 In addition to single colony sequencing, we also employed a more unbiased approach based on population 146 sequencing of all the mutant colonies on a given plate, in order to elucidate all possible variants. On 147 average the genome wide coverage for a total of 81 samples was 308±50. We then reported mutations if 148 the position was covered by at least one read in both forward and reverse orientations and there were 149 two reads with a phred score over 20. This identified 45 additional mutations in Rv0678 and one in atpE 150 (A63T), as well as four mutations in pepQ (Rv2535c; F97V, G96G, V92G, A87G) ( Table S1). As in the single 151 colony analysis, we found the following mutations dominating in different independent evolutionary 152 experiments and selected across different drug concentrations: Rv0678 192 ins g (64fs), A99V, R50W, and 153

Resistance caused by huge inversion interruption in Rv0678 155
Three mutants with elevated BDQ (2 mg/L) and CFZ (2 mg/L) MICs but lacking mutations in the BDQ/CFZ 156 resistance associated genes were further subjected to long read sequencing with the PacBio Sequel II 157 system. A de novo assembly employing the PacBio SMRT® Link software resulted in one closed Mtbc 158 genome, and two assemblies with 1 and 4 four contiguous sequences (contigs), respectively. All 159 assemblies covered more than 99.9% of the H37Rv reference genome (NC_000962.3). All three assemblies 160 showed a large-scale sequence re-arrangement with the borders at position 779,073 (Rv0678 coding 161 sequence) and 3,552,584 (intergenic region), flanked by a transposase open reading frame (IS6110). A 162 2.5Mb fragment was inverted and thus split the Rv0678 gene into halves (Figure 2), which was not 163 detected by a classical reference mapping approach and revealed a further interesting resistance 164 mechanism. We scanned all BDQ resistant samples in the CRyPTIC strain collection for similar events using 165 local assembly with ARIBA 44 . In one isolate we assembled one contig which clearly represented a similar 166 inversion disrupting Rv0678, but since the depth of support was just 1 read (mean, across the contig) and 167 this was Illumina (short read) data, we considered this only circumstantial evidence.  (Table S4). In this patient dataset, we identified 98 178 mutations throughout Rv0678, 71 of which were not previously described in patients (Table 2). In addition, 179 seven variants in Rv0678 harbored mutations found in different lineages (Table S4), pointing to 180 convergent evolution at these sites. Eleven mutations from the patient-derived set were also shared with 181 our in vitro derived datasets (selected variants and population sequencing) ( Figure 1B & 3A). 182  (Table S4). To better explain this high number of BDQ susceptible strains with 201 Rv0678 mutations we looked for additional mutations in the efflux pump genes mmpL5 and mmpS5, as it 202 has been postulated that variants in these genes may reduce or abrogate acquired resistance 59 . However, 203 we found that 64/94 (68%) of susceptible isolates harboring Rv0678 mutations had WT mmpL5/mmpS5 204 genes (Table S4). Likely functional/inactivating mutations in mmpL5 (19/19 isolates with frameshifts and 205 3/5 with missense) were associated with susceptibility while synonymous mutations appeared 206 uncorrelated with susceptibility (7/18 isolates susceptible). 207 Plotting all 147 resistance and borderline resistance-associated Rv0678 mutations on the gene sequence 208 resistance were compared between three datasets: in vitro selected mutants (Table 1), in vitro population sequencing (Table S1), and patient derived 214 variants (Table 2) Figure S4). 281

Figure 5: Networks of the most persistent pockets found in the wild-type (WT) and mutated Rv0678. 282
The pocket crosstalk analysis on simulated systems identifies allosteric signaling. All systems shared a 283 large common pocket located at the interface of the two monomers ( Figure S5 Although BDQ/CFZ are rolled out worldwide as core drugs in treatment regimens for MDR-TB patients 8 , 299 antibiotic susceptibility testing for these compounds is rarely available and patients are often treated 300 empirically. Indeed, recent studies report treatment failure and/or poor outcome of MDR-TB patients as 301 mutations in Rv0678 emerge that are associated with BDQ/CFZ resistance 12,15,17,26,54 . Additionally, 302 misdiagnosis can promote the selection and subsequent transmission of specific MDR Mtbc strains. For 303 example, it was recently reported in Eswatini and South Africa, that patients infected with an MDR 304 outbreak strain harboring the RMP mutation I491F, which is not detectable by conventional phenotypic 305 or genotypic testing, probably continued to receive RMP although it was ineffective 30 . These strains now 306 represent more than 60% of MDR strains in Eswatini and additionally more than half of these developed 307 enhanced BDQ/CFZ MICs via acquired Rv0678 mutations 1,29,30,64 . This scenario underlines the urgent need 308 for approaches to rapidly detect resistance, including to BDQ/CFZ, with baseline diagnostics and during 309 therapy (e.g. via WGS or targeted genome sequencing) to avoid treatment failure, further resistance 310 development and ongoing transmission of specific MDR strains. 311 Genome sequencing techniques have been demonstrated to accurately identify genomic variants in all 312 target regions involved in resistance development in a single analysis 3,65 . However, several major 313 challenges exist-including linking genotype and phenotype to distinguish resistance-mediating from 314 benign mutations and interpreting how multiple mutations interact when the resistance phenotype is 315 polygenic, as is the case with BDQ and CFZ 60 . This work catalogues the phenotype of 260 unique genomic 316 variants across six genes. Importantly, not all variation leads to resistance, as 38 of these mutations in 317 Rv0678 are not associated with resistance to BDQ/CFZ (Table S5), which facilitates the prediction of 318 BDQ/CFZ susceptibility. Furthermore, 30/94 BDQ susceptible isolates with Rv0678 mutations harbored an 319 additional mutation in the mmpL5 or mmpS5 efflux pump genes, which may abrogate the efflux pump 320 mechanism and reconstitute the susceptible phenotype 59 . Here, we provide preliminary evidence that 321 isolates with inactivating (frameshift, nonsense, or missense variants) mmpL5 mutations can override 322 Rv0678 resistance-mediating mutations, resulting in hyper-susceptibility to BDQ. Further work to 323 understand the frequency, distribution and origin of these inactivating mutations is necessary to 324 understand their clinical relevance worldwide. 325 While genome sequencing of clinical Mtbc strains has been deployed in several countries, its 326 implementation is constrained not just by technical and data analysis considerations, but also because 327 working directly from clinical samples (e.g. sputum) is difficult 3 . We recently demonstrated that targeted 328 genome sequencing using the commercially available Deeplex®-MycTB has a high sensitivity and 329 specificity for first-and second-line drug resistance prediction, including BDQ/CFZ resistance mutations 330 that have been found in patients 64,66,67 . In combination with our comprehensive set of mutations that 331 confer resistance to BDQ/CFZ, Deeplex®-MycTB and other targeted sequencing approaches allow for rapid 332 detection of BDQ/CFZ resistance from clinical samples in few days using small portable genome 333 sequencers such as the MinION 68 . 334 Importantly, in vitro laboratory evolution experiments identified 50 resistance-conferring mutations that 335 have not yet appeared in clinical strains, thus allowing us to characterize and identify a significant 336 proportion of resistance-conferring mutations before they have even occurred in a patient. These results 337 highlight the value of in vitro evolution experiments as a complementary method for prospective 338 identification of resistance, as compared to traditional surveillance of patient isolates. Furthermore, this 339 method provides a large strain collection of de novo resistant clones from a comparable wild-type 340 susceptible ancestor. Finally, by using PacBio sequencing to establish the full genome sequence of mutant 341 clones without any mutations in the relevant targets, we were able to identify a large genomic inversion 342 that disrupts Rv0678, which should be considered as a novel BDQ/CFZ resistance mechanism. 343 Another method for prospectively identifying mutations that confer resistance is structure-based 344 modeling, which has recently been combined with machine learning methods to design algorithms for 345 predicting resistance to rifampicin, isoniazid, and pyrazinamide 69-73 . These approaches rely on the fact 346 that resistance-conferring mutations are often located in drug binding pockets or active sites and have 347 distinct biophysical consequences as compared to their susceptible counterparts 74 . This study identifies 348 several resistance mechanisms in Rv0678 with quantifiable structural effects (protein stability, dimer 349 interactions, SNAP2 scores, and interaction with the DNA), suggesting that a structure-based machine 350 learning approach could also be successful for predicting BDQ/CFZ resistance. 351 As we have shown, any approach to predict BDQ resistance must consider not only mutations present in 352 Rv0678, but also potential resistance/susceptibility determinants in other putative resistance-conferring 353 genes e.g. atpE, pepQ, Rv1979c, and mmpL/S5. It has been attempted to predict BDQ resistance from atpE 354 mutations 75 , so one feasible option would be a meta-approach that flags if any individual gene-level 355 algorithms predict resistance/susceptibility. In addition, CRyPTIC and other studies are producing large 356 MIC datasets, which will enable much more nuanced modeling of resistance, which is particularly 357 important for drugs with efflux-mediated resistance that often exhibit small sub-threshold changes in MIC. 358 Although structure-based modeling is helpful in predicting resistance, these structural hypotheses must 359 be confirmed experimentally for final validation. In addition to clear-cut resistance-associated mutations, additional "secondary" mutations were co-370 selected with resistance-associated variants in vitro. These secondary mutations affected several genes, 371 mainly associated with cellular processes such as cell wall synthesis, metabolic processes, and protein 372 regulation; with mutations also in genes of unknown function (Table S3). It is possible that these mutations 373 bring about an additional phenotypic affect, or they may be compensatory mutations 5,78 . However, until 374 further experiments are conducted, such as competitive fitness assays (or even transcriptomic analysis), 375 it cannot be ruled out that these are merely hitchhiking mutations or culture-selected variants. 376 In this study, we focused on structural mechanisms of Rv0678 mediated resistance, as these mutations 377 are the most clinically relevant and correlated with BDQ treatment failures. Other genes which have been 378 linked to BDQ or CFZ resistance, such as atpE, pepQ and Rv1979c, but their clinical relevance has yet to 379 be established 10,29,49,53 . In the CRyPTIC dataset we identified four Rv1979c variants which presented a 380 resistant phenotype to BDQ with variable CFZ cross resistance, and seven mutations in pepQ garnering 381 borderline BDQ resistance and variable CFZ resistance. Finally, mutations in atpE (coding for the target of 382 BDQ) were rare among the in vitro selection experiments and the CRyPTIC dataset, suggesting that atpE 383 mutations might have a high fitness cost. 384 In conclusion, our work advances our understanding of how resistance can arise to BDQ and CFZ. This 385 information provides an updated variant catalogue, comprising of mutations and structural variations 386 associated with BDQ/CFZ resistance as well as benign variants and mutations implicated with hyper-387 susceptibility. Employing this information in DNA sequencing-based diagnostic approaches will allow for 388 the first time a rational design of BDQ and CFZ -containing MDR-TB therapies. 389

In vitro evolution experiments 391
In vitro experiments were carried out with H37Rv strain (ATCC 27294). Bedaquiline was purchased from 392 Janssen-Cilag GmbH and clofazimine was ordered from Sigma (C8895-1G) both were reconstituted from 393 powder in DMSO and stored at -20°C. test.CRyPTIC strains were phenotyped using either the UKMYC5 plate or the updated UKMYC6 plate 83 . 466 Plates were sealed, incubated at 37°C, and read at 14 days. In addition to manual plate readings, all plate 467 images underwent an automated reading using AMyGDA software 46 . Plates without essential agreement 468 for a drug MIC were marked as low quality and sent to a citizen science project (BashTheBug,469 http://bashthebug.net) for additional verification. Plates with exact agreement between at least two 470 phenotyping methods were marked as high quality. Low (no method agrees) and medium (two methods 471 with essential agreement) quality phenotypes were excluded from this analysis. 472 All strains included in this study had at least phenotypic data for BDQ. Some strains do not include CFZ 473 data due to missing information, low quality analysis, or removal due to experimental error. 474

Rv0678 variant literature search 475
An extensive search was performed to include and summarize previously published BDQ and/or CFZ 476 resistant associated mutations from in vitro, in vivo, and patient derived isolates. We used PubMed, 477 Google, and Google Scholar to search literature published from 2014 to June 2021. Search criteria included 478 the key "TB", "Mycobacterium tuberculosis", "MTB", "bedaquiline", "clofazimine", "treatment", "clinical 479 report", "patient", "MDR-TB", "XDR-TB", "diarylquinoline", and "drug resistance". 480 Mutations in all resistance associated genes: Rv0678, atpE, Rv1979c, and pepQ were included in our final 481 analysis, all variants with low MICs or multiple mutations in the same gene were excluded (

Molecular dynamics simulations 494
All the systems simulated in the present work, including Rv0678-WT, Rv0678-A101E, Rv0678-L40V and 495 Rv0678-L40F, were prepared using BiKi Life Sciences Software Suite version 1.3.5 of BiKi Technologies s.r.l 496 87 . Each simulated system consisted of Rv0678 homodimer unit X-ray structure (PDB ID 4NB5) and the 497 mutations were generated using UCSF Chimera software 84 . The Amber14 force field was used in all 498 molecular dynamic simulations. TIP3P waters were added to make an orthorhombic box. Adding a suitable 499 number of counter-ions neutralized the overall system. Then, the energy of the whole system was 500 minimized. Four consecutive equilibration steps were then performed: 1) 100 ps in the NVT ensemble at 501 100K with the protein backbone restrained (k=1000 kJ/mol nm 2 ), 2) 100 ps in the NVT ensemble at 200K 502 with the protein backbone similarly restrained, 3) 100 ps in the NVT ensemble at 300K with the protein 503 backbone restrained, and 4) 1000 ps in NPT ensemble at 300K with no restraints. For atoms less than 504 1.1nm apart, electrostatic forces were calculated directly; for atoms further apart electrostatics were 505 calculated using the Particle Mesh Ewald. Van der Waals forces were only calculated for atoms within 1.1 506 nm of one another. The temperature was held constant using the velocity rescale thermostat, which is a 507 modification of the Berendsen's coupling algorithm. Finally, simulations 100 ns long in the NPT ensemble 508 at 300K were performed for each system. To detect allosteric signal transmission networks across the 509 protein surface, defined as interconnected pocket motions, we carried out the allosteric communication