Phomoarcherin B as a novel HIV-1 reverse transcriptase RNase H activity inhibitor: conclusions from comprehensive computational analysis

The HIV epidemic has claimed more than 32.7 million live since its emergence in 1981, while many ART and HAART therapies are available and provide relief and control for patients, most of these therapeutics come with long-term side effects, resistance, socio-economical barriers, and other obstacles. In this study, genomic analysis was performed on 98 HIV-1 genomes to determine the most coherent target that could be utilized to restrict and cease the viral replication, the reverse transcriptase enzyme. Following the identification of the target protein, the RNase H activity of the reverse transcriptase was nominated as the potent target given the limited research associated with it. A library of 94 thousand small molecule inhibitors was generated and virtual screening was performed to identify hits, based on the reproducibility of the screening results, 4 compounds with the best scores were considered and their interaction within the active site was analyzed. Subsequently, all-atom molecular dynamics simulations and MM-PBSA was performed to validate the stability and binding free energy of the hits within the RNase H active. In silico ADMET assays were performed on the hit compounds to analyze their drug-likeness, physicochemical and pharmacological properties. Phomoarcherin B, a pentacyclic aromatic sesquiterpene naturally found in the endophytic fungus Phomopsis archeri, known for its anticancer properties scored the best in all the experiments and was nominated as a potential inhibitor of the HIV-1 reverse transcriptase RNase H activity. Author Summary The Human immunodeficiency virus 1 (HIV-1) has remained a global public health issue for the past 4 decades with millions of patients around the globe affected. Long years of research in the field of anti-viral therapies had introduced several drugs to combat the viral infection, however, most of these drugs come with their shortcomings. In this study, computational drug discovery approaches were utilized to research novel drugs that could be used as anti-viral agents against the former virus, the results had revealed Phomoarcherin B, a natural compound from an endophytic fungus to hold promising therapeutic potentials.

P a g e 2 | 57 of the screening results, 4 compounds with the best scores were considered and their interaction within 21 the active site was analyzed. Subsequently, all-atom molecular dynamics simulations and MM-PBSA was 22 performed to validate the stability and binding free energy of the hits within the RNase H active. In silico 23 ADMET assays were performed on the hit compounds to analyze their drug-likeness, physicochemical and 24 pharmacological properties. Phomoarcherin B, a pentacyclic aromatic sesquiterpene naturally found in 25 the endophytic fungus Phomopsis archeri, known for its anticancer properties scored the best in all the 26 experiments and was nominated as a potential inhibitor of the HIV-1 reverse transcriptase RNase H 27 activity. anti-viral therapies had introduced several drugs to combat the viral infection, however, most of these 33 drugs come with their shortcomings. In this study, computational drug discovery approaches were utilized 34 to research novel drugs that could be used as anti-viral agents against the former virus, the results had 35

38
The Human immunodeficiency virus 1 (HIV-1) remains a global public health issue ever since its 39 first identification in 1981, its the caustic agent for the Acquired immunodeficiency syndrome (AIDS) which 40 compromises the immune system of the host and eventually leave them defenseless against secondary 41 infection [1]. The World Health Organization refers to HIV-1 as a "global epidemic" [2], and according to 42 P a g e 4 | 57 membranes which leads to the insertion of the viral capsid into the host cell. Once inside the host, the 67 capsid disintegrates and the viral RNAs are reverse transcribed into DNA molecules via RT which starts 68 with an RNA/DNA hybrid and further cleavage of the RNA, and synthesis of dsDNA takes place. The proviral 69 dsDNA is further integrated into the host genome via the integrase enzyme where it remains as a reservoir 70 for the virus until the cell is activated upon which the cellular transcription and translation mechanisms 71 are hijacked to produce the viral proteins and viral RNA genomes [6][7][8]. 72 The first clinical attempts to treat the virus started in 1987 and later that year zidovudine (ZDV,73 also referred to as AZT or azidothymidine) was approved for clinical usage which marked the beginning of 74 the development of antiretroviral therapies (ARTs) strategies [9]. Nowadays, 6 classes of ART drugs 75 targetting 5 different phases of the HIV life cycle has been developed and they're classified as nucleoside 76 and nucleotide reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcription inhibitors 77 (NNRTIs), protease inhibitors (PIs), entry or fusion inhibitors, and integrase strand transfer inhibitors 78 (INSTIs) [10]. A list of all the FDA-approved ART drugs against HIV-1 is shown in Table 1. 79

81
Another substantial discovery in the field of ART for HIV treatment started with the high active 82 antiretroviral therapy (HAART) era in the mid-1990s. The replication mechanism of HIV-1 is far from 83 perfect, mainly due to its error-prone reverse transcriptase which randomly introduces mutations into the 84 viral genome which could over time lead to resistance against drugs and HAART provided the solution for 85 the issues. In HAART, a cocktail of several ART drugs from different drugs are prescribed at the same time, 86 since the probability of developing mutations to become resistant to several drugs at the same time is far 87 less likely than developing mutations against a single drug, HAART has minimized the drug resistance issue 88 that single ART drugs caused substantially [1,11,12]. Despite the hope, HAART holds for its patients and 89 the opportunities it provides, its yet far from perfect as the current drugs prescribed under HAART comes 90 with many short and/or long term side effects such as hepatotoxicity, cardiovascular complications, distal 91  The results from the whole-genome alignment are provided in Fig 3, the alignment is visualized 186 such that each LCB is clustered together, the alignment is zoomed out so as each clustered LCB is shown 187 P a g e 12 | 57 as a black continuous bar, a region with a length of around 2.4 kb within the ≈2.8-5.3 kb range from the 188 consensus sequence is highly conserved with almost no gaps in most of the sequences. The alignment for 189 this conserved region further shows a very high degree of conservation within this region throughout all 190 of the HIV-1 genomes aligned with a gap only in one of the sequences (Fig 4). 191 The virtual screening of the ligand dataset against the HIV-1 RT enzyme nominated 7 compounds 204 with significantly high affinities (≥ -8.5 kcal/mol), among them only 4 compounds successfully achieved 205 the same affinity in 3 subsequent runs, hence only these 4 compounds were selected and further 206 analyzed, the top docking poses' for these 4 compounds are shown in Fig 5,       The RMSD for the Cα of the RT from each frame throughout the 50 ns equilibration simulation 246 was extracted and calculated using the 1 st frame in each trajectory as the reference point, Fig 7 (a) shows 247 the RT equilibration state throughout the equilibration simulation (plateau in the 40-50 ns interval). Fig 7  248 (b) shows the Cα RMSF calculated using the same 1 st frame as the reference point for each equilibration 249 simulation. An average RMSF plot for the 4 simulations was also calculated by averaging the RMSF of each 250 residue over all of the simulations (Fig 7. c) to analyze the overall RMSF of the RT through the course of 251 equilibration under the simulation conditions. 252 The plots for Cα RMSD against time for each RT-lead compound simulation (Fig 8), Cα RMSF for 259 each residue for each RT-lead compound (Fig 9), and the hydrogens bonds formed as a function of time 260 in each RT-lead compound system (Fig 10) throughout the 30 ns production simulation were plotted to 261 subsequently analyze the motion of the lead compound within the simulation system as well as the 262 stability of the RT backbone in presence of the lead compounds.
were calculated for the RT enzyme, lead 284 compound, and RT-lead complex separately from each production simulation, and their total sum was 285 used in equation 4 to calculate the binding free energy ∆ / , the sums of each energy term is 286 provided in Table 4 along with their standard deviations, the values of ∆ indicates the 287 spontaneity of the interaction between the RT and lead compounds (i.e. more negative values implies a 288 more spontaneous reaction). Detailed value for each energy term for the protein, ligand, and complex 289 separately is provided in S5. 290 Table 4. MM/PBSA energy terms for ∆ calculated for each RT-lead pair, energies were calculated 291 from the last 10 ns of the production trajectory using the single trajectory approach. 292 The physiochemical properties of the 4 lead compounds are listed in Table 5 along with their drug-295 likeness results, all 4 lead compounds passed the Lipinksi rule of 5, Ghose filters, Veber filter, Egan filter, 296 and Muegge filter without any violations. The ADMET profiles of each lead compound are also 297 summarized in Table 6 based on the results from admetSAR and ADMETlab. 298 Lipinksi rule of 5 ͱ Pass (0) Pass (0) Pass (0) Pass (0) Ghose filters ͱ Pass (0) Pass (0) Pass (0) Pass (0) Veber filters ͱ Pass (0) Pass (0) Pass (0) Pass (0) Egan filters ͱ Pass (0) Pass (0) Pass (0) Pass (0) Muegge filters Pass Pass Pass Pass *Based on the Log S (ESOL) scale, insoluble < -10 < poor < -6 < moderate < -4 < soluble < -2 < very < 0 < highly soluble.  ͱ Probability of half-life being greater than 3 hours is given within parentheses, below 0.5 was considered to have T1/2 < 3.
The SMILES notation of the lead compounds was used as the input to calculate each of the properties listed in the and HAART therapies provide some relief and support for the patients, it comes with its disadvantages 307 and limitations. As mentioned in the introduction, this study aimed to perform extensive in silico analysis 308 to discover and evaluate potent novel inhibitors of HIV-1 replication within the host by targeting the most 309 coherent target, providing therapeutic options for the patients while at the time accelerating the drug 310 development processes by providing potential leads. 311 The mutation rate for HIV-1 is reported to be 10 −4 to 10 −2 mutants/clones and with the estimated 312 production of 10 9 virions/day within an infected individual, the virus mutates quite efficiently to develop 313 resistance and/or evade the immune system [60]. However, not all of these mutants are expected to 314 survive and replicate as mutations to some genes could be lethal, hence the most coherent target for drug 315 discovery and development efforts would be the phenotypes that mutate less frequently as their chance 316 of developing resistance or evasion are lower than their highly mutating counterparts, to determine such 317 regions within the HIV-1 genome, the comparative genomics approach was used, comparative genomics 318 considers the correlations and differences between the genotype (genome) of closely related species or 319 even different variants of the same specie to answer the reasons behind their characteristic phenotypes, 320 this method has been widely used to determine resistance genes in several bacterias in the past [61][62][63]. 321 In this study, we utilized 98 high-quality HIV-1 sequences from the Los Alamos database and performed 322 whole-genome alignment with MAUVE to determine the regions within the HIV-1 that genome that shows 323 the highest level of consensus among all of the 98 sequences, as visualized in Fig 3, a genomic  The RT enzyme being among the most extensively studied proteins of HIV-1, this study aimed to 330 discover and analyze potent targets inhibiting the RT enzyme itself, given that several NNRTI and NRTI has 331 already been approved by the FDA for use and since all of them function by inhibiting the DNA polymerase 332 activity of the RT, the aim of the study focused specifically on discovering and analyzing potent RT RNase 333 H inhibitors, which is equally critical for the viral replication as its polymerase activity [64][65][66][67]. 334 Virtual screening has been used extensively in recent years to discover novel therapeutics and/or 335 repurpose existing drugs to new targets, in summary, it can be viewed as a form of batch molecular 336 docking where an attempt is made to position each compound within a library of compounds to the best 337 position it can have within a specified region of a protein, usually an active site or a receptor's binding 338 domain, such that it would attain maximum affinity towards it, this method is more efficient, faster, and 339 cheaper than the high throughput screening (HTS) approaches [55,68,69]. In this study, a library of 94,545 340 small molecules stable at physiological pH with a charge of 0, -1, and -2 (since the HIV-1 RNase H active 341 site contains positive Mn/Mg cofactors) was generated with their weights limited to be in the range of 342 350-500 g/mol (based on the size of the RT's RNase H active site pocket). The virtual screening result 343 initially reported 7 hits with affinity scores below -8.5 kcal/mol, upon 3 subsequent repetitions, only 4 of 344 these compounds successfully reproduced the same affinity scores in the same pose within the RNase H 345 active site (referred to as Compound 1, 2, 3, and 4 in this study, also briefly described in Table 2)  RNase H active site (as shown in Fig 6), the binding poses are reasonable as all the lead compounds form 353 several hydrogen bonds except for compound 3 only, all the leads also happen to be in less than 4 Å 354 proximity to form hydrophobic and ionic interactions with the active site residues and the active site Mn 355 cofactor which provides some basis to the potential these lead compounds have in being actual potent 356 inhibitors [70,71]. 357 Molecular dynamics provide a great opportunity to explore and analyze the behavior of 358 biomolecules at nano-scale levels, which makes it an indispensable tool for drug discovery, as the 359 interactions a drug could make under simulation conditions designed to replicate the physiological cellular 360 conditions are more likely to represents its interactions in in vitro and in vivo, and therefore dictate its 361 activity and side effects [72,73]. In both X-ray and Cryo-EM (cryogenic electron microscopy), the protein 362 structures are not determined in physiological conditions but rather in extremely low temperatures (often 363 liquid Nitrogen temperature), hence the first step in simulating near-actual physiological conditions is to 364 equilibrate the protein into its native state under the physiological temperature [74]. Performing 365 equilibration of proteins in a single step is often ill-advised as it can cause irregularities and clashes within 366 the structure resulting in unexpected behaviors and biases during downstream analysis, hence, in this 367 study, each of the 4 RT-lead compound systems was minimized and equilibrated in 5 steps, first, the 368 energy of the system was minimized for 2 ns, then the RT-lead complex along with the Mn 2+ cations was 369 restrained to allow the water and salt ions to equilibrate around the RT-lead complex for 5 ns, then the 370 restraints on the protein's side chains was released to allow them to equilibrate for 10 ns, following this 371 step was the whole system equilibration for 50 ns with only the Mn 2+ cations restained to avoid it from 372 drifting away, and finally, 30 ns simulation with no restraint was performed from which the trajectory was 373 P a g e 31 | 57 collected for analysis [75]. To ensure that the RT enzyme was well-equilibrated before performing the 374 production simulation, the RMSD of the RT's Cα was plotted throughout the 50 ns equilibration simulation 375 to ensure it reached a plateau, which happened as seen in Fig 7 (a), the RT enzyme in all of the 4 systems 376 reached a plateau after the 30 ns time-lapse which meant the production runs would provide accurate 377 and less biased results for the behavior of the lead compounds. The HIV-1 RT enzyme is a large complex 378 with many loop regions, which unlike α-helix and β-sheet conformations have no absolute stable 379 conformation, hence they often move randomly throughout the simulation resulting in higher overall 380 RMSD and RMSF, this phenomena can be noticed in Fig 7 (b and c), where residues close to the C-and N-381 terminal, i.e. the fingers subdomain (residue 1-85 & 118-155, consisting of several loops) and the palm 382 subdomain (residue 86-117 & 156-236, containing some long loops) has higher RMSF values in each of the 383 4 simulation systems. The RMSD plot of compound 1 (Fig 8a) is definitively a negative result as compound 384 1's backbone (cyan line) appear to deviate ≈ 2 Å throughout the simulation, this implies that on average, 385 compound 1 had moved ≈ 2 Å from its initial docked position, which can be interpreted into it leaving the 386 active site, thereby possessing no potent affinity towards the RT enzyme's RNase H domain, compound 2 387 and 4 (Fig 8 b &  also plotted to ensure that the RT enzyme structure was not compromised in any of the simulation 394 systems, since the enzyme structure was already equilibrated for 50 ns prior (their RMSD graphs reached 395 a plateau in Fig 7), no high peaks are to be expected like those of Fig 7 (above 10 Å RMSD), Cα RMSF below 396 5 Å is mainly contributed by the vibration effect and random movement of the loop regions [76]. Fig 10  397 P a g e 32 | 57 shows all the potential hydrogens between each lead compound and the residues within 3 Å range from 398 them at a maximum cut-off angle of 20°, each lead compound retains at least 3 hydrogen bonds on 399 average throughout simulation which is relatively similar to the hydrogen bonds predicted using PyMol 400 (Fig 6 and Table 3) except for compound 3, which was predicted to form no hydrogen bonds, however, 401 even in Fig 10 (c) compound 3 doesn't form stable hydrogen bonds until it passes the 18 ns mark, 402 considering its RMSD of around 1 Å (Fig 8c), it probably rotated around itself to some degree and/or got 403 closer to other nearby residues to form these hydrogen bonds and even then, it made on average fewer 404 hydrogen bonds than the other lead compounds. 405 To determines the spontaneity or propensity of a drug candidate at a given (physiological) 406 temperature, its binding free energy toward its biomolecular target is used as an objective quantity in the 407 selection process. Binding free energy calculations are quite challenging both in vitro and in silico, 408 however, in silico routes are less labor-intensive and more computationally expensive, in this study the 409 fast and accurate MM-PBSA protocol was used on the last 10 ns (where the RMSD had reached a plateau 410 and were stable) of each simulation system to estimate their binding free energies, a total of 8 energy 411 terms was calculated for each of the RT enzyme, the lead compound, and the RT enzyme in complex with 412 the lead compound. Equation 4 was used to calculate ∆ which was also approximated to be equal 413 to the respective ∆ as mentioned in equation 5 (and justified by the former reasons) [77,78]. The 414 binding energies for each of the compounds (in Table 4) are promising with the best result from compound 415 3 (which also had the lowest overall RMSD) followed by compound 1, compound 4, and finally, compound 416 2, all the estimated ∆ values being negative can conclude that each of these compounds can 417 spontaneously bind to the RT's RNase site, however, given the RMSD plot of compound 1 (Fig 8a), it is 418 unlikely that this value is true for the RNase H active site since it drifted from the active site ≈ 2 Å, hence, 419 its binding free energy is to some region ≈2 Å away from the RNase H active, therefore, from the molecular 420 dynamics simulations and MM-PBSA calculations, we can conclude that compound 2, 3, and 4 has a 421 P a g e 33 | 57 significant affinity to the HIV-1 RT's RNase H active site and thereby could potentially hold some inhibitory 422 activity. 423 Evaluation of the physicochemical, pharmacological, and ADMET profiles of the lead compounds 424 has been used extensively in the very early stages of drug discovery to accelerate the conversion of leads 425 into qualified development candidates, most of these evaluations are performed in vitro which costs a lot 426 of time and resources, however, with the developments in the field of machine learning and quantitative 427 structure-activity relationship (QSAR), these experiments could be performed in silico with statistically 428 significant results, this study utilized the pre-trained algorithms of 3 different ADMET assessing webserver 429 to conclude the physiochemical, pharmacological, and ADMET profiles of the lead compounds analyzed 430 [79-81]. Since compound 1 had shown to not poses proper inhibitory potentials from previous analyses, 431 further interpretations focused strictly on compounds 2-4. As given in Table 6, each of compounds 2-4 432 exercised high gastrointestinal absorption levels which are critical for any drug as they have to be 433 absorbed into the body to exert their effect, among the lead compounds, only compound 4 showed the 434 potential to permeate blood-brain barrier which makes it's a valuable lead as HIV-1 is known to cross the 435 blood-brain barrier and infect target cells causing neurocognitive disorders and brain damage in the 436 processes [82,83]. As for the distribution of the compounds, the most important parameter is the fraction 437 of the compound unbound in the plasma, ideally, the higher is better as the unbound fraction is the active 438 drug available to exert its function (>5% is a decent threshold), among the lead compounds, both of 439 compound 3 and 4 provide accepTable unbound fractions for a therapeutic agent [84,85]. 440 Cytochrome P450 (CYP450) enzymes are essential for the metabolism of many drugs, around 50 441 isoforms of CYP450 enzymes are identified in humans, however, 90% of drugs are known to be 442 metabolized by 6 of these isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, CYP3A4, and CYP3A5), the overall 443 performance of the lead compounds was evaluated by CYP inhibitory promiscuity parameter (Table 6), 444 P a g e 34 | 57 the classification of high and low refers to whether the respective compound does or does not possess 445 substantial CYP450 inhibitory promiscuity, lower CYP inhibitory promiscuity indicates less risk of clinical 446 drug-drug interaction and better/higher metabolism (and therefore excretion as well) of the drugs, among 447 the lead compounds reported in this paper, both compound 3 and 4 exhibits low CYP inhibitory 448 promiscuity making them ideal choices for drug candidacy [86,87]. The excretion of the lead compounds 449 was evaluated based on their half-life (T1/2) within the body, this parameter indicates the time it takes for 450 the lead compounds to reach half of their initial concentration in the body once administered. While there 451 is no ideal benchmark for T1/2 of therapeutic drugs, 12-48 h are often considered ideal as shorter T1/2 would 452 require more frequent intake of the drug to maintain its activity and longer T1/2 would cause accumulation 453 of the drug and prolonged side effects [88]. Among the lead compounds reported in this paper, 454 compounds 2 and 4 exhibit T1/2 greater than 3 hours (although with a lower probability for compound 2) 455 whereas compound 3 exhibited T1/2 lower than 3 hours, further nominating compound 4 as a potent drug 456 candidate. Table 6 also reports the toxicity of the lead compounds based on pre-trained regression models 457 by admetSAR, admetSAR database contains 5 quantitative regression models with high predictive 458 accuracy as their models are trained with over 210,000 ADMET annotated data points for more than 459 96,000 unique compounds with 45 kinds of ADMET-associated properties curated from literature [89]. 460 According to its toxicity predictions based on rat acute toxicity and Tetrahymena Pyriformis toxicity, 461 compound 4 is the best performing lead, ideally, the higher the LD50 and pIGC50 for rat acute toxicity and 462 Tetrahymena Pyriformis toxicity respectively, the higher the dose it would require to attain a toxic effect, 463 hence providing a wider range for a therapeutic dose [90]. None of the lead compounds were predicted 464 to exhibit any carcinogenic activity. The acute oral toxicity parameter was predicted based on a model 465 trained on 12,204 diverse compounds with their LD50 classified based on the criterion of US EPA for all 466 drugs, all of the lead compounds reported in this paper were predicted to have an acute oral toxicity 467 P a g e 35 | 57 between 500-5000 mg/kg, which a implies that doses below this range are unlikely to result in any acute 468 oral toxicity, hence each one of them are suitable for oral administration. 469 Considering all the in silico analysis reported in this paper and the lead selection criteria applied, 470 compound 4 is the best performing lead compound, with a docking score of -8.5 kcal/mol, several 471 hydrophobic, hydrogen bond, and ionic interactions with active site residues of the HIV-1 RNase H and 472 the cofactor Mn 2+ cations, it had only shown less than 1 Å deviation from its initial docked pose throughout 473 the 30 ns molecular dynamic simulation, and attained a binding free energy of ≈ -35.58±4.84 kcal/mol 474 with a near-ideal ADMET profiles, a video of the production simulation of the RT with compound 4 is 475 provided in S7. Compound 4, with the IUPAC name (6aR,6bS,10aR,12aS)-5-Hydroxy-6b, 10,10,12a-476 tetramethyl-6a,7,8,10,10a,11,12,12a-octahydro-1H-benzo[a]furo [3,4-h]xanthene-3,9(6H,6bH)-dione is 477 also known in the literature as Phomoarcherin B. Phomoarcherin B (PubChem CID 52952104) is a natural 478 compound found in the endophytic fungus Phomopsis archeri, it was first isolated and characterized as a 479 pentacyclic aromatic sesquiterpene via spectroscopic analysis by Hemtasin et al. (2011) and tested for 480 antimalarial and anticancer activities on cholangiocarcinoma cell lines [91]. It was also featured in 2 481 different reviews for the same characteristics, however, no in vitro or in vivo assay has been performed 482 regarding its antiviral or RNase H inhibitory activity, making it a potential novel lead that could inhibit HIV-483 1 replication by inhibiting its RNase H activity [92,93] database [19]. The sequences were manually selected such that only 2 sequences (where applicable) were 501 chosen from each country and the dataset included all the geographic regions available (however, this 502 trend was not strictly followed as some countries like the US received higher coverage for its size and the 503 number of high-quality sequences deposited whereas other countries like India despite its size, due to 504 lack of abundant high-quality sequences deposited, received lower coverage). Whole-genome alignment 505 was performed via progressiveMAUVE algorithm with match seed weight set to automatic calculation, 506 minimum Locally Collinear Blocks set to default (3 times the minimum match size), progressive Muscle 507 (v3.6) was selected for as gap aligner for each LCB, and minimum island size, maximum backbone gap size, 508 minimum backbone size were set to 50 [94,95]. The list of all the sequences used for the alignment is 509 included in supplementary materials 1 (S1) and the whole genome alignment result is included in S2. The 510 alignment result was visualized in Geneious Prime (v2020.1) and the highest conserved continuous region 511 with no gaps in the alignment was excised from the alignment and visualized separately in-depth [96]. A 512 consensus identity sequence from the conserved fragment was generated using Jalview and submitted to 513 NCBI BLASTx with the default parameters (max target sequences 100, expected threshold 0.05, word size 514 of 6, max match in a query range 0, matrix BLOSUM62, gap costs for existence 11, an extension of 1, and 515 compositional adjustments via conditional compositional score matrix adjustment), the alignment for the 516 excised fragments are provided in S3 and the consensus sequence is provided in S4 [97][98][99]. research group was utilized to generate the topology and parameter files for the RT enzyme, the same 541 force field was used to parameterize the ligand molecules as well (via LigParGen server with 1.14 CM1A 542 charge model) [108][109][110][111]. Each pair of RT-ligand complex was immersed in a square box with explicit TIP3P 543 water such that such a distance of 5 Å was maintained between the edge of the box to the RT-ligand 544 complex along each axis, the system was neutralized with Na + and Clions and their final concentration 545 was maintained at 0.15 mol/L (physiological salt concentration). The system was minimized for 2 ns to 546 reach its lowest energy relaxed state from the X-ray diffraction state, the system was then equilibrated 547 for 5 ns at 310 K with periodic boundary conditions, Langevin dynamics, Particle Mesh Ewald (PME) for 548 electrostatics, and Langevin piston (at 1 atm pressure) with the RT-ligand complex constrained to allow 549 the water and ions equilibrate around the complex, this step was followed by 10 ns equilibration with the 550 constraints on the protein side chains released to allow the side chains to relax. The system was then 551 subjected to 50 ns equilibration with constraints only on the cofactor Mn 2+ cations to allow the system to 552 reach its equilibrium while maintaining the cofactor in the active site. The root mean square deviation 553 (RMSD) and root mean square fluctuation (RMSF) of RT's Cα from equilibration simulation was calculated 554 using the 1 st frame as the reference point to monitor the RT's behavior under the simulation system. 555 Finally, a 30 ns production simulation with no constraints was performed from which the trajectory was 556 collected for analysis. 557 Throughout the simulation, the outputs were written to the trajectory every 1 ps, and RMSD of 558 RT's Cα and the lead ligand as a function of time elapsed was plotted, RMSF of the Cα for each residue of 559 the RT throughout the production simulation was also plotted. The number of hydrogen bonds between 560 the RT enzyme and the respective lead ligand throughout the production simulation was also plotted (with 561 the donor-acceptor thresholds set to < 3 Å and angle cutoff set to 20° ), all the statistical analysis and 562 visualizations was performed using the Matplotlib and Seaborn libraries [112,113]. 563 4.5. Binding free energy calculation via MM/PBSA 564 The binding free energy (∆Gbind, Gibbs free energy) between the lead compounds and RT enzyme 565 was calculated from the last 10 ns (stable RMSD interval) for each simulation system (compromising of 566 1001 snapshots) via the MM/PBSA single trajectory protocol, the formula in equation 2 was followed to 567 calculate the energy terms for each of the RT enzymes, lead compounds and RT-lead complex using the 568 CaFE plugin (v1.0) in VMD. Finally, equation 4 was used to calculate the ∆Gbind [114,115]. 569 Where ∆ = ( ∆ + ∆ + ∆ ), the sum of (∆ + 574 ∆ + ∆ ) aka ∆ represents the changes of the gas phase molecular mechanic 575 energy, and − ∆ is the product of temperature and the change in the conformational entropy upon 576 binding. One of the limitations of in silico end-point binding free energy calculation methods is the 577 difficulty in capturing the changes in configurational entropy involved with the association of the protein 578 with the ligand, this step requires adequate sampling of all the relevant movements of the ligand to the 579 corresponding protein, however, experimentally it demands extensive computation and often provides 580 highly inaccurate results, hence, most in silico approximate binding free energy estimator protocols tend 581 to drop the conformational entropy terms in ∆ calculations, which was also followed in this study, 582 therefore, the ∆ values were calculated using equation 5 to calculate the approximate ∆ , to 583 compensate for this, the gas phase (∆ ) energy calculation was performed and a longer trajectory (10 584 ns) was utilized to make the closest approximation of the binding free energy possible. [114,116,117]. 585 ∆ ≈ ∆ (5) 586 587 4.6. In silico physicochemical and ADMET profile analysis 588 The top lead compounds were submitted to the SwissADME webserver from the Swiss Institute 589 of Bioinformatics to calculate their physiochemical properties and drug-likeness [118]. The lead 590 compounds drug-likeness were evaluated based on 5 filters, the Lipinksi rule of 5, Ghose filters, Veber 591 filter, Egan filter,. As for the ADMET analysis, admetSAR (which is also used 592 by DrugBank to evaluate drugs ADMET profile) and ADMETlab (v2.0) webserver were collectively used to 593 analyze each lead compound [59,89]. 594 595