Structure-based discovery of inhibitors of the SARS-CoV-2 Nsp14 N7-methyltransferase

An under-explored target for SARS-CoV-2 is non-structural protein 14 (Nsp14), a crucial enzyme for viral replication that catalyzes the methylation of N7-guanosine of the viral RNA at 5′-end; this enables the virus to evade the host immune response by mimicking the eukaryotic post-transcriptional modification mechanism. We sought new inhibitors of the S-adenosyl methionine (SAM)-dependent methyltransferase (MTase) activity of Nsp14 with three large library docking strategies. First, up to 1.1 billion make-on-demand (“tangible”) lead-like molecules were docked against the enzyme’s SAM site, seeking reversible inhibitors. On de novo synthesis and testing, three inhibitors emerged with IC50 values ranging from 6 to 43 μM, each with novel chemotypes. Structure-guided optimization and in vitro characterization supported their non-covalent mechanism. In a second strategy, docking a library of 16 million tangible fragments revealed nine new inhibitors with IC50 values ranging from 12 to 341 μM and ligand efficiencies from 0.29 to 0.42. In a third strategy, a newly created library of 25 million tangible, virtual electrophiles were docked to covalently modify Cys387 in the SAM binding site. Seven inhibitors emerged with IC50 values ranging from 3.2 to 39 μM, the most potent being a reversible aldehyde. Initial optimization of a second series yielded a 7 μM acrylamide inhibitor. Three inhibitors characteristic of the new series were tested for selectivity against 30 human protein and RNA MTases, with one showing partial selectivity and one showing high selectivity. Overall, 32 inhibitors encompassing eleven chemotypes had IC50 values <50 μM and 5 inhibitors in four chemotypes had IC50 values <10 μM. These molecules are among the first non-SAM-like inhibitors of Nsp14, providing multiple starting points for optimizing towards antiviral activity.


118
Ultra-large library docking against Nsp14 identifies novel inhibitors. Due to the lack 119 of SARS-CoV-2 Nsp14 protein structure when this project began, we initially used the N7-MTase 120 domain of SARS-CoV-1 Nsp14 (PDB ID 5C8S) 23 for retrospective control calculations, which 121 helped us to validate the recognition of known ligands. The SAM binding site of SARS-CoV-1 122 Nsp14 was used without any modifications as the active site residues are conserved in both SARS-123 CoV-1 and SARS-CoV-2 N7-MTase domains ( Figure 1A). These control calculations confirmed 124 that we could preferentially dock known MTase adenosyl-containing compounds (SAM,SAH,125 and Sinefungin) and other known MTase inhibitors including LLY283 41 , BMS-compd7f 42 , and 126 Epz04777 41-45 in favorable geometries with high ranks versus 300 property matched decoys 46,47 . 127 The structure used in these retrospective control calculations was subsequently supported by the Seeking non-covalent inhibitors, we first docked over 680 million molecules, mostly in the 132 "lead-like" range of the ZINC20 database (e.g., molecular weight < 350 amu, cLogP values < 3.5). 133 Each library molecule was sampled for complementarity, in an average of 3438 orientations and 134 for each of these about 187 conformations-over 3.6 x 10 14 ligand configurations were sampled 135 in the site in 121,018 core hours (about 5 days on 1000 cores). Seeking novel chemotypes, 136 molecules topologically similar to SAM analogs were discarded. Compounds remaining were 137 clustered based on ECFP4 fingerprints to identify unique chemotypes. Most cluster representatives 138 were prioritized for interactions with Trp292, Gly333, Asn334, Asp352, Ala353, Phe367, Tyr368 139 and Val389 using LUNA 48 . Molecules with strained conformations were deprioritized 38 . Of the 140 remaining molecules, the best scoring 5000 were visually inspected for key interactions and for 141 unfavorable features, such as uncomplemented polar groups buried in the active site, using 142 Chimera 49 . Ultimately, 93 molecules, each in a different scaffold, were de novo synthesized and 143 tested for enzyme inhibition at 30 and 50 µM, measuring the transfer of [ 3 H]-methyl from the SAM 144 methyl donor onto the cap structure of an RNA substrate (GpppAC4) ( Table S1). Of the 93 145 molecules tested, only ZINC475239213 ('9213) inhibited by more than 50% and was considered 146 active. This molecule had an IC50 of 20 µM in concentration-response (Figure 2A, middle panel). 147 In its docked pose, the base-like moiety of '9213 hydrogen-bonds with backbone amides of 148 Ala353, Phe367 and Tyr368, while more distal parts of the molecule hydrogen-bond with Gln354 149 and Lys336 (Figure 2A, right panel). Van der Waals and stacking interactions are also apparent 150 in the docked pose; overall these interactions resemble those observed among SAM and 151 established SAM-like inhibitors but are made with different ligand groups. 152

153
With the determination of the cryo-EM structure of the SARS-CoV-2 Nsp10-Nsp14 154 complex (PDB ID 7N0B) 23 , and the development of a larger "tangible" ZINC22 50 library of 1.1 155 billion molecules, we launched a second docking screen. The same retrospective control 156 calculations were performed to optimize docking parameters, leading to similar sampling and 157 calculation times. Following the same prioritization strategy as before, but seeking different 158 chemotypes, 72 diverse molecules, were de novo synthesized and experimentally tested for 159 enzyme inhibition (Table S1). Two inhibitors were found, ZINC730084824 ('4824) with an IC50 160 of 43 µM and ZINC61142882 ('2882) with an IC50 of 6 µM ( Figure 2B-C), a hit rate of 2.7 %. 161 The origins of the low hit rates for these two initial screens, and strategies to improve upon them, 162 will be considered below. 163  We tested the docking hits and analogs for colloidal aggregation, perhaps the dominant 203 mechanism of artifactual activity in early discovery 52-54 (Supplementary Figures 3.1 to 3.4). As 204 a first line of defense, all Nsp14 assays were conducted in the presence of 0.01% v/v Triton-X 100, 205 a non-ionic detergent that disrupts colloidal aggregates and right-shifts their potency 55, 56 . We 206 also conducted follow-up assays for actives looking for particle formation by dynamic light 207 scattering (DLS) and for activity against the widely-used counter-screening enzymes malate 208 dehydrogenase (MDH) and AmpC b-lactamase (AmpC), both with and without detergent. The 209 '9213 analog, '3888, did form colloid-like particles by DLS, with an apparent critical aggregation 210 concentration (CAC) in the 10 uM range. Like most of the inhibitors studied here, however, the 211 scattering intensity was relative modest, and the molecule did not inhibit the counter-screening 212 enzymes at concentrations substantially higher than the IC50 for Nsp14, even the absence of 213 detergent. While '3888 may form particles, we do not believe these are relevant for its inhibition 214 of Nsp14. Compound '4824 did not form detectable particles by DLS, but did inhibit the counter-215 screening enzymes MDH and AmpC at relevant concentrations. However, this inhibition can be 216 at least partly attributed to strong absorbance at assay wavelengths (such absorbance was not an 217 issue for the Nsp14 radioligand assay). Moreover, when we imitated the conditions of the Nsp14 218 assay by the addition of 0.01% Triton to the MDH and AmpC assays, inhibition was largely or 219 entirely eliminated. Meanwhile, the '4824 analogs ('6947, '6953, '6943) did form colloid-like 220 particles by DLS, though again with relatively modest scattering intensities. All three inhibited 221 MDH or AmpC at relevant concentrations, but here too inhibition was largely eliminated by the 222 addition of 0.01% of detergent. Compound '1988 formed colloid-like particles by DLS, but with 223 a CAC 10-fold higher than its Nsp14 IC50. While the compound's inhibition of MDH was in a 224 relevant range, it's activity against AmpC was not, and the inhibition of both enzymes disappeared 225 when we copied the Nsp14 conditions by the addition of Triton. We conclude that many of these 226 inhibitors do aggregate, but this does not appear to be relevant for their inhibition of Nsp14, for 227 which the inclusion of detergent appears to be prophylactic. These studies do support the 228 usefulness of including detergents like Triton or Tween in enzyme and receptor inhibition assays. The experiments were performed in triplicate.

238
Docking 16 million fragment-like molecules. With only three inhibitor scaffolds 239 discovered by lead-like docking, we stepped back to interrogate the site with fragment-based 240 docking. Fragment screens explore more chemical space than a larger lead-like library 14, 57, 58 , 241 which may be helpful for an under-explored site where warheads and key residue interactions have 242 not been characterized. With the proviso that they have lower affinities, fragments also have 243 higher hit rates in empirical 59 and docking screens 57, 58, 60 than do lead-like molecules, providing 244 a richer tiling of the binding site by ligand functional groups. Indeed, a strategy of fragment-245 docking was effective against another under-studied SARS-CoV-2 enzyme, Mac1 14, 15 , and 246 fragment-based discovery nucleated a successful drug-discovery campaign against the Mpro 247 enzyme 10 . Accordingly, from the 16 million molecule fragment-like set (e.g., molecular weight 248 < 250, cLogP < 2.5) in ZINC22, we targeted the full SARS-CoV-2 (PDB 7N0B) SAM site, the 249 adenine portion of that site, and the SAM-tail region in three independent campaigns ( Figure 4A) 250 (Methods) 61 . Overall,14,406,946,14,124,978,and 14,908,652 million molecules were scored, 251 respectively. For each, the top-ranked 300,000 fragments were filtered as above, and the remaining 252 fragments were clustered by topological similarity. Top-ranking cluster heads were visually 253 inspected in Chimera 49 for favorable interactions, prioritizing those in the adenine site campaign 254 for hydrogen bonds to Tyr368 and Ala353, and hydrophobic interactions with Phe367 48 . For the 255 SAM-tail docking screen, interactions with Gly333 were prioritized, with additional interactions 256 were selected for such as Gln313 and Asn386. For fragments docked against the entire SAM 257 binding site, a combination of these interaction criteria were used. Ultimately 69 fragments were 258 prioritized, of which 54 were successfully synthesized (78% fulfilment rate) ( Table S1). 259

Curation of 25 million aldehyde and acrylamide electrophiles for covalent docking. 296
In a final strategy, we sought potential covalent electrophiles that could react with the enzyme's 297 active site Cys387. Such covalent docking has been successful in campaigns that targeted catalytic 298 serine and non-catalytic, active site cysteine and lysine residues in enzymes such as b-lactamase, 299 Jak kinases 62 , eIF4e 63 , M Pro 64 and targets such as RSK2 and MSK1 65 . These earlier campaigns 300 had been limited to several hundred thousand electrophiles, largely from "in-stock" libraries. With 301 the advent of the ultra-large tangible libraries, we thought to curate a larger set of electrophiles, 302 focusing on aldehydes and acrylamides. Searching smarts patterns allowed us to build databases 303 of 7.3 million aldehydes and 17.7 million acrylamides. We compared our aldehyde and acrylamide 304 libraries to those that can be found in other in-stock or physical screening libraries, including the 305

UCSF Small Molecule Discovery Center (SMDC) 66 , Molecular Libraries Small Molecule 306
Repository of the NIH (MLSMR) 67 , and the in-stock set curated in ZINC20 51 . By total numbers, 307 the aldehyde library is 196-to 10,000-fold larger than the number of aldehydes in the other 308 libraries, while the number of core scaffolds represented by these electrophiles is 252-to 3,600-309 fold larger that those sampled in the previous libraries (Table 1)  with non-covalent DOCK3.7 scores < 0 kcal/mol were further filtered for internal strain 38 , 325 stranded hydrogen bond donors and acceptors, and for modeled hydrogen bonds with either 326 Tyr368, Ala353, or Gly333 48 . Lastly, 33,156 molecules were clustered for topological similarity, 327 and 9,591 molecules were prioritized for visual inspection in Chimera 49 . From these, 92 molecules 328 were selected for de novo synthesis. Of 61 aldehydes and 31 acrylamides, 47 and 26 were 329 successfully synthesized, respectively, a 79% fulfilment rate (Table S1). On experimental testing, 330 hits were defined as having at least 50% inhibition at 100 μM. For the aldehydes, four compounds 331 were active of 51 tested (a hit rate of 8%) and had IC50 values ranging 3.2 to 19 μM ( with Gly333 and Gly313 in its docked pose (Supplementary Figure 7). 341

342
In early optimization of acryl42, analog acryl42_10 was 4.5-fold more potent at 7 μM with 343 the addition of a methoxy ( Figure 5C). Adding a hydroxyl in the same place in analog acryl42_11 344 resulted in an inactive analog, indicating the methoxy could be adding hydrophobic contacts, 345 opposed to additional hydrogen bonds with the protein (Table S6). We tested the importance of 346 the free amide of the acrylamide warhead with methylation of analog acryl42_5; the analog was 347 inactive, perhaps reflecting the loss of a modeled hydrogen bond with the mainchain of Ala353.  Figure 1). While acryl41 did not form a measurable adduct by mass 356 spectrometry, acryl42 and its analog acryl42_10 did do so, supporting a covalent inhibition 357 mechanism (Supplementary Figure 8). We also changed the acrylamide warhead to the saturated 358 propanamide group in compound ZD160-68 resulting in no enzymatic inhibition, which furthered 359 support for acryl42 acting through covalent inhibition ( Figure 5C). Overall, acryl_42 and its 360 analog, acryl42_10, appear to be irreversible covalent inhibitors, while '1911 appears to be a 361 reversible covalent inhibitor. We expect that acryl_41 is also acting as a covalent inhibitor but 362 note that further mechanistic study of these classes is warranted. 363

364
The covalent inhibitors were evaluated for colloidal aggregation (Supplementary Figure  365   3.1, 3.2, 3.3, 3.4). The 12 uM aldehyde Z5185631889 ('1889 ( Figure 5) had a CAC five-fold 366 higher than its Nsp14 IC50 and did not inhibit either counter-screening enzyme under any measured 367 condition-if this compound aggregates it is not relevant for its Nsp14 activity. While the 3.8 368 likely is an aggregator, its aggregation is unlikely to be relevant to its Nsp14 inhibition. 372 Compounds were tested for inhibition of the enzymes at 10 μM, then selected for IC50 383 determination if higher than 50% inhibition was observed. The non-covalent, 6 µM lead-like 384 inhibitor, '1988, showed only modest selectivity, inhibiting nine enzymes more than 50% with 385 IC50 values ranging from 4 to 26 μM (Figure 6)

401
From this study emerge among the first Nsp14 inhibitors unrelated to SAM, either 402 topologically or by physical properties. Overall, 23 non-covalent, lead-like inhibitors across three 403 scaffolds were found with IC50 values less than 50 µM, providing SAR for additional optimization 404 (Figure 2, Figure 3, Table S2, Table S3, Table S4). Additional characterization and structure-405 based optimization demonstrated their competitive, non-covalent mechanism of action against 406 Nsp14 (Supplementary Figure 1, Supplementary Figure 2). The most active covalent inhibitors 407 were the initial aldehyde docking hits, with IC50 values ranging 3.5 to 12 μM, and the acrylamide 408 analog acryl42_10 with an IC50 of 7 μM, all modeled to modify Cys387 of Nsp14 (Figure 5). 409 Finding these depended on developing new tangible libraries of 25 million electrophiles-these 410 have been made publicly available for community use (https://covalent2022.docking.org) ( Table  411 1). Another eight families of inhibitors were revealed from docking a library of 16 million tangible 412 fragments (Figure 4). While affinities were naturally lower than the best of the lead-like 413 inhibitors, several fragments had mid-µM IC50 values, and the four most potent had LEs 0.32 to 414 0.42 kcal/HAC. Taken together, 19 new chemotypes were found; of these, 11 had members with 415 IC50 values <50 µM. 416 417 SARS-CoV-2 Nsp14 inhibitors described to date are SAM analogs 33, 69, 70 or fragments 418 with extensive water networks 71 . While the SAM analogs are widely-studied, they typically suffer 419 from both low cell-permeability, owing to their size and ionization state, and from low selectivity, 420 owing to their high similarity to the shared co-factor of this large family of MTases. Conversely, 421 the new molecules described here are smaller and mostly uncharged, and topologically unrelated 422 to SAM (Table S7) to draw. Moreover, as a SAM-dependent enzyme with many related human enzymes, chemical 432 novelty was important. Thus, as may be true with many SARS-CoV-2 targets, we could not 433 leverage knowledge from previous chemical series other than SAM analogs. The lack of chemical 434 precedence meant that these screens had a bootstrapping element to them-a small number of 435 successes in early campaigns enabled us to optimize subsequent ones, contributing to improved 436 hit-rates and affinities. We do note that our most informative screens-against the 16 million 437 tangible fragments-occurred late in the campaign. Whereas there may still be skepticism about 438 fragment docking, our own experience, not only here but also against the SARS-2 enzyme Mac1 439 14, 15 and in earlier studies against b-lactamases 57, 58, 60 , is that fragment docking can reveal multiple 440 chemotypes with high-ligand efficiency and fidelity to subsequently determined crystal structures. 441 Were we to begin again, we might have started with the fragment screen, leveraging the 442 interactions it revealed for campaigns against the larger, lead-like libraries. Such an approach may 443 be useful against other understudied viral targets. 444 445 Certain caveats merit airing. Our most potent inhibitors are low-μM, weaker than the most 446 potent of the SAM analogs previously characterized for Nsp14, the best of which inhibited in the 447 100 nM range 33,69,70 . '1911 needs additional characterization of its reversible covalent mechanism 448 of inhibition, limited here by its reversibility in mass spectrometry analysis, and low-μM activity 449 in the rapid dilution experiments. Many of the inhibitors form colloidal aggregates, which would 450 ordinarily be a concern for selectivity and artifactual activity. Control experiments suggest that 451 such aggregation is not relevant for Nsp14 inhibition. Still, it remains true that this activity must 452 be controlled for in subsequent optimization, and is a general hazard to navigation in early 453 discovery. Importantly, antiviral activity, cell toxicity, and cell permeability remains to be 454 explored for these molecules. Understanding these will inform future compound advancement. (http://sw.docking.org, http://arthor.docking.org), the latter primarily containing Enamine REAL 508 compounds (http://enamine.net/compound-collections/real-compounds/real-space-navigator). 509 The resulting analogs were further filtered based on Tc > 0.4 and docked to the N7-MTase domain 510 of SARS-CoV2 Nsp14. Compounds were also designed by modifying 2D structure and custom 511 synthesis by Enamine Ltd. (Kyïv, Ukraine). The docked poses were visually inspected for 512 compatibility with the site and prioritized analogs were synthesized and tested for each series, 513 respectively (Table S1). 514 515 Fragment docking. The optimized docking setup from the SARS-CoV-2 second non-516 covalent lead-like screen described above was used. Three different screens were run with different 517 matching spheres 61 -those in the adenine-site, SAM-tail site, or all matching spheres ( Figure  518 4A), with 15,738,235 docked and 14,406,946 scored, 15,738,278 docked and 14,124,978 scored, 519 and 16,299,173 docked and 14,908,652 scored, respectively. Each setup was analyzed separately 520 until visualization in Chimera 49 -the top 300,000 ranked poses were filtered for having torsional 521 strain less than 7 REU total, and single strain of 2.5 REU 38 , less than 2 stranded hydrogen bond 522 donors, less than 4 stranded hydrogen bond acceptors, and greater than 1 hydrogen bond to Tyr368, 523 Ala353, or Gly333 48 . Remaining molecules were visually inspected for having favorable 524 interactions. In total, 65 compounds were selected for purchasing, 50 from Enamine and 19 from 525 WuXi, and overall, 53 were successfully synthesized for a fulfilment rate of 82%. REAL databases, finding 20 million acrylamides and 6 million aldehydes. The DOCKovalent 3D 530 files were generated as previously described 63-65 . Briefly, the electrophiles were converted to their 531 transition state product and a dummy atom was placed indicating to the docking algorithm which 532 atom should be modeled covalently bound to the sulfur of the cysteine. Both 2D structures and 3D 533 DOCKovalent files are now publicly available at http://covalent2022.docking.org. To compare to 534 other public molecule databases, we used the ZINC20 in-stock set 51 , the MLSMR library 67 and 535 the UCSF SMDC library 66 , and searched the same SMARTS patterns for acrylamides and 536 aldehydes. The number of chemotypes were determined by  . 537 538 Covalent docking and compound optimization. The optimized docking setup from the 539 first SARS-CoV-1 lead-like screen described above was used, with differences being which 540 residues have been hyper-polarized 77 (Tyr368, Tyr368 and Ala353, or Tyr368, Ala353, and 541 Gly333, referred to as 1-HP, 2-HP and 3-HP, respectively). For the acrylamide screen against 1-542 HP, molecules with docked scores less than 0 were selected for filtering (top 341,000); those with 543 internal torsional strain less than total strain of 6.5 REU and single strain of 2 REU 38 , molecules 544 with less than 2 stranded hydrogen bond donors and less than 4 stranded hydrogen bond acceptors 545 were prioritized. Molecules were also selected that formed at least one hydrogen bond to Tyr368, 546 Ala353 or Gly333 using LUNA 48 leaving 2,423. After clustering for chemical similarity, 533 were 547 visually inspected in Chimera 49 . For the 2-HP setup, molecules with scores less than 0 (top 548 440,661) were filtered using the same criteria with 2,961 molecules remaining, comprising of 622 549 clusters that were visually inspected. For the 3-HP setup, no molecules passed the strain, IFP, and 550 hydrogen bond filter and were not considered further. Visual inspection prioritized molecules with 551 the same criterion as above. Lastly, selected compounds from both 1-HP and 2-HP setups were 552 clustered to select unique chemotypes, and 31 were purchased. Synthesis was successful for 26 for 553 a fulfilment rate of 84%. 554 555 For the aldehydes in the 1-HP setup, the top 894,979 compounds (dock score less than 0) 556 were filtered to prioritized as the acrylamides were above, with clustering for chemical similarity 557 leaving 1,340 for visual filtering. For the 2-HP setup, the top 1,494,350 were filtered to 3,548, and 558 3-HP setup of top 1,494,345 to 3,548 for visual inspection. Compounds were prioritized for the 559 same interactions as the acrylamides, and finally 61 aldehydes were selected. Synthesis was 560 successful for 47 of these for a fulfilment rate of 77%, and an overall covalent fulfillment rate of 561 79%. 562 563 Acryl42 analogs acryl42_5, acryl42_11 and acryl42_10 was designed off the acryl42 2D 564 chemical structure, and synthesized by Enamine; ZD160-68 was designed to test the activity of

Aggregation. 642
Dynamic Light Scattering (DLS). Samples were prepared as 8-point half-log dilutions in 643 filtered 50 mM KPi buffer, pH 7.0 with final DMSO concentration at 1% (v/v). Colloidal particle 644 formation was detected using DynaPro Plate Reader II (Wyatt Technologies). All compounds were 645 screened in triplicate at each concentration. For compounds that formed colloidal-like particles, 646 the critical aggregation concentration (CAC) was determined by splitting the data into two data 647 sets based on aggregating (i.e. >106 scattering intensity) and non-aggregating (i.e. <106 scattering 648 intensity) and were fitted with separate nonlinear regression curves, and the point of intersection 649 was determined using GraphPad Prism software version 9.1.1 (San Diego, CA). Levo Therapeutics, Inc., Cullgen, Inc. and Cullinan Oncology, Inc. J.J. is a cofounder and equity 725 shareholder in Cullgen, Inc. and a consultant for Cullgen, Inc., EpiCypher, Inc., and Accent 726 Therapeutics, Inc. 727 728