Abstract
A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An underutilised advance is the increase in structural biology throughput, which has progressed from an artisanal endeavour to a monthly throughput of up to 100 different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high throughput crystallography data into predictive models for ligand design. Here we designed a simple machine learning approach that predicts protein-ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein-ligand complexes, and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high throughput crystallography campaign against the SARS-CoV-2 Main Protease (MPro), obtaining parallel measurements of over 200 protein-ligand complexes and the binding activity. This allows us to design a one-step library syntheses which improved the potency of two distinct micromolar hits by over 10-fold, arriving at a non-covalent and non-peptidomimetic inhibitor with 120 nM antiviral efficacy. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry.
Competing Interest Statement
KLS is a consultant for Transition Bio Ltd. JDC is a current member of the Scientific Advisory Board of OpenEye Scientific Software, Redesign Science, and Interline Therapeutics, and has equity interests in Redesign Science and Interline Therapeutics. AAL has equity interests in PostEra.