RT Journal Article SR Electronic T1 Overview of the SAMPL6 pKa Challenge: Evaluating small molecule microscopic and macroscopic pKa predictions JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.10.15.341792 DO 10.1101/2020.10.15.341792 A1 Mehtap Işık A1 Ariën S. Rustenburg A1 Andrea Rizzi A1 M. R. Gunner A1 David L. Mobley A1 John D. Chodera YR 2020 UL http://biorxiv.org/content/early/2020/10/15/2020.10.15.341792.abstract AB The prediction of acid dissociation constants (pKa) is a prerequisite for predicting many other properties of a small molecule, such as its protein-ligand binding affinity, distribution coefficient (log D), membrane permeability, and solubility. The prediction of each of these properties requires knowledge of the relevant protonation states and solution free energy penalties of each state. The SAMPL6 pKa Challenge was the first time that a separate challenge was conducted for evaluating pKa predictions as part of the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) exercises. This challenge was motivated by significant inaccuracies observed in prior physical property prediction challenges, such as the SAMPL5 log D Challenge, caused by protonation state and pKa prediction issues. The goal of the pKa challenge was to assess the performance of contemporary pKa prediction methods for drug-like molecules. The challenge set was composed of 24 small molecules that resembled fragments of kinase inhibitors, a number of which were multiprotic. Eleven research groups contributed blind predictions for a total of 37 pKa distinct prediction methods. In addition to blinded submissions, four widely used pKa prediction methods were included in the analysis as reference methods. Collecting both microscopic and macroscopic pKa predictions allowed in-depth evaluation of pKa prediction performance. This article highlights deficiencies of typical pKa prediction evaluation approaches when the distinction between microscopic and macroscopic pKas is ignored; in particular, we suggest more stringent evaluation criteria for microscopic and macroscopic pKa predictions guided by the available experimental data. Top-performing submissions for macroscopic pKa predictions achieved RMSE of 0.7-1.0 pKa units and included both quantum chemical and empirical approaches, where the total number of extra or missing macroscopic pKas predicted by these submissions were fewer than 8 for 24 molecules. A large number of submissions had RMSE spanning 1-3 pKa units. Molecules with sulfur-containing heterocycles or iodo and bromo groups were less accurately predicted on average considering all methods evaluated. For a subset of molecules, we utilized experimentally-determined microstates based on NMR to evaluate the dominant tautomer predictions for each macroscopic state. Prediction of dominant tautomers was a major source of error for microscopic pKa predictions, especially errors in charged tautomers. The degree of inaccuracy in pKa predictions observed in this challenge is detrimental to the protein-ligand binding affinity predictions due to errors in dominant protonation state predictions and the calculation of free energy corrections for multiple protonation states. Underestimation of ligand pKa by 1 unit can lead to errors in binding free energy errors up to 1.2 kcal/mol. The SAMPL6 pKa Challenge demonstrated the need for improving pKa prediction methods for drug-like molecules, especially for challenging moieties and multiprotic molecules.Competing Interest StatementJDC was a member of the Scientific Advisory Board for Schrodinger, LLC during part of this study, and is a current Scientific Advisory Board member for OpenEye Scientific and scientific advisor to Foresite Labs. DLM is a current member of the Scientific Advisory Board of OpenEye Scientific and an Open Science Fellow with Silicon Therapeutics. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Vir Biotechnology, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, XtalPi, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, The Einstein Foundation, and the Sloan Kettering Institute. A complete list of funding can be found at http://choderalab.org/funding.AbbreviationsSAMPLStatistical Assessment of the Modeling of Proteins and LigandspKalog10 of the acid dissociation equilibrium constantlog Plog10 of the organic solvent-water partition coefficient (Kow) of neutral specieslog Dlog10 of organic solvent-water distribution coefficient (Dow)SEMStandard error of the meanRMSERoot mean squared errorMAEMean absolute errorτKendall’s rank correlation coefficient (Tau)R2Coefficient of determination (R-Squared)MPSCMultiple protonation states correction for binding free energyDLDatabase LookupLFERLinear Free Energy RelationshipQSPRQuantitative Structure-Property RelationshipMLMachine LearningQMQuantum MechanicsLECLinear Empirical Correction