Abstract
Predicting the safety of a drug from preclinical data is a major challenge in drug discovery, and progressing an unsafe compound into the clinic puts patients at risk and wastes resources. Methods and analytic decisions known to provide poor predictions are common in drug safety pharmacology and related fields, which include creating arbitrary thresholds, binning continuous values, giving all assays equal weight, and multiple reuse of information. In addition, the metrics used to evaluate models often omit important criteria and assessing how models perform on new data are often insufficient. Prediction models with these problems are unlikely to perform well, and published models suffer from many of these issues. We describe these problems in detail, often demonstrate their negative consequences, and propose simple solutions that are standard in other disciplines where predictive modelling is used.
Competing Interest Statement
The authors have declared no competing interest.