Abstract
Drug side effects are a leading cause of morbidity and mortality. Currently, the frequency of drug side effects is determined experimentally during human clinical trials through placebo-controlled studies. Here we present a novel framework to computationally predict the frequency of drug side effects. Our algorithm is based on learning a latent variable model for drugs and side effects by matrix decomposition. Extensive evaluations on held out test sets show that the frequency class is predicted with 67.8% to 94% accuracy in the neighborhood of the correct class. Evaluations on prospective data confirm the commonly held hypothesis that most post-marketing side effects are very rare in the population, with occurrences of less than 1 in a 10,000. Importantly, our model provides explanations of the biology underlying drug side effect relationships. We show that the drug latent representations in our model are related to distinct anatomical drug activities and that the similarity between these representations are predictive of the drug clinical activity as well as drug targets.
One summary sentence novel explainable machine learning algorithm predicts the frequency of drug side effects in the population