Abstract
Recently, there has been a growing interest in the development of pharmacological interventions targeting ageing, as well as on the use of machine learning for analysing ageing-related data. In this work we use machine learning methods to analyse data from DrugAge, a database of chemical compounds (including drugs) modulating lifespan in model organisms. To this end, we created four datasets for predicting whether or not a compound extends the lifespan of C. elegans (the most frequent model organism in DrugAge), using four different types of predictive biological features, based on compound-protein interactions, interactions between compounds and proteins encoded by ageing-related genes, and two types of terms annotated for proteins targeted by the compounds, namely Gene Ontology (GO) terms and physiology terms from the WormBase’s Phenotype Ontology. To analyse these datasets we used a combination of feature selection methods in a data pre-processing phase and the well-established random forest algorithm for learning predictive models from the selected features. The two best models were learned using GO terms and protein interactors as features, with predictive accuracies of about 82% and 80%, respectively. In addition, we interpreted the most important features in those two best models in light of the biology of ageing, and we also predicted the most promising novel compounds for extending lifespan from a list of previously unlabelled compounds.
Competing Interest Statement
Joao Pedro de Magalhaes is an advisor/consultant for the Longevity Vision Fund, NOVOS, Insilico Medicine, YouthBio Therapeutics and the founder of Magellan Science Ltd, a company providing consulting services in longevity science. The other authors declare no conflict of interest.