TY - JOUR T1 - Hubble2D6: A deep learning approach for predicting drug metabolic activity JF - bioRxiv DO - 10.1101/684357 SP - 684357 AU - Gregory McInnes AU - Rachel Dalton AU - Katrin Sangkuhl AU - Michelle Whirl-Carrillo AU - Seung-been Lee AU - Russ B. Altman AU - Erica L. Woodahl Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/06/27/684357.abstract N2 - A major limitation of phenotype prediction in genetics is the ability to model the complexities of genetic variation when sample sizes are small. This is especially true in pharmacogenetics, a highly translational yet data-limited subfield of genetics. Drug metabolism is a critical facet of pharmacogenetics and can have consequences for drug safety and efficacy. CYP2D6 is an important enzyme, metabolizing more than 25% of clinically used drugs. It is highly polymorphic which leads to a heterogeneous response to drugs among the population. We present Hubble2D6, a set of deep learning models for predicting metabolic activity of CYP2D6 genotype and predicting functional classification of CYP2D6 haplotypes. We train our models on 249 samples, addressing data scarcity by pretraining on simulated data, weakly supervised learning, and using a functional representation of genetic variants. We validate our models using in vitro data for haplotypes previously unseen by the model and explain 38% of the variance with the genotype-based activity predictor and predict haplotype function with an AUC of 0.85. We demonstrate a procedure to build a computational model of a complex gene using primarily simulated and unlabeled data which can then be used to make functional predictions about novel genetic variation, and present a model that may be of clinical significance for an important application of genetics. ER -