PT - JOURNAL ARTICLE AU - Michael L. Chen AU - Akshith Doddi AU - Jimmy Royer AU - Luca Freschi AU - Marco Schito AU - Matthew Ezewudo AU - Isaac S. Kohane AU - Andrew Beam AU - Maha Farhat TI - Deep learning predicts tuberculosis drug resistance status from genome sequencing data AID - 10.1101/275628 DP - 2018 Jan 01 TA - bioRxiv PG - 275628 4099 - http://biorxiv.org/content/early/2018/06/05/275628.short 4100 - http://biorxiv.org/content/early/2018/06/05/275628.full AB - Background The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data.Methods and Findings Using targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3,601 Mycobacterium tuberculosis strains, 1,228 of which were multidrug resistant, we investigated the use of machine learning to predict phenotypic drug resistance to 10 anti-tuberculosis drugs. The final model, a multitask wide and deep neural network (MD-WDNN), achieved improved high predictive performance: the average AUCs were 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the MD-WDNN showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. In addition to being able to learn from samples that have only been partially phenotyped, our proposed multidrug architecture shares information across different anti-tuberculosis drugs and genes to provide a more accurate phenotypic prediction. We use t-distributed Stochastic Neighbor Embedding (t-SNE) visualization and feature importance analyses to examine inter-drug similarities.Conclusions Machine learning is capable of accurately predicting resistant status using genomic information and holds promise in bringing sequencing technologies closer to the bedside.