TY - JOUR T1 - DeepMicro: deep representation learning for disease prediction based on microbiome data JF - bioRxiv DO - 10.1101/785626 SP - 785626 AU - Min Oh AU - Liqing Zhang Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/10/18/785626.abstract N2 - Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.IBDinflammatory bowel diseaseEW-T2Dtype 2 diabetes in European womenC-T2Dtype 2 diabetes in ChineseObesityobesityCirrhosisliver cirrhosisColorectalcolorectal cancerSAEshallow autoencoderDAEdeep autoencoderVAEvariational autoencoderCAEconvolutional autoencoderReLurectified linear unitKLKullback-LeiblerSVMsupport vector machineRFrandom forestMLPmulti-layer perceptronRBFradial basis functionAUCarea under the receiver operating characteristics curvePCAPrincipal Component AnalysisRPGaussian Random Projection ER -