Abstract
Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.
Footnotes
List of abbreviations
- IBD
- inflammatory bowel disease
- EW-T2D
- type 2 diabetes in European women
- C-T2D
- type 2 diabetes in Chinese
- Obesity
- obesity
- Cirrhosis
- liver cirrhosis
- Colorectal
- colorectal cancer
- SAE
- shallow autoencoder
- DAE
- deep autoencoder
- VAE
- variational autoencoder
- CAE
- convolutional autoencoder
- ReLu
- rectified linear unit
- KL
- Kullback-Leibler
- SVM
- support vector machine
- RF
- random forest
- MLP
- multi-layer perceptron
- RBF
- radial basis function
- AUC
- area under the receiver operating characteristics curve
- PCA
- Principal Component Analysis
- RP
- Gaussian Random Projection