Abstract
Background The use of deep learning in analyses of DNA methylation data is beginning to emerge and distill non-linear relationships among high-dimensional data features. However, a generalized and user-friendly approach for execution, training, and interpreting deep learning models for methylation data is lacking.
Results We introduce and demonstrate the robust performance of MethylNet on downstream tasks of DNA methylation analysis, including cell-type deconvolution, pan-cancer classification, and subject age prediction. We interrogate the learned features from a pan-cancer classification to show high fidelity clustering of cancer subtypes, and compare the importance assigned to CpGs for the age and cell-type analyses to demonstrate concordance with expected biology.
Conclusions Our findings demonstrate high accuracy of end-to-end deep learning methods on methylation prediction tasks. Together, our results highlight the promise of future steps to use transfer learning, hyperparameter optimization and feature interpretations on DNA methylation data.
Abbreviations
- 450K
- HumanMethylation450
- 850K
- HumanMethylationEPIC
- ANN
- Artificial Neural Networks
- CpG
- Cytosine-Guanine Dinucleotides
- CWL
- Common Workflow Language
- DNAm
- DNA Methylation
- EWAS
- Epigenome-Wide Association Studies
- SHAP
- Shapley Additive Feature Explanations
- SVM
- Support Vector Machine
- UMAP
- Uniform Manifold Approximation and Projection
- VAE
- Variational Auto-encoders