Abstract
We present Scaden, a deep neural network for cell deconvolution that uses gene expression information to infer the cellular composition of tissues. Scaden is trained on single cell RNA-seq data to engineer discriminative features that confer robustness to bias and noise, making complex data preprocessing and feature selection unnecessary. We demonstrate that Scaden outperforms existing deconvolution algorithms in both precision and robustness, across tissues and species. A single trained network reliably deconvolves bulk RNA-seq and microarray, human and mouse tissue expression data. Due to this stability and flexibility, we surmise that deep learning-based cell deconvolution will become a mainstay across data types and algorithmic approaches. Scaden’s comprehensive software package is easy to use on novel as well as diverse existing expression datasets available in public resources, deepening the molecular and cellular understanding of developmental and disease processes.
Footnotes
Updated data and code access. Minor changes to the text. Inclusion of unknown cell type analysis.
List of abbreviations
- RNA-seq
- Next Generation RNA Sequencing
- GEP
- gene expression profile matrix
- SVR
- Support Vector Regression
- DNN
- Deep Neural Network
- scRNA-seq
- single cell RNA-seq
- simulated tissue
- training data generated by mixing proportions of scRNA-seq data
- PBMC
- peripheral blood mononuclear cells
- CCC
- concordance correlation coefficient
- r
- Pearson’s correlation coefficient
- CS
- CIBERSORT
- CSx
- CIBERSORTx
- CPM
- Cell Population Mapping