Abstract
Background Cell line perturbation data could be utilized as a reference for inferring underlying molecular processes in new gene expression profiles. It is important to develop accurate and computationally efficient algorithms to exploit biological knowledge in the growing compendium of existing perturbation data and harness these for new predictions.
Results We reframed the problem of inferring possible gene perturbation based on a reference perturbation database into a classification task and evaluated the application of deep neural network models to address this problem. Our results showed that a fully-connected multi-layer neural network was able to achieve up to 74.9% accuracy in a holdout test set, but the model generalizability was limited by consistency between training and testing data.
Conclusion Capacity and flexibility enables neural network models to efficiently represent transcriptomic features associated with single gene knockdown perturbations. With consistent signals between training and testing sets, neural networks may be trained to classify new samples to experimentally confirmed molecular phenotypes.
Competing Interest Statement
The authors have declared no competing interest.
List of Abbreviations
- CMap
- Connectivity Map
- DNN
- Deep Neural Network
- shRNA
- short hairpin RNA
- CRISPR
- Clustered Regularly Interspaced Short Palindromic Repeats
- CGS
- Consensus Gene Signatures
- ES
- Enrichment Score
- WTCS
- Weighted Connectivity Score
- ELU
- Exponential Linear Unit