Abstract
Motivation The heterogeneous nature of cancers with multiple subtypes makes them challenging to treat. However, multi-omics data can be used to identify new therapeutic targets and we established a computational strategy to improve data mining.
Results Using our approach we identified genes and pathways specific to cancer subtypes that can serve as biomarkers and therapeutic targets. Using a TCGA breast cancer dataset we applied the ExtraTreesClassifier dimensionality reduction along with logistic regression to select a subset of genes for model training. Applying hyperparameter tuning, increased the model accuracy up to 92%. Finally, we identified 20 significant genes using differential expression. These targetable genes are associated with various cellular processes that impact cancer progression. We then applied our approach to a glioma dataset and again identified subtype specific targetable genes.
Conclusion Our research indicates a broader applicability of our strategy to identify specific cancer subtypes and targetable pathways for various cancers.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Oregon Health & Science University, Portland USA-97239