Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Identifying common transcriptome signatures of cancer by interpreting deep learning models

View ORCID ProfileAnupama Jha, View ORCID ProfileMathieu Quesnel-Vallières, View ORCID ProfileAndrei Thomas-Tikhonenko, Kristen W. Lynch, View ORCID ProfileYoseph Barash
doi: https://doi.org/10.1101/2021.11.11.467790
Anupama Jha
1Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anupama Jha
  • For correspondence: anupamaj@seas.upenn.edu mathieu.quesnel-vallieres@pennmedicine.upenn.edu yosephb@pennmedicine.upenn.edu
Mathieu Quesnel-Vallières
2Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
3Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mathieu Quesnel-Vallières
  • For correspondence: anupamaj@seas.upenn.edu mathieu.quesnel-vallieres@pennmedicine.upenn.edu yosephb@pennmedicine.upenn.edu
Andrei Thomas-Tikhonenko
4Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
5Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
6Division of Cancer Pathobiology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrei Thomas-Tikhonenko
Kristen W. Lynch
2Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
3Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yoseph Barash
1Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA
3Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yoseph Barash
  • For correspondence: anupamaj@seas.upenn.edu mathieu.quesnel-vallieres@pennmedicine.upenn.edu yosephb@pennmedicine.upenn.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Cancer is a set of diseases characterized by unchecked cell proliferation and invasion of surrounding tissues. The many genes that have been genetically associated with cancer or shown to directly contribute to oncogenesis vary widely between tumor types, but common gene signatures that relate to core cancer pathways have also been identified, signifying that cancer cases display common hallmark molecular features. It is not clear however whether there exist additional sets of genes or transcriptomic features that are less well known in cancer biology but that are also commonly deregulated across several cancer types. Here, in order to agnostically identify transcriptomic features that are commonly shared between cancer types, we used RNA-Seq datasets encompassing thousands of samples from 19 healthy tissue types and 18 solid tumor types to train three feed-forward neural networks, based either on protein-coding gene expression, lncRNA expression or splice junction use, to distinguish between healthy and tumor samples. All three models achieve high precision, recall and accuracy on test sets derived from 13 datasets used during training and on an independent test dataset, indicating that our models recognize transcriptome signatures that are consistent across tumors. Analysis of attribution values extracted from our models reveals that genes that are commonly altered in cancer by expression or splicing variations are under strong evolutionary and selective constraints, suggesting that they have important cellular functions. Importantly, we found that genes composing our cancer transcriptome signatures are not frequently affected by mutations or genomic alterations and that their functions differ widely from the genes genetically associated with cancer. Finally, our results also highlighted that deregulation of RNA-processing genes and aberrant splicing are pervasive features across a large array of solid tumor types. The transcriptomic features that we highlight here define cancer signatures that may reflect causal variations or consequences of disease state, or a combination of both.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted November 12, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Identifying common transcriptome signatures of cancer by interpreting deep learning models
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Identifying common transcriptome signatures of cancer by interpreting deep learning models
Anupama Jha, Mathieu Quesnel-Vallières, Andrei Thomas-Tikhonenko, Kristen W. Lynch, Yoseph Barash
bioRxiv 2021.11.11.467790; doi: https://doi.org/10.1101/2021.11.11.467790
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Identifying common transcriptome signatures of cancer by interpreting deep learning models
Anupama Jha, Mathieu Quesnel-Vallières, Andrei Thomas-Tikhonenko, Kristen W. Lynch, Yoseph Barash
bioRxiv 2021.11.11.467790; doi: https://doi.org/10.1101/2021.11.11.467790

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Cancer Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (3514)
  • Biochemistry (7371)
  • Bioengineering (5347)
  • Bioinformatics (20328)
  • Biophysics (10048)
  • Cancer Biology (7781)
  • Cell Biology (11353)
  • Clinical Trials (138)
  • Developmental Biology (6454)
  • Ecology (9985)
  • Epidemiology (2065)
  • Evolutionary Biology (13359)
  • Genetics (9375)
  • Genomics (12614)
  • Immunology (7729)
  • Microbiology (19119)
  • Molecular Biology (7478)
  • Neuroscience (41163)
  • Paleontology (301)
  • Pathology (1235)
  • Pharmacology and Toxicology (2142)
  • Physiology (3183)
  • Plant Biology (6882)
  • Scientific Communication and Education (1276)
  • Synthetic Biology (1900)
  • Systems Biology (5329)
  • Zoology (1091)