Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Deep learning of representations for transcriptomics-based phenotype prediction

Aaron M. Smith, Jonathan R. Walsh, John Long, Craig B. Davis, Peter Henstock, Martin R. Hodge, Mateusz Maciejewski, Xinmeng Jasmine Mu, Stephen Ra, Shanrong Zhao, Daniel Ziemek, Charles K. Fisher
doi: https://doi.org/10.1101/574723
Aaron M. Smith
1Unlearn.AI, Inc., San Francisco, CA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonathan R. Walsh
1Unlearn.AI, Inc., San Francisco, CA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Long
2Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Craig B. Davis
3Oncology Global Product Development, Pfizer Inc., San Diego, CA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Henstock
4Business Technology, Pfizer Inc., Cambridge MA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin R. Hodge
5Inflammation and Immunology, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mateusz Maciejewski
5Inflammation and Immunology, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xinmeng Jasmine Mu
6Oncology Research & Development, Worldwide Research & Development, Pfizer Inc, San Diego CA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Ra
2Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shanrong Zhao
2Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Ziemek
7Inflammation and Immunology, Worldwide Research & Development, Pfizer Pharma GmbH, Berlin, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charles K. Fisher
1Unlearn.AI, Inc., San Francisco, CA, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

The ability to predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. This task is complicated because expression data are high dimensional whereas each experiment is usually small (e.g., ∼20,000 genes may be measured for ∼100 subjects). However, thousands of transcriptomics experiments with hundreds of thousands of samples are available in public repositories. Can representation learning techniques leverage these public data to improve predictive performance on other tasks? Here, we report a comprehensive analysis using different gene sets, normalization schemes, and machine learning methods on a set of 24 binary and multiclass prediction problems and 26 survival analysis tasks. Methods that combine large numbers of genes outperformed single gene methods, but neither unsupervised nor semi-supervised representation learning techniques yielded consistent improvements in out-of-sample performance across datasets. Our findings suggest that using l2-regularized regression methods applied to centered log-ratio transformed transcript abundances provide the best predictive analyses.

Footnotes

  • ↵∗ drams{at}unlearn.ai

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted March 15, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Deep learning of representations for transcriptomics-based phenotype prediction
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Deep learning of representations for transcriptomics-based phenotype prediction
Aaron M. Smith, Jonathan R. Walsh, John Long, Craig B. Davis, Peter Henstock, Martin R. Hodge, Mateusz Maciejewski, Xinmeng Jasmine Mu, Stephen Ra, Shanrong Zhao, Daniel Ziemek, Charles K. Fisher
bioRxiv 574723; doi: https://doi.org/10.1101/574723
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Deep learning of representations for transcriptomics-based phenotype prediction
Aaron M. Smith, Jonathan R. Walsh, John Long, Craig B. Davis, Peter Henstock, Martin R. Hodge, Mateusz Maciejewski, Xinmeng Jasmine Mu, Stephen Ra, Shanrong Zhao, Daniel Ziemek, Charles K. Fisher
bioRxiv 574723; doi: https://doi.org/10.1101/574723

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4087)
  • Biochemistry (8766)
  • Bioengineering (6480)
  • Bioinformatics (23346)
  • Biophysics (11751)
  • Cancer Biology (9149)
  • Cell Biology (13255)
  • Clinical Trials (138)
  • Developmental Biology (7417)
  • Ecology (11369)
  • Epidemiology (2066)
  • Evolutionary Biology (15088)
  • Genetics (10402)
  • Genomics (14011)
  • Immunology (9122)
  • Microbiology (22050)
  • Molecular Biology (8780)
  • Neuroscience (47373)
  • Paleontology (350)
  • Pathology (1420)
  • Pharmacology and Toxicology (2482)
  • Physiology (3704)
  • Plant Biology (8050)
  • Scientific Communication and Education (1431)
  • Synthetic Biology (2209)
  • Systems Biology (6016)
  • Zoology (1250)