Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A data mining paradigm for identifying key factors in biological processes using gene expression data

Jin Li, Le Zheng, Akihiko Uchiyama, Lianghua Bin, Theodora M. Mauro, Peter M. Elias, Tadeusz Pawelczyk, Monika Sakowicz-Burkiewicz, Magdalena Trzeciak, Donald Y. M. Leung, Maria I. Morasso, Peng Yu
doi: https://doi.org/10.1101/327478
Jin Li
1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
2TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Le Zheng
1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Akihiko Uchiyama
3Laboratory of Skin Biology, National Institute for Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lianghua Bin
4Department of Pediatrics, National Jewish Health, Denver, Colorado, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Theodora M. Mauro
5Dermatology Service, Veterans Affairs Medical Center, and Department of Dermatology, UCSF, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter M. Elias
5Dermatology Service, Veterans Affairs Medical Center, and Department of Dermatology, UCSF, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tadeusz Pawelczyk
6Department of Molecular Medicine, Medical University of Gdansk, Gdansk, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Monika Sakowicz-Burkiewicz
6Department of Molecular Medicine, Medical University of Gdansk, Gdansk, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Magdalena Trzeciak
7Department of Dermatology, Venerology and Allergology, Medical University of Gdansk, Gdansk, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Donald Y. M. Leung
4Department of Pediatrics, National Jewish Health, Denver, Colorado, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria I. Morasso
3Laboratory of Skin Biology, National Institute for Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peng Yu
1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
2TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted May 23, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A data mining paradigm for identifying key factors in biological processes using gene expression data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A data mining paradigm for identifying key factors in biological processes using gene expression data
Jin Li, Le Zheng, Akihiko Uchiyama, Lianghua Bin, Theodora M. Mauro, Peter M. Elias, Tadeusz Pawelczyk, Monika Sakowicz-Burkiewicz, Magdalena Trzeciak, Donald Y. M. Leung, Maria I. Morasso, Peng Yu
bioRxiv 327478; doi: https://doi.org/10.1101/327478
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A data mining paradigm for identifying key factors in biological processes using gene expression data
Jin Li, Le Zheng, Akihiko Uchiyama, Lianghua Bin, Theodora M. Mauro, Peter M. Elias, Tadeusz Pawelczyk, Monika Sakowicz-Burkiewicz, Magdalena Trzeciak, Donald Y. M. Leung, Maria I. Morasso, Peng Yu
bioRxiv 327478; doi: https://doi.org/10.1101/327478

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4246)
  • Biochemistry (9176)
  • Bioengineering (6807)
  • Bioinformatics (24066)
  • Biophysics (12160)
  • Cancer Biology (9567)
  • Cell Biology (13847)
  • Clinical Trials (138)
  • Developmental Biology (7661)
  • Ecology (11739)
  • Epidemiology (2066)
  • Evolutionary Biology (15547)
  • Genetics (10673)
  • Genomics (14366)
  • Immunology (9515)
  • Microbiology (22916)
  • Molecular Biology (9135)
  • Neuroscience (49170)
  • Paleontology (358)
  • Pathology (1487)
  • Pharmacology and Toxicology (2584)
  • Physiology (3851)
  • Plant Biology (8351)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2301)
  • Systems Biology (6207)
  • Zoology (1304)