Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

recount-brain: a curated repository of human brain RNA-seq datasets metadata

Ashkaun Razmara, Shannon E. Ellis, Dustin J. Sokolowski, View ORCID ProfileSean Davis, Michael D. Wilson, View ORCID ProfileJeffrey T. Leek, View ORCID ProfileAndrew E. Jaffe, View ORCID ProfileLeonardo Collado-Torres
doi: https://doi.org/10.1101/618025
Ashkaun Razmara
1Frank H. Netter MD School of Medicine at Quinnipiac University, North Haven, CT 06473, USA
2Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shannon E. Ellis
3Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
4Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
5Department of Cognitive Science Department, University of California San Diego, La Jolla, CA, 92161, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dustin J. Sokolowski
6Genetics and Genome Biology, Hospital for Sick Children, Toronto M5G 0A4, Canada
7Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sean Davis
8Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sean Davis
Michael D. Wilson
6Genetics and Genome Biology, Hospital for Sick Children, Toronto M5G 0A4, Canada
7Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
9Heart and Stroke Richard Lewar Centre of Excellence in Cardiovascular Research, Toronto M5S 3H2, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey T. Leek
3Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
4Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jeffrey T. Leek
Andrew E. Jaffe
3Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
4Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
10Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA
11Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew E. Jaffe
Leonardo Collado-Torres
4Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
10Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Leonardo Collado-Torres
  • For correspondence: leo.collado@libd.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The usability of publicly-available gene expression data is often limited by the availability of high-quality, standardized biological phenotype and experimental condition information (“metadata”). We released the recount2 project, which involved re-processing ∼70,000 samples in the Sequencing Read Archive (SRA), Genotype-Tissue Expression (GTEx), and The Cancer Genome Atlas (TCGA) projects. While samples from the latter two projects are well-characterized with extensive metadata, the ∼50,000 RNA-seq samples from SRA in recount2 are inconsistently annotated with metadata. Tissue type, sex, and library type can be estimated from the RNA sequencing (RNA-seq) data itself. However, more detailed and harder to predict metadata, like age and diagnosis, must ideally be provided by labs that deposit the data.

To facilitate more analyses within human brain tissue data, we have complemented phenotype predictions by manually constructing a uniformly-curated database of public RNA-seq samples present in SRA and recount2. We describe the reproducible curation process for constructing recount-brain that involves systematic review of the primary manuscript, which can serve as a guide to annotate other studies and tissues. We further expanded recount-brain by merging it with GTEx and TCGA brain samples as well as linking to controlled vocabulary terms for tissue, Brodmann area and disease. Furthermore, we illustrate how to integrate the sample metadata in recount-brain with the gene expression data in recount2 to perform differential expression analysis. We then provide three analysis examples involving modeling postmortem interval, glioblastoma, and meta-analyses across GTEx and TCGA. Overall, recount-brain facilitates expression analyses and improves their reproducibility as individual researchers do not have to manually curate the sample metadata. recount-brain is available via the add_metadata() function from the recount Bioconductor package at bioconductor.org/packages/recount.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted April 24, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
recount-brain: a curated repository of human brain RNA-seq datasets metadata
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
recount-brain: a curated repository of human brain RNA-seq datasets metadata
Ashkaun Razmara, Shannon E. Ellis, Dustin J. Sokolowski, Sean Davis, Michael D. Wilson, Jeffrey T. Leek, Andrew E. Jaffe, Leonardo Collado-Torres
bioRxiv 618025; doi: https://doi.org/10.1101/618025
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
recount-brain: a curated repository of human brain RNA-seq datasets metadata
Ashkaun Razmara, Shannon E. Ellis, Dustin J. Sokolowski, Sean Davis, Michael D. Wilson, Jeffrey T. Leek, Andrew E. Jaffe, Leonardo Collado-Torres
bioRxiv 618025; doi: https://doi.org/10.1101/618025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3482)
  • Biochemistry (7329)
  • Bioengineering (5301)
  • Bioinformatics (20212)
  • Biophysics (9985)
  • Cancer Biology (7706)
  • Cell Biology (11273)
  • Clinical Trials (138)
  • Developmental Biology (6425)
  • Ecology (9923)
  • Epidemiology (2065)
  • Evolutionary Biology (13292)
  • Genetics (9353)
  • Genomics (12559)
  • Immunology (7681)
  • Microbiology (18964)
  • Molecular Biology (7421)
  • Neuroscience (40915)
  • Paleontology (298)
  • Pathology (1226)
  • Pharmacology and Toxicology (2130)
  • Physiology (3145)
  • Plant Biology (6842)
  • Scientific Communication and Education (1271)
  • Synthetic Biology (1893)
  • Systems Biology (5299)
  • Zoology (1086)