Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

StereoGene: Rapid Estimation of Genomewide Correlation of Continuous or Interval Feature Data

Elena D. Stavrovskaya, Tejasvi Niranjan, View ORCID ProfileElana J. Fertig, Sarah J. Wheelan, View ORCID ProfileAlexander Favorov, Andrey Mironov
doi: https://doi.org/10.1101/059584
Elena D. Stavrovskaya
1Dept. of Bioengineering and Bioinformatics, Moscow State University, Moscow, 119992, Russia
2Institute for Information Transmission Problems, RAS, Moscow, 127994, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tejasvi Niranjan
3Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, 21287, Baltimore, MD 21287, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elana J. Fertig
3Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, 21287, Baltimore, MD 21287, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elana J. Fertig
Sarah J. Wheelan
3Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, 21287, Baltimore, MD 21287, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Favorov
3Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, 21287, Baltimore, MD 21287, USA
4Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, RAS, Moscow, 119333, Russia
5Laboratory of Bioinformatics, Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow, 117545, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexander Favorov
Andrey Mironov
1Dept. of Bioengineering and Bioinformatics, Moscow State University, Moscow, 119992, Russia
2Institute for Information Transmission Problems, RAS, Moscow, 127994, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Motivation Genomics features with similar genomewide distributions are generally hypothesized to be functionally related, for example, co-localization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genomewide correlation among genomic features are required.

Results Here, we propose a method, StereoGene, that rapidly estimates genomewide correlation among pairs of genomic features. These features may represent high throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology, and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics.

Availability The StereoGene C++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/

Contact favorov{at}sensi.org

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted May 25, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
StereoGene: Rapid Estimation of Genomewide Correlation of Continuous or Interval Feature Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
StereoGene: Rapid Estimation of Genomewide Correlation of Continuous or Interval Feature Data
Elena D. Stavrovskaya, Tejasvi Niranjan, Elana J. Fertig, Sarah J. Wheelan, Alexander Favorov, Andrey Mironov
bioRxiv 059584; doi: https://doi.org/10.1101/059584
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
StereoGene: Rapid Estimation of Genomewide Correlation of Continuous or Interval Feature Data
Elena D. Stavrovskaya, Tejasvi Niranjan, Elana J. Fertig, Sarah J. Wheelan, Alexander Favorov, Andrey Mironov
bioRxiv 059584; doi: https://doi.org/10.1101/059584

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4085)
  • Biochemistry (8755)
  • Bioengineering (6477)
  • Bioinformatics (23331)
  • Biophysics (11740)
  • Cancer Biology (9144)
  • Cell Biology (13237)
  • Clinical Trials (138)
  • Developmental Biology (7410)
  • Ecology (11364)
  • Epidemiology (2066)
  • Evolutionary Biology (15084)
  • Genetics (10397)
  • Genomics (14006)
  • Immunology (9115)
  • Microbiology (22036)
  • Molecular Biology (8777)
  • Neuroscience (47345)
  • Paleontology (350)
  • Pathology (1420)
  • Pharmacology and Toxicology (2480)
  • Physiology (3703)
  • Plant Biology (8045)
  • Scientific Communication and Education (1431)
  • Synthetic Biology (2207)
  • Systems Biology (6014)
  • Zoology (1249)