Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Exploring different sequence representations and classification methods for the prediction of nucleosome positioning

Nikos Kostagiolas, Nikiforos Pittaras, View ORCID ProfileChristoforos Nikolaou, George Giannakopoulos
doi: https://doi.org/10.1101/482612
Nikos Kostagiolas
1Information Management Systems Institute, “Athena” Research & Innovation Center, Athens, 15125, Greece,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: kostagio@imis.athena-innovation.gr
Nikiforos Pittaras
2Institute of Informatics and Telecommunications, NCSR “Demokritos”, Athens, 15341, Greece, ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: pittarasnikif@iit.demokritos.gr ggianna@iit.demokritos.gr
Christoforos Nikolaou
3Department of Biology, University of Crete, Heraklion, 70013, Greece,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christoforos Nikolaou
  • For correspondence: nikolaou@uoc.gr
George Giannakopoulos
2Institute of Informatics and Telecommunications, NCSR “Demokritos”, Athens, 15341, Greece, ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: pittarasnikif@iit.demokritos.gr ggianna@iit.demokritos.gr
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Motivation Nucleosomes form the first level of DNA compaction and thus bear a critical role in the overall genome organization. At the same time, they modulate chromatin accessibility and, through a dynamic equilibrium with other DNA-binding proteins, may shape gene expression. A number of large-scale nucleosome positioning maps, obtained for various genomes, has compelled the importance of nucleosomes in the regulation of gene expression and has shown constraints in the relative positions of nucleosomes to be much stronger around regulatory elements (i.e. promoters, splice junctions and enhancers). At the same time, the great majority of nucleosome positions appears to be rather flexible. Various computational methods have in the past been used in order to capture the sequence determinants of nucleosome positioning but, as the extent to which DNA sequence preferences may guide nucleosome occupancy largely varies, this has proved to be rather difficult. In order to focus on highly specific sequence attributes, in this work we have analyzed two well-defined sets of nucleosome-occupied sites (NOS) and nucleosome-free-regions (NFR) from the genome of S. cerevisiae, with the use of textual representations.

Results We employed 3 different genomic sequence representations (Hidden Markov Models, Bag-of-Words and N-gram Graphs) combined with a number of machine learning algorithms on the task of classifying genomic sequences as nucleosome-free (NFR) or nucleosome-occupied NOS (to be further amended based on updated results). We found that different approaches that involve the usage of different representations or algorithms can be more or less effective at predicting nucleosome positioning based on the textual data of the underlying genomic sequence. More interestingly, we show that N-gram Graphs, a sequence representation that takes into account both k-mer occurrences and relative positioning at various lengths scales is outperforming other methodologies and may thus be a choice of preference for the analysis of DNA sequences with subtle constraints.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted December 03, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Exploring different sequence representations and classification methods for the prediction of nucleosome positioning
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Exploring different sequence representations and classification methods for the prediction of nucleosome positioning
Nikos Kostagiolas, Nikiforos Pittaras, Christoforos Nikolaou, George Giannakopoulos
bioRxiv 482612; doi: https://doi.org/10.1101/482612
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Exploring different sequence representations and classification methods for the prediction of nucleosome positioning
Nikos Kostagiolas, Nikiforos Pittaras, Christoforos Nikolaou, George Giannakopoulos
bioRxiv 482612; doi: https://doi.org/10.1101/482612

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Subject Areas
All Articles
  • Animal Behavior and Cognition (4237)
  • Biochemistry (9145)
  • Bioengineering (6784)
  • Bioinformatics (24019)
  • Biophysics (12135)
  • Cancer Biology (9542)
  • Cell Biology (13794)
  • Clinical Trials (138)
  • Developmental Biology (7640)
  • Ecology (11714)
  • Epidemiology (2066)
  • Evolutionary Biology (15517)
  • Genetics (10649)
  • Genomics (14331)
  • Immunology (9491)
  • Microbiology (22856)
  • Molecular Biology (9101)
  • Neuroscience (49026)
  • Paleontology (355)
  • Pathology (1484)
  • Pharmacology and Toxicology (2572)
  • Physiology (3848)
  • Plant Biology (8335)
  • Scientific Communication and Education (1472)
  • Synthetic Biology (2296)
  • Systems Biology (6196)
  • Zoology (1302)