Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation

View ORCID ProfileAnurag Sethi, View ORCID ProfileMengting Gu, Emrah Gumusgoz, Landon Chan, View ORCID ProfileKoon-Kiu Yan, View ORCID ProfileJoel Rozowsky, View ORCID ProfileIros Barozzi, View ORCID ProfileVeena Afzal, View ORCID ProfileJennifer Akiyama, Ingrid Plajzer-Frick, View ORCID ProfileChengfei Yan, Catherine Pickle, View ORCID ProfileMomoe Kato, View ORCID ProfileTyler Garvin, Quan Pham, View ORCID ProfileAnne Harrington, View ORCID ProfileBrandon Mannion, Elizabeth Lee, Yoko Fukuda-Yuzawa, View ORCID ProfileAxel Visel, Diane E. Dickel, View ORCID ProfileKevin Yip, Richard Sutton, View ORCID ProfileLen A. Pennacchio, View ORCID ProfileMark Gerstein
doi: https://doi.org/10.1101/385237
Anurag Sethi
1Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
2Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anurag Sethi
Mengting Gu
1Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mengting Gu
Emrah Gumusgoz
6Department of Internal Medicine, Section of Infectious Diseases, Yale University School of Medicine, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Landon Chan
3School of Medicine, The Chinese University of Hong Kong, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Koon-Kiu Yan
1Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
2Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Koon-Kiu Yan
Joel Rozowsky
1Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
2Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joel Rozowsky
Iros Barozzi
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Iros Barozzi
Veena Afzal
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Veena Afzal
Jennifer Akiyama
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jennifer Akiyama
Ingrid Plajzer-Frick
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chengfei Yan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chengfei Yan
Catherine Pickle
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Momoe Kato
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Momoe Kato
Tyler Garvin
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tyler Garvin
Quan Pham
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anne Harrington
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anne Harrington
Brandon Mannion
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brandon Mannion
Elizabeth Lee
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yoko Fukuda-Yuzawa
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Axel Visel
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Axel Visel
Diane E. Dickel
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kevin Yip
4Department of Computer Science, The Chinese University of Hong Kong, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kevin Yip
Richard Sutton
6Department of Internal Medicine, Section of Infectious Diseases, Yale University School of Medicine, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Len A. Pennacchio
7Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Len A. Pennacchio
Mark Gerstein
1Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
2Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
5Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mark Gerstein
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Enhancers are important noncoding elements, but they have been traditionally hard to characterize experimentally. Only a few mammalian enhancers have been validated, making it difficult to train statistical models for their identification properly. Instead, postulated patterns of genomic features have been used heuristically for identification. The development of massively parallel assays allows for the characterization of large numbers of enhancers for the first time. Here, we developed a framework that uses Drosophila STARR-seq data to create shape-matching filters based on enhancer-associated meta-profiles of epigenetic features. We combined these features with supervised machine learning algorithms (e.g., support vector machines) to predict enhancers. We demonstrated that our model could be applied to predict enhancers in mammalian species (i.e., mouse and human). We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mouse and transduction-based reporter assays in human cell lines. Overall, the validations involved 153 enhancers in 6 mouse tissues and 4 human cell lines. The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription-factor binding patterns at predicted enhancers and promoters in human cell lines. We demonstrated that these patterns enable the construction of a secondary model effectively discriminating between enhancers and promoters.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 05, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation
Anurag Sethi, Mengting Gu, Emrah Gumusgoz, Landon Chan, Koon-Kiu Yan, Joel Rozowsky, Iros Barozzi, Veena Afzal, Jennifer Akiyama, Ingrid Plajzer-Frick, Chengfei Yan, Catherine Pickle, Momoe Kato, Tyler Garvin, Quan Pham, Anne Harrington, Brandon Mannion, Elizabeth Lee, Yoko Fukuda-Yuzawa, Axel Visel, Diane E. Dickel, Kevin Yip, Richard Sutton, Len A. Pennacchio, Mark Gerstein
bioRxiv 385237; doi: https://doi.org/10.1101/385237
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation
Anurag Sethi, Mengting Gu, Emrah Gumusgoz, Landon Chan, Koon-Kiu Yan, Joel Rozowsky, Iros Barozzi, Veena Afzal, Jennifer Akiyama, Ingrid Plajzer-Frick, Chengfei Yan, Catherine Pickle, Momoe Kato, Tyler Garvin, Quan Pham, Anne Harrington, Brandon Mannion, Elizabeth Lee, Yoko Fukuda-Yuzawa, Axel Visel, Diane E. Dickel, Kevin Yip, Richard Sutton, Len A. Pennacchio, Mark Gerstein
bioRxiv 385237; doi: https://doi.org/10.1101/385237

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3571)
  • Biochemistry (7514)
  • Bioengineering (5473)
  • Bioinformatics (20664)
  • Biophysics (10250)
  • Cancer Biology (7925)
  • Cell Biology (11563)
  • Clinical Trials (138)
  • Developmental Biology (6558)
  • Ecology (10129)
  • Epidemiology (2065)
  • Evolutionary Biology (13526)
  • Genetics (9493)
  • Genomics (12784)
  • Immunology (7869)
  • Microbiology (19429)
  • Molecular Biology (7609)
  • Neuroscience (41854)
  • Paleontology (306)
  • Pathology (1252)
  • Pharmacology and Toxicology (2178)
  • Physiology (3247)
  • Plant Biology (6993)
  • Scientific Communication and Education (1290)
  • Synthetic Biology (1941)
  • Systems Biology (5404)
  • Zoology (1107)