Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

MOCCS profile analysis clarifies the cell type dependency of transcription factor-binding sequences and cis-regulatory SNPs in humans

Saeko Tahara, View ORCID ProfileTakaho Tsuchiya, View ORCID ProfileHirotaka Matsumoto, View ORCID ProfileHaruka Ozaki
doi: https://doi.org/10.1101/2022.04.08.487641
Saeko Tahara
1Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba, Tsukuba 1-1-1, Tennodai, Tsukuba, Ibaraki 305-8577, Japan
2School of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Takaho Tsuchiya
1Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba, Tsukuba 1-1-1, Tennodai, Tsukuba, Ibaraki 305-8577, Japan
3Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Takaho Tsuchiya
Hirotaka Matsumoto
4School of Information and Data Sciences, Nagasaki University, 1-14, Bunkyo-machi, Nagasaki City, Nagasaki, 852-8521, Japan
5Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hirotaka Matsumoto
Haruka Ozaki
1Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba, Tsukuba 1-1-1, Tennodai, Tsukuba, Ibaraki 305-8577, Japan
3Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
5Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Haruka Ozaki
  • For correspondence: haruka.ozaki@md.tsukuba.ac.jp
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Transcription factors (TFs) show heterogeneous DNA-binding specificities in individual cells and whole organisms in natural conditions): de novo motif discovery usually provides multiple motifs even from a single ChIP-seq sample. Despite the accumulation of ChIP-seq data and ChIP-seq-derived motifs, the diversity of DNA-binding specificities across different TFs and cell types remains largely unexplored. Here, we propose MOCCS profiles, the new representation of DNA-binding specificity of TFs, which describes a ChIP-seq sample as a profile of TF-binding specificity scores (MOCCS2scores) for every k-mer sequence. Using our k-mer-based motif discovery method MOCCS2, we systematically computed MOCCS profiles for >10,000 human TF ChIP-seq samples across diverse TFs and cell types. Comparison of MOCCS profiles revealed the global distributions of DNA-binding specificities, and found that one-third of the analyzed TFs showed differences in DNA-binding specificities across cell types. Moreover, we showed that the differences in MOCCS2scores (ΔMOCCS2scores) predicted the effect of variants on TF binding, validated by in vitro and in vivo assay datasets. We also demonstrate ΔMOCCS2scores can be used to interpret non-coding GWAS-SNPs as TF-affecting SNPs and provide their candidate responsible TFs and cell types. Our study provides the basis for investigating gene expression regulation and non-coding disease-associated variants in humans.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • # Email addresses for all authors, SaekoTahara tahara.saeko.ss{at}alumini.tsukuba.ac.jp, Takaho Tsuchiya takaho.tsuchiya{at}md.tsukuba.ac.jp, Hirotaka Matsumoto hirotaka.matsumoto{at}nagasaki-u.ac.jp, Haruka Ozaki haruka.ozaki{at}md.tsukuba.ac.jp

  • Abbreviations

    ASB
    Allele-specific binding
    AUROC
    Area Under Receiver Operating Characteristic Curve
    ChIP-seq
    Chromatin immunoprecipitation sequencing
    GWAS
    Genome-wide association study
    PBS
    Preferential binding score
    PWM
    Position weight matrix
    SNP
    Single-nucleotide polymorphism
    TF
    Transcription factor
    TFBS
    Transcription factor binding site
    MOCCS
    Motif centrality analysis of ChIP-seq
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted June 16, 2022.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    MOCCS profile analysis clarifies the cell type dependency of transcription factor-binding sequences and cis-regulatory SNPs in humans
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    MOCCS profile analysis clarifies the cell type dependency of transcription factor-binding sequences and cis-regulatory SNPs in humans
    Saeko Tahara, Takaho Tsuchiya, Hirotaka Matsumoto, Haruka Ozaki
    bioRxiv 2022.04.08.487641; doi: https://doi.org/10.1101/2022.04.08.487641
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    MOCCS profile analysis clarifies the cell type dependency of transcription factor-binding sequences and cis-regulatory SNPs in humans
    Saeko Tahara, Takaho Tsuchiya, Hirotaka Matsumoto, Haruka Ozaki
    bioRxiv 2022.04.08.487641; doi: https://doi.org/10.1101/2022.04.08.487641

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4688)
    • Biochemistry (10380)
    • Bioengineering (7695)
    • Bioinformatics (26373)
    • Biophysics (13551)
    • Cancer Biology (10729)
    • Cell Biology (15464)
    • Clinical Trials (138)
    • Developmental Biology (8509)
    • Ecology (12844)
    • Epidemiology (2067)
    • Evolutionary Biology (16887)
    • Genetics (11416)
    • Genomics (15493)
    • Immunology (10638)
    • Microbiology (25257)
    • Molecular Biology (10241)
    • Neuroscience (54597)
    • Paleontology (402)
    • Pathology (1671)
    • Pharmacology and Toxicology (2899)
    • Physiology (4355)
    • Plant Biology (9263)
    • Scientific Communication and Education (1588)
    • Synthetic Biology (2561)
    • Systems Biology (6789)
    • Zoology (1472)