Abstract
We proposed a two-step approach for predicting active cis-regulatory modules (CRMs) in a cell/tissue type. We first predict a map of CRM loci in the genome using all available transcription factor binding data in the organism, and then predict functional states of all the putative CRMs in any cell/tissue type using few epigenetic marks. We have recently developed a pipeline dePCRM2 for the first step, and now presented machine-learning methods for the second step. Our approach substantially outperforms existing methods. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
We updated the Results section, and rewrote most part of the manuscript.
Abbreviations
- AUROC
- area under receiver operator characteristic curve
- ATAC
- assay for transposase-accessible chromatin
- ATAC-seq
- assay for transposase accessible chromatin using sequencing
- CA
- chromatin accessibility
- ChIP-seq
- chromatin immunoprecipitation sequencing
- CRM
- cis-regulatory module
- DNase-seq
- DNase I hypersensitive sites sequencing
- ESC
- embryonic stem cells
- mESC
- mouse ESC
- FDRs
- false discovery rates
- LR
- logistic regression
- mCG
- cytosine methylation in CpG dinucleotide
- MPRA
- massively parallel reporter assays
- ROC
- receiver operator characteristic curve
- SVM
- support vector machine
- TF
- transcription factor
- TFBS
- TF binding site
- STARR-seq
- self-transcribing assay of regulatory regions sequencing
- UFSPs
- universal functional states predictors
- WHG- STARR-seq
- whole genome STARR-seq.