Abstract
Motivation Identifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.
Results We developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that use transcription factor sequence preferences in the form of position weight matrices, predicting binding for 34 transcription factors (accuracy > 0.99; Matthews correlation coefficient > 0.3). In at least one validation cell type, performance of Virtual ChIP-seq is higher than all participants of the DREAM Challenge for in vivo transcription factor binding site prediction in 4 of 9 transcription factors that we could compare to.
Availability The datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), and the predictions for 34 transcription factors on Roadmap cell types (http://doi.org/10.5281/zenodo.1066932).
Footnotes
↵5 Lead contact: michael.hoffman{at}utoronto.ca