TY - JOUR T1 - Learning from mistakes: Accurate prediction of cell type-specific transcription factor binding JF - bioRxiv DO - 10.1101/230011 SP - 230011 AU - Jens Keilwagen AU - Stefan Posch AU - Jan Grau Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/06/12/230011.abstract N2 - Computational prediction of cell type-specific, in-vivo transcription factor binding sites is still one of the central challenges in regulatory genomics, and a variety of approaches has been proposed for this purpose.Here, we present our approach that earned a shared first rank in the “ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge” in 2017. This approach employs features derived from chromatin accessibility, binding motifs, gene expression, genomic sequence and annotation to train classifiers using a supervised, discriminative learning principle. Two further key aspects of this approach are learning classifier parameters in an iterative training procedure that successively adds additional negative examples to the training set, and creating an ensemble prediction by averaging over classifiers obtained for different training cell types.In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessiblity and binding motifs are sufficient to yield state-of-the-art performance for in-vivo binding site predictions. We also show that the iterative training procedure and the ensemble prediction are pivotal for the final prediction performance.To make predictions of this approach readily accessible, we predict 682 peak lists for a total of 31 transcription factors in 22 primary cell types and tissues, which are available for download at https://www.synapse.org/#!Synapse:syn11526239, and we demonstrate that these may help to yield biological conclusions. Finally, we provide a user-friendly version of our approach as open source software at http://jstacs.de/index.php/Catchitt.Contact grau{at}informatik.uni-halle.de ER -