Abstract
Decoding transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF class/family labels into the same space. By training on binding data for hundreds of TFs and embedding over 1M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo, and can distinguish signals of closely related TFs.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.