RT Journal Article SR Electronic T1 Representation Learning of Genomic Sequence Motifs with Convolutional Neural Networks JF bioRxiv FD Cold Spring Harbor Laboratory SP 362756 DO 10.1101/362756 A1 Peter K. Koo A1 Sean R. Eddy YR 2018 UL http://biorxiv.org/content/early/2018/07/08/362756.abstract AB Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they work. Here we perform systematic experiments on synthetic sequences to reveal principles of how CNN architecture influences the internal representations of genomic sequence motifs that are learned. We focus our study on representations learned by first convolutional layer filters. We find that deep CNNs tend to learn distributed representations of partial sequence motifs. However, we demonstrate that the architecture of a CNN can be modified to predictively learn more interpretable localist representations, i.e. whole motifs. We then validate that the representation learning principles established from synthetic sequences generalize to in vivo sequences.