RT Journal Article
SR Electronic
T1 Representation Learning of Genomic Sequence Motifs with Convolutional Neural Networks
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 362756
DO 10.1101/362756
A1 Peter K. Koo
A1 Sean R. Eddy
YR 2018
UL http://biorxiv.org/content/early/2018/07/08/362756.abstract
AB Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they work. Here we perform systematic experiments on synthetic sequences to reveal principles of how CNN architecture influences the internal representations of genomic sequence motifs that are learned. We focus our study on representations learned by first convolutional layer filters. We find that deep CNNs tend to learn distributed representations of partial sequence motifs. However, we demonstrate that the architecture of a CNN can be modified to predictively learn more interpretable localist representations, i.e. whole motifs. We then validate that the representation learning principles established from synthetic sequences generalize to in vivo sequences.