RT Journal Article SR Electronic T1 Global Importance Analysis: A Method to Quantify Importance of Genomic Features in Deep Neural Networks JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.09.08.288068 DO 10.1101/2020.09.08.288068 A1 Koo, Peter K. A1 Ploenzke, Matthew A1 Anand, Praveen A1 Paul, Steffan B. A1 Majdandzic, Antonio YR 2020 UL http://biorxiv.org/content/early/2020/09/09/2020.09.08.288068.abstract AB Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. For model interpretability, attribution methods have been employed to reveal learned patterns that resemble sequence motifs. First-order attribution methods only quantify the independent importance of single nucleotide variants in a given sequence – it does not provide the effect size of motifs (or their interactions with other patterns) on model predictions. Here we introduce global importance analysis (GIA), a new model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a new convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.Competing Interest StatementThe authors have declared no competing interest.