PT  - JOURNAL ARTICLE
AU  - Gherman Novakovsky
AU  - Oriol Fornes
AU  - Manu Saraswat
AU  - Sara Mostafavi
AU  - Wyeth W. Wasserman
TI  - ExplaiNN: interpretable and transparent neural networks for genomics
AID  - 10.1101/2022.05.20.492818
DP  - 2022 Jan 01
TA  - bioRxiv
PG  - 2022.05.20.492818
4099  - http://biorxiv.org/content/early/2022/05/22/2022.05.20.492818.short
4100  - http://biorxiv.org/content/early/2022/05/22/2022.05.20.492818.full
AB  - Sequence-based deep learning models, particularly convolutional neural networks (CNNs), have shown superior performance on a wide range of genomic tasks. A key limitation of these models is the lack of interpretability, slowing their broad adoption by the genomics community. Current approaches to model interpretation do not readily reveal how a model makes predictions, can be computationally intensive, and depend on the implemented architecture. Here, we introduce ExplaiNN, an adaptation of neural additive models1 for genomic tasks wherein predictions are computed as a linear combination of multiple independent CNNs, each consisting of a single convolutional filter and fully connected layers. This approach brings together the expressivity of CNNs with the interpretability of linear models, providing global (cell state level) as well as local (individual sequence level) insights of the biological processes studied. We use ExplaiNN to predict transcription factor (TF) binding and chromatin accessibility states, demonstrating performance levels comparable to state-of-the-art methods, while providing a transparent view of the model’s predictions in a straightforward manner. Applied to de novo motif discovery, ExplaiNN detects equivalent motifs to those obtained from specialized algorithms across a range of datasets. Finally, we present ExplaiNN as a plug and play platform in which pre-trained TF binding models and annotated position weight matrices from reference databases can be combined in a simple framework. We expect that ExplaiNN will accelerate the adoption of deep learning by biological domain experts in their daily genomic sequence analyses.Competing Interest StatementThe authors have declared no competing interest.