PT - JOURNAL ARTICLE AU - Sasse, Alexander AU - Chikina, Maria AU - Mostafavi, Sara TI - Quick and effective approximation of <em>in silico</em> saturation mutagenesis experiments with first-order Taylor expansion AID - 10.1101/2023.11.10.566588 DP - 2023 Jan 01 TA - bioRxiv PG - 2023.11.10.566588 4099 - http://biorxiv.org/content/early/2023/11/14/2023.11.10.566588.short 4100 - http://biorxiv.org/content/early/2023/11/14/2023.11.10.566588.full AB - To understand the decision process of genomic sequence-to-function models, various explainable AI algorithms have been proposed. These methods determine the importance of each nucleotide in a given input sequence to the model’s predictions, and enable discovery of cis regulatory motif grammar for gene regulation. The most commonly applied method is in silico saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to in vivo saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform, because it requires computing three forward passes for every nucleotide in the given input sequence; these computations add up when analyzing a large number of sequences, and become prohibitive as the length of the input sequences and size of the model grows. Here, we show how to use the first-order Taylor approximation for ISM, which reduces its computation cost to a single forward pass for an input sequence, placing its scalability on equal footing with gradient-based approximation methods such as “gradient-times-input”. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and data set sizes. We use our theoretical derivation to connect ISM with the gradient values and show how this approximation is related to a recently suggested correction of the model’s gradients.Competing Interest StatementThe authors have declared no competing interest.