PT - JOURNAL ARTICLE AU - Antonio Majdandzic AU - Peter K. Koo TI - Statistical correction of input gradients for black box models trained with categorical input features AID - 10.1101/2022.04.29.490102 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.04.29.490102 4099 - http://biorxiv.org/content/early/2022/05/01/2022.04.29.490102.short 4100 - http://biorxiv.org/content/early/2022/05/01/2022.04.29.490102.full AB - Gradients of a deep neural network’s predictions with respect to the inputs are used in a variety of downstream analyses, notably in post hoc explanations with feature attribution methods. For data with input features that live on a lower-dimensional manifold, we observe that the learned function can exhibit arbitrary behaviors off the manifold, where no data exists to anchor the function during training. This leads to a random component in the gradients which manifests as noise. We introduce a simple correction for this off-manifold gradient noise for the case of categorical input features, where input values are subject to a probabilistic simplex constraint, and demonstrate its effectiveness on regulatory genomics data. We find that our correction consistently leads to a significant improvement in gradient-based attribution scores.Competing Interest StatementThe authors have declared no competing interest.