Abstract
Deep convolutional neural networks have emerged as the state of the art for predicting single-unit responses in a number of visual areas. While such models outperform classical linear-nonlinear and wavelet-based feature representations, we currently do not know what additional nonlinear computations they approximate. Divisive normalization (DN) has been suggested as one such nonlinear, canonical cortical computation, which has been found to be crucial for explaining nonlinear responses to combinations of simple stimuli such as gratings. However, it has neither been tested rigorously for its ability to account for spiking responses to natural images nor do we know to what extent it can close the gap to high-performing black-box models. Here, we developed an end-to-end trainable model of DN that learns the pool of normalizing neurons and the magnitude of their contribution directly from the data. We used this model to investigate DN in monkey primary visual cortex (V1) under stimulation with natural images. We found that this model outperformed linear-nonlinear and wavelet-based feature representations and came close to the performance of deep neural networks. Surprisingly, within the classical receptive field, oriented features were normalized preferentially by features with similar orientation preference rather than non-specifically as assumed by current models of DN. Thus, our work provides a new, quantitative and interpretable predictive model of V1 applicable to arbitrary images and refines our view on the mechanisms of gain control within the classical receptive field.