Abstract
Task-optimized deep convolutional neural networks are the most quantitatively accurate models of the primate ventral visual stream. However, such networks are implausible as a model of the mouse visual system because mouse visual cortex has a known shallower hierarchy and the supervised objectives these networks are typically trained with are likely neither ethologically relevant in content nor in quantity. Here we develop shallow network architectures that are more consistent with anatomical and physiological studies of mouse visual cortex than current models. We demonstrate that hierarchically shallow architectures trained using contrastive objective functions applied to visual-acuity-adapted images achieve neural prediction performance that exceed those of the same architectures trained in a supervised manner and result in the most quantitatively accurate models of the mouse visual system. Moreover, these models’ neural predictivity significantly surpasses those of supervised, deep architectures that are known to correspond well to the primate ventral visual stream. Finally, we derive a novel measure of inter-animal consistency, and show that the best models closely match this quantity across visual areas. Taken together, our results suggest that contrastive objectives operating on shallow architectures with ethologically-motivated image transformations may be a biologically-plausible computational theory of visual coding in mice.
Competing Interest Statement
The authors have declared no competing interest.