ABSTRACT
Primates—including humans—can typically recognize objects in visual images at a glance, even in the face of naturally occurring identity preserving image transformations such as changes in viewpoint. A primary neuroscience goal is to uncover neuron-level mechanistic models that quantitatively explain this behavior, not only predicting average primate performance, but also predicting primate performance for each and every image. Here, we applied this stringent behavioral prediction test to the leading mechanistic models of primate vision (specifically, deep, convolutional, artificial neural networks; ANNs) by directly comparing their behavioral patterns, at high resolution over a large number of object discrimination tasks, against those of humans and rhesus macaque monkeys. Using high-throughput data collection systems for human and monkey psychophysics, we collected over one million behavioral trials for 2400 images of 24 broadly sampled basic-level objects, resulting in 276 binary object discrimination tasks. Consistent with previous work, we observed that state-of-the-art deep, feed-forward, convolutional ANNs trained for visual categorization (termed DCNNIC models) accurately predicted primate patterns of object-level confusion (e.g. how often a camel is confused with a dog, on average). However, when we examined behavioral performance for individual images within each object discrimination task, we found that all of the DCNNIC models were significantly non-predictive of primate performance. We found that this prediction failure was not accounted for by simple image attributes, nor was it rescued by simple model modifications. These results show that current DCNNIC models cannot account for the image-level behavioral patterns of primates, even when images are not optimized to be adversarial. This suggests that new ANN models are needed to more precisely capture the neural mechanisms underlying primate object vision, and that high-resolution, large-scale behavioral metrics could serve as a strong constraint for discovering such models.
SIGNIFICANCE STATEMENT Recently, specific feed-forward deep convolutional artificial neural networks (ANNs) models have dramatically advanced our quantitative understanding of the neural mechanisms underlying primate core object recognition. In this work, we tested the limits of those ANNs by systematically comparing the behavioral responses of these models with the behavioral responses of humans and monkeys, at the resolution of individual images. Using those high-resolution metrics, we found that all tested ANN models significantly diverged from primate behavior. Going forward, these high-resolution, large-scale behavioral metrics could serve as a strong constraint for discovering better ANN models of the primate visual system.