ABSTRACT
Convolutional neural networks (CNNs) have achieved very high object categorization performance recently. It has increasingly become a common practice in human fMRI research to regard CNNs as working model of the human visual system. Here we reevaluate this approach by comparing fMRI responses from the human brain in three experiments with those from 14 different CNNs. Our visual stimuli included original and filtered versions of real-world object images and images of artificial objects. Replicating previous findings, we found a brain-CNN correspondence in a number of CNNs with lower and higher levels of visual representations in the human brain better resembling those of lower and higher CNN layers, respectively. Moreover, the lower layers of some CNNs could fully capture the representational structure of human early visual areas for both the original and filtered real-world object images. Despite these successes, no CNN examined could fully capture the representational structure of higher human visual processing areas. They also failed to capture that of artificial object images in all levels of visual processing. The latter is particularly troublesome, as decades of vision research has demonstrated that the same algorithms used in the processing of natural images would support the processing of artificial visual stimuli in the primate brain. Similar results were obtained when a CNN was trained with stylized object images that emphasized shape representation. CNNs likely represent visual information in fundamentally different ways from the human brain. Current CNNs thus may not serve as sound working models of the human visual system.
Significance Statement Recent CNNs have achieved very high object categorization performance, with some even exceeding human performance. It has become common practice in recent neuroscience research to regard CNNs as working models of the human visual system. Here we evaluate this approach by comparing fMRI responses from the human brain with those from 14 different CNNs. Despite CNNs’ ability to successfully perform visual object categorization like the human visual system, they appear to represent visual information in fundamentally different ways from the human brain. Current CNNs thus may not serve as sound working models of the human visual system. Given the current dominating trend of incorporating CNN modeling in visual neuroscience research, our results question the validity of such an approach.