Abstract
Neural encoding, a crucial aspect to understand human brain information processing system, aims to establish a quantitative relationship between the stimuli and the evoked brain activities. In the field of visual neuroscience, with the ability to explain how neurons in primary visual cortex work, population receptive field (pRF) models have enjoyed high popularity and made reliable progress in recent years. However, existing models rely on either the inflexible prior assumptions about pRF or the clumsy parameter estimation methods, severely limiting the expressiveness and interpretability. In this paper, we propose a novel neural encoding framework by learning “what” and “where” with deep neural networks. The modeling approach involves two separate aspects: the spatial characteristic (“where”) and feature selection (“what”) of neuron populations in visual cortex. Specifically, we use the receptive field estimation and multiple features regression to learn these two aspects respectively, which are implemented simultaneously in a deep neural network. The two forms of regularizations: sparsity and smoothness, are also adopted in our modeling approach, so that the receptive field can be estimated automatically without prior assumptions about shapes. Furthermore, an attempt is made to extend the voxel-wise modeling approach to multi-voxel joint encoding models, and we show that it is conducive to rescuing voxels with poor signal-to-noise characteristics. Extensive empirical results demonstrate that the method developed herein provides an effective strategy to establish neural encoding for human visual cortex, with the weaker prior constraints but the higher encoding performance.
Author summary Characterizing the quantitative relationship between the stimuli and the evoked brain activities usually involves learning the spatial characteristic (“where”) and feature selection (“what”) of neuron populations. As an effective strategy, we propose a novel end-to-end “what” and “where” architecture to perform neural encoding. The proposed modeling approach consists of receptive field estimation and multiple features regression, which learns “where” and “what” simultaneously in a deep neural network. Different from previous methods, we use the sparsity and smoothness regularizations in the deep neural network to guide the receptive field estimation, so that the receptive field for each voxel can be estimated automatically. Moreover, in consideration of computational similarities between adjacent voxels, we made an attempt to extend the proposed modeling approach to multi-voxel joint encoding models, improving the encoding performance of voxels with poor signal-to-noise characteristics. Empirical evaluations show that the proposed method outperforms other baselines to achieve the state-of-the-art performance.