Abstract
Mapping brain functions to their underlying neural substrates is a central goal of cognitive neuroscience. Functional magnetic resonance imaging (fMRI) has proven indispensable in this endeavour. Recently, there has been growing interest in tackling this problem by mapping semantic concepts onto brain regions using repositories of images and text from the neuroimaging literature. However, no study has thus far approached this problem using (dense) vector representations of words. Using data from the Neurosynth database, we sought to develop a model that could (A) capture local correlations between words in text, as well as topics, (B) capture representation of distributed brain networks in relation to word embeddings, and (C) generate synthetic images given word inputs. We show that jointly embedding words and brain imaging data on a vector space can yield semantic representations that sensibly relate concepts across biological, psychological, and observational levels of analysis. Moreover, our proposed model makes no assumption about spatial orientation of fMRI voxels, which allows for embedding of distributed brain networks onto the semantic space. We demonstrate this capability by generating synthetic brain activation vectors from word inputs. Our model has the potential to advance neuroimaging meta-analysis as well as contextual word-embedding methods more broadly.