Language processing in brains and deep neural networks: computational convergence and its limits

Deep learning has recently allowed substantial progress in language tasks such as translation and completion. Do such models process language similarly to humans, and is this similarity driven by systematic structural, functional and learning principles? To address these issues, we tested whether the activations of 7,400 artificial neural networks trained on image, word and sentence processing linearly map onto the hierarchy of human brain responses elicited during a reading task, using source-localized magneto-encephalography (MEG) recordings of one hundred and four subjects. Our results confirm that visual, word and language models sequentially correlate with distinct areas of the left-lateralized cortical hierarchy of reading. However, only specific subsets of these models converge towards brain-like representations during their training. Specifically, when the algorithms are trained on language modeling, their middle layers become increasingly similar to the late responses of the language network in the brain. By contrast, input and output word embedding layers often diverge away from brain activity during training. These differences are primarily rooted in the sustained and bilateral responses of the temporal and frontal cortices. Together, these results suggest that the compositional - but not the lexical - representations of modern language models converge to a brain-like solution.

Hypotheses. An artificial neural network is said to converge to brain-like representations if training makes its activations become increasingly correlated with those of the brain, and vice versa if it diverges. Because artificial neural networks are high dimensional, part of a random network could significantly correlate with brain activity, and thus lead to a "fortunate relationship" between the brain and the algorithm. In this schema, each dot represents one artificial neural network frozen at a given training step. B. To quantify the similarity between an artificial neural network and the brain, a linear regression can be fit from the model's activations (X) to the brain response (Y ) to the same stimulus sequence (here: 'once upon a'). The resulting "brain score" Yamins et al. (2014) is independent of the training objective of the model (e.g. predicting the next word). C. Average (absolute) MEG responses to word onset in various regions of the cortical network associated with reading, normalized by their maximum amplitudes to highlight their relative onsets and peaks (top ticks). See Movie 1 for additional results.
Figure 2: A. Average brain scores across time (0 -1 sec after word onset) and subjects for the deep CNN trained on character recognition (top), Word2Vec (middle) and the difference between the two (bottom) in response to words presented in random word lists. B. Average brain scores within each region-of-interest (panels) obtained with the CNN (gray) and with Word2Vec (W2V). The coloured area indicate when W2V is higher than CNN. C. Second-level statistical comparison across subjects of the brain scores obtained within each region-of-interest (averaged from 0 -1 s) with the CNN (gray) and W2V (color), resulting from a two-sided Wilcoxon signed-rank test. Error bars are the 95% confidence intervals of the scores' distribution across subjects.
middle -but not the outer -layers of deep language models systematically converge to the sustained representations of 58 the bilateral frontal and temporal cortices. and middle temporal gyri, as well as the pre-motor and infero-frontal cortices between 150 and 500 ms after word onset Figure 3: A. Brain scores of a 256-dimensional CBOW Word2Vec embedding, trained with a context size of 5 words at different training stages (full corpus size: 278M words from Wikipedia), as a function of time (averaged across MEG senors). B Same as (A) using a Skip-gram Word2Vec. C. Averaged brain scores obtained across time, space (all MEG channels) and subjects (y-axis), for each of the word embedding of CBOW Word2Vec embeddings (dots, n=6,889 models in total) as a function of their training (x-axis) and performance on the training task (color) (see methods for details). D. Same as (C) using a Skip-gram Word2Vec. E-G. Averaged Brain scores obtained as a function of the objective (CBOW vs Skipgram), context size and dimensionality, irrespective of other properties (e.g. training, performance etc). Error bars indicate the 95% confidence interval. H. Feature importance estimated from a random forest regressor fitted for each subject separately (dots, n=95) to predict the brain scores (averaged across time and space) of an embedding model given its loss, training step, objective, context size and dimensionality. Overall, the performance of word embedding has a modest and non-monotonic impact on brain scores.
Word embeddings specifically correlate with late, distributed and lateralized brain responses to words Figure 4: A. Average brain scores across time (0 -2 sec after word onset) and subjects for Word2Vec (top), the ninth layer of a 13-layer Causal Language Transformer (CLT, middle) and the difference between the two (bottom) in response to words presented within sentences. B. Average brain scores within each region-of-interest (panels) obtained with the CNN (gray), Word2Vec (color) and the CLT (black). C. Second-level statistical comparison across subjects of the brain scores obtained within each region-of-interest (averaged from 0 -2 s) with Word2Vec (color) and the CLT (black), resulting from a two-sided Wilcoxon signed-rank test. Error bars are the 95% confidence intervals of the scores' distribution across subjects. See Movie 2 for additional results. and the learning objective (CBOW vs Skipgram) (∆R 2 = 50% ± 3%, p < 10 −16 ) appeared to be important predictors 107 of whether an embedding would linearly correlate with brain responses to words.

108
To our surprise, however, brain scores did not vary monotonically with the training and performance of word embeddings.

109
For example, the brain scores of CBOW-trained Word2Vec steeply increased at the beginning of training, decreased 110 after 60 training steps (i.e. after having been exposed to 3M words and 155,000 distinct words), and finally reached 111 a plateau after 3,000 training steps ( 150M words and 750,000 distinct words). (Fig. 3 C). Similarly, the Word2Vec 112 embeddings trained with a skip-gram objective reached a plateau around only five training steps (i.e. ≈250,000 words).

113
Together, these results suggest that training word embedding algorithms does not make them systematically converge to 114 brain-like representations. representations that linearly correlate with those of the human brain, we applied the above brain score analyses with the the models, like the brain, process and combine words sequentially, we restrict our analyses to causal transformers (i.e. 124 unidirectional, from left to right) (CLT), and now focus on the brain responses to words presented within isolated but 125 meaningful sentences.
126 Figure 4 summarizes the brain scores obtained with visual, word and contextualized word embeddings, generated by the 127 visual CNN, a representative Word2Vec embedding and the ninth layer of a representative 13-layer CLT, respectively.

128
Video 2 displays the brain scores obtained with the visual CNN (blue), as well as the gain in brain scores obtained with 129 the Word2Vec embedding (green) and the gain in brain scores obtained with the ninth layer of a CLT (red). Overall, the ). G. Feature importance estimated, for each subject separately (dots, n=95), with a random forest fitted to predict the average brain score from the model's hyperparameters (number of attention heads, total number of layers, and dimensionality of its layers), training step and performance, as well as from the layer relative depth of the layer used to compute this brain score. Error bars are the 95% confidence intervals of the scores' distribution across subjects. higher brain scores than word embeddings especially one second after word onset. These improvements in brain scores the brain aims to achieve during language processing (i.e. finding the learning objective), rather than how it achieves it 213 (i.e. finding the representations necessary to achieve this goal).

214
Finally, the present study only starts to elucidate the precise nature of linguistic representations in the brains and 215 artificial neural network. Similarly, it only starts to unravel the complex interplay between the regions of the language 216 network. How the mind builds and organizes its lexicon and how it parses and manipulate sentences thus remain open 217 questions. We hope that the present workwill serve as a stepping stone to progress on these historical questions. We aimed to compare brain activity to three families of models, targeting visual, lexical and compositional representa- vocabulary words in total). These design choices enforces that the difference in brain scores observed across models 262 cannot be explained by differences in corpora and text preprocessing.

263
To evaluate the networks' performance on language modeling, we computed their perplexity (exponential of the entropy) 264 and their accuracy (accuracy at predicting the current word given previous words) on a test dataset of 180,883 words 265 from Wikipedia. Note that we had to use a larger test dataset to evaluate Word2Vec networks because of the loss was 266 not stable.   to reliably optimize hyperparameters on each training set. 294 We then evaluated the performance of the K * |T | ridge regressions fits by computing the Pearson's correlation between 295 predicted and actual MEG on test folds. We obtained K * |T | correlation scores, corresponding to the ability the 296 network's activations to correlate with brain activity at time step t, for fold k. We call "brain scores" the correlation 297 scores averaged across folds. To evaluate the anatomical location of these effects, we performed the brain scores on source-reconstructed MEG 300 signals, by correlating the single-trial source estimates ("true" source) with the single-trial source predictions generated 301 by the model-to-brain mapping regressions.

302
To this aim, Freesurfer (Fischl, 2012) was used to automatically segment the T1-weighted anatomical MRI of individual 303 subjects, using the 'recon-all' pipeline. We then manually co-referenced the subjects' segmented skull with the head-  Brain scores were then averaged across channels, time samples and subjects to obtain the results in Figure 3 and 5. To 322 evaluate the convergence of a model, we computed, for each subject, the Pearson's correlation between the brain scores 323 of the network and its performance and/or its training step. To systematically quantify how the architecture, the performance and the learning of the artificial neural networks 326 impacted their ability to linearly correlate with brain activity, we fitted, for each subject separately, a random forest 327 across the models' properties (e.g. dimensionality, training stage) to predict their brain scores, using scikit-learn's