GPT-2’s activations predict the degree of semantic comprehension in the human brain

Language transformers, like GPT-2, have demonstrated remarkable abilities to process text, and now constitute the backbone of deep translation, summarization and dialogue algorithms. However, whether these models encode information that relates to human comprehension remains controversial. Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but also predict the extent to which subjects understand narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear model to predict brain activity from GPT-2’s activations, and correlate this mapping with subjects’ comprehension scores as assessed for each story. The results show that GPT-2’s brain predictions significantly correlate with semantic comprehension. These effects are bilaterally distributed in the language network and peak with a correlation of R=0.50 in the angular gyrus. Overall, this study paves the way to model narrative comprehension in the brain through the lens of modern language algorithms.

resentations of GPT-2 relate to a human-like understanding 23 remains largely unknown. 24 Here, we propose to evaluate how the similarity between the 25 brain and GPT-2 vary with semantic comprehension. Specifi-26 cally, we first compare GPT-2's activations to the functional 27 Magnetic Resonance Imaging of 101 subjects listening to 28 70 min of seven short stories, and we quantify this similar-29 ity with a "brain score" (M) (8,9). Second, we evaluate how 30 * as assessed using Huggingface interface (https://github.com/huggingface/transformers) and GPT-2 pretrained model with temperature=0. the brain scores systematically vary with semantic comprehen-31 sion, as individually assessed by a questionnaire at the end of 32 each story. To assess whether GPT-2 generates similar represen-35 tations to those of the brain, we first evaluate, for each voxel, 36 subject and narrative independently, whether the fMRI re-37 sponses can be predicted from a linear combination of GPT-2's 38 activations ( Figure 1A). We summarize the precision of this 39 mapping with a brain score M: i.e. the correlation between 40 the true fMRI responses and the fMRI responses linearly pre-41 dicted, with cross-validation, from GPT-2's responses to the 42 same narratives (cf. Methods). To mitigate fMRI spatial 43 resolution and the necessity to correct each observation by 44 the number of statistical comparisons, we here report either 1) 45 the average brain scores across voxels or 2) the average score 46 within each region of interest (n = 314, following an automatic 47 subdivision of Destrieux atlas (10), cf. SI.1). Consistent with 48 previous findings (2,4,11,12), these brain scores are signif-49 icant over a distributed and bilateral cortical network, and 50 peak in middle-and superior-temporal gyri and sulci, as well 51 as in the supra-marginal and the infero-frontal cortex (2,4,11) 52 ( Figure 1B). 53 By extracting GPT-2 activations from multiple layers (from 54 layer one to layer twelve), we confirm that middle layers best 55 map onto the brain ( Figure 1C), as seen in previous studies 56 (2,4,11). For clarity, the following analyses focus on the 57 activations extracted from the eighth layer, i.e. GPT-2's most 58 "brain-like" layer ( Figure 1B).

59
GPT-2's brain predictions correlate with semantic comprehension. 60 Does the linear mapping between GPT-2 and the brain reflect 61 a fortunate correspondence (4)? Or, on the contrary, does 62 it reflect similar representations of high-level semantics? To 63 address this issue, we correlate these brain scores to the level of 64 comprehension of the subjects, assessed for each subject-story 65 pair. On average across all voxels, this correlation reaches 66 R = 0.50 (p < 10 −15 , Figure 1D, as assessed across subject-67 story pairs with the Pearson's test provided by SciPy). This 68 correlation is significant across a wide variety of the bilateral 69 temporal, parietal and prefrontal cortices typically linked to 70 language processing ( Figure 1E). Together, these results sug-71 gest that the shared representations between GPT-2 and the 72 brain reliably vary with semantic comprehension. representations typically vary with attention (13,14), and 76 could thus, in turn, influence down-stream comprehension 77 processes. Consequently, one can legitimately wonder whether 78 Fig. 1. A. 101 subjects listen to narratives (70 min of unique audio stimulus in total) while their brain signal is recorded using functional MRI. At the end of each story, a questionnaire is submitted to each subject to assess their understanding, and the answers are summarized into a comprehension score specific to each (narrative, subject) pair (grey box). In parallel (blue box on the left), we measure the mapping between the subject's brain activations and the activations of GPT-2, a deep network trained to predict a word given its past context, both elicited by the same narrative. To this end, a linear spatio-temporal model (f • g) is fitted to predict the brain activity of one voxel Y , given GPT-2 activations X as input. The degree of mapping, called "brain score" is defined for each voxel as the Pearson correlation between predicted and actual brain activity on held-out data (blue equation, cf. Methods). Finally, we test the correlation between the comprehension scores of the subjects and their corresponding brain scores using Pearson's correlation (red equation). A positive correlation means that the representations shared across the brain and GPT-2 are key for the subjects to understand a narrative. B. Brain scores (fMRI predictability) of the activations of the eighth layer of GPT-2. Scores are averaged across subjects, narratives, and voxels within brain regions (142 regions in each hemisphere, following a subdivision of Destrieux Atlas (10), cf. SI.1). Only significant regions are displayed, as assessed with a two-sided Wilcoxon test across (subject, narrative) pairs, testing whether the brain score is significantly different from zero (threshold: .05). C. Brain scores, averaged across fMRI voxels, for different activation spaces: phonological features (word rate, phoneme rate, phonemes, tone and stress, in green), the non-contextualized word embedding of GPT-2 ("Word", light blue) and the activations of the contextualized layers of GPT-2 (from layer one to layer twelve, in blue). The error bars refer to the standard error of the mean across (subject, narrative) pairs (n=237). D. Comprehension and GPT-2 brain scores, averaged across voxels, for each (subject, narrative) pair. In red, Pearson's correlation between the two (denoted R), the corresponding regression line and the 95% confidence interval of the regression coefficient. E. Correlations (R) between comprehension and brain scores over regions of interest. Brain scores are first averaged across voxels within brain regions (similar to B.), then correlated to the subjects' comprehension scores. Only significant correlations are displayed (threshold: .05). F. Correlation scores (R) between comprehension and the subjects' brain mapping with phonological features (M(Phonemic) (i), the share of the word-embedding mapping that is not accounted by phonological features M(Word) − M(Phonemic) (ii) and the share of the GPT-2 eighth layer's mapping not accounted by the word-embedding M(GPT2) − M(Word) (iii). G. Relationship between the average GPT-2-to-brain mapping (eighth layer) per region of interest (similar to B.), and the corresponding correlation with comprehension (R, similar to D.). Only regions of the left hemisphere, significant in both B. and E. are displayed. In black, the top ten regions in terms of brain and correlation scores (cf. SI.1 for the acronyms). Significance in D, E and F is assessed with Pearson's p-value provided by SciPy † . In B, E and F, p-values are corrected for multiple comparison using a False Discovery Rate (Benjamin/Hochberg) over the 2 × 142 regions of interest.
the correlation between comprehension and GPT-2's brain 79 mapping is simply driven by variations in low-level auditory 80 processing. To address this issue, we evaluate the predictabil-81 ity of fMRI given low-level phonological features: the word 82 rate, phoneme rate, phonemes, stress and tone of the narrative 83 (cf. Methods). The corresponding brain scores correlate with 84 the subjects' understanding (R = 0.17, p < 10 −2 ) but less so 85 than the brain scores of GPT-2 (∆R = 0.32). These low-level 86 correlations with comprehension peak in the left superior tem-87 poral cortex ( Figure 1F). Overall, this result suggests that the 88 link between comprehension and GPT-2's brain mapping may 89 be partially explained by -but not reduced to -the variations 90 of low-level auditory processing. is displayed in 1F. Strictly lexical effects (word-embedding 102 versus phonological) peak in the superior-temporal lobe and 103 in pars triangularis. By contrast, higher-level effects (GPT-2 104 eighth layer versus word-embedding) peak in the superior-105 frontal, posterior superior-temporal gyrus, in the precuneus 106 and in both the triangular and opercular parts of the inferior 107 frontal gyrus -a network typically associated with high-level 108 language comprehension (4, [15][16][17][18][19]. 110 The variability in comprehension scores could result from 111 exogeneous factors (e.g. some stories may be harder to com-112 prehend than others for GPT-2) and/or from endogeneous 113 factors (e.g. some subjects may better understand specific 114 texts because of their prior knowledge). To address this issue, 115 we fit a linear mixed model to predict comprehension scores 116 given brain scores, specifying the narrative as a random effect 117 (cf. SI.1). The fixed effect of brain score (shared across nar-118 ratives) is highly significant: β = 0.04, p < 10 −29 , cf. SI.1). 119 However, the random effect (slope specific to each single nar-120 rative) is not (β < 10 −2 , p > 0.11). We also replicate the 121 main analysis ( Figure 1D) within each single narrative: the 122 correlation with comprehension reaches 0.76 for the 'sherlock' 123 story and is above 0.40 for every story (cf. SI.1). Overall, 124 D R A F T these analyses confirm that the link between GPT-2 and semantic comprehension is mainly driven by subjects' individual 126 differences in their ability to make sense of the narratives.

Discussion
Our analyses reveal a positive correlation between 128 semantic comprehension and the degree to which GPT-2 maps 129 onto brain responses to spoken narratives.

130
These results strengthen and complete prior work on the 131 brain bases of semantic comprehension. In particular, previous 132 studies have used inter-subject brain correlation to reveal the 133 brain regions associated with understanding (17). For exam-

147
The relationship between GPT-2's representations and hu-148 man comprehension remains to be qualified. First, although 149 highly significant, our brain scores are relatively low (2,9,17).  Overall, the present study strengthens and clarifies the simi-180 larity between the brain and deep language models, repeatedly 181 observed in the past three years (2-4, 11, 20). Together, these 182 findings reinforce the relevance of deep language models in 183 unraveling the neural bases of narrative comprehension.

186
Our analyses rely on the "Narratives" dataset (21), composed of 187 the brain signals, recorded using fMRI, of 345 subjects listening to 188 27 narratives.

Narratives and comprehension score
Among the 27 stories of the 190 dataset, we selected the seven stories for which subjects were asked 191 to answer a comprehension questionnaire at the end, and for which 192 the answers varied across subjects (more than ten different com-193 prehension scores across subjects), resulting in 70 minutes of audio 194 stimuli in total, from four to 19 minutes per story (Figure 2). Ques-195 tionnaires were either multiple-choice, fill-in-the blank, or open 196 questions (answered with free text) rated by humans (21). Here,197 we used the comprehension score computed in the original dataset 198 which was either a proportion of correct answers or the sum of the 199 human ratings, scaled between 0 and 1 (21). It summarizes the For each of the seven narratives: number of subjects (n), distribution of comprehension scores across subjects and length of the narrative.

Brain activations
The brain activations of the 101 subject who 203 listened to the selected narratives were recorded using fMRI, as de-204 scribed in (21). As suggested in the original paper, pairs of (subject, 205 narrative) were excluded because of noisy recordings, resulting in 206 237 pairs in total.  (1). Each layer l can be seen as a nonlinear 214 system that takes a sequence of w words as input, and outputs 215 a contextual vector of dimension (w, d), called the "activations" 216 of layer l (d = 768). Intermediate layers were shown to better 217 encode syntactic and semantic information than input and output 218 layers (22), and to better map onto brain activity (2, 4). Here, we 219 show that the eighth layer of GPT-2 best predicts brain activity 220 1C. We thus select the eighth layer of GPT-2 for our analyses. 221 Our conclusions remain unchanged with other intermediate-to-deep 222 layers of GPT-2 (from 6 th to 12 th layers).

223
In practice, the narratives' transcripts were formatted (replacing  (s,w) : With f • g the fitted estimator (g: temporal and f: spatial

Supporting Information (SI)
Brain parcellation. In Figure 1B, E, and F, we used a subdi-364 vision of the parcellation from Destrieux Atlas (10) In Figure 1G,  parameters β ∈ R and η ∈ R (shared across narratives), and 381 the random effect parameter βw ∈ R and ηw ∈ R (specific to 382 the narrative w) such that: were assessed with a t-test, as implemented in statsmodels.

388
Replication across single narratives. To further support that 389 the R were not driven by the narratives' variability, we repli-390 cate the analysis of Figure 1D within single narratives. In  Thus, we estimate an upper bound of the best brain score that 396 can be obtained given the level of noise in the Narrative dataset.

397
To this end, for each (subject, narrative) pair, we linearly 398 map the fMRI recordings, not with the GPT-2 activations, 399 but with the average fMRI recordings of the other subjects 400 who listened to that narrative. More precisely, we use the 401 † † https://www.statsmodels.org/ Fig. 3. Replication within single narratives. Same as Figure 1D for each single narrative.
exact same setting as in Eq. (1), but we predict Y (s) , not 402 from g(X) (GPT-2's features after temporal alignment, of size 403 ntimes × n dim ), but from the mean of the other subject's brains 404 Y = 1 |S| s =s Y (s ) (of size ntimes × n voxels ). This score is 405 called the noise ceiling for the (subject, narrative) pair. The 406 noise ceilings for each brain region are displayed in Figure 4, 407 and correspond to upper bounds of the brain scores displayed 408 in Figure 1B.  Figure 1B.