Abstract
The field of human cognitive neuroscience is increasingly acknowledging inter-individual differences in the precise locations of functional areas and the corresponding need for individual-level analyses in fMRI studies. One approach to identifying functional areas and networks within individual brains is based on robust and extensively validated ‘localizer’ paradigms—contrasts of conditions that aim to isolate some mental process of interest. Here, we present a new version of a localizer for the fronto-temporal language-selective network. This localizer is similar to a commonly-used localizer based on the reading of sentences and nonword sequences (Fedorenko et al., 2010) but uses speeded presentation (200ms per word/nonword). Based on a direct comparison between the standard version (450ms per word/nonword) and the speeded versions of the language localizer in 24 participants, we show that a single run of the speeded localizer (3.5 min) is highly effective at identifying the language-selective areas: indeed, it is more effective than the standard localizer given that it leads to an increased response to the critical (sentence) condition and a decreased response to the control (nonwords) condition. This localizer may therefore become the version of choice for identifying the language network in neurotypical adults or special populations (as long as they are proficient readers), especially when time is of essence.
Introduction
Neuroscientific studies of uniquely human abilities rely predominantly on non-invasive neuroimaging techniques such as functional magnetic resonance imaging (fMRI). A widespread methodological approach in fMRI studies of human brain function is to average individual activation maps for some contrast of interest in a template brain space and perform statistical analyses in each voxel across individuals to derive a group-level whole-brain statistical map. However, functional regions vary in their precise locations across individuals (Fischl et al., 2008; Frost & Goebel, 2012; Tahmasebi et al., 2012; Vázquez-Rodríguez et al., 2019; Somers et al., 2021). Correspondingly, reliance on these group-averaging approaches can lead to low sensitivity and functional resolution (Brett et al., 2002; Saxe et al., 2006; Nieto-Castañón & Fedorenko, 2012). Inter-individual variability is particularly problematic when functional regions of interest lie in proximity to functionally distinct regions, as is the case with both frontal and temporal language regions (e.g., Tomaiuolo et al., 1999; Fedorenko et al., 2012; Tahmasebi et al., 2012; Deen et al., 2015; Braga et al., 2020; Du et al., 2024; see Fedorenko & Blank, 2020, for discussion of this issue for ‘Broca’s area’).
One increasingly popular solution that circumvents inter-individual variability in the precise locations of functional regions is the use of functional ‘localizers’ (Saxe et al., 2006; Nieto-Castañón & Fedorenko, 2012; Gratton & Braga, 2021). In this approach, a brain region or network that supports a mental process of interest is identified with a functional contrast in each individual brain and subsequently, the region’s/network’s responses to some critical condition(s) of interest are examined. Consistent use of these localizers across studies and labs (and in some cases, species; Russ et al., 2021) affords greater confidence that the ‘same’ region or set of regions is being studied, compared to relying on anatomical landmarks alone, and thus facilitates the accumulation of scientific knowledge.
This functional localization approach has been successful across many domains of perception and cognition including high-level visual and auditory processing, social cognition, and language (Kanwisher et al., 1997; Epstein & Kanwisher, 1998; Downing et al., 2001; Belin et al., 2002; Saxe & Kanwisher, 2003; Baker et al., 2007; Fedorenko et al., 2010, 2013; Overath et al., 2015; Fischer et al., 2016; Isik et al., 2017). In the domain of language, Fedorenko et al. (2010) developed a localizer that relies on a contrast between language processing and the processing of a perceptually similar condition that lacks linguistic structure or meaning (e.g., reading or listening to sentences vs. nonword lists, or listening to sentences vs. backwards speech or acoustically degraded sentences; Bedny et al., 2011; Scott et al., 2017; Lipkin et al., 2022; Malik-Moraleda, Ayyash et al., 2022). Such contrasts target brain areas that support computations related to accessing words and combining them into complex linguistic structures and meanings. These ‘language localizers’ robustly identify the left-lateralized fronto-temporal language network, which has long been implicated in language processing based on investigations of patients with aphasia (e.g., Luria, 1970; Goodglass, 1993; Bates et al., 2003; Fridriksson et al., 2018; Wilson et al., 2023) and group-averaging neuroimaging investigations of language processing (e.g., Binder et al., 1997; Price, 2010; Friederici, 2012). Importantly, language localizers are highly generalizable, eliciting similar activations across presentation modalities, materials, and tasks (see Fedorenko et al., 2024). Moreover, the brain regions that this localizer identifies closely correspond to those that emerge from the bottom-up clustering of voxel time-courses obtained during task-free resting state data (Braga et al., 2020). This correspondence highlights that the language network is a ‘natural kind’ in the brain: an ontologically meaningful grouping of a set of brain regions that show highly synchronized activity over time. Neuroimaging studies of language that rely on the functional localization approach have produced a number of robust and replicable findings both about i) the relationship between language and other perceptual and cognitive processes (e.g., Fedorenko et al., 2011; Deen et al., 2015; Amalric & Dehaene, 2019; Ivanova et al., 2020; Jouravlev et al., 2019; Chen et al., 2023; Shain et al., 2023), and ii) the internal organization of the language system (e.g., Blank et al., 2016; Fedorenko et al., 2020; Shain, Kean et al., 2024).
One practical concern that researchers often express about the use of localizers is that they take time. Time is often a precious commodity in neuroimaging research, either because the critical task is already long and/or because the population of interest may have low tolerance for the scanner environment. However, given the advantages that localizers provide—including greater sensitivity, greater functional resolution, more accurate effect size estimation, higher interpretability of the responses in the critical tasks, and the ability to meaningfully accumulate knowledge across studies, labs, and species—many researchers continue to adopt this approach. One recent effort in the field has been to try to optimize localizers so that they can be as short as possible while still yielding robust individual-level responses (e.g., Dodell-Feder et al., 2011; Lee et al., 2024; Hutchinson et al., 2024).
In this study, we develop a shorter version of a widely used reading-based language localizer (Fedorenko et al., 2010). We leverage the fact that humans can read at fast rates, especially when the need for eye movements is minimized by presenting words one at a time in the center of the screen in a rapid serial visual presentation (RSVP) paradigm (e.g., Forster, 1970; Potter et al., 1980, 1986; Potter, 2012; Mollica & Piantadosi, 2017). In these studies, participants can process linguistic information even when each word is presented for as little as ∼80-200ms, as evidenced by accurate recall of the stimuli and high accuracy in answering comprehension questions about the content. Moreover, Vagharchakian et al. (2012) found that speeded reading, similar to reading at slower speeds, activates the language areas, but their study used a group-averaging approach, leaving open the question of whether speeded reading elicits sufficiently robust responses in individual participants. This is the question our study aims to address. Although this question is primarily methodological in nature, our study’s design allows us to additionally ask a theoretically interesting question about whether the increased processing difficulty due to speeded presentation affects neural responses in the language-selective network, or instead (or in addition) in the domain-general Multiple Demand network, which is sensitive to cognitive effort across diverse paradigms (e.g., Fedorenko et al., 2013; Duncan et al., 2012; Duncan, 2010; Duncan et al., 2020; Assem et al., 2020b).
Methods
Brief overview
24 adults each completed two versions of a language localizer task. In both versions, participants read sentences and lists of unconnected pronounceable nonwords presented on the screen one word/nonword at a time. The two versions differed in the presentation speed of each word/nonword. One localizer version was an extensively validated language localizer task (Fedorenko et al., 2010; Mahowald & Fedorenko, 2016; see Lipkin et al., 2022 for data from >600 participants on this version) where each word/nonword is presented for 450 ms (‘standard language localizer’). The other version was a new, speeded version of the task where each word/nonword was presented for 200 ms (‘speeded language localizer’). 22 of the 24 participants completed the two versions of the language localizer in the same scanning session; the remaining two—in separate sessions (1 and 463 days apart). For all participants, the speeded version was run after the standard version. Each scanning session lasted between 1 and 2 hours and included a variety of additional tasks for unrelated studies. The materials, scripts, and screen recordings for the two language localizer versions are available at https://www.evlab.mit.edu/resources-all/download-localizer-tasks (standard version) and https://github.com/el849/speeded_language_localizer/ (speeded version).
Participants
24 neurotypical adults (12 female, 12 male), aged 18 to 60 (mean: 28.04; std: 9.25), participated for payment between June 2021 and December 2022. All participants were native speakers of English, had normal or corrected-to-normal vision, and no history of neurological, developmental, or language impairments. 22 participants (∼92%) were right-handed, as determined by the Edinburgh handedness inventory (Oldfield, 1971), 2 participants (∼8%) were left-handed. All participants gave informed written consent in accordance with the requirements of the MIT’s Committee on the Use of Humans as Experimental Subjects (COUHES).
fMRI tasks
Language network localizer tasks
Standard language localizer task
A reading task contrasted sentences (e.g., THE SPEECH THAT THE POLITICIAN PREPARED WAS TOO LONG FOR THE MEETING) and lists of unconnected, pronounceable nonwords (e.g., LAS TUPING CUSARISTS FICK PRELL PRONT CRE POME VILLPA OLP WORNETIST CHO) in a standard blocked design with a counterbalanced condition order across runs, as introduced in Fedorenko et al. (2010). Each stimulus consisted of 12 words/nonwords. Stimuli were presented in the center of the screen, one word/nonword at a time, at the rate of 450 ms per word/nonword. Each stimulus was preceded by a 100 ms blank screen and followed by a 400 ms screen showing a picture of a finger pressing a button, and a blank screen for another 100 ms, for a total trial duration of 6 s. Participants were instructed to read attentively (silently, to themselves) and to press a button on the button box whenever they saw the picture of a finger pressing a button on the screen. The button-pressing task was included to help participants remain alert. Experimental blocks lasted 18 s (with 3 trials per block) and fixation blocks lasted 14 s. Each run (consisting of 16 experimental blocks and 5 fixation blocks) lasted 358 s (5 min 58 s). Participants completed 2 runs.
Speeded language localizer task
The speeded version of the language localizer was identical to the standard version except that each word/nonword was presented for 200 ms instead of 450 ms (i.e., ∼56% faster). Each stimulus was preceded by a 100 ms blank screen and followed by a 400 ms screen showing a picture of a finger pressing a button, and a blank screen for another 100 ms, for a total trial duration of 3 s. The instructions to the participants were the same as in the standard version although they were warned that the presentation would be somewhat fast, and they were told not to worry if they missed some button presses. Experimental blocks lasted 9 s (with 3 trials per block) and fixation blocks lasted 14 s. Each run (consisting of 16 experimental blocks and 5 fixation blocks) lasted 214 s (3 min 34 s). Participants completed 2 runs.
Language network localizer experimental materials
Standard language localizer materials
The materials consisted of five sets, each set comprising 48 sentences and 48 nonword sequences, for a total of 240 sentences and 240 nonword sequences. The sentences were drawn from the Brown corpus (Bird & Loper, 2004; Francis & Kucera, 1964) and were selected to include a variety of syntactic constructions and topics. The nonwords were created using the ‘Wuggy’ software (https://github.com/WuggyCode/wuggy; the default parameters were used) so as to respect the phonotactic constraints of English. In cases where Wuggy was unable to generate a nonword candidate, we relied on one of the following strategies: i) broke down the word into composite words (for compound words) or morphemes, matched each composite word/morpheme to a nonword, and then reassembled those; ii) used one of the nonwords created for another word; or iii) created an English-sounding nonword ourselves. Any given participant saw one set of materials.
Speeded language localizer materials
The first 11 participants were presented with the materials from the standard version (ensuring that a different set was used). Approximately halfway through data collection, we created a new set of materials for the speeded language localizer in order to: i) generalize the findings to a new set of materials, and ii) avoid potential material overlaps between the standard and speeded localizer materials in future experiments. Hence, for the remaining 13 participants, we created five new sets each consisting of 48 sentences and 48 nonword sequences, for a total of 240 new sentences and 240 new nonword sequences. The sentences were again selected from the Brown corpus (Bird, 2009). In particular, we sampled 1,000 12 word-long sentences and then selected a set of 240 sentences that were not already included in the original set of materials, were syntactically and semantically diverse, and did not contain offensive/inappropriate content. The nonword strings were created as in the standard version.
Multiple Demand network localizer task
In addition to the language tasks, we included a non-linguistic demanding task: a spatial working memory task. The goal was two-fold. First, including a non-linguistic task allowed us to evaluate the selectivity of the language fROIs–defined by two versions of the localizer—for language processing (Fedorenko et al., 2011, 2024). And second, this task allowed us to examine brain responses to the conditions of the language localizer tasks in another set of functional areas: areas that comprise the domain-general Multiple Demand (MD) network (Duncan, 2010; Duncan et al., 2012; Fedorenko et al., 2013). This network supports executive functions like working memory and cognitive control. The spatial WM task has been previously established to robustly identify these areas at the individual-participant level (e.g., Blank et al., 2014; Mineroff, Blank et al., 2018; Shashidhara et al., 2020; Assem et al., 2020a; Malik-Moraleda, Ayyash et al., 2022). Although the areas of the MD network have been shown to not support any ‘core’ linguistic computations—like those related to lexical access, syntactic structure building, or semantic composition (e.g., Blank & Fedorenko, 2017; Diachek, Blank, Siegelmann et al., 2020; Quillen et al., 2021; Shain, Blank et al., 2020, Shain et al., 2022)—their engagement has been reported for some cases of effortful perception and comprehension (e.g., Mattys & Wiget, 2011; MacGregor et al., 2022; Liu et al., 2022; see Discussion). We therefore wanted to evaluate the MD areas’ responses to speeded comprehension, to see whether this kind of processing difficulty draws on domain-general resources.
The spatial working memory task contrasted a hard condition with an easy condition in a standard blocked design with a counterbalanced condition order across runs (e.g., Fedorenko et al., 2011, 2013; Blank et al., 2014). On each trial (duration = 8 s), participants saw a fixation cross for 500 ms, followed by a 3x4 grid within which randomly generated locations were sequentially flashed (1s per flash) two at a time for a total of eight locations (hard condition) or one at a time for a total of four locations (easy condition). Then, participants indicated their memory for these locations in a two-alternative forced-choice paradigm via a button press (the choices were presented for 1,000 ms, and participants had up to 3 s to respond). Feedback, in the form of a green checkmark (correct responses) or a red cross (incorrect responses), was provided for 250 ms, with fixation presented for the remainder of the trial. Experimental blocks lasted 32 s (with 4 trials per block) and fixation blocks lasted 16 s. Each run (consisting of 12 experimental blocks and 4 fixation blocks) lasted 448 s (7 min 28 s). Participants completed 2 runs.
23 of the 24 participants completed the MD localizer in the same scanning session as the standard language localizer; the remaining participant—in a separate session (98 days apart).
fMRI data acquisition, preprocessing and first-level analysis
fMRI data acquisition
Structural and functional data were collected on the whole-body, 3 Tesla, Siemens Trio scanner 32-channel head coil, at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1-weighted, Magnetization Prepared RApid Gradient Echo (MP-RAGE) structural images were collected in 176 sagittal slices with 1 mm isotropic voxels (TR = 2,530 ms, TE = 3.48 ms, TI = 1100 ms, flip = 8 degrees). Functional, blood oxygenation level dependent (BOLD) data were acquired using one of three similar sequences (denoted as sequence A, B, C). The data from the majority of participants (22 out of 24) were acquired using sequence A which we describe in this paragraph. See specifications of sequences B and C in SI Table 1 (importantly, scanning sequence is kept constant in all comparisons between the standard and the speeded versions of the localizer besides in a single participant). Sequence A was an SMS EPI sequence (with a 90 degree flip angle and using a slice acceleration factor of 2), with the following acquisition parameters: fifty-two 2 mm thick near-axial slices acquired in the interleaved order (with 10% distance factor), 2 mm × 2 mm in-plane resolution, FoV in the phase encoding (A ≫ P) direction 208 mm and matrix size 104 × 104, TR = 2,000 ms and TE = 30 ms, and partial Fourier of 7/8. The first 10 s of each run were excluded to allow for steady state magnetization.
fMRI preprocessing
fMRI data were analyzed using SPM12 (release 7487), CONN EvLab module (release 19b), and custom MATLAB scripts. Each participant’s functional and structural data were converted from DICOM to NIfTI format. All functional scans were coregistered and resampled using B-spline interpolation to the first scan of the first session (Friston et al., 1995). Potential outlier scans were identified from the resulting subject-motion estimates as well as from BOLD signal indicators using default thresholds in CONN preprocessing pipeline (5 standard deviations above the mean in global BOLD signal change, or framewise displacement values above 0.9 mm; (Nieto-Castanon, 2020). Functional and structural data were independently normalized into a common space (the Montreal Neurological Institute [MNI] template; IXI549Space) using SPM12 unified segmentation and normalization procedure (Ashburner & Friston, 2005) with a reference functional image computed as the mean functional data after realignment across all timepoints omitting outlier scans. The output data were resampled to a common bounding box between MNI-space coordinates (−90, −126, −72) and (90, 90, 108), using 2 mm isotropic voxels and 4th order spline interpolation for the functional data, and 1 mm isotropic voxels and trilinear interpolation for the structural data. Last, the functional data were smoothed spatially using spatial convolution with a 4 mm FWHM Gaussian kernel.
First-level analysis
Effects were estimated using a General Linear Model (GLM) in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (HRF) (fixation was modeled implicitly, such that all timepoints that did not correspond to one of the conditions were assumed to correspond to a fixation period). Temporal autocorrelations in the BOLD signal timeseries were accounted for by a combination of high-pass filtering with a 128 s cutoff, and whitening using an AR(0.2) model (first-order autoregressive model linearized around the coefficient a = 0.2) to approximate the observed covariance of the functional data in the context of Restricted Maximum Likelihood estimation (ReML). In addition to experimental condition effects, the GLM design included first-order temporal derivatives for each condition (included to model variability in the HRF delays), as well as nuisance regressors to control for the effect of slow linear drifts, subject-motion parameters, and potential outlier scans on the BOLD signal.
Definition of functional regions of interest (fROIs)
Language and Multiple Demand (MD) fROIs were defined using a group-constrained subject-specific (GSS) approach (Fedorenko et al., 2010) where a set of spatial masks, or parcels, is combined with each individual subject’s localizer activation map, to constrain the definition of individual fROIs. The parcels delineate the expected gross locations of activations for a given contrast and are sufficiently large to encompass the variability in the locations of individual activations. Within each parcel, we selected the top 10% most localizer-responsive voxels, based on t-values.
To define the language fROIs, we used a set of six parcels derived from a group-level probabilistic activation overlap map for the sentences > nonwords contrast in 220 independent participants. The parcels included two regions in the left inferior frontal gyrus (LIFG, LIFGorb), one in the left middle frontal gyrus (LMFG), two in the left temporal lobe (LAntTemp and LPostTemp), and one extending into the angular gyrus (LAngG). Following prior work (e.g., Blank et al., 2014), to define the right-hemisphere RH fROIs, the LH parcels were transposed onto the RH, but the individual LH and RH fROIs were allowed to differ in their precise locations within the homotopic parcels. To define the MD fROIs, we used a set of 20 parcels (10 in each hemisphere) derived from a group-level probabilistic activation overlap map for the hard > easy spatial working memory contrast (Fedorenko et al., 2013) in 197 independent participants. The parcels included symmetrical regions in frontal and parietal lobes, as well as a region in the anterior cingulate cortex. All parcels are available for download from https://evlab.mit.edu/funcloc/.
Extraction of fMRI BOLD responses
We evaluated language and MD networks’ responses by estimating response magnitudes to the conditions of the standard and speeded language localizers in the individually defined fROIs. For each fROI in each participant, we averaged the responses across voxels to get a single value per participant per fROI per condition (i.e., the sentences and nonwords conditions for the language localizer tasks, and the hard and easy conditions for the MD localizer task). The responses to the conditions used to localize the areas of interest (e.g., the responses to the sentences and nonwords conditions in the language fROIs) were estimated using an across-runs cross-validation procedure, where one run of the standard or speeded language localizer was used to define the fROI and the other run of the same localizer version was used to estimate the response magnitudes. The procedure was repeated across run partitions, switching which run was used for fROI definition vs. response estimation. Finally, the estimates were averaged to derive a single value per participant per fROI per condition.
Statistical analysis
We used linear mixed effects (LMEs) models to test for statistical significance. All LMEs reported in the paper were implemented using the lmer function from the lme4 R package (D. Bates et al., 2015; version 1.1-31). Statistical significance testing was performed using the lmerTest package (Kuznetsova et al., 2017; version 3.1-3). The R-squared values were obtained using the GLMM function (MuMln; version 1.47.1) in R. The likelihood ratio tests were performed using the anova function from the lme4 R package. For each LME model reported, we provide (in SI 2D, 3C, 3D, 3E, 4C) tables with model formulae, fixed effects regression coefficients, standard error, random effects coefficients, and p-values).
Results
We compared the fMRI BOLD responses from two versions of a language localizer task (a ‘standard language localizer’ and a ‘speeded language localizer’). The results are organized according to the following two questions: 1) Can speeded reading be used to reliably localize language-responsive areas in individual participants?, and 2) Does increased processing difficulty during speeded reading affect brain responses in the domain-general Multiple Demand brain network?
1. The speeded language localizer can reliably localize language-responsive areas in individual participants
1.A. The activation topography is similar between the standard and speeded language localizer versions
Twenty-four participants completed a standard language localizer task (Fedorenko et al., 2010) and a speeded localizer task. In both tasks, they silently read sentences (sentences condition) and sequences of nonwords (nonwords condition) (see Methods; fMRI tasks).
The activation maps for the sentences > nonwords contrast are visually highly similar between the standard language localizer and the speeded language localizer (Figure 1A). To quantify this similarity, we correlated voxel-wise activation patterns (restricted to the LH language parcels; see SI 2A for whole-brain correlations) across localizer runs and versions. The correlation values were Fisher-transformed and averaged across the six LH language parcels, leading to a single value for each comparison.
First, we correlated the activation patterns across the two runs within each localizer version. These values characterize the stability of the activation patterns for each version and also delimit the similarity that could be obtained between the two localizer versions. The within-localizer correlations were high for both versions: 0.881 and 0.887 for the standard and speeded versions, respectively (Figure 1B; left bars), and did not statistically differ from each other (speeded > standard; ꞵ=0.005, t=0.118, p=0.906 via linear mixed effects (LME) modeling). Next and critically, we correlated the activation patterns between the two versions of the localizer. To match the amount of data to the within-version comparisons, we correlated activations for each run of the standard version with each run of the speeded version (four pairwise combinations, given two runs of each localizer version; Figure 1B; right bar). The between-localizer correlation was 0.795 (Figure 1B; right bar: the average of the four pairwise combinations of runs between the standard and speeded localizer versions). To statistically compare the within-vs. between-version correlations, we modeled the average within-version and between-version correlation coefficients in an LME model with a fixed effect for comparison type (within vs. between), and random intercepts for participants and parcels. Although the similarity of the activations within a given localizer version was statistically higher than between localizer versions (Figure 1B), the effect size was relatively small (within > between; ꞵ=0.089, t=2.629, p=0.009), in line with both within and between correlations being high.
In a secondary analysis, we quantified the extent of voxel overlap between functional regions of interest (fROIs) using the Dice coefficient (Dice, 1945). The results mirrored the spatial correlation analyses above. The overlap between the sentences > nonwords fROIs, defined as the top 10% of language-responsive voxels, was high for both within and between comparisons (0.704 and 0.670 for the within comparisons for the standard and speeded versions, respectively; and 0.649 for the between comparison), but slightly higher across the runs within a localizer version than between localizer versions (ꞵ=0.038, t=2.453, p=0.015) (Figure 1C). For fROIs of larger size (e.g., fROIs defined as the top 20% or 30% of most language-responsive voxels), the within vs. between differences get smaller (20%: ꞵ=0.022, t=1.775, p=0.077; 30%: ꞵ=0.011, t=1.018, p=0.310), which suggests that although the peaks of the activation topographies are slightly more similar within a localizer version than between the two versions, the overall topographies are highly similar (see SI 2B for Dice coefficient comparisons across a larger range of fROI thresholds).
1.B. The fROIs defined by the speeded language localizer respond at least as strongly and as selectively during language processing as the fROIs defined by the standard localizer
Having established that the activation topographies are similar across the localizer versions (Figure 1), we examined the magnitude of the BOLD responses for the sentences and nonwords conditions across the two versions in the fROIs defined by the standard approach of selecting top 10% of most language-responsive voxels within six broad, anatomical parcels (Figure 2A; see Methods; Extraction of fMRI BOLD responses). Figure 2B shows the average BOLD responses across the six LH language fROIs and Figure 2C shows the responses for each of the six fROIs individually. To statistically compare the two localizers, the BOLD responses were modeled in an LME with fixed effects for condition (sentences vs. nonwords) and localizer version (standard vs. speeded) and random intercepts for participants and fROIs.
As expected, the effect of condition was highly significant (sentences > nonwords, ꞵ=1.698, t=22.060, p<0.0001) (Figure 2B,C); in contrast, the main effect of localizer version was not significant (speeded > standard, ꞵ=0.011, t=0.142, p=0.887). To further examine whether the standard and speeded localizer versions differed with respect to their responses to sentences and nonwords, we used a similar LME as above but also included an interaction term between condition and localizer version. We tested for significance of the interaction using a likelihood-ratio test with a Chi Square test statistic (χ2). Indeed, the interaction was significant (χ2=19.919, p<0.0001), suggesting that the responses to the two conditions (sentences and nonwords) differed between localizer versions. To better understand this difference, we examined the effect size of the sentences > nonwords contrast and found that it was greater in the speeded localizer compared to the standard localizer (speeded > standard; ꞵ=0.681, t=7.858, p<0.0001). As can be seen in Figure 2B,C, the response to the sentences condition was higher in the speeded localizer compared to the standard localizer (speeded > standard; ꞵ=0.351, t=3.054, p=0.002), and the response to the nonwords condition was lower in the speeded localizer compared to the standard localizer (speeded > standard; ꞵ=-0.329, t=-4.125, p<0.0001). Taken together, these analyses show that the fROIs defined by the speeded language localizer show a larger sentences > nonwords effect compared to the standard localizer, due to both higher responses to sentences and lower responses to nonwords in the speeded version.
We also examined the selectivity of the language fROIs defined by both the standard and the speeded localizers for language processing relative to a non-linguistic demanding cognitive task. Prior work has established that language-responsive brain areas (as defined by standard versions of the language localizer task) are highly selective for language relative to diverse non-linguistic inputs and tasks (e.g., Fedorenko et al., 2011; Ivanova et al., 2020, 2021; Chen et al., 2023; for reviews, see Fedorenko & Blank, 2020; Fedorenko et al., 2024). Here, we investigated whether the fROIs defined by the speeded language localizer exhibit a similar degree of selectivity. To do so, we collected brain responses during a spatial working memory task (see Methods; fMRI tasks) and examined BOLD response magnitudes to the hard and easy conditions in the LH language regions, defined by the standard versus speeded language localizers (Figure 2D,E). As expected given the results in Section 1, both sets of fROIs showed selectivity for language, with no response during the cognitively demanding spatial working memory task (standard localizer: hard: t=-1.059, p=0.301; easy: t=0.392, p=0.699 via two-sided, one-sample t-test against zero; speeded localizer: hard: t=0.274, p=0.786; easy: t=1.513, p=0.143). This lack of response in the language areas is in sharp contrast with the Multiple Demand areas, which respond strongly to both conditions, and show a clear hard > easy effect (SI 3A).
Finally, in addition to the analyses reported in 1.A and 1.B above, we tested whether the BOLD response magnitudes from the fROIs defined by the standard versus speeded localizers were stable over time (across runs (SI 3B) and—for two participants who completed the localizers several times—across scanning sessions (SI 3C)). This is important to know given that BOLD response magnitudes are often used in individual-differences investigations that aim to relate neural measures to behavior (e.g., Mahowald & Fedorenko, 2016; Assem et al., 2020a; Kong et al., 2020). We found that the magnitudes were indeed highly stable within participants over time.
2. Speeded sentence reading engages the domain-general Multiple Demand (MD) system more than standard reading
In addition to examining responses in the language network (Section 1.B), we investigated responses in the domain-general Multiple Demand (MD) network. This network supports computations related to goal-directed behaviors and is recruited during a broad array of cognitively demanding tasks (e.g., Duncan, 2010; Duncan et al., 2012; Fedorenko et al., 2013; Shashidhara et al., 2019; Assem et al., 2020b; Duncan et al., 2020). Of most relevance to the current investigation, the MD network appears to be engaged in some cases of effortful comprehension, including processing speech in noisy conditions (Mattys & Wiget, 2011; MacGregor et al., 2022; Liu et al., 2022), processing accented speech (Adank & Janse, 2010; Janse & Adank, 2012; Adank et al., 2012; Banks et al., 2015), processing non-native languages (Malik-Moraleda, Jouravlev et al., 2024; Wolna et al., 2024), and processing linguistic inputs that are not syntactically well-formed (Kuperberg et al., 2003; Nieuwland et al., 2012; Mollica et al., 2020; Tuckute et al., 2024; Kauf et al., 2024). However, the full range of conditions under which the MD network is recruited during language processing is not well-understood, yet is critical for understanding the contributions of this network to comprehension.
Following prior work (e.g., Malik-Moraleda, Ayyash et al., 2024), we defined MD fROIs (10 in each hemisphere; Figure 3A) using the hard > easy contrast of the spatial working memory task described in the previous section (Section 1.B; and Methods; fMRI tasks). We then examined the responses to the sentences and nonwords conditions across the two versions of the language localizer to test whether speeded reading taxes the MD network. (For validation that the MD fROIs behave as expected, i.e., show a reliably greater response to the hard spatial working memory condition compared to the easy one, see SI 3A.)
The BOLD response magnitudes for the sentences and nonwords conditions across both localizer versions are shown in Figure 3B for the average of the ten left and right hemisphere MD fROIs and Figure 3C for each hemisphere separately (see SI 4A for each of the twenty fROIs individually). In line with prior work (e.g., Fedorenko et al., 2013; Diachek, Blank, Siegelman et al., 2020), we found that the MD fROIs showed a robust nonwords > sentences effect in the standard language localizer (ꞵ=-0.275, t=-7.547, p<0.0001). In contrast, in the speeded language localizer, reading of nonwords did not engage the MD network to a greater extent than reading of sentences (sentences > nonwords, ꞵ=-0.012, t=-0.283, p=0.777). As evident from Figure 3C, some participants exhibited higher MD network engagement in the nonwords condition, whereas others exhibited the opposite pattern. To statistically compare the responses to the two localizers, the BOLD responses were modeled in an LME with fixed effects for condition (sentences vs. nonwords) and localizer version (standard vs. speeded) and random intercepts for participants and fROIs. Using likelihood ratio tests, we confirmed a significant interaction between condition and localizer version (χ2=18.274, p<0.0001), suggesting that the MD network was engaged differently by the two localizers. In particular, the MD network was more engaged in the sentences condition during the speeded localizer compared to the standard localizer (speeded > standard, ꞵ=0.200, t=4.584, p<0.0001), whereas the responses to the nonwords condition did not reliably differ between the two versions (speeded > standard, ꞵ=-0.064, t=-1.519, p=0.129). In summary, speeded sentence reading was more effortful than slower-paced reading, and under the speeded-reading conditions, no nonwords > sentences effect was observed.
Discussion
In cognitive neuroscience, there is a growing recognition of inter-individual differences in the precise functional topographies, especially in the association cortex (e.g., Brett et al., 2002; Saxe et al., 2006; Nieto-Castañón & Fedorenko, 2012; Fedorenko & Blank, 2020). We here show that a standard localizer for the language network (Fedorenko et al., 2010) can be halved in time by using speeded reading, and that the speeded-reading-based contrast is even more robust than the one based on standard-paced reading. In the remainder of the Discussion, we elaborate on these findings and their implications.
1. Robustness and generalizability of the language localizer
The standard language localizer (Fedorenko et al., 2010) investigated in our study has been widely used over the past decade (e.g., Fedorenko et al., 2010; Mahowald & Fedorenko, 2016; Braga et al., 2020; Lipkin et al., 2022; Du et al., 2024). The localizer contrasts the reading of well-formed sentences versus sequences of nonwords. The brain areas identified by this contrast have been shown to be robust across materials (e.g., Fedorenko et al., 2010) and tasks (e.g., Diachek, Blank, Siegelmann et al., 2020; Ivanova et al., in prep). Moreover, this contrast generalizes well to the auditory and audio-visual presentation modalities (e.g., Fedorenko et al., 2010; Scott et al., 2017; Olson et al., 2023) and works well across typologically diverse languages (Richardson et al., 2020; Malik-Moraleda et al., 2022; Terhune-Cotter et al., 2023) and for diverse populations, including children (Ozernov-Palchik, O’Brien et al., 2024), older healthy adults (Billot, Jhingan et al., in prep), and individuals with stroke aphasia (Billot, 2023; Clercq et al., 2024; Billot et al., in prep). In the current study, we show that the reading version of the localizer is robust to presentation speed, in line with past behavioral work showing the ability to understand language at fast speeds when presented word-by-word in a rapid serial visual presentation (RSVP) paradigm (e.g., Forster, 1970; Potter et al., 1980, 1986; Potter, 2012; Mollica & Piantadosi, 2017). In the speeded version that we evaluated, each word was presented for 200 ms (compared to 450 ms in the standard localizer, i.e., ∼56% faster), and we demonstrate that language areas in individual participants can be reliably localized using this version.
2. The speeded language localizer shows at least as strong selectivity for language relative to the control condition and a non-linguistic demanding task
In the current work, we first established that the voxel-level activation topographies were highly similar between the standard and speeded language localizers, and then demonstrated that the response magnitudes in fROIs defined by each localizer version were highly similar both in their responses to language and a control condition, and that these fROIs exhibited selectivity for language processing relative to a non-linguistic demanding spatial working memory task (e.g., Duncan, 2010; Fedorenko et al., 2013). Moreover, the speeded localizer is actually more effective than the standard version given that it better differentiates the critical language condition and the control condition. Specifically, the speeded localizer elicited a stronger response to the sentences condition, possibly due to an increase in attentional demands or processing difficulty (but see next discussion section), and a weaker response to the control condition (nonwords). The reduced response to nonwords may be due to the increased challenge of reading nonwords quickly which in turn might reduce the accessibility of information about their phonotactic properties (e.g., Regev et al., 2024). Thus, the speeded localizer produced a response profile with at least as strong responses to language as the standard localizer. Additionally, the areas identified by the speeded localizer were selective for language relative to a non-linguistic spatial working memory task, similar to the profile of the areas identified using the standard localizer (see Fedorenko & Blank, 2020 and Fedorenko et al., 2024 for reviews).
We also found that the sentences > nonwords response magnitude was stable across runs for the speeded localizer version, similar to the standard version, which suggests that the speeded localizer can also be used in studies that relate neural markers to behavior or genetics to study individual differences (e.g., Mahowald & Fedorenko, 2016; Assem et al., 2020a; Kong et al., 2020).
3. Contributions of the Multiple Demand (MD) network to language comprehension
The Multiple Demand (MD) network is broadly implicated in cognitively demanding tasks and goal-directed action, showing strong responses to diverse executive function tasks (Duncan & Owen, 2000; Duncan, 2010; Duncan et al., 2012; Fedorenko et al., 2013; Shashidhara et al., 2019b; Assem et al., 2020b; Duncan et al., 2020) as well as during some domains of reasoning, like arithmetic reasoning (e.g., Monti et al., 2009; Fedorenko et al., 2013; Amalric & Dehaene, 2019) and understanding computer code (e.g., Ivanova et al., 2020; Liu et al., 2020). Some language tasks where comprehension/production are accompanied by task demands can also engage the MD network (e.g., Diachek, Blank, Siegelman et al., 2020). However, during naturalistic comprehension of even syntactically complex stimuli, the MD network is not engaged, and the costs of language processing are localized to the language-selective system (Diachek, Blank, Siegelmann et al., 2020; Quillen et al., 2021; Shain, Blank et al., 2020; Wehbe et al., 2021; see review, see Fedorenko & Shain, 2021).
In contrast to the costs associated with linguistic processing specifically (e.g., processing unexpected elements or non-local inter-word dependencies; Shain, Blank et al., 2020; Shain et al., 2022), some cases of effortful comprehension, even without external task demands, appear to engage the MD network. Such cases include listening to speech in noisy conditions (Mattys & Wiget, 2011; MacGregor et al., 2022; Liu et al., 2022), processing accented speech (Adank & Janse, 2010; Janse & Adank, 2012; Adank et al., 2012; Banks et al., 2015), processing sentences in non-native languages (Malik-Moraleda, Jouravlev et al., 2024; Wolna et al., 2024), and processing linguistic inputs that are not syntactically well-formed (Kuperberg et al., 2003; Nieuwland et al., 2012; Mollica et al., 2020; Tuckute et al., 2024; Kauf et al., 2024). A possible generalization about these cases is that they all involve difficulty extracting a syntactically parsable word sequence from perceptual linguistic inputs.
Here, we present another case where passive language comprehension engages the MD network: speeded reading (for earlier evidence, see Vagharchakian et al., 2012, although the evidence is indirect as no independent MD localizer is included). The MD regions’ response during the sentences condition was ∼43% higher in the speeded version compared to the standard version (cf. a much smaller difference observed in the language regions: a ∼16% increase for the speeded version). Interestingly, in some previously reported cases, the linguistic condition that engages the MD network to a greater extent elicits a lower response in the language areas. For example, Malik-Moraleda, Jouravlev et al. (2024) show that comprehension of relatively low-proficiency languages engages the MD network more strongly than higher-proficiency languages, but elicits a lower response in the language network. In contrast, the speeded sentence reading condition elicited a higher response compared to the normal-speed reading condition in both the MD network and the language network. This pattern may be taken to suggest that the generalization above— that the MD network gets engaged when it is difficult to extract a syntactically parsable word sequence from perceptual inputs—is not correct: this kind of difficulty should systematically lead to lower responses in the language network given that partially comprehensible stimuli should not be able to engage linguistic computations to the full extent (see Malik-Moraleda, Jouravlev et al., 2024 for discussion). Thus, the precise contributions of the MD network during different kinds of effortful linguistic processing remain to be determined.
Finally, given the MD network’s stronger response during the speeded sentence reading condition but a similarly strong response during the nonword reading condition, the speeded localizer does not elicit a nonwords > sentences effect in the MD regions, in contrast to the standard language localizer (Fedorenko et al., 2013; Diachek, Blank, Siegelmann et al., 2020). A practical implication is that it is not possible to use the nonwords > sentences contrast in the speeded version to localize the MD network (in addition to the language network) as is sometimes done (e.g., Shain, Blank et al., 2020). Whether the time saved by the speeded language localizer version is worth this trade-off of not being able to functionally define the MD regions using the same localizer will depend on the researcher’s goals.
4. Other efforts in cognitive neuroscience to develop efficient localizers
Functional localizers increase the sensitivity, functional resolution, and interpretability of research in cognitive neuroscience, but they take up precious time during the study. As a result, there is growing interest in making localizers more efficient. There are two strategies to make a localizer shorter: i) by reducing the number of blocks or making the blocks shorter, or ii) by trying to increase the size of the critical > control effect (typically, by trying to increase the critical condition’s response magnitude). Our approach falls into the first category. In particular, by increasing the speed of (visually) presenting linguistic materials (by ∼56%), we shortened experimental blocks from 18 s (3 6-second trials) to 9 s (3 3-second trials). (Note that although we retained the original 14 s fixation blocks, the fixation blocks could likely be shortened to 9 s, which would shave off another 30 s from the run’s duration.) Lee et al. (2024) also took the first approach, but instead of changing the presentation speed, they showed that for a standard auditory language localizer based on the contrast of intact speech > degraded speech (as introduced in Scott et al., 2017) fewer blocks suffice for localizing the language regions.
The approach of trying to increase the magnitude of the critical condition requires selection of stimuli that maximally engage the system of interest. For example, Dodell-Feder et al. (2011) analyzed responses to individual stimuli in a standard Theory of Mind (ToM) network localizer (Saxe & Kanwisher, 2003) and in a large dataset of a few hundred participants identified a) a subset of the critical-condition items (false belief stories) that elicit the highest response in the ToM brain areas, and b) a subset of the control-condition items (false photograph stories) that elicit the lowest response in the ToM areas. These subsets were used to create a highly efficient ToM localizer (see Chen, Kamps et al., 2024 for a related approach). Other studies have attempted to select stimuli that would be especially exciting for particular individuals based on their interests. For example, Olson, D’Mello et al. (2023) used language materials on topics of interest to different individuals with autism and found stronger responses in the language areas with those custom-selected stimuli. Finally, with the advent of neural networks that are predictive of brain responses (e.g., Yamins et al., 2014; Schrimpf et al., 2021), it is now possible to create or select stimuli that elicit maximal responses in the target region/network (Bashivan et al., 2019; Xiao & Kreiman, 2020; Ratan Murty et al., 2021; Gu et al., 2023; Tuckute et al., 2024). To our knowledge, these advances have not yet been leveraged in the creation of efficient localizers.
In addition to increasing the efficiency of a given localizer, another recent effort is to combine several localizers into a single experiment. For example, Hutchison et al. (2024) propose a multimodal localizer with simultaneous presentation of visual and auditory stimuli to target processing of e.g., faces and scenes, as well as speech, language, and higher-level cognitive areas.
Increasing localizer efficiency in all these ways is valuable given the increasing popularity of precision imaging approaches in cognitive neuroscience (Gordon et al., 2017; Naselaris et al., 2021; Gratton & Braga, 2021; Allen et al., 2022).
Data and code availability
The scripts for running the speeded language localizer as well as the associated analyses can be found here: https://github.com/el849/speeded_language_localizer/. The data can be found on OSF: https://osf.io/2vskh/.
Supplementary Information
SI 1: Details on fMRI acquisition sequences
SI 2: Information related to Results Section 1.A
SI 2A: Whole-brain spatial correlation (supplementing language parcel correlations in Figure 1B and Figure 1C)
SI 2B: Dice correlation coefficient between the standard and speeded language localizer versions
Note that in some participants (in particular for larger values of n), not all top n% voxels displayed a positive sentences > nonwords t-statistic. In this case, the voxels with negative t-statistic (i.e., opposite selectivity) were excluded from the Dice coefficient analyses. See the number of included voxels across the range of n in SI Table 2C.
SI 2D: Statistics tables for Results Section 1.A
In the tables below, “SpCorr” denotes the Fisher-transformed spatial correlation coefficient. “within_between” denotes whether a spatial correlation coefficient was computed within localizer or between localizer versions. “participant” denotes each of the n=24 participants. “fROI” denotes each of the LH or RH regions of interest (six fROIs in each hemisphere).
SI 3: Information related to Results Section 1.B
SI 3A: Validation of hard > easy contrast from the MD localizer
SI 3B: Sentences > nonwords BOLD response magnitudes are highly correlated across runs for both the standard and speeded language localizer versions
To investigate how stable the sentences and nonwords BOLD responses were across individual scanning runs, we quantified the average BOLD response magnitudes of the sentences > nonwords contrast for each LH language fROI for the odd and even run of each localizer version separately. Note that independent data were used to localize the fROI (i.e., data from the odd run were used to define the fROI, and responses were extracted from the even run, and vice versa).
The correlation between the average sentence > nonwords magnitude across LH language fROIs was greater in the standard language localizer than the speeded language localizer (SI Figure 3B, panel A). The correlation of the sentences > nonwords magnitude between odd and even runs across the six language fROIs was r = 0.78 (p<0.0001) for the standard language localizer, and r = 0.57 (p=0.0036) for the speeded language localizer. (Note that without the one outlier participant–bottom right in SI Figure 3B, panel A–the correlation was r = 0.85 for the speeded language localizer, p<0.0001).
The odd-even correlation values for individual fROIs were similarly high (SI Figure 3B, panel B): The average correlation across the six language fROIs was 0.79 (SD across fROIs: 0.10; six ps<0.005) for the standard language localizer, and 0.72 (SD: 0.08; six ps<0.005) for the speeded language localizer.
SI 3C: Consistency of localizers within participants across sessions
SI 3D: Statistics tables for Results Section 1.B (responses to language)
In the tables below, “BOLD response” denotes the BOLD response magnitude for the given condition (sentences, nonwords, or sentences > nonwords; note that “language” denotes both sentences and nonwords responses). “condition” denotes the sentences and nonwords conditions in the LMEs where they are modeled together. “version” denotes the language localizer version, either standard or speeded. “participant” denotes each of the n=24 participants. “fROI” denotes each of the six LH fROIs.
SI 3E: Language BOLD responses for right hemisphere fROIs
The statistics tables accompanying SI Figure 3E are found below.
SI 3F: Statistics tables for Results Section 1.B (responses to working memory task)
In the tables below, “BOLD response” denotes the BOLD response magnitude for the given condition (hard, easy). “version” denotes the language localizer version, either standard or speeded. “participant” denotes each of the n=24 participants. “fROI” denotes each of the six LH fROIs.
SI 4: Information related to Results Section 2 (Multiple Demand network)
SI 4A: MD responses to the language localizer versions across fROIs
SI 4B: Statistics tables for Results Section 2
In the tables below, “BOLD response” denotes the BOLD response magnitude for the given condition (sentences, nonwords; note that “language” denotes both sentences and nonwords responses). “condition” denotes the sentences and nonwords conditions in the LMEs where they are modeled together. “version” denotes the language localizer version, either standard or speeded. “participant” denotes each of the n=24 participants. “fROI” denotes each of the twenty LH/RH MD fROIs.
Footnotes
↵* Co-first authorship