Abstract
A small fraction of the world’s population master five or more languages. How do such polyglots represent and process their different languages, and more generally, what can this unique population tell us about the language system? We identified the language network in each of 25 polyglots (including 16 hyperpolyglots with knowledge of 10+ languages) and examined its response to the native language, languages of varying proficiency, and unfamiliar languages. We found that all languages elicit a response reliably above the perceptually matched control condition in all areas of the language network. The response magnitude across languages generally scaled with comprehension level: aside from the native language, which elicited a relatively low response, languages that were more comprehensible to the participant elicited stronger responses. This pattern held for both familiar (studied) languages, and unfamiliar languages (cognate languages of high-proficiency languages elicited a stronger response than non-cognate languages). We also replicated a prior finding of weaker responses during native language processing in polyglots compared to non-polyglots. These results contribute to our understanding of how multiple languages co-exist within a single brain and provide new evidence that the language-selective network responds more strongly to stimuli from which more linguistic meaning can be extracted.
Introduction
A large fraction of the world’s population speak or sign more than one language. Not surprisingly, research in psycholinguistics and cognitive neuroscience has long been tackling questions about how bilingual minds and brains work. The research questions have ranged from how multiple languages are represented and processed (e.g., Fabbro, 2001; Lucas et al., 2004; Perani & Abutalebi, 2005; Liu & Cao, 2016) to whether language processing differs between bilingual and monolingual individuals (e.g., Kovelman et al., 2008; Jones et al., 2012; Grundy et al., 2017; Pliatsikas et al., 2020; Arredondo et al., 2022), to whether bilingualism confers advantages outside of the language domain (e.g., Bialystok et al., 2016; Cespón & Carreiras, 2020; Blanco-Elorrieta & Caramazza, 2021).
Although a handful of studies have investigated individuals who speak three or more languages (e.g., Vingerhoets et al., 2003; Briellmann et al., 2004; Jeong et al., 2007; Lemhöfer et al., 2010; Videsott et al., 2010; De Bruin et al., 2014; Yazbek et al., 2020), most past work has focused on bilinguals. Yet bilinguals are not an ideal test population for some research questions. First, by definition, research on bilinguals is limited to asking questions about two languages, and some findings about the representation and processing of two languages may not generalize to the representation and processing of multiple languages. For example, although many have argued that in bilinguals, the two languages draw on the same brain areas (e.g., Illes et al., 1999; Roux & Trétmoulet, 2002; Klein et al., 2006; Sulpizio et al., 2020), it is possible that additional languages (perhaps the later-acquired ones or the lower-proficiency ones) would not show the same pattern and would instead draw on a system that supports the acquisition of novel cognitive skills later in life (e.g., Abutalebi, 2008; Liu et al., 2010; Yazbek et al., 2020). And second, in many cases, comparisons between a bilingual’s two languages involve a comparison between a native language (L1) and a non-native language acquired later in life (L2). Because one’s native language may have a privileged status in how it is represented and processed (e.g., Cutler, 2012; Keysar et al., 2012; Pierce et al., 2014), such comparisons may be difficult to interpret when trying to understand, for example, how proficiency influences neural responses (i.e., is L2 eliciting a weaker/stronger response than L1 because the individual is less proficient in it or because it is not the individual’s native language?).
Furthermore, past cognitive neuroscience research on bilingualism has suffered from a number of limitations. First, much past work has relied on the traditional group-averaging fMRI approach (e.g., comparing group-level activation maps for L1 vs. L2), which suffers from low sensitivity, low functional resolution, and low interpretability (e.g., Saxe et al., 2006; Nieto-Castañón & Fedorenko, 2012; Fedorenko, 2021). And second, many studies have relied on paradigms that conflate language processing and general task demands, which recruit distinct brain networks: the language-selective network (Fedorenko et al., 2011) and the domain-general Multiple Demand network (e.g., Duncan, 2010, 2013), respectively. As a result, overlap between languages may reflect similar task demands rather than something about language representation/processing specifically.
To address these limitations, we here turn to polyglots and hyperpolyglots, who have some proficiency in at least 5 languages (range 5-54). Using robust individual-level fMRI analyses and an extensively validated language network ‘localizer’ paradigm (e.g., Fedorenko et al., 2010; Lipkin et al., 2022; Malik-Moraleda, Ayyash et al., 2022), we ask two questions that have been mostly probed in bilinguals and that remain debated (e.g., Costa & Sebastián-Gallés, 2014; Sulpizio et al., 2020; Pascual et al., 2023). First, we ask whether multiple languages (including the native language and three other languages of varying proficiency) all draw on the frontotemporal language-selective network. And second, we ask how proficiency affects the magnitude of neural response in the language areas. In addition, we ask a novel question about neural responses to unfamiliar languages that are cognates (or ‘sister languages’; e.g., Campbell, 2017) of the participants’ native/high-proficiency languages. Given that, due to a common ancestral language, cognate languages have overlap (often substantial) in vocabulary, participants should be able to extract some meaning from those languages (e.g., Gooskens et al., 2017). We tested whether this level of comprehension would manifest as a stronger response in the language areas compared to completely unfamiliar non-cognate languages, which should be incomprehensible. Finally, in line with increasing emphasis on robustness and replicability in cognitive neuroscience (e.g., Poldrack et al. 2017), we attempt to replicate a recent finding of lower neural responses during native language processing in polyglots compared to non-polyglots (Jouravlev et al., 2020).
To foreshadow our results, all languages, including unfamiliar ones, elicit a robust response in the language-selective brain network relative to a perceptually matched control condition. Aside from the native language, which elicits the lowest-magnitude response of the four familiar languages, responses to different languages scale with comprehensibility. This pattern of results suggests that the amount of neural activity in the language network reflects the level of understanding (i.e., how much meaning the participant can extract from the linguistic input). The native language, at least in this population, constitutes an exception, however, eliciting a lower response than would be expected based on comprehensibility alone.
Methods
Participants
Polyglots were defined as individuals who (a) have some proficiency (beyond basic proficiency) in at least 5 languages (i.e., their native language and four other languages), and (b) have advanced proficiency in at least one language other than their native language. Participants assessed their own proficiency in listening, speaking, reading, and writing abilities in each language they have some familiarity with, on a scale from 0=no knowledge to 5=native/native-like proficiency. The four scores (for listening, speaking, reading, and writing) were summed to obtain an overall proficiency score for each language.
Twenty-six polyglots were recruited by word of mouth. Most participants resided in the Boston area, but a few traveled from other cities in the US. One participant’s data were excluded due to excessive motion. The remaining 25 participants were between 19 and 67 years of age at the time of testing (M=36.68 years; SD=13.05). Most (19 of the 25) were native speakers of English; the remaining six were native speakers of French (n=2), Dutch (n=1), German (n=1), Mandarin (n=1), and Spanish (n=1) and proficient speakers of English. Fourteen participants were male, and 22 were right-handed (all participants showed typical, left-lateralized language activations).
The mean number of languages spoken/signed, with some level of proficiency was 16.6 (median=11, range: 5-54 languages; Table S1). The mean self-rated proficiency for their most proficient language was 20 out of the maximum score of 20 (SD=0), 18.4 for their second most proficient language (SD=2.24, range: 12-20), 15.6 for their third most proficient language (SD=3.56, range: 8-20), 13.4 for their fourth most proficient language (SD=4.07, range: 7-20), 12.0 for their fifth most proficient language (SD=3.72, range: 6-20), and—for individuals with proficiency in more than five languages—10.8 for their sixth most proficient language (SD=3.61, range: 6-18). Thus, in addition to having native-like proficiency in their second-most proficient language, most of these individuals had quite high proficiency in their third and fourth most proficient languages, and some in their fifth- and sixth-most proficient languages (Table S1).
For one analysis, we additionally used a previously published dataset of eighty-six bilingual nonpolyglot individuals, who were native speakers of diverse languages and had completed a language listening task in their native language (Malik-Moraleda, Ayyash et al., 2022). Half of the participants (n=43) were male. Participants were between 19 and 45 years of age at the time of testing (M=27.52, SD=5.49). All participants were right-handed and showed typical, left-lateralized language activations.
All participants gave informed consent in accordance with the requirements of the Committee on the Use of Humans as Experimental Subjects at MIT and were paid for their participation.
Experimental Design and Materials
Each participant completed a localizer for the language network (Fedorenko et al., 2010) and the critical multi-language listening task. Some participants completed one or two additional tasks for unrelated studies. The entire scanning session lasted approximately 2 hours.
Language localizer
Participants passively read sentences and lists of pronounceable nonwords in a blocked design. The Sentences>Nonwords contrast targets brain regions that are sensitive to high-level linguistic processing, including the understanding of word meanings and combinatorial phrase-structure building (Fedorenko et al., 2010). In prior work, we had established the robustness of the language localizer to changes in the materials, task, timing parameters, and other aspects of procedure (e.g., Fedorenko et al., 2010; Fedorenko, 2014; Mahowald & Fedorenko, 2016; Scott et al., 2017; Cheung et al., 2020; Malik-Moraleda et al., 2022). We used English materials in this localizer for all participants (see Malik-Moraleda, Ayyash et al., 2022 for evidence that this localizer works well for non-native but proficient speakers of English). Each trial started with 100 ms pre-trial fixation, followed by a 12-word-long sentence or a list of 12 nonwords presented on the screen one word/nonword at a time at the rate of 450 ms per word/nonword. Then, a line drawing of a hand pressing a button appeared for 400 ms, and participants were instructed to press a button whenever they saw this icon, and finally a blank screen was shown for 100 ms, for a total trial duration of 6 s. The simple buttonpressing task was included to help participants stay awake and focused. Each block consisted of 3 trials and lasted 18 s. Each run consisted of 16 experimental blocks (8 per condition), and five fixation blocks (14 s each), for a total duration of 358 s (5 min 58 s). Each participant performed two runs. Condition order was counterbalanced across runs.
Multi-language listening task
Participants listened to brief passages and to control, acoustically scrambled versions of those passages (see below for details) in eight languages in a blocked design. The eight languages were selected separately for each participant based on their linguistic background (Table S2) and included (a) the participant’s native language (L1), (b) three non-native languages that the participant was somewhat proficient in (L2, L3, and L4), (c) two unfamiliar languages that were cognates of the languages that the participant had familiarity with, and two languages that the participant was completely unfamiliar with. For the familiar non-native languages, L2 was the language that participants reported being most proficient in after their native language. L3 and L4 were chosen so that participants were somewhat proficient in them, but these were not always the next most proficient languages due to the limitations on the languages for which the experimental materials were available. For the languages that we used in this experiment, the mean self-rated proficiency for L2 was 18.5 out of the maximum score of 20 (SD=1.92, range: 12-20), for L3 – 13.5 (SD=3.72, range: 6-20), for L4 – 9.96 (SD=3.22, range: 4-16). Across participants, materials from 29 languages were used (Table S2).
Two sets of materials were used in this experiment. One set (used for n=18 participants) came from the publicly available corpus of Bible audio stories (https://www.biblegateway.com/resources/audio/), which consists of Bible-based stories that are narrated by native speakers of different languages. This set included materials for 25 languages (Arabic, Bangla, Basque, Dutch, English, French, Georgian, German, Hausa, Hebrew, Hindi, Indonesian, Italian, Japanese, Mandarin, Romanian, Russian, Persian, Spanish, Swahili, Thai, Turkish, Wolof, Xhosa, and Yiddish). The other set (used for n=7 participants) used passages from Alice in Wonderland (Carroll, 1865) that are narrated by native speakers and are based on work by Malik-Moraleda, Ayyash et al. (2022) (https://evlab.mit.edu/aliceloc/). This set included materials for 46 languages (Afrikaans, Arabic, Armenian, Assamese, Basque, Belarusian, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Mandarin, Marathi, Nepali, Norwegian, Polish, Portuguese, Romanian, Russian, Serbo-Croatian, Slovene, Spanish, Swahili, Swedish, Tagalog, Tamil, Telugu, Turkish, Ukrainian and Vietnamese).
For both the Bible stories and Alice in Wonderland materials, for the critical condition, a set of 8 audio clips was selected, each 16 s long. The control condition was created using a sound “quilting” procedure (Overath et al., 2015). To do so, for each language, a 1-1.5 min clip of continuous speech was selected (from the same respective corpus: the Bible stories corpus or the Alice in Wonderland materials). These clips were divided into 30 ms segments, and the segments were then re-ordered in such a way that (a) segment-to-segment cochleogram changes match the original signal as closely as possible, and (b) boundary artifacts are minimized. Eight quilted clips, each 16 s long, were created for each language. Finally, for all the resulting intact and quilted clips, sound intensity was normalized, and each clip was further edited to include brief (1 s long) fade-in and fade-out periods at the beginning and end, respectively. Intensity normalization and fade-in/fade-out editing were performed using the Audacity software (Audacity, 2014) for the Bible stories materials and Matlab (Mathworks, 2020) for the Alice in Wonderland materials. All the materials used in this experiment are available at: https://osf.io/3he75/. The experimental script is available upon request. The full set of materials (64 intact and 64 quilted clips) were divided into 8 sets, which corresponded to 8 scanning runs. Each run consisted of 16 experimental blocks (8 intact clips with one clip per language and 8 quilted clips with one clip per language) and 3 fixation blocks each lasting 12 s, for a total run duration of 292 s (4 min 52 s). Each participant performed 8 runs. The order of conditions (intact, quilted) and languages was counterbalanced across runs and participants.
fMRI data acquisition
Structural and functional data were collected on the whole-body 3 Tesla Siemens Trio scanner with a 32-channel head coil at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1-weighted structural images were collected in 179 sagittal slices with 1 mm isotropic voxels (TR=2,530 ms, TE=3.48 ms). Functional, blood oxygenation level dependent (BOLD) data were acquired using an EPI sequence (with a 90° flip angle and using GRAPPA with an acceleration factor of 2), with the following acquisition parameters: thirty-one 4mm thick near-axial slices, acquired in an interleaved order with a 10% distance factor; 2.1 mm x 2.1 mm in-plane resolution; field of view of 200 mm in the phase encoding anterior to posterior (A>P) direction; matrix size of 96 x 96 voxels; TR of 2,000 ms; and TE of 30 ms. Prospective acquisition correction (Thesen et al., 2000) was used to adjust the positions of the gradients based on the participant’s motion one TR back. The first 10 s of each run were excluded to allow for steady-state magnetization.
fMRI data preprocessing
fMRI data were analyzed using SPM12 (release 7487), CONN EvLab module (release 19b), and other custom MATLAB scripts. Each participant’s functional and structural data were converted from DICOM to NIFTI format. All functional scans were coregistered and resampled using B-spline interpolation to the first scan of the first session (Friston et al., 1995). Potential outlier scans were identified from the resulting subject-motion estimates as well as from BOLD signal indicators using default thresholds in CONN preprocessing pipeline (5 standard deviations above the mean in global BOLD signal change, or framewise displacement values above 0.9 mm; Nieto-Castañón, 2020). Functional and structural data were independently normalized into a common space (the Montreal Neurological Institute [MNI] template; IXI549Space) using SPM12 unified segmentation and normalization procedure (Ashburner & Friston, 2005) with a reference functional image computed as the mean functional data after realignment across all timepoints omitting outlier scans. The output data were resampled to a common bounding box between MNI-space coordinates (−90, −126, −72) and (90, 90, 108), using 2 mm isotropic voxels and 4th order spline interpolation for the functional data, and 1 mm isotropic voxels and trilinear interpolation for the structural data. Last, the functional data were smoothed spatially using spatial convolution with a 4 mm FWHM Gaussian kernel.
First-level modeling
Effects were estimated using a General Linear Model (GLM) in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (HRF) (fixation was modeled implicitly, such that all timepoints that did not correspond to one of the conditions were assumed to correspond to a fixation period). Temporal autocorrelations in the BOLD signal timeseries were accounted for by a combination of high-pass filtering with a 128 s cutoff, and whitening using an AR(0.2) model (first-order autoregressive model linearized around the coefficient a=0.2) to approximate the observed covariance of the functional data in the context of Restricted Maximum Likelihood estimation (ReML). In addition to experimental condition effects, the GLM design included first-order temporal derivatives for each condition (included to model variability in the HRF delays), as well as nuisance regressors to control for the effect of slow linear drifts, subject-motion parameters, and potential outlier scans on the BOLD signal.
Language fROI definition and response estimation
For each participant, functional regions of interest (fROIs) were defined using the Group-constrained Subject-Specific (GSS) approach (Fedorenko et al, 2010), whereby a set of parcels or “search spaces” (i.e., brain areas within which most individuals in prior studies showed activity for the localizer contrast) is combined with each individual participant’s activation map for the same or similar contrast.
To define the language fROIs, we used five parcels derived from a group-level representation of data for the Sentences>Nonwords contrast in 220 independent participants (Figure 2a). These parcels were used in much prior work (e.g., Pereira et al., 2018; Fedorenko et al., 2020; Malik-Moraleda, Ayyash et al., 2022; Hu, Small et al., 2022) and included three regions in the left frontal cortex: two located in the inferior frontal gyrus (LIFG and LIFGorb), and one located in the middle frontal gyrus (LMFG); and two regions in the left temporal cortex spanning the entire extent of the lateral temporal lobe (LAntTemp and LPostTemp). Individual fROIs were defined by selecting within each parcel the 10% of most localizer-responsive voxels based on the t-values for the Sentences>Nonwords contrast. The responses of these individually defined fROIs to the 16 conditions of the critical task were then extracted, averaging across the voxels in each fROI. The responses to the eight quilted control conditions were further averaged in order to obtain a more robust control condition baseline.
Additionally, we examined activations in the right hemisphere (RH) homotopes of the language regions. To define the fROIs in the RH, the left hemisphere parcels were mirror-projected onto the RH to create five homotopic parcels. By design, the parcels cover relatively large swaths of cortex in order to be able to accommodate inter-individual variability. Hence the mirrored versions are likely to encompass RH language regions despite possible hemispheric asymmetries in the precise locations of activations (for validation, see Blank et al., 2014; Mahowald & Fedorenko, 2016; Lipkin et al., 2022; Shain, Paunov, Chen et al., 2022).
Statistical Analyses
Does the polyglots’ language network respond to all their languages?
To evaluate whether different languages of polyglot individuals all engage the same brain network, we examined the responses in the language network to each of the languages in which the individuals had some proficiency (L1, L2, L3 and L4) as well as to the four unfamiliar languages, as discussed in Methods. To do so, for each language, we fitted a model that predicted the BOLD response of the LH language network to that language with participants and fROIs modeled as random intercepts. Dummy coding was used, with the quilted control condition used as the reference level:
In most analyses, we treat the language network as an integrated whole given that these regions a) have similar functional profiles with respect to their selectivity for language (e.g., Fedorenko et al., 2011; Fedorenko & Blank, 2020) and their role in lexico-semantic and combinatorial processing during language comprehension and production (e.g., Fedorenko et al., 2010, 2016, 2020; Blank et al., 2016; Bautista & Wilson, 2016; Hu, Small et al., 2022) and b) exhibit strong inter-region correlations in their activity during naturalistic cognition paradigms (e.g., Blank et al., 2014; Paunov et al., 2019; Braga et al., 2020). However, for some analyses, we additionally fitted models that predicted the BOLD response in each language fROI separately in order to explore potential differences between fROIs.
How does proficiency (comprehensibility) affect the magnitude of response in the polyglots’ language network?
To compare the magnitudes of response in the language network to the four languages for which the participants had different proficiencies (L1-L4), we fitted a model that predicted the BOLD response in the language network with Language (L1, L2, L3, L4) modeled as a fixed effect and participants and fROIs modeled as random intercepts. Effect coding was used for the contrasts in order to make three comparisons: L1 vs. L2, L2 vs. L3, and L3 vs. L4:
How does the polyglots’ language network respond to unfamiliar cognate (sister) languages of the languages familiar to them?
To test whether the processing of cognate languages results in a stronger response than the processing of unfamiliar non-cognate languages (while eliciting a lower response than familiar languages), we fitted a model that predicted the BOLD response in the language network with Condition (Cognates, Familiar, Unfamiliar) modeled as a fixed effect and participants and fROIs modeled as random intercepts. Dummy coding was used, with the Cognates condition used as the reference level:
Does the polyglots’ language network respond less strongly during native language processing than the language network of non-polyglots?
To our knowledge, the only past functional brain imaging investigation that focused on polyglots is Jouravlev et al. (2021) (cf. Amunts et al., 2004; Hervais-Adelman et al., 2018; Palmann & Golestani, 2020 for past anatomical investigations of polyglot brains; see Hervais-Adelman et al., 2015 for an fMRI study that included some polyglot participants). Jouravlev and colleagues reported that the language network of polyglots responds less strongly during native language processing compared to both a set of pairwise-matched controls and a larger group or control participants. In their study, which used a subset of the individuals (13 of the 25) tested in the current study, Jouravlev and colleagues used the sentence condition in the reading-based language localizer task (Fedorenko et al., 2010). Here, we attempted to replicate this finding using responses to auditory language in a larger set of polyglots. To do so, we compared the responses in the language network to passages in one’s native language between our polyglot participants and a relatively large set of bilingual non-polyglot participants (n=86) who had previously completed a similar passage listening task in their native language (reported in Malik-Moraleda, Ayyash, et al. 2022; data available at https://osf.io/b9c4z). We fitted a model that predicted the BOLD response in the language network to the native language condition (relative to fixation), with Group (polyglot, control) modeled as a fixed effect and participants and fROIs modeled as random intercepts:
Results
1. The polyglots’ language network responds to all their languages
As can be seen in Figure 1 (see also Figures S1 and S2), all four familiar languages (L1 (Native language), L2, L3, and L4) elicited activation in the left frontal and temporal cortical areas. Each language elicited a reliable response in the (individually defined) language network relative to the quilted control condition (ps<0.001). Furthermore, cognate languages and unfamiliar noncognate languages also elicited a reliable response relative to the quilted control (ps<0.001). This response profile was also largely apparent in each fROI individually (Figure S3) and it was similar when the language fROIs were defined using the L1 > Quilted contrast from the critical task (Figure S4).
2. Aside from their native language, the polyglots’ language network responds more strongly to languages that are more comprehensible to the polyglot
The response magnitudes varied across languages (Figure 1b), and this across-condition pattern was robust across scanning runs and across participants (Figure S5).
First, the native language condition elicited a relatively low response compared to the other familiar languages (L2, L3, and L4). The response to L1 is reliably lower than the response to L2 (1.11 vs. 1.86% BOLD signal change relative to fixation; β = −0.75, p<0.001). Interestingly, examination of the individual fROI profiles (Figure S3) reveals that this difference is more pronounced in the frontal compared to the temporal fROIs: in the frontal ROIs, the magnitude of response to the native language is comparable to that to unfamiliar non-cognate languages. Although the condition (L1, L2) by ROI group (frontal, temporal) interaction is not significant (p=0.068), this qualitative pattern suggests that in this population, native language processing is carried out more locally within the temporal lobe, with only minimal contribution from the frontal areas (cf. Figures 2b, S8 for evidence of strong frontal responses during native language processing in non-polyglots).
Second, the response to the three non-native familiar languages showed a gradient, with a stronger response to languages for which the polyglot reported higher proficiency. The response to L2 is numerically higher than the response to L3 (1.86 vs. 1.71; β=0.14, n.s.), and the response to L3 is reliably higher than the response to L4 (1.71 vs. 1.38; β =0.32, p=0.02). (This pattern also held when excluding the participants (n=3) who were early bilinguals in their L1 and L2 (Figure S6).)
And third, the unfamiliar cognate languages elicited a lower response than familiar languages (1.17 vs. 1.65% BOLD signal change relative to fixation; β =0.48, p<0.001), in line with the proficiency result observed for the familiar languages. However, in spite of no past experience with any of the unfamiliar languages, cognates elicited a stronger response than the unfamiliar non-cognate languages (1.17 vs. 0.61% BOLD signal change relative to fixation; β =-0.56, p<0.001; this pattern also holds when excluding subjects (n=2) for whom the right cognate language could not be selected based on the material availability or other criteria (Figure S7)).
The pattern of response to the different languages was generally similar in the RH homotopic network (Figure S8), but the magnitudes did not differentiate familiar and unfamiliar languages as clearly due to a relatively stronger response to the unfamiliar languages, compared to the LH language network.
3. Replication of Jouravlev et al. (2021): the polyglots’ language network responds less strongly during native language processing than the language network of non-polyglots
As can be seen in Figure 2 (see also Figure S9), we successfully replicate Jouravlev et al.’s (2021) finding in the auditory modality: polyglots showed a lower response in their language network while listening to their native language than a control group of 86 bilingual nonpolyglots (1.10 vs. 2.45 % BOLD signal change relative to the fixation baseline; β = −1.32, p<0.001; see Figure S10 for the results for the subset of 12 polyglots that were not included in Jouravlev et al., 2021). These results held in each of the five language fROIs (ps<0.001; Table S3).
Discussion
The vast majority of humans grow up speaking or signing one or two languages. However, a small fraction of the population master a large number (sometimes, several dozen) of languages. Although this phenomenon of polyglotism is not new (e.g., Erard, 2012), very few past studies have attempted to characterize the minds and brains of such individuals (e.g., Papagno & Vallar, 1995; Paradis, 2001; Amunts et al., 2004; Hervais-Adelman et al., 2015, 2018). The only prior fMRI study that has investigated the language system of polyglots (Jouravlev et al., 2021) focused on comparing polyglots and non-polyglots. Building on that work, we here took a deeper dive into the polyglots’ language system and examined neural responses to several familiar and unfamiliar languages. We found that i) all languages elicit a reliable response in the language network relative to a perceptually matched control condition, ii) languages that are more comprehensible to the participant elicit stronger responses (except for the native language, which elicits a relatively low response), and iii) languages that are cognates of familiar languages elicit a reliably greater response than unfamiliar non-cognate languages. In line with recent emphasis in the field on robustness and replicability (e.g., Ioannidis, 2005; Button et al., 2013; Ioannidis et al., 2014; Simmons et al., 2015; Poldrack et al., 2017), we also replicate Jouravlev et al.’s (2021) finding of lower responses during native language processing in polyglots compared to nonpolyglots. Below, we elaborate on the significance of these findings and situate them within the broader empirical and theoretical landscape.
All languages of a polyglot activate the language network
A set of frontal and temporal brain areas in the left hemisphere selectively support language processing relative to diverse non-linguistic tasks (e.g., Fedorenko et al., 2011; Monti et al., 2012; Fedorenko et al., 2012; Deen et al., 2015; Pritchett et al., 2018; Jouravlev et al., 2019; Ivanova et al., 2020; Liu et al., 2020; Chen et al., in press; inter alia). Recent work has also established that this network supports the processing of typologically diverse languages (Malik-Moraleda, Ayyash et al., 2022; see e.g., Illes et al., 1999; Chee et al., 1999; Hernandez et al., 2001; Briellmann et al., 2004 for earlier evidence from smaller sets of languages), including constructed languages, like Esperanto (Malik-Moraleda et al., in prep.). This selectivity for language and cross-linguistic universality jointly suggest that some features of the language network—including its ‘location’ with respect to perceptual, motor, and non-linguistic cognitive systems—make it well-suited to support the broadly common features of languages, shaped by biological and cultural evolution. Perhaps not surprisingly then, when an individual acquires multiple languages (multiple sets of mappings between linguistic forms and meanings), those languages are represented and processed by the same neural system. This claim has been previously made based on data from bilingual and trilingual individuals (e.g., Illes et al., 1999a; Chee et al., 2000; Hernandez et al., 2001; Klein et al., 2006; Emmorey & McCullough, 2009; Buchweitz et al., 2009; Videsott et al., 2010; Willms et al., 2011; Honey et al., 2012; see Sebastian et al., 2011 and Sulpizio et al., 2020 for meta-analyses; see Perani & Abutalebi, 2005 and van Heuven & Dijkstra, 2010 for reviews). However, many past studies have used group-averaging analyses (cf. Dehaene et al., 1997), which may overestimate overlap, and paradigms that conflate linguistic and general task demands, making the nature of the overlap difficult to interpret.
We used individual-subject fMRI analyses and an extensively validated language network “localizer” paradigm to investigate neural responses to familiar and unfamiliar languages in a set of polyglots and we found that all languages that we examined (the native language, non-native languages of varying proficiency, and even unfamiliar languages) elicit a reliable response in the left-hemisphere language network relative to a perceptually matched control condition (created using the quilting approach; Overath et al., 2015). Of course, as discussed in the following two sections, languages vary in how strongly they activate the language network, but it appears that at least in polyglots, processing any linguistic input—even if little/no meaning can be extracted from it—engages the language-processing mechanisms (see Malik-Moraleda, Ayyash et al., 2022, for evidence that this pattern also holds in non-polyglot bilingual individuals, with an unfamiliar foreign language condition eliciting a reliable response in the language network, albeit weaker than the response to one’s native language).
It is worth noting that, of course, at a finer spatial scale, the different languages of a bilingual or multilingual individual must be dissociable (otherwise, there would be too much interference among the languages, making comprehension and production impossible). Indeed, a number of fMRI studies have reported reliable decoding of language identity in multivariate patterns of neural activity (Correia et al., 2014; Xu et al., 2017; Van de Putte et al., 2017; see Xu et al., 2021 for a review). That said, the current results suggest that, at least in polyglots, the same set of frontal and temporal language-selective brain areas supports the processing of linguistic input across languages, including both familiar and unfamiliar ones.
The magnitude of the language network’s response scales with comprehensibility (except for the native language)
Setting aside the native language, responses to the other languages (L2-L4, two cognate languages, and two unfamiliar non-cognate languages) appear to scale with comprehension level, i.e., with how much meaning the individual can extract from the input. In particular, we see a) stronger responses to familiar than unfamiliar languages, b) stronger responses to familiar languages of higher proficiency than those of lower proficiency, and c) stronger responses to cognate languages than unfamiliar non-cognate languages. This pattern of response aligns with the idea of proper domains of specialized information processing systems (e.g., Sperber, 1994). In particular, the proper domain of the language network may be interpretable (meaningful) linguistic signals (structured and meaningful word sequences): the better the input fits this domain, the stronger the response in the language areas.
This pattern of stronger responses to more language-like (meaningful and structured) stimuli has been previously reported in experiments that manipulate the information contained in a linguistic signal within a language. For example, Fedorenko et al. (2010; see also Pallier et al., 2011; Bedny et al., 2011; Fedorenko et al., 2016; Shain et al., 2021) have reported that the language areas respond strongly to sentences, weaker to stimuli that have lexical meanings (lists of words) or a syntactic frame (Jabberwocky sentences), and weakest to stimuli that have neither lexical meanings nor a syntactic frame (lists of pseudowords). Here, we see that a similar pattern holds across the languages of a polyglot: the level of response in the language areas is higher for languages that the polyglot can understand better. Interestingly, this pattern holds not only for familiar languages, which the polyglot has studied, but also for unfamiliar cognate languages that are somewhat comprehensible due to overlap in vocabulary with some of their high-proficiency languages. Future work may investigate whether the latter effect is restricted to polyglots—individuals who are highly attuned to languages—or whether it holds for non-polyglots as well (see Gooskens et al., 2017 for behavioral evidence suggesting that this phenomenon may not be restricted to plyglots).
The native language holds a privileged status, at least in polyglots
The native language did not follow the comprehensibility pattern described above: in particular, even though the native language should be maximally comprehensible (at least as comprehensible as L2), it elicited a relatively low response in the language network. Numerically, the magnitude of the response to the native language was comparable to the response to unfamiliar cognate languages. The fact that the response to the native language in the polyglot’s language network deviated from the pattern observed for all other languages aligns with a) the finding reported in Jouravlev et al. (2021), and replicated here, of lower responses to native language in polyglots compared to non-polyglots, and b) with several bodies of work that have argued that one’s native language may be represented and processed in a distinct way from the later acquired languages. For example, one’s native language has been shown to have unique and lasting advantages for processing speech in noisy conditions (e.g., Cutler, 2012; Blanco-Elorrieta et al., 2020); certain words, like taboo words and swear words, elicit stronger responses when presented in one’s native language (e.g., Chen et al., 2015; Sulpizio et al., 2019); and, more generally, linguistic content presented in one’s native language is processed in more emotional / less rational ways, as has been shown in the domains of economic and moral decision making (e.g., Keysar et al., 2012; Costa et al., 2014; Hayakawa et al., 2017).
Whether various previously reported effects of differential language processing in one’s native vs. non-native language relate to the lower magnitude of response to the native language in the left-hemisphere language network, or whether they instead relate to some other aspect(s) of the neural infrastructure of native language processing, remains to be determined. For example, although not the focus of the current paper, native and non-native languages appear to elicit differential responses in the fronto-parietal Multiple Demand (MD) network, which has been linked to executive control and goal-directed behaviors (e.g., Duncan, 2010, 2013; Assem et al., 2020a). In particular, non-native, but not native, language processing, elicits above-baseline responses in this network (Figure S11; see also Malik-Moraleda, Ayyash et al., 2022, for evidence of below-baseline responses to one’s native language in the MD network of nonpolyglot bilinguals). It is possible that the engagement of this network during non-native language processing is what leads to more rational responses to linguistic information, as reported in past studies (e.g., Keysar et al., 2012; Costa et al., 2014; Hayakawa et al., 2017). This hypothesis can be tested by relating the level of response in the MD network to the degree of rationality demonstrated in the economic/moral scenarios, although it would be necessary to first establish that the strength of the response to one’s non-native language in the Multiple Demand network, as well as the relevant behavioral responses, are reliable within individuals and sufficiently variable across individuals.
We also observed interesting differences between language regions in their response to native language processing. As can be seen from Figure S3, it appears that the response to the native language is especially low in the frontal language areas, where it is similar in magnitude to the response to unfamiliar non-cognate languages (cf. the temporal language areas, where it is much closer in magnitude to L2 and L3). This qualitative regional difference suggests that in this population, the native language may be processed more locally, within the temporal component of the language network, with minimal involvement from the frontal language areas. If frontal language areas indeed do not play a significant role in the processing of the native language in polyglots, then interfering with the activity of the inferior frontal areas in these individuals (e.g., via TMS; e.g., Devlin & Watkins, 2007) should have less of an effect on language performance, and damage to the inferior frontal areas should not have strong consequences on language function (see Wilson et al., 2022 for evidence from non-polyglot individuals that frontal brain damage is generally less consequential than temporal damage for aphasia).
Replication of Jouravlev et al.’s (2021) finding of lower responses to language in polyglots compared to non-polyglots
Jouravlev et al. (2021) examined the language network in a set of 17 native-English-speaking polyglot individuals (13 of whom were included in the current study) and reported weaker and less spatially extensive responses during native language processing compared to both a set of carefully matched (on age, gender, and IQ) control participants and a larger set of controls. This effect was spatially selective: polyglots and controls did not differ in the strength or extent of activation in the right-hemisphere homotope of the language network or in two other large-scale networks (the Multiple Demand network and the Default network). Here, we replicated and extended Jouravlev and colleagues’ results. In particular, in contrast to Jouravlev et al. (2021), who relied on a reading-based language localizer paradigm, we examined the magnitude of response to auditory language comprehension and found reliably weaker responses in a set of 25 polyglots (including in the subset of n=12 polyglots who were not included in Jouravlev et al., 2021; Figure S9). Thus, it appears that this finding is robust.
Jouravlev and colleagues interpreted their results as reflecting greater processing efficiency in polyglots and speculated that the difference between polyglots and non-polyglots is experientially driven. In particular, drawing on findings in the domain of motor learning (e.g., e.g., Poldrack et al. 1998; Fletcher et al. 1999; Kelly and Garavan 2005; Bernardi et al. 2013), they hypothesized that language representation and processing may become more efficient as a result of acquiring multiple languages. Some support for this possibility comes from the relationship that we observe between the response to native language and the number of languages that polyglots listed as having some proficiency in, such that individuals with more languages show weaker responses in the language network (Figure S12). However, the possibility that individuals who become polyglots represent and process language more efficiently from the start remains a viable alternative. Distinguishing between these possibilities will require genetic investigations of polyglots and/or longitudinal investigations of individuals as they acquire new languages.
Overall, the current fMRI investigation constitutes the first attempt to characterize the responses to different languages in the language network of polyglots and hyperpolyglots. Using a robust individual-subject approach and identifying language areas using a validated language localizer paradigm (Fedorenko et al., 2010), we uncovered several clear patterns, including a relatively low response to the native language (especially in the frontal language areas) and responses to other languages varying as a function of comprehension level (greater comprehensibility associated with stronger responses). These findings contribute to our general understanding of the human language system and its ability to process familiar and unfamiliar languages in polyglot individuals.
Acknowledgements
We would like to acknowledge the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT, including its support team (Steve Shannon and Atsushi Takahashi). We would also like to thank i) Nancy Kanwisher for supporting pilot investigations for this study (back in 2013-14), ii) Josef Affourtit, Zoya Fan, Matt Siegelman, and Sophia Zhang for help with collecting and organizing the language background questionnaire data, iii) Richard Futrell and Shawn Wen for help with the Bible stories materials, iv) Sam Norman-Haignere for help in creating the quilted control clips, v) Zuzanna Balewski for help with experimental scripts, vi) members of the Fedorenko and Gibson labs for help with fMRI data collection (especially Eghbal Hosseini, Hope Kean, Anna Ivanova, and Lia Washington), vii) Judith Thurman, Yvonne Stapp, Simon Calder, Christian Saunders, Patrick Cox, Jessica Contrera, Gretchen McCulloch, Kim Mills, Melina von Kivvon, and Susan Fitzgerald for covering this work in podcasts and news outlets and thus helping us recruit more polyglots, viii) Michael Erard, Simon Fisher, and Narly Golestani for helpful discussions, and of course, ix) our participants for their time and enthusiasm. This work was supported by research funds to EF from the McGovern Institute for Brain Research, the Brain and Cognitive Sciences Department, the Simons Center for the Social Brain, and the Middleton Professorship. EF was additionally supported by NIH awards R01-DC016607, R01-DC016950, and U01-NS121471.