Abstract
Human listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.
Significance Statement Effective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.
Introduction
In daily conversation, listeners identify 200 words per minute (Tauroza & Allison, 1990) from a vocabulary of ∼40,000 words (Brysbaert et al., 2016). This produces a substantial cognitive challenge: they must recognise more than 3 words per second and constantly select from sets of similar words which may be transiently ambiguous (e.g. hijack and hygiene both begin with /haid3/). Although it is recognised that humans and machines achieve word recognition by combining the current speech input with the prior probability of words using Bayes theorem (Norris & McQueen, 2008; Davis & Scharenborg, 2016), the underlying neural implementation of Bayesian perceptual inference remains unclear (Aitchison & Lengeyl, 2017).
Here, we compare two neural mechanisms for spoken word recognition. In competitive-selection accounts (e.g. TRACE, McClelland & Elman, 1986, Figure1A), word recognition is achieved through within-layer lateral inhibition between neural units that represent similar words. By this view, hijack and hygiene compete for identification such that an increase in probability for one word inhibits units representing other similar sounding words. Conversely, predictive-selection accounts (e.g. Predictive-Coding, Davis & Sohoglu, 2020) suggests that word recognition is achieved through computations of prediction error (Figure1D). On hearing transiently ambiguous speech like /haid3/, higher-level units representing matching words make contrasting predictions (/æk/ for hijack, /i:n/ for hygiene). Prediction error – the difference between sounds predicted and actually heard – provides a signal to update word probabilities such that the correct word can be selected (Gagnepain et al, 2012).
In this study, we manipulated the prior probability of words through the competitor priming effect (Monsell & Hirsh, 1998; cf. Marsolek, 2008), by which the recognition of a word (hygiene) is delayed if a similar word (hijack) has been heard earlier. This delay could either be due to increased lateral inhibition (competitive-selection) or greater prediction error (predictive-selection). Thus, similar behavioural effects of competitor priming are predicted by two distinct neural computations (cf. Spratling, 2008). To distinguish these two theories, it is critical to investigate neural data that reveals the direction, timing and level of processing at which competitor priming modulates neural responses. Existing neural data remains equivocal with some evidence consistent with competitive-selection (Bozic et al., 2010; Okada & Hickok, 2006), predictive-selection (Gagnepain et al, 2012), or both mechanisms (Brodbeck et al., 2018; Donhauser et al., 2019). We followed these previous studies in correlating two computational measures with neural activity: lexical entropy (for competitive-selection accounts) and segment prediction error (or phoneme surprisal as in other studies, for predictive-selection theories).
Here, we used magnetoencephalography (MEG) to record the location and timing of neural responses during spoken words recognition in a competitor priming experiment. Pseudowords (e.g. hijure) were included in our analysis to serve as a negative control for competitor priming, since existing research found that pseudowords neither produce nor show this effect (Monsell & Hirsh, 1998). We compared items with the same initial segments (e.g. words hygiene, hijack, pseudowords hijure, higent all share /haid3/) and measured neural and behavioural effects concurrently so as to link these two effects for single trials.
While lexical entropy and prediction error are highly correlated for natural speech, this competitor priming manipulation allows us to make differential predictions under the two theories as illustrated in Figure 1. Specifically: (1) before the deviation point (DP, the point at which similar-sounding words diverge), competitor priming will increase lexical entropy and hence neural responses in competitive-selection theories (Figure1B,C Pre-DP). However, prediction error, as supported by predictive-selection accounts, will be reduced for pre-DP segments, since heard segments are shared and hence more strongly predicted (Figure1E,F Pre-DP). (2) After the DP, predictive-selection but not competitive-selection accounts propose that pseudowords should evoke greater neural signals, since they evoke maximal prediction errors (Figure1E,F Pseudoword panel, Post-DP). (3) Furthermore, in predictive-selection but not competitive-selection theories, competitor priming is associated with an increased response to post-DP segments due to enhanced prediction error caused by mismatch between primed words (predictions) and heard speech (Figure1E,F Word panel, Post-DP).
Illustration of neural predictions based on competitive-selection and predictive-selection models respectively for recognition of a word (hygiene) or pseudoword (hijure) that is unprimed or primed by a similar-sounding word (hijack) or pseudoword (higent). A. In a competitive-selection model, such as TRACE (McClelland & Elman, 1986), word recognition is achieved through within-layer lexical competition. B. Illustration of the competitive-selection procedure for word (e.g. hygiene) and pseudoword (e.g. hijure) recognition. Phoneme input triggers the activation of multiple words beginning with the same segments, which compete with each other until one word is selected. No word can be selected when hearing a pseudoword, though it would be expected that lexical probability (although not lexical entropy) should be greater for words than for pseudowords. C. Illustration of neural predictions based on lexical entropy. Lexical entropy gradually reduces to zero as more speech is heard. Before the deviation point (hereafter DP) at which the prime (hijack) and target (hygiene) diverge, these items are indistinguishable, and competitor priming should transiently increase lexical entropy (shaded area). After the DP, competitor priming should not affect entropy since prime and target words can be distinguished. D. In a predictive-selection model such as the Predictive-Coding account (PC, Davis & Sohoglu, 2020), words are recognised by minimising prediction error, which is calculated by subtracting the predicted segments from the current sensory input. E. Illustration of the predictive-selection procedure during word (e.g. hygiene) and pseudoword (e.g. hijure) recognition. Speech input evokes predictions for the next segment (based on word knowledge as in B), which is then subtracted from the speech input and used to generate prediction errors that update lexical predictions (+ shows confirmed predictions that increase lexical probability, – shows disconfirmed predictions that decrease lexical probability). F. Illustration of neural predictions based on segment prediction error. Before the DP, priming of initial word segments should strengthen predictions and reduce prediction error. There will also be greater mismatch between predictions and heard speech for competitor-primed words and hence primed words should evoke greater prediction error than unprimed words (shaded area). This increased prediction error should still be less than that observed for pseudowords, which should evoke maximal prediction error regardless of competitor priming due to their post-DP segments being entirely unpredictable.
Materials and Methods
Participants
Twenty-four (17 female, 7 male) right-handed, native English speakers were tested after giving informed consent under a process approved by the Cambridge Psychology Research Ethics Committee. This sample size was selected based on previous studies measuring similar neural effects with the same MEG system (Gagnepain et al. 2012; Sohoglu & Davis, 2016; Sohoglu et al. 2012, etc.). All participants were aged 18-40 years and had no history of neurological disorder or hearing impairment based on self-report. Two participants’ MEG data were excluded from subsequent analyses respectively due to technical problems and excessive head movement, resulting in 22 participants in total. All recruited participants received monetary compensation.
Experimental Design
To distinguish competitive- and predictive-selection accounts, we manipulated participants’ word recognition process by presenting partially mismatched auditory stimuli prior to targets. Behavioural responses and MEG signals were acquired simultaneously. Prime and target stimuli pairs form a repeated measures design with two factors (lexicality and prime type). The lexicality factor has 2 levels: word and pseudoword, while the prime type factor contains 3 levels: unprimed, primed by same lexical status, primed by different lexical status. Hence the study is a factorial 2 ⨯ 3 design with 6 conditions: unprimed word (hijack), word-primed word (hijack-hygiene), pseudoword-primed word (basef-basis), unprimed pseudoword (letto), pseudoword-primed pseudoword (letto-lettan), word-primed pseudoword (boycott-boymid). Prime-target pairs were formed only by stimuli sharing the same initial segments. Items in the two unprimed conditions served as prime items in other conditions and they were compared with target items (Figure 2A).
Experimental design and stimuli. A. Four different types of prime-target pairs. Each pair was formed by two stimuli from the same quadruplet, separated by between 20 to 80 trials of items that do not share the same initial segments. B. Lexical decision task. Participants made lexicality judgments to each item they heard via a button-press. The response time was recorded from the onset of the stimuli. As shown, items within each quadruplet are repeated after a delay of 1-4 minutes following a number of other intervening stimuli. C. Stimuli within the same quadruplet have identical onsets in STRAIGHT parameter space (Kawahara, 2006) and thus only diverge from each other after the deviation point (DP). MEG responses were time-locked to the DP. D. Stimuli length histogram.
The experiment used a lexical decision task (Figure 2B) implemented in MATLAB through Psychtoolbox-3 (Kleiner et al. 2007), during which participants heard a series of words and pseudowords while making lexicality judgments to each stimulus via button-press responses. 344 trials of unique spoken items were presented every ∼3 seconds in two blocks of 172 trials, each block lasting approximately 9 minutes. Each prime-target pair was separated by 20 to 80 trials of items that do not start with the same speech sounds, resulting in a relatively long delay of 1-4 minutes between presentations of phonologically-related items. This delay was chosen based on Monsell and Hirsh (1998), who suggest that it prevents strategic priming effects (Norris et al. 2002). Stimuli from each of the quadruplets were Latin-square counterbalanced across participants, i.e. stimulus quadruplets that appeared in one condition for one participant were allocated to another condition for another participant. The stimulus sequences were pseudo-randomised using Mix software (van Casteren & Davis, 2006), so that the same type of lexical status (word/pseudoword) did not appear successively on more than 4 trials.
Stimuli
The stimuli consisted of 160 sets of four English words and pseudowords, with durations ranging from 372 to 991 ms (M = 643, SD = 106). Each set contained 2 words (e.g. letter, lettuce) and 2 phonotactically-legal pseudowords (e.g. letto, lettan) that share the same initial segments (e.g. /let/) but diverge immediately afterwards.
We used polysyllabic word pairs (Msyllable = 2.16, SDsyllable =0.36) instead of monosyllabic ones in our experiments so as to identify a set of optimal lexical competitors that are similar to their prime yet dissimilar from all other items. All words were selected from the CELEX database (Baayen et al., 1993). Their frequencies were taken from SUBTLEX UK corpus (Van Heuven et al., 2014) and restricted to items under 5.5 based on log frequency per million word (Zipf scale, Van Heuven et al., 2014). In order to ensure that any priming effect was caused purely by phonological but not semantic similarity, we also checked that all prime and target word pairs have a semantic distance of above 0.7 on a scale from 0 to 1 based on the Snaut database of semantic similarity scores (Mandera et al., 2017), such that morphological relatives (e.g. darkly/darkness) were excluded.
All spoken stimuli were recorded onto a Marantz PMD670 digital recorder by a male native speaker of southern British English in a sound-isolated booth at a sampling rate of 44.1 kHz. Special care was taken to ensure that shared segments of stimuli were pronounced identically (any residual acoustic differences were subsequently eliminated using audio morphing as described below).
The point when items within each quadruplet begin to acoustically differ from each other is the deviation point (hereafter DP, Figure 2C). Pre-DP length ranged from 150 to 672 ms (M = 353, SD = 96), while post-DP length ranged from 42 to 626 ms (M = 290, SD = 111) (Figure 2D). Epochs of MEG data were time-locked to the DP. Using phonetic transcriptions (phonDISC) in CELEX, the location of the DP was decided based on the phoneme segment at which items within each quadruplet set diverge (Mseg=3.53, SDseg=0.92). To determine when in the speech files corresponds to the onset of the first post-DP segment, we aligned phonetic transcriptions to corresponding speech files using the WebMAUS forced alignment service (Kisler et al., 2017; Schiel, 1999). In order to ensure that the pre-DP portion of the waveform was acoustically identical, we cross-spliced the pre-DP segments of the 4 stimuli within each quadruplet and conducted audio morphing to combine the syllables using STRAIGHT (Kawahara, 2006) implemented in MATLAB. This way, phonological co-articulation in natural speech was reduced to the lowest level possible at the DP, hence any cross-stimuli divergence evoked in neural responses can only be caused by post-DP deviation.
Behavioural Data Analyses
Response times (RTs) were inverse-transformed so as to maximise the normality of the data and residuals; Figures report untransformed response times for clarity. Inverse-transformed RTs and error rates were analysed using linear and logistic mixed-effect models respectively using the lme4 package in R (Bates et al. 2014). Lexicality (word, pseudoword) and prime type (unprimed, primed by same lexical status, primed by different lexical status) were fixed factors, while participant and item were random factors. Maximal models accounting for all random effects were attempted wherever possible, but reduced random effects structures were applied when the full model did not converge (Barr et al., 2013). Likelihood-ratio tests comparing the full model to a nested reduced model using the Chi-Square distribution were conducted to evaluate main effects and interactions. Significance of individual model coefficients were obtained using t (reported by linear mixed-effect models) or z (reported by logistic mixed-effect models) statistics in the model summary. One-tailed t statistics for RTs are also reported for two planned contrasts: (1) word-primed versus unprimed conditions for word targets, and (2) word-primed versus pseudoword-primed conditions for word targets.
When assessing priming effects, we excluded data from target trials in which the participant made an error in the corresponding prime trial, because it is unclear whether such target items will be affected by priming given that the prime word was not correctly identified.
MEG Data Acquisition and Processing
Magnetic fields were recorded with a VectorView system (Elekta Neuromag) which contains a magnetometer and two orthogonal planar gradiometers at each of 102 locations within a hemispherical array around the head. Although electric potentials were recorded simultaneously using 68 Ag-AgCl electrodes according to the extended 10-10% system, these EEG data were excluded from further analysis due to excessive noise. All data were digitally sampled at 1 kHz. Head position were monitored continuously using five head-position indicator (HPI) coils attached to the scalp. Vertical and horizontal electro-oculograms were also recorded by bipolar electrodes. A 3D digitizer (FASTRAK; Polhemus, Inc.) was used to record the positions of three anatomical fiducial points (the nasion, left and right preauricular points), HPI coils and evenly distributed head points for use in source reconstruction.
MEG Data were preprocessed using the temporal extension of Signal Source Separation in MaxFilter software (Elekta Neuromag) to reduce noise sources, normalise the head position over blocks and participants to the sensor array and reconstruct data from bad MEG sensors. Subsequent processing was conducted in SPM12 (https://www.fil.ion.ucl.ac.uk/spm/) and FieldTrip (http://www.fieldtriptoolbox.org/) software implemented in MATLAB. The data were epoched from −1100 to 2000ms time-locked to the DP and baseline corrected relative to the −1100 to −700ms prior to the DP, which is a period before the onset of speech for all stimuli (Figure 1C). Low-pass filtering to 40 Hz was conducted both before and after robust averaging across trials (Litvak et al., 2011). A time window of −150 to 0ms was defined for pre-DP comparisons based on the shortest pre-DP stimuli length. A broad window of 0 to 1000ms was defined for post-DP comparisons, which covered the possible period for lexicality and prime effects. After averaging over trials, an extra step was taken to combine the gradiometer data from each planar sensor pair by taking the root-mean square (RMS) of the two amplitudes.
After converting the sensor data into 3D images (2D sensor x time), F tests for main effects were performed across sensors and time (the term “sensors” here denotes interpolated scalp locations in 2D image space). Reported effects were obtained with a cluster-defining threshold of p < .001, and significant clusters identified as those whose extent (across space and time) survived p < 0.05 FWE-correction using Random Field Theory (Kilner & Friston, 2010). Region of interest (ROI) analyses for the priming effect were then conducted over scalp locations and time windows that encompassed significant clusters for the (orthogonal) lexicality effect. When plotting waveforms and topographies, data are shown for sensors nearest to the critical points in 2D image space.
Apart from the two planned contrasts mentioned above (see Behavioural Data Analyses), which were applied to post-DP analysis, one-tailed t statistics was also reported on the pre-DP planned contrast between unprimed and word-primed items.
Source Reconstruction
In order to determine the underlying brain sources of the sensor-space effects, source reconstruction was conducted using SPM’s Parametric Empirical Bayes framework (Henson et al., 2011). Forward models for each sensor type (magnetometers and gradiometers) were based on a template brain normalised to each participant’s T1-weighted structural MRI (sMRI) scan obtained on a 3T Prisma system (Siemens, Erlangen, Germany) using an MPRAGE sequence. Fiducials, sensor positions and head-shape points (with nose points removed) in MEG space were projected onto sMRI space to co-register the two coordinate systems. The data were then inverted using the ‘IID’ solution, equivalent to classical minimum norm, fusing the magnetometer and gradiometer data (Henson et al, 2011). This inversion was performed using a 1-40 Hz frequency band with a time window of −150 to 0ms for pre-DP analysis and 100 to 800ms for post-DP analysis. This post-DP time window was defined by the temporal extent between the approximate lexicality diverging point (between words and pseudowords) (∼100ms) and the approximate average ending latency of the significant cluster (∼800ms) shown in gradiometers and magnetometers. The total energy within the pre- and post-DP time windows was then written to 3D images in MNI space. As in sensor space, ROI analyses were conducted over significant scalp locations and time windows from the most reliable lexicality cluster. Factorial ANOVA were carried out on main effects and one-tailed paired t-tests on planned contrasts (see MEG Data Acquisition and Processing).
Results
Behaviour
Response Times
As shown in Figure 3A, factorial analysis of lexicality (word, pseudoword) and prime type (unprimed, primed by same lexical status, primed by different lexical status) indicated a significant main effect of lexicality, in which RTs for pseudowords were significantly longer than for words, X2(3) = 23.60, p < .001. In addition, there was a significant interaction between lexicality and prime type, X2(2) = 10.73, p = .005. This interaction was followed up by two separate one-way models for words and pseudowords, which showed a significant effect of prime type for words, X2(2) = 10.65, p = .005, but not for pseudowords, X2(2) = 1.62, p = .445. Consistent with the competitor priming results from Monsell and Hirsh (1998), words that were primed by another word sharing the same initial segments were recognised significantly more slowly than unprimed words (for mean raw RTs see Fig 3A), β = 0.02, SE = 0.01, t(79.69) = 3.33, p < .001, and more slowly than pseudoword-primed words, β = 0.02, SE = 0.01, t(729.89) = 2.37, p = .018. As mentioned earlier (see Introduction), both competitive- and predictive-selection models predicted longer response times to word-primed target words compared to unprimed words, it is hence critical to distinguish the two accounts through further investigation of the MEG responses.
Response time results (A) and accuracy results (B) of the lexical decision task. Bars are color-coded by lexicality and prime type on the x axis (words, blue frame; pseudowords, orange frame; unprimed, no fill; primed by same lexicality, consistent fill and frame colors; primed by different lexicality, inconsistent fill and frame colors). Bars show the subject grand averages, error bars represent ± within-subject SE, adjusted to remove between-subjects variance (Cousineau, 2005). Statistical significance is shown based on generalised linear mixed-effects regression: * p<0.05, * * p<0.01, * * * p<0.001. Statistical comparisons shown with solid lines indicate the lexicality by prime-type interaction and main effects of prime-type for each lexicality, whereas comparisons with broken lines indicate the significance of pairwise comparisons.
Accuracy
There was a trend towards more lexical decision errors in response to words than to pseudowords, although this lexicality effect was marginal, X2(3) = 7.31, p = .063. The error rates for words and pseudowords were also affected differently by priming, as indicated by a significant interaction between lexicality and prime type, X2(2) = 6.08, p = .048. Follow-up analyses using two separate models for each lexicality type showed there was a main effect of prime type for words, X2(2) = 13.95, p < .001, but not for pseudowords, X2(2) = 1.93, p = .381. Since we had not anticipated these priming effects on accuracy, post-hoc pairwise z tests were Bonferroni corrected for multiple comparisons. These showed that pseudoword priming reliably increased the error rate compared to the unprimed condition, β = 1.68, SE = 0.54, z = 3.14, p = .005, and to the word-primed condition, β = 2.74, SE = 0.89, z = 3.07, p = .007. Although no specific predictions on accuracy were made a priori by either competitive- or predictive-selection model, it is worth noting that participants might have expected pseudowords to be repeated given the increased error rate of responses to pseudoword-primed target words.
MEG
In order to explore the impact of lexicality and competitor priming on neural responses to critical portions of speech stimuli, both before and after they diverge from each other, MEG responses were time-locked to the DP. All reported effects are family-wise error (FWE)-corrected at cluster level for multiple comparisons across scalp location and time at a threshold of p < 0.05.
Pre-DP analyses
We assessed neural responses before the DP, during which only the shared speech segments have been heard and hence the words and pseudowords in each stimulus set are indistinguishable. Since there could not have been any effect of lexical status pre-DP, only prime type effects were considered in this analysis. Predictive- and competitive-selection accounts make opposite predictions for pre-DP neural signals evoked by word-primed items compared to unprimed items. We therefore conducted an F-test for neural differences between these two conditions across the scalp and source spaces over a time period of −150 to 0ms before the DP. Two significant clusters were found in gradiometers respectively over the mid-left scalp locations at −21 to −14ms (Figure 4A) and posterior-right scalp locations at −9ms (Figure 4B). In both of these locations, unprimed items evoked significantly greater neural responses than word-primed items. We did not find any cluster showing stronger neural responses for word-primed items than unprimed items and no clusters survived correction for multiple comparisons for magnetometer responses or for analysis in source space.
Pre-DP results. A & B. Pre-DP response difference between items that are unprimed and primed by a word in MEG gradiometer sensors within −150 to 0ms (a time window at which words and pseudowords are indistinguishable). The topographic plots show F-statistics for the entire sensor array with the scalp locations that form two statistically significant clusters highlighted and marked with black dots. Waveforms represent MEG response averaged over the spatial extent of the significant cluster shown in the topography. C. ROI analysis of neural responses evoked by unprimed and primed items averaged over the same pre-DP time period of −150-0ms but across gradiometer sensor locations which showed the post-DP pseudoword>word lexicality effect (see Figure 5A). Bars are color-coded by prime type on the x axis (unprimed items, no fill; word-primed items, blue; pseudoword-primed items, orange; black frame indicates that words and pseudowords are indistinguishable). All error bars represent ± within-participant SE, adjusted to remove between-participant variance (Cousineau, 2005). Statistical significance: * p<0.05.
To further examine these results, we also conducted ROI analysis of gradiometer signals evoked by unprimed and primed items averaged over the same −150 to 0ms pre-DP time window but across the set of scalp locations that showed the post-DP lexicality effect at which pseudowords elicited greater neural responses than words (see Figure 5A). As shown in Figure 4C, the results indicated that unprimed items elicited significantly stronger neural responses than word-primed items, t(21) = 2.41, p = .013, consistent with the whole-brain analysis. In particular, the mid-left cluster shown in panel A partially overlaps with the post-DP pseudoword>word cluster. The direction and location of these pre-DP neural responses are in accordance with the predictive-selection account and inconsistent with the competitive-selection account. A surprising finding is that post-hoc analysis also showed greater neural responses evoked by unprimed items than pseudoword-primed items, t(21) = 2.69, p = .014, although we had not predicted these effects from pseudoword primes.
Post-DP results showing lexicality effects and corresponding ROI responses evoked by conditions of interest. A & B. Post-DP lexicality effects in MEG gradiometer and magnetometer sensors. The topographic plots show the statistically significant cluster with a main effect of lexicality (pseudoword > word). Waveforms represent MEG response averaged over the spatial extent of the significant cluster shown in the topography. The width of waveforms represents ± within-participant SE, adjusted to remove between-participants variance (Cousineau, 2005). C. Statistical parametric map showing the cluster (pseudoword > word) rendered onto an inflated cortical surface of the Montreal Neurological Institute (MNI) standard brain thresholded at FWE-corrected cluster-level p < 0.05. D, E & F. Post-DP ROI ANOVA on neural signals and source strength evoked by conditions of interest averaged over the time window and scalp locations of the significant cluster shown in panel A, B & C. Bars are color-coded by lexicality and prime type on the x axis (words, blue frame; pseudowords, orange frame; unprimed, no fill; primed by same lexicality, consistent fill and frame colors; primed by different lexicality, inconsistent fill and frame colors). All error bars represent ± within-participant SE, adjusted to remove between-participants variance (Cousineau, 2005). Statistical significance from ANOVAs: * p<0.05, * * p<0.01, * * * p<0.001. Statistical comparisons shown with solid lines indicate the lexicality by prime-type interaction and main effects of prime-type for each lexicality, whereas comparisons with broken lines indicate the significance of planned pairwise comparisons.
Post-DP analyses
We then examined the post-DP response differences between words and pseudowords (lexicality effect). The gradiometer sensors showed a significant cluster over the left side of the scalp at 410-726ms post-DP (Figure 5A). In this cluster, pseudowords evoked a significantly stronger neural response than words. Similarly, magnetometer sensors also detected a significant left-hemisphere cluster at 398-903ms post-DP (Figure 5B) showing the same lexicality effect. We did not find any significant cluster in which words evoked greater neural responses than pseudowords. These results are consistent with findings from Gagnepain and colleagues (2012). To locate the likely neural source of the effects found in sensor space, we conducted source reconstruction by integrating gradiometers and magnetometers. As shown in Figure 5C, results from source space showed that neural generators of the lexicality effect were estimated to lie within the superior temporal gyrus (STG, peak at x = −56, y = −28, x = −10; x = −54, y = −26, z = 4 and x = −58, y = – 14, z = 16). This location, and direction of response, is consistent with a sub-lexical (e.g. phonemic) process being modulated by lexicality; in line with the predictive-selection account.
Next, we investigated whether the neural responses that were modulated by lexicality were also influenced by prime type, by testing the interaction between prime type and lexicality on data averaged over the time window and the scalp locations of the significant cluster shown in panel A and B (Figure 5D & E). This interaction was significant in both gradiometer, F(1.96, 41.11) = 7.30, p = .002, and in magnetometer, F(1.90, 39.99) = 5.80, p = .007, data. Follow-up tests showed that there was a significant effect of prime type for words, F(1.93, 40.55) = 8.01, p = .001 (gradiometers), F(1.81, 37.96) = 5.61, p = .009 (magnetometers), such that neural signals evoked by word-primed words were significantly stronger than those evoked by unprimed words, t(21) = 2.22, p = .019 (gradiometers), t(21) = 3.33, p = .002 (magnetometers), and pseudoword-primed words, t(21) = 3.70, p < .001 (gradiometers), t(21) = 2.64, p = .008 (magnetometers). In contrast, there was no reliable main effect of prime type for pseudowords, F(1.94, 40.80) = 0.67, p = .514 (gradiometers), F(1.79, 37.61) = 0.80, p = .446 (magnetometers). The corresponding tests performed on the source-reconstructed power within the lexicality ROI of suprathreshold voxels (Figure 5F) did not show a reliable interaction effect between lexicality and competitor priming, F(1.47, 30.91) = 1.06, p = .34. Nevertheless, consistent with the sensor space results, source power did show a significant effect of prime type for words, F(1.59, 33.49) = 4.21, p = .031, but not pseudowords, F(1.68, 35.36) = 1.02, p = .359. Pairwise comparisons also indicated that word-primed words evoked greater source strength than unprimed words, t(21) = 2.28, p = .017, and pseudoword-primed words, t(21) = 2.20, p = .020. Thus, in line with behavioural results, neural responses evoked by words and pseudowords were also influenced differently by prime type. Critically, competitor priming modulated the post-DP neural responses evoked by words, but not those evoked by pseudowords, and these effects were localised to STG regions that plausibly contribute to sub-lexical processing of speech. This matches the pattern of responses proposed in the predictive-selection model (see Figure 1F).
To ensure that other response patterns were not overlooked, we also investigated whether there was any lexicality by prime-type interaction at other locations across the scalp and source spaces, and during other time periods. As shown in Figure 6A, a significant cluster of Gradiometers at midline posterior scalp locations were found at 405-427ms post-DP, in which the effect of priming was significantly different for words and pseudowords. Figure 6B shows gradiometer signals evoked by conditions of interest averaged over the spatial and temporal extent of the significant cluster in panel A. To explore this profile, we computed an orthogonal contrast to assess the overall lexicality effect (the difference between words and pseudowords), and the result was marginal, F(1.00, 21.00) = 3.50, p = .075. The effect of prime type was marginally significant for words, F(1.89, 39.78) = 3.08, p = .060, but significant for pseudowords, F(1.80, 37.85) = 7.14, p = .003. The location and pattern of this interaction cluster were dissimilar to those predicted by either competitive- or predictive-selection theories and no cluster survived correction in magnetometer sensors or source space.
Post-DP results showing lexicality-by-priming interaction effects in MEG gradiometers. A. The topographic plot shows F-statistics for the statistically significant cluster that showed an interaction between lexicality and prime type. Waveforms represent gradiometer responses averaged over the spatial extent of the significant cluster shown in the topography. B. Gradiometer signals evoked by conditions of interest averaged over temporal and spatial extent of the significant cluster in panel A. All error bars represent ± within-participant SE, adjusted to remove between-participants variance (Cousineau, 2005). Statistical significance: * * p<0.01. The statistical comparison lines indicate main effects of prime type for each lexicality. The lexicality by prime-type interaction is statistically reliable as expected based on the defined cluster.
Linking neural and behavioural effects
To further examine the relationship between neural and behavioural response differences attributable to competitor priming or lexicality, we conducted a single-trial regression analyses using linear mixed-effect models that account for random intercepts and slopes for participants and stimuli sets (grouped by their initial segments). We calculated behavioural RT differences and neural MEG differences caused by: (1) lexicality. i.e. the difference between pseudoword and word trials (collapsed over primed and unprimed conditions) and (2) competitor priming, i.e. the difference between unprimed and word-primed word trials, with MEG signals averaged over the spatial and temporal extent of the post-DP pseudoword>word cluster seen in sensor space and the STG peak voxel, x = −54, y = −26, z = 4, in source space (see Figure 5). We then assessed the relationship between these behavioural and neural difference effects in linear mixed-effect regression of single trials, with differences in RTs as the independent variable and differences in MEG responses as the dependent variable.
As shown in Figure 7A, we observed a significant positive relationship between RTs and magnetometers on lexicality difference (β = 0.11, SE = 0.01, t(23.31) = 7.77, p < .001), although associations between RTs and gradiometers or source response were not significant (gradiometers: β = −0.0001, SE = 0.0002, t(97.47) = −0.77, p = .444; source: β = −0.000002, SE = 0.000001, t(27.49) = −1.22, p = .234). These observations from magnetometers indicated that slower lexical decision times evoked by pseudowords were associated with greater neural responses. Furthermore, the intercept parameter for the magnetometers model was significantly larger than zero, β = 37.58, SE = 5.72, t(23.09) = 6.57, p < .001. We can interpret this intercept as the neural difference that would be predicted for trials in which there was no delayed response to pseudowords compared to words. The significant intercept indicated a baseline difference in neural responses to words and pseudowords, even in the absence of any difference in processing effort (as indexed by lexical decision RTs). This suggested the engagement of additional neural processes specific to pseudowords regardless of the behavioural effect (cf. Taylor et al., 2014).
Single-trial linear mixed-effect models which accounted for random intercepts and slopes for participants and stimuli sets (grouped by initial segments) were constructed to compute the relationship between RTs and magnetometers on (A) lexicality difference (i.e. between pseudowords and words, collapsed over unprimed and primed conditions) and (B) competitor priming difference (i.e. between word-primed word and unprimed word conditions). Magnetometer responses were averaged over the time window and scalp locations of the significant post-DP pseudoword>word cluster (see Figure 5). β1 refers to the model slope, β0 refers to the model intercept. Statistical significance: * * * p<0.001.
Figure 7B showed another significant positive relationship between RTs and magnetometers on competitor priming difference (β = 0.15, SE = 0.02, t(38.85) = 7.89, p < .001), while relationships between RTs and gradiometers or source response were again not significant (gradiometers: β = 0.0004, SE = 0.0003, t(20.61) = 1.08, p = .293; source: β = −0.0000009, SE = 0.000002, t(15.04) = −0.47, p = .646). Interestingly, unlike for the lexicality effect, the intercept in this competitor priming magnetometers model did not reach significance (β = 12.88, SE = 7.27, t(21.33) = 1.77, p = .091). This non-significant intercept suggested that if word-primed words did not evoke longer RTs than unprimed words, magnetometer signals would not be reliably different between the two conditions either. Hence, consistent with predictive-selection accounts, the increased post-DP neural responses in the STG caused by competitor priming was both positively linked to and mediated by longer response times.
Discussion
In this study, we distinguished different implementations of Bayesian perceptual inference by manipulating the prior probability of spoken words and examining the pattern of neural responses. We replicated the competitor priming effect such that a single prior presentation of a competitor word (prime, e.g. hijack) delayed the recognition of a word with the same initial sounds (target, e.g. hygiene), whereas this effect was not observed when the prime or target was a pseudoword (e.g. hijure). Armed with this behavioural evidence, we used MEG data to test the neural bases of two Bayesian theories of spoken word recognition.
Competitive- vs predictive-selection
Competitive-selection accounts propose that word recognition is achieved through direct inhibitory connections between representations of similar candidate words (e.g. McClelland & Elman, 1986). Priming boosts the activation of heard words and increases lateral inhibition applied to neighbouring words, which delays their subsequent identification (Monsell & Hirsh, 1998). The effect of competitor priming is to increase lexical uncertainty, and hence lexical-level neural responses, until later time points when target words can be distinguished from the competitor prime (see Figure 1C). In contrast, predictive-selection accounts propose that word recognition is achieved by subtracting predicted speech from heard speech and using these computations of prediction error to update lexical probabilities (see Gagnepain, et al, 2012; Davis & Sohoglu, 2020). By this view, predictions for segments that are shared between competitor primes and targets (pre-DP segments, like /haid3/ for hijack and hygiene) will be enhanced after presentation of prime words. Thus competitor priming will reduce the magnitude of prediction error, and hence neural responses before the DP (Figure 1F). Only when speech diverges from predictions (post-DP segments, such as /i:n/ in hygiene) will competitor-primed words evoke greater prediction error, leading to an increased neural response in brain areas involved in pre-lexical (e.g. phonemic) processing of speech that have been shown to represent prediction error (Blank et al., 2018; Blank & Davis, 2016).
We tested these predictions for the direction and timing of neural responses associated with competitor priming using MEG data which showed opposite neural effects of competitor priming before and after the DP. In the pre-DP period, consistent with predictive-selection but contrary to competitive-selection, we saw decreased neural responses for word-primed items compared to unprimed items. The initial, shared segments between prime (hijack) and target (hygiene) words evoke a reduced response during early time periods in line with a reduction in prediction error. However, during the post-DP period, we found competitor primed words evoked stronger neural responses than unprimed words in exactly the same scalp locations and time periods that show increased responses to pseudowords (hijure) compared to words. These post-DP response increases are in line with increased processing difficulty for competitor-primed words and for pseudowords being due to an increase in prediction error. Thus, the time course of the neural effects of competitor priming – with reduced neural responses pre-DP and increased neural responses post-DP – closely resembles the expected changes in prediction error shown in Figure 1F based on predictive-selection mechanisms.
On top of the direction and timing of neural responses, effects of lexicality and competitor priming localise to the superior temporal gyrus (STG). This is a brain region that has long been associated with lower-level sensory processing of speech (see Yi et al., 2019, for a review). Our observation of increased responses to pseudowords in this region is in accordance with source-localised MEG findings (Gagnepain et al., 2012; Shtyrov et al., 2012) and evidence from a meta-analysis of PET and fMRI studies (Davis & Gaskell, 2009). This location is also consistent with the proposal that lexical influences on segment-level computations (rather than lexical-level computations themselves) produce reliable neural differences between words and pseudowords. We take this finding as further evidence in favour of computation of segment prediction error as a critical mechanism underlying word identification. Increased prediction error for pseudowords has also been linked to encoding of novel lexical items in theoretical work (Davis & Sohoglu, 2020) and in studies of word learning in young children (Ylinen, et al 2017).
We further show using regression analyses that neural (MEG) and behavioural (RT) effects of lexicality and competitor priming are linked on a trial-by-trial basis. Trials in which pseudoword processing or competitor priming leads to a larger increase in RT also lead to larger post-DP neural responses. These links between behavioural and neural effects of lexicality and competitor priming are once more in-line with the proposal that post-DP increases in prediction error are a key neural mechanism for word and pseudoword processing and can explain the delayed behavioural responses seen in competitor priming. Interestingly, although regression analyses show positive relationships between RT and MEG effects on both lexicality and competitor priming, they differ in terms of whether a reliable neural response difference would be seen for trials in which response time effects were null (i.e. the baseline difference). While neural lexicality effects were significant even for trials that did not show behavioural effects, the same was not true for the competitor priming effect. These results indicate that, in accordance with predictive-selection accounts, the post-DP neural competitor priming effect was mediated by changes in behavioural response times. Only those trials in which competitor priming slowed behavioural responses led to larger neural responses. In contrast, an increased neural response to pseudowords would be expected even in trials for which response times did not differ between pseudowords and words. We will consider the implications of these and other findings for pseudoword processing in the next section.
How do listeners process pseudowords?
Participants identified pseudowords with a speed and accuracy that is similar to that seen during recognition of familiar words. This is consistent with an optimally-efficient language processing system (Marslen-Wilson, 1984; Zhuang et al, 2014;), in which pseudowords can be distinguished from real words as soon as deviating speech segments are heard. Beyond this well-established behavioural finding, however, we reported two seemingly contradictory observations concerning pseudoword processing.
The first is that, while post-DP neural activity and response times for words were modulated by competitor priming, processing of pseudowords was not similarly affected. This might suggest that the prior probability of hearing a pseudoword and the prediction error elicited by mismatching segments are not changed by our experimental manipulations. This may be because pseudowords have a low or zero prior probability and elicit maximal prediction errors that cannot be modified by a single prime. Yet, memory studies suggest that even a single presentation of a pseudoword can be sufficient for listeners to establish a lasting memory trace (Mckone & Trynes, 1999; Arndt et al., 2008). However, it is possible that this memory for pseudowords reflects a different type of memory (e.g. episodic memory) from that produced by a word, with only the latter able to temporarily modify long-term, lexical-level representations and predictions for word speech segments (as in Complementary Learning Systems theories, cf. McClelland et al., 1995; Davis & Gaskell, 2009). Additionally, these differences between words and pseudowords may be influenced by the lexical decision task, which may have implicitly cued participants to treat words and pseudowords differently. Participants need to identify the exact form of a single word in order to confirm its lexical status, but a deviation from all known words needs to be established to indicate a pseudoword (Norris & Kinoshita, 2008).
A second observation is that, contrary to the null result for post-DP processing, pseudoword priming reduced subsequent pre-DP neural responses evoked by target items to a similar degree as real word priming (Figure 4C). This pre-DP effect is surprising given previous evidence suggesting that pseudowords must be encoded into memory and subject to overnight, sleep-associated consolidation in order to modulate the speed of lexical processing (Tamminen et al., 2010; James et al., 2017) or neural responses (Davis & Gaskell, 2009; Landi et al. 2018). It might be that neural effects seen for these pre-DP segments were due to changes to the representation of familiar words that our pseudowords resembled, though these were insufficient to modulate processing of post-DP segments.
Summary
Our work provides compelling evidence in favour of neural computations of prediction error during spoken word recognition. Unlike previous work (Brodbeck et al. 2018; Donhauser & Baillet, 2020) which reported neural responses correlated with lexical entropy as well as prediction error (surprisal), we did not find any similarly equivocal evidence. These earlier studies measured neural responses only to familiar words in continuous speech sequences such as stories or talks. However, since lexical uncertainty (entropy) and segment-level predictability (segment prediction error or surprisal) are highly correlated in natural speech, these studies may not be as able to distinguish between the lexical and segmental mechanisms that we assessed here. In contrast, our speech materials were carefully selected to change lexical probability (through priming) and for priming to have opposite effects on segment prediction error before and after DP. This manipulation provides conclusive evidence in favour of predictive-selection mechanisms that operate using computations of prediction error during spoken word recognition.
Acknowledgments
The research was supported by the UK Medical Research Council (SUAG/044 & SUAG/046 G101400) and by a China Scholarship Council award to Yingcan Carol Wang. We are grateful to Clare Cook, Ece Kocagoncu and Tina Emery for their assistance with data collection, and also to Olaf Hauk for his advice on MEG data analysis.
Footnotes
The authors declare no competing financial interests.