Abstract
In the last few decades, the idea that people routinely and implicitly predict upcoming words during language comprehension has turned from a controversial hypothesis to a widely-accepted assumption. Current theories of language comprehension1–3 posit prediction, or context-based pre-activation, as an essential mechanism occurring at all levels of linguistic representation (semantic, morpho-syntactic and phonological/orthographic) and facilitating the integration of words into the unfolding discourse representation. The strongest evidence to date for phonological pre-activation comes from DeLong, Urbach and Kutas4, who monitored participants’ electrophysiological brain responses as they read sentences, presented one word at a time, with expected/unexpected indefinite article + noun combinations like, “The day was breezy so the boy went outside to fly a kite/an airplane”. The sentences varied expectations (‘cloze’ probability) for a consonant- or vowel-initial noun, as determined in a sentence-completion task using other participants. Expectedly, the amplitude of the N400 event-related potential (ERP) decreased (became less negative) with increasing cloze reflecting ease of processing5–6. Whereas the decreased N400 at the noun could be due to its pre-activation or because high-cloze nouns are easier to integrate, crucially, N400s at the immediately-preceding article a or an showed the same relationship with cloze, i.e., encountering an indefinite article that mismatched a highly-expected word (e.g., an when expecting kite) also elicited a larger N400. This led to the claim that participants pre-activated highly-expected nouns, including their initial phonemes, based on the preceding context, with larger N400s on mismatching articles reflecting disconfirmation of this prediction.
The Delong et al. study warranted stronger conclusions than related results available at the time. Unlike previous work, it did not rely on the precursory visual-depiction of upcoming nouns, clearly de-confounded prediction and integration effects, and tested for graded phonological pre-activation of specific word form. Correspondingly, the study has been enthusiastically received as strong evidence for probabilistic phonological pre-activation, receiving over 650 citations to date and featuring in authoritative reviews2–3. However, there is good cause to question the soundness of the original finding (and the appropriateness of the analysis used). Attempts to replicate the critical article-effect have failed7. Moreover, an earlier, alternative analysis of the same data by the authors8 failed to reach statistical significance, but was omitted from the published report.
To obtain more definitive evidence, we conducted a direct replication study spanning 9 laboratories (Ntotal = 334). We pre-registered one replication analysis that was faithful to the original, and one single-trial analysis that modeled subject- and item-level variance using linear mixed-effects models. Applying the replication analysis to our article data (Figure 1a), the original finding did not replicate: no laboratory observed a significant negative relationship between cloze and N400 at central-parietal electrodes. In contrast, the negative relationship was successfully replicated for the nouns: 6 laboratories observed such an effect and 2 laboratories observed relatively strong but non-significant effects in the expected direction (range r = .30 to .50). In the single-trial analysis (Fig. 1b-c), there was no statistically significant effect of cloze on article-N400s, also with stricter control for pre-article voltage levels (Supplementary Fig. 1). Crucially, there was a strong and significant cloze effect on noun-N400s (in all laboratories), which was significantly different from that on article-N400s. We observed no significant differences between laboratories for article or noun effects. Exploratory Bayesian analyses with priors based on DeLong et al. further support our conclusions (Fig. 1d, Supplementary Fig. 2). Finally, a control experiment confirmed our participants’ sensitivity to the a/an rule during online language comprehension (Supplementary Fig. 3).
Despite a sample size 10 times larger than the original and improved statistical analysis, we observed no statistically significant effect of cloze on article-N400s, while replicating the strong and statistically significant effect of cloze on noun-N400s4,6. The effect of cloze on article-N400s, if existent, must be very small to evade detection given our expansive approach. Whether such an effect would constitute convincing evidence for routine phonological pre-activation as assumed in theories of language comprehension3 can be questioned, but, more generally, such an effect cannot be meaningfully studied in typical small-scale studies. Consequently, current theoretical positions may be based on potentially unreliable findings and require revision. In particular, the strong prediction view that claims that pre-activation routinely occurs across all – including phonological – levels3, can no longer be viewed as having strong empirical support.
Our results do not constitute evidence against prediction in general. We note a lack of convincing evidence specifically for phonological pre-activation, which would have to be measured before a noun appears and unobscured by processes instigated by the noun itself.
However, our results neither support nor necessarily exclude phonological pre-activation. Unlike gender-marked articles9 (e.g., in Dutch or Spanish) that agree with nouns irrespective of intervening words, English a/an articles index the subsequent word, which is not always a noun. Maybe our participants did not use mismatching articles to disconfirm predicted nouns, possibly because it was not a viable strategy (American and British English corpus data show a mere 33% chance that a noun follows such articles). Perhaps a revision of the predicted meaning is required to trigger differential ERPs.
DeLong et al. recently described filler-sentences in their experiment10, cf. 7, which were omitted from their original report, and were neither provided nor mentioned to us upon our request for their stimuli. DeLong used the existence of these filler-sentences to dismiss an alternative explanation of their results, namely that an unusual experimental context wherein every sentence contains an article-noun combination leads participants to strategically predict upcoming nouns. Importantly, we failed to replicate their article-effects despite an experimental context that could inadvertently encourage strategic prediction. Therefore, the difference between their experiment and ours cannot explain the different results, and may even strengthen our conclusions.
In sum, our findings do not support a strong prediction view involving routine and probabilistic pre-activation of phonological word form based on preceding context.
Moreover, our results further highlight the importance of direct replication, large sample size studies, transparent reporting and of pre-registration to advance reproducibility and replicability in the neurosciences.