Looking forward does not mean forgetting about the past: ERP evidence for the interplay of predictive coding and interference during language processing

Interference and prediction have independently been identified as crucial influencing factors during language processing. However, their interaction remains severely underinvestigated. Furthermore, despite the growing body of behavioral studies investigating interference during sentence processing, the neurobiological basis of cue-based retrieval and retrieval interference remains insufficiently understood. Here, we addressed these issues with an ERP experiment that systematically examined the interaction of interference and prediction during language processing. We used the neurobiologically well-established predictive coding framework for the theoretical framing of our study. German sentence pairs were presented word-by-word, with an article in the second sentence constituting the critical word. We analyzed mean single trial EEG activity in a standard N400 time window and found an interaction between interference and prediction as measured by an offline cloze probability test. Under high predictability, sensory input was explained away (in the sense of predictive coding). Hence, no retrieval operations were necessary and no interference effects were observable. In contrast, under low predictability, model updating engendered memory retrieval, including the evaluation of distractor items, thus leading to interference effects. We conclude that interference should be included in predictive coding-based accounts of language because prediction errors can trigger retrieval operations and, therefore, induce interference.


Introduction
To successfully process a sentence or discourse, it is necessary to access previously encountered words, so that new words can be integrated with ones that have already been processed. The cognitive mechanisms underlying this process have been modeled in detail within the cue-based parsing framework (Lewis, Vasishth, & Van Dyke, 2006;Van Dyke & Lewis, 2003). It views memory retrieval as a cue-based process, which compares retrieval cues generated at the retrieval site with the features of the words in memory. Each new word or constituent is encoded into memory as a feature bundle (e.g. case, number, gender etc.). Importantly, not only the constituent itself is represented, but also predicted constituents that are required to form a grammatical sentence. Thus, when the noun toy is encountered in example one (from Lewis et al., 2006), two memory representations are constructed: one depicting its features and the other a prediction of the corresponding verb. When the verb arrived is processed, retrieval cues are generated to integrate it into the sentence context. The cues are matched in parallel against all (recent) memory representations. A sufficient match results in retrieval of the verb prediction and the integration of verb and noun.
(1) Melissa knew that the toy from her uncle in Bogotá arrived today.
But retrieval and, hence, comprehension difficulty can arise due to similaritybased interference from cue overlap, i.e. an overlap between the retrieval cues and features of more than one word / prediction recently stored in memory (Lewis et al., 2006). The correct to-be-retrieved word / prediction is called the target of the retrieval and other words / predictions in memory may function as interfering distractors during the retrieval process. This perspective is supported by numerous behavioral studies, in which interference was induced either through distracting words within a sentence or by extra-sentential memory items (see Jäger, Engelmann, and Vasishth (2017) for a recent review and meta-analysis). For example, Van Dyke and McElree (2006) used a dual-task paradigm to study interference from extrasentential memory items (see 2a) during sentence comprehension. They presented it-clefts (see 2b) that required a retrieval operation to integrate the matrix sentence verb (sailed/fixed) with its object (boat). The object is the target of the retrieval.
(2a) Memory load set: tablesinktruck (2b) It was the boat that the guy who lived by the sea sailed / fixed in two sunny days.
In load conditions, three nouns that functioned as distractors (cf. 2a) were presented before a self-paced sentence reading task (for sentences such as 2b). In high interference conditions, the distractors were plausible objects of the matrix verb, and in low interference conditions, they were implausible (i.e. it is possible to fix, but not to sail a table / sink / truck; thus, interference is high for matrix verbs such as fix, but low for verbs such as sail). To ensure that there were no confounding differences in the comprehension of the verbs, no load conditions were included. In these conditions, the participants just read the sentences without an additional task.
Reading times for the critical verb carrying the interference manipulation showed that, within the memory load conditions, high interference conditions led to longer reading times. In addition, comprehension accuracy was influenced by both memory load and interference, with responses being more accurate for low interference conditions and for no load conditions than for their high interference / load counterparts, respectively. These results were interpreted as showing that retrieval is the locus of interference during sentence comprehension rather than encoding, as the retrieval context was manipulated while encoding was held constant (Van Dyke & McElree, 2006).
Evidence for interference via distractors from within the sentence was provided, for example, by Gordon, Hendrick, and Johnson (2001), who found that similar noun phrase types within a sentence lead to higher processing difficulty. Van Dyke (2007) showed that distractors that match only a subset of retrieval cues could nevertheless influence retrieval and cause processing difficulties. Furthermore, it was found that distractors in-between target and retrieval site (retroactive interference) have more impact on processing than distractors that are processed before the target (proactive interference) (Martin & McElree, 2009). However, the neurobiological basis of cue-based retrieval and interference in sentence processing remains insufficiently understood. Initial ERP evidence was presented by Martin, Nieuwland, and Carreiras (2012), who studied how distractors affect the processing of (un)grammatical NP ellipses in Spanish by manipulating the (mis)match between the gender of retrieval cues, targets and distractors. They reported a sustained, broadly distributed negativity for grammatical ellipses with a mismatching distractor compared to grammatical ellipses with a matching distractor and ungrammatical ellipses. It was concluded that the mismatching distractor interrupted the processing of grammatical ellipses via cue-based interference. In a follow-up study, Martin, Nieuwland, and Carreiras (2014) changed the syntactic structure of their material to render the antecedents of the ellipses (the target of the retrieval) less distinct. In this study, grammatical ellipses with matching distractors elicited an early anterior negativity compared to grammatical ellipses with mismatching distractors and ungrammatical ellipses. This was regarded as an effect of similarity-based interference. Thus, while both studies found negativities, they were elicited by different conditions (matching vs. mismatching distractors), had different distributions and latencies. The neurocognitive correlates of interference therefore clearly require further investigation. In addition, no attempt has been made to date to elucidate the possible neurobiological mechanisms underlying retrieval interference in language.
With the present study, we aimed to deepen the neurobiological understanding of interference during language comprehension by studying the interplay of interference and prediction within the neurobiologically well-established predictive coding framework. Predictive coding states that prediction is a general principle of brain function, with generative predictive models serving to explain the causes of sensory input (Friston, 2005). Predictive processing is organized within a hierarchically structured cortical architecture, within which anatomical feedback connections (from higher to lower levels) communicate predictions. Feedforward connections (from lower to higher levels), by contrast, communicate sensory input or more preciselyonly the parts of the input that are divergent from the prediction (Clark, 2013;Rao & Ballard, 1999). Thus, prediction errors essentially serve as a proxy for sensory information (e.g. Feldman & Friston, 2010), since the feed-forward activity associated with sensory input is "silenced" when predicted (i.e. excitatory bottom-up activity is countered by inhibitory top-down activity in this case; cf. Friston, 2010). Overall, the system strives to reduce prediction errors, but when they do occur, they are used to update the current model so that more accurate predictions can be generated in the future.
According to a recent proposal by Bornkessel- Schlesewsky and Schlesewsky (2019), the N400 event-related potential provides a window on predictive coding during language comprehension. Specifically, this account posits that the N400 indexes precision-weighted prediction errors, with precision defined as the inverse of variance (Feldman & Friston, 2010) andin the context of languageequating to the relevance of a particular information source for the form-to-meaning mapping. This offers an explanation for the presence versus absence of N400 effects for the same, prediction-error inducing phenomenon across different languages, depending on the language specific relevance (or "validity", cf. MacWhinney, Bates, & Kliegl, 1984) of cues such as word order, animacy or case marking (see Bornkessel-Schlesewsky et al. (2011) for empirical evidence, and Bornkessel-Schlesewsky and Schlesewsky (2019) for detailed discussion). In summary, while N400 amplitude shows a general tendency to decrease for predicted stimuli and increase for stimuli that involve prediction error (Dröge, Fleischer, Schlesewsky, & Bornkessel-Schlesewsky, 2016;Frank, Otten, Galli, & Vigliocco, 2015;Kuperberg, 2016), the magnitude of the N400 effect is assumed to be proportional to the degree to which the prediction error actually leads to a model update. In the present study, we will draw on this proposal to formulate our hypotheses with regard to the relation between prediction and interference during language processing.
The interplay of prediction and interference hasto the best of our knowledge crossed, as they investigated two levels of prediction (high, low) within the interference conditions, but only low prediction within no interference conditions. Thus, it is not possible to fully characterize the interplay between prediction and interference on the basis of these results.
We can, nevertheless, put forward a hypothesis regarding the relationship between prediction and interference within the context of a predictive coding framework. Recall from our discussion of predictive coding above that predictions essentially correspond to top-down activity within a hierarchically organized processing architecture, and that this activity cancels out bottom-up, stimulus-related activity when all features of a stimulus are fully predictable (silencing of prediction errors). By contrast, interference arises when bottom-up information calls for a retrieval and the features of the retrieval target are too similar to those of other items currently held in memory. It thus appears plausible that retrieval operations are only required in the absence of perfect predictability, i.e. when it is not possible to fully cancel out stimulus-related activity. In other words, when an upcoming word is fully predicted, it does not contain any new information that could serve to initiate a retrieval operation. We sought to examine this hypothesis with the present study.

The Present Study
The present study examined the interplay between prediction and interference in German, using a design that fully crossed the two factors. We focused on German articles as critical stimuli within a broader (two) sentence context, because they overtly mark case, number and gender and are thus ideal for manipulating interference. In view of the design opportunities afforded by German for the examination of interference, it is somewhat surprising that only one study has investigated retrieval interference during German sentence processing to date (Nicenboim, Vasishth, Engelmann, & Suckow, 2018). Here, we aimed to provide further evidence for interference effects in German in addition to presenting what isto the best of our knowledgethe first ERP study to investigate the interplay of predictive coding and interference. Specifically, we manipulated gender interference and prediction during the processing of an article (see example 4). In German, this word class carries (only) case, number and gender information. processing system should seek to establish coreference with a noun in the preceding context (cf. Schumacher, 2009). We hypothesized that this process would be influenced by both the predictability of the article (and of the following noun) as well as by the degree of interference from the distractor noun. Materials were constructed so that they varied in terms of both target article and noun cloze probability and the degree of interference (see Methods for further details). By extending the distance between the retrieval site and the to-be-retrieved word across a sentence boundary, we hoped to gain insights into the scope of potentially interfering distractors.
Our hypotheses were as follows: when the article is fully predictable, prediction confirmation at this position further supports the existing prediction of the following noun. In cases of prediction disconfirmation at the article, by contrast, the internal model needs to be adjusted to prevent further prediction errors and the prediction for the upcoming noun needs to be revised. This revision necessarily involves memory retrieval of potential candidates and we hypothesized that it should be influenced by interference. We expected similar effects at the position of the following noun. Again, if the specific noun was predicted, there will be prediction confirmation and no further processing is required (silencing of prediction errors). If it was not (fully) predicted, its integration calls for retrieval of the corresponding memory entry as well as an adjustment of the predictive model to prevent further prediction errors. Following Bornkessel-Schlesewsky and Schlesewsky (2019), we expected to find N400 effects for prediction errors versus prediction confirmation at both the article and following noun that and hypothesized that these would be modulated by interference.

Participants
Forty-four right-handed participants (8 male; mean age: 22.75 years, range: 18 -35) from the University of Salzburg gave written informed consent and received either course credit or 20 Euros for their participation. All were native (Austrian) German speakers with normal or corrected-to-normal vision and no history of neurological or psychiatric disorders. None of the participants took part in the questionnaire pretest. Four participants had to be excluded because of technical issues, three were excluded because their response accuracy was below 75 % and five participants were excluded because of excessive artifacts (more than 25 % of the critical sentences were affected). The data of thirty-two participants (5 male, mean age: 22.34 years, range: 18 -35) was used for final analyses.

Materials
Eighty-five sets of four sentence pairs were constructed (see Table 1 for an example). Two noun phrases (NPs), which either matched or mismatched according to their gender features, were introduced in a context sentence and one of them was repeated in the target sentence. Target sentences were identical across conditions. The repeated NP was always the object of the target sentence and its article was our primary critical position for ERP analyses, though we also report additional analyses of the noun position. In German, articles and corresponding nouns must have congruent gender, number and case features. In the high interference conditions (1/2), the article could grammatically be followed by each of the two nouns in the context, while, in the low interference conditions (3/4), there is only one compatible noun in the context. Additionally, the distance between the target noun and the retrieval site (the article) was manipulated via the word order in the context sentence.

Questionnaire Pretest
A sentence completion pretest was conducted to obtain cloze values for the critical article and the following noun in context (Kutas & Hillyard, 1984;Taylor, 1953). The material was split into four lists, with only one sentence pair from each lexical set occurring per list. In addition, there were two versions of each list: one truncated the target sentence before and one after the critical article, resulting in eight different questionnaires. The different lists / versions were constructed to ensure that the pretest participants stayed naïve towards the manipulation. No fillers were included in the pretest. A total of 144 students of the University of Salzburg participated in the pretest (16 to 20 per questionnaire version, mean age = 22.5 years, age range: 18 to 40 years, 39 men). Pretest participants were native speakers of (Austrian) German and were instructed to intuitively complete the sentence pairs.
None of the pretest participants also took part in the main study.
The obtained cloze probability (CP) values ranged from 0 to 1 for both the article and noun. The results of the pretest and pairwise comparisons are shown in Table 2.
The CP of the article in the high interference conditions was higher overall than in the low interference conditions. This is a logical consequence of the fact that, no matter with which of the two nouns of the high interference context the participants chose to continue the sentence, the article had the same form because the nouns shared the same gender feature. In contrast, the CP values for the article in the low interference conditions were lower because the article form varied based on the decision regarding which of the context nouns should follow. In both interference conditions, the participants showed a recency effect, i.e. they chose an article congruent to the last-mentioned noun more often than to the first-mentioned noun (cf. the higher CP values for a short distance to referent across conditions). In the high interference conditions, this recency effect is difficult to interpret as the nouns share the same gender and might be due to an implicit bias related to verb semantics, i.e. in the way that one of the nouns is closer associated with the verb.
The CP values for the target noun are lower overall in the high interference conditions compared to low interference conditions. This reflects the fact that the article is ambiguous in the high interference conditions and that, in the low interference conditions, the article only matches one noun. Like the article results, the noun CP results show a recency effect.
In summary, both the article and the noun CPs obtained by our pretest reveal an offline recency effect. The participants expect the last-mentioned noun to be repeated in the target sentence. Also, for both the article and the noun, we see that the participants use the likelihood of an article and noun with a specific gender feature to complete the sentence pair.

Procedure
Eighty of the original eighty-five items were presented in the main experiment.
The exclusion was done to achieve an even number of items and was not based on pretest results. Two pseudo-randomized lists were constructed so that each participant saw only two versions of one item (either version 1 and 4 or version 2 and 3). To avoid order effects, half of the participants were presented a backward version of the lists. Each of the lists consisted of 240 sentence pairs (160 critical and 80 filler sentence pairs). The first sentences of the filler sentence pairs were identical to the structure of the critical sentence pairs. The second sentence of the fillers was constructed to be coherent to the first sentence but did not include a referential link to it. All sentence pairs (both critical and filler pairs) were grammatical. A third of all sentence pairs in both lists (80 out of 240) was followed by a "yes"/"no" comprehension question. Half of the questions required a "yes" answer indicated by a right or left mouse click (balanced across participants).
Participants were instructed to silently read the sentence pairs, which were presented on the center of a computer screen in a word-by-word manner. Words were presented in white letters on a black background and participants were asked to minimize eye movements, blinks and general movements. A trial began with a 400 ms fixation cross followed by 200 ms blank screen. Words were presented for 400 ms each with a 100 ms inter-stimulus interval between words; after the last word of the first sentence in the pair, there was a 1000 ms inter-stimulus interval. To encourage participants to read carefully, a third of the trials was followed by a yes/no comprehension question (i.e. 'Steht das Auto vor dem Haus?', engl.: 'Is the car in front of the house?'), which the participants had five seconds to answer. After that (or in trials without a task), there was a self-paced inter-trial interval, during which participants could blink as much as needed before pressing a button to start the next trial. All sessions began with a five-sentence-pair practice block to familiarize participants with the procedure. The following six experimental blocks were separated by short breaks. Each experimental session lasted approximately 2.5 hours; participants spent approximately 75 minutes under task.

Electroencephalogram recording and preprocessing
The electroencephalogram (EEG) was recorded from 58 active scalp electrodes positioned according to the standard 10-20 system and attached to an elastic cap (ActiCap from Easycap GmbH, Herrsching, Germany). EEG recordings used an ActiCHamp amplifier (Brain Products GmbH, Gilching, Germany) with a sampling rate of 500 Hz and AFz as the ground electrode. Six electro-oculogram (EOG) electrodes were positioned above and below both eyes and at the outer canthus of each eye.
Impedances were kept below 10 kΩ. All electrodes were online referenced to the left mastoid and re-referenced to linked mastoids offline. All preprocessing steps were carried out using the Brainvision Analyzer 2. The raw data were filtered with a bandpass of 0.1 -30 Hz. Artifact rejection was carried out semi-automatically, based on Brainvision Analyzer 2's Max-Min criterion. According to this criterion, intervals are rejected in which the six EOG channels show a difference of more than 100 µV within 200 ms windows, plus an additional 200 ms window before and after the artifact.
ERPs were computed in epochs of -200 to 1550 ms relative to critical article onset (noun onset was 500 ms after article onset). While we present grand average ERPs (see Figure 1) for comparability to previous research, the current design required single trial-based analyses, as we outline below.

Statistical Analysis
We performed linear mixed model analyses using the R (R Core Team, 2017) package lme4 (Bates, Maechler, Bolker, & Walker, 2015) on the basis of the mean activity of EEG single trial data between 300 and 500 ms post onset of the article and noun. We chose the standard N400 time window based on theoretical considerations and our hypotheses, rather than via visual inspection of ERP waveforms. For both words, offline cloze probability values of the article and following noun were included as continuous predictors, together with the topographical factors sagittality (anterior: Firstly, models with maximal random effects structures did not converge for our data or were not computable within a manageable time frame. Secondly, as already noted by Barr and colleagues (2013), their suggestion explicitly targets confirmatory research rather exploratory investigations such as ours (see also Alday, Schlesewsky, & Bornkessel-Schlesewsky, 2017).
Beyond the commonalities of the article and noun models outlined above, the two models differed in the coding of the factors interference and distance. In the article model, Interference was modeled as a three-level factor (high, low with short distance, low with long distance between target and retrieval site) because, in the high interference conditions, the target noun cannot be determined at the article and therefore its recency is unclear until the noun position (i.e. both conditions are identical up to the position of the article). In the noun model, we modeled the two factors separately: Interference (high, low) and Distance between target and retrieval site (short, long). Using the R package car (Fox & Weisberg, 2011), categorical variables were encoded with sum contrasts, such that the estimate for a given fixed effects level represents the difference between this level and the grand mean. For brevity, we present Type-II Wald χ 2 tests (also provided by the car package (Fox & Weisberg, 2011)) instead of the model summaries. We visualize effects of interest based on effects matrices created using the R package effects (Fox, 2003) and plotted using ggplot2 (Wickham, 2016). Prior to the analyses, we excluded two items that were inconsistently constructed, leaving us with the data for 78 items.

Comprehension Question Accuracy
Participants were encouraged to answer the comprehension questions correctly and within the time limit (five seconds). As noted above, three participants were excluded for accuracy rates below 75%. For the remaining participants,

Traditional ERP waveforms
For sake of completeness, we present traditional grand average ERP waveforms (N= 32) in Figure 1. Article onset is at 0 ms and noun onset at 500 ms. The traditional ERP methodology cannot account for the complexity of our design with continuous predictors (cf. Alday et al., 2017). Thus, while visual inspection of the grand average ERP waveforms is suggestive of some condition-based differences in the N400 time windows at the article and noun positions, we will not interpret these further as they do not reflect our full design. Converging support for this assumption is presented in Table 3, which shows that, for both the article and noun positions, models including cloze probability values show a better fit to our data than models without them. This justifies our complex design and, in addition, our departure from the analysis of effects apparent in the ERP waveforms.

Electrophysiological Data: Article
Type-II Wald test results for the linear mixed model for the article are shown in Table 4. We focus on the four-way interaction between Sagittality, Interference, Article CP and Noun CP (χ 2 (4) = 17.54, p = 0.001), which is visualized in Figure 2. At posterior electrode sites, effects of interference emerged only when both Article CP and Noun CP were low and manifested themselves in the form of increased negativity for the high interference and low interference / long target distance conditions in comparison to the low interference / short target distance condition.

Electrophysiological Data: Noun
Type-II Wald tests for the linear mixed model for the noun are shown in Table 5.
We focus on the Sagittality x Interference x Target distance x Noun CP interaction (χ 2 (2) = 29.54, p < 0.001), which is visualized in Figure 3. When Noun CP is low, at anterior electrode sites, the low interference / short target distance condition is more negative than the other conditions and the low interference / long target distance is more positive than the others. The high interference conditions do not differ from each other and show an amplitude that is intermediary between the two low interference conditions.

Discussion
We have presented an ERP study that investigated the interplay of prediction and interference within sentence pairs. In addition to the experimental manipulations, we included offline cloze probability (CP) values of the critical article and following noun in linear mixed model analyses of mean single trial EEG activity within the N400 time window (300-500 ms post stimulus onset). At the position of the critical article, we found a posterior negativity for conditions with a highinterference distractor and for conditions with a close, low-interference distractor (i.e. long distance to target) in comparison to conditions with a distant, lowinterference distractor (i.e. short distance to target). These effects only emerged when the CP of the article itself and of the following noun were low. At the noun position, interference and distance effects also manifested themselves when CP was low, but only the CP of the noun itself mattered. Here, the noun in the low interference condition with a short distance to the target elicited an anterior negativity compared to the other conditions. Amplitudes were less negative for the high interference conditions, which did not differ depending on target distance.
Finally, conditions with a low-interference distractor and a long distance to target showed the least negative / most positive responses at the noun position.
In the following, we will first discuss the results for the article and noun positions, in turn. Subsequently, we will consider potential theoretical implications of our findings, before offering some conclusions.

Article
Effects at the article emerged only when the CP of both the article and noun were low. Thus, under conditions of high predictability, the distractors were essentially "ignored" by the processing system. We suggest that this aligns well with a predictive coding perspective, according to which high predictability leads to all (or almost all) of the sensory input for a current item being "explained away" (cf. Friston, 2010;and see the introduction for further discussion).
In contrast, when predictability is low, bottom-up information gains more weight.
Since the input is not "explained away" under these circumstances, bottom-up sensory information (corresponding to prediction error), is propagated up through the levels of the hierarchically organized predictive processing architecture. Prediction errors induce model updating, whichwe assumerequires an evaluation of all (other) possible candidate referents for the article and upcoming noun. Since this evaluation necessarily involves retrieval of the candidates, it is a natural locus for interference. We propose here that interference modulates the complexity of the model update and, thereby, N400 amplitude. In sentences of the type employed in our experiment, the required model update involves retrieving the correct referent for the upcoming noun. This retrieval operation is facilitated under conditions of low interference: since the gender information provided by the article uniquely matches exactly one noun in the context, this scenario involves a low risk of future prediction errors in spite of the fact that the CP of noun and article was low prior to processing of the corresponding stimuli. In essence, retrieval is easy due to the lack of alternativespossibly due to the restricted experimental context. In the low interference, short target distance condition, retrieval of the target noun is additionally supported by recency (see section 2.3 for a discussion of recency effects in our materials as revealed by the pretest). By contrast, when recency information contradicts gender information during referent retrieval, as is the case in the lowinterference, long distance to target condition, model updating is more complex, due to the two conflicting information sources. Hence, we observe a more pronounced N400 effect under these conditions. Finally, under conditions of high interference, a low CP article induces a prediction error but, at the same time, provides no additional evidence to guide a model update (since it matches both referents in the context sentence). While a model update is clearly required here, it is associated with a high degree of uncertainty, again correlating with a more pronounced N400 amplitude.
Thus, in line with Bornkessel-Schlesewsky and Schlesewsky's (2019) proposal, our results at the position of the critical article suggest that N400 amplitude during incremental sentence comprehension does not reflect the presence/absence of a prediction error per se, but rather the effects of a prediction error on internal model updating. The present findings suggest that interference may play an important role in this regard, as the presence of interfering distractors not only impacts memory retrieval (i.e. the likelihood of retrieving the correct item from memory), but also the complexity of model updating. In addition, if an account along these lines is correct, it suggests that prediction error within a hierarchically organized predictive coding architecture might trigger memory retrieval. We return to this point in section 4.3 below.

Noun
At the noun position, effects are also only apparent under low (noun) CP. As already argued for the article position, we assume that highly predicted nouns do not engender model updatingand, hence, additional retrieval operationsas they are fully compatible with top-down predictions and thus lead to a silencing of prediction errors.
Under low noun CP, the most pronounced negativity was observable for the low interference / short target distance condition. In line with our discussion of the article effects, this could reflect the falsification (reflected in the low CP) of a highprecision prediction that was built even before the article was processed (arising from the absence of distractors and recency of the target). This leads to a pronounced prediction error response, i.e. increased N400 amplitude.
Both high interference conditions are less negative than the low interference / short target distance condition, irrespective of target distance. This could also be attributable to the precision-weighting of prediction errors (cf. Bornkessel- Schlesewsky & Schlesewsky, 2019): the article('s gender information) is a cue with low precision in these conditions, because its compatibility with both nouns in the context gives rise to two highly likely predictions. Accordingly, the precision of these predictions is reduced (see Todd et al., 2014, for discussion of precision modulation for predictions as opposed to sensory input) and N400 amplitude is reduced for a prediction error in comparison to the low interference / short target distance condition. This provides converging support for the idea (cf. Bornkessel-Schlesewsky & Schlesewsky, 2019) that the precision-based weighting of cues and predictions is not only language-specific, but also situation-, i.e. context-sensitive. Gender is normally a valid cue forming dependencies between articles and nouns in German, but the high interference context renders it invalid.
The low interference / long target distance condition was least negative at the noun position. This observation is more difficult to derive from our assumptions and we can thus only offer some speculative attempts as a post-hoc explanation. We assume that, in this condition, prediction precision is lower than in the low interference / short target distance condition because here, the gender cue is in conflict with the recency cue. However, this raises the question of why prediction precision should be even lower here than in the case of the high-interference conditions, as would be required to explain the extremely reduced (or even absent) prediction error response in the N400. One possibility as to why this could be the case is that, in the low interference / long target distance condition, there are two different competing information sources (cues). In the high interference conditions, by contrast, the competing information stems from the same type of cue, namely gender. However, to test whether this explanation indeed holds, it would need to be rendered more precise via a quantitative model of the relative weighting (cue validity) of gender and recency in German and examined in a confirmatory study.
Finally, we would like to note that the topography of the noun effect is not typical for an N400 in the visual modality. It is, however, similar to that found by Martin et al. (2014), who reported an anterior negativity for ellipses with a matching distractor. But it is difficult to further connect these previous findings to our results, as our high interference conditions, which are most directly comparable to Martin and colleagues' critical condition, are associated with an intermediary amplitude of the interference-related anterior negativity effect for low CP nouns.

Theoretical implications
Our work has potential implications for all of the theoretical frameworks under discussion: predictive coding, cue-based parsing (retrieval) and the functional interpretation of N400 amplitude as reflecting model updating via precisionweighted prediction error signals. We will briefly discuss each of these in turn.
Within a hierarchically organized predictive coding architecture, prediction errors trigger model updating. Our results provide some initial indications as to how this updating process may be shaped by different information sources in the context of a complex domain such as higher-order language processing. Specifically, the updates required in our experimental conditions cannot depend entirely on incoming sensory information, but also require recourse to previous input, which needs to be accessed via retrieval operations. Retrieval can, in turn, be influenced by interference, which lowers the probability of correctly retrieving the target or, in a worst-case scenario, may even favor retrieval of an incorrect (distractor) item. If the incorrect item is retrieved, the priors of the new model are incorrect, thereby inevitably leading to prediction error in the future. If interference manifests as multiple possible targets for the retrieval, new predictions that are based on this retrieval situation are less trusted because of the increased risk of prediction error. The resulting model thus has less reliability due to the lower-precision predictions. In summary, our findings suggest that: (a) prediction errors trigger memory retrieval when the incoming sensory information is not sufficient to guide model updating; and (b) retrieval interference during model updating diminishes the ability of the model update to minimize future prediction errors.
Under the cue-based parsing framework, memory representations not only comprise already processed words, but also predictions for upcoming words or constituents. The former are retrieved when new words need to be integrated with them (e.g. to establish co-reference or to saturate dependencies between heads such as verbs and their dependent arguments). By contrast, the latter are retrieved when the predicted words are actually encountered. The retrieval operation investigated with the presented study might be comparable to those during reflexive-/reciprocalantecedent or ellipsis processing. Rather than retrieving an argument-verb dependency, for example, it calls for the retrieval of the correct (co-)referent for the current item. But there are no structural constraints (e.g. c-command in Chomskyan frameworks), which function as additional retrieval cues during the retrieval of our referent.
There appears to be a conceptual discrepancy between the memory representation of a predicted word that needs to be retrieved in cue-based parsing and predictions in the sense of predictive coding. In cue-based parsing, the memory representation of the representation is shifted out of attentional focus when other words are processed (see McElree, 2006). Thus, when the predicted word is encountered, the prediction must be retrieved in order to allow for an integration of the predicted, newly encountered word with the corresponding "old" word. Within the predictive coding architecture, by contrast, the prediction of an upcoming stimulus of a specific kind would be (repeatedly?) communicated via feedback connections until the stimulus is encountered, at which point the prediction would "explain away" the sensory input. However, given the propensity for non-adjacent dependencies in language, all stimuli that are encountered between the generation of the prediction and the predicted stimulus' appearance would lead to prediction error.
This could either engender model updating and subsequent dismissal of the prediction, orwhat is perhaps more likelyto a generalization of the prediction (e.g. the prediction that a noun phrase should be followed, at some point, by a verb).
This may lead to maintenance or even strengthening of the prediction (cf. anti-locality effects, (Husain, Vasishth, & Srinivasan, 2014;Konieczny, 2000)), which appears incompatible with the notion that a predicted memory representation disappears from the focus of attention and becomes active again later. One way to operationalize predictions for non-adjacent elements in a predictive coding framework may be to assume that these correspond to higher levelsand hence longer timescaleswithin the hierarchically organized predictive coding architecture (cf. Bornkessel- Schlesewsky & Schlesewsky, 2019;Bornkessel-Schlesewsky, Schlesewsky, Small, & Rauschecker, 2015).
The N400-as-model-updating account by Bornkessel-Schlesewsky and Schlesewsky (2019) posits that N400 amplitude reflects precision-weighted prediction errors and, hence, the consequences of a prediction error for model updating rather than prediction-error-related activity per se. Initially, the notion of precision-weighting was introduced to account for N400 modulations via the relevance (cue validity) of a particular feature within a given language. The results of the present study provide evidence to suggest that this notion could be extended to varying validity of a cue according to the specific context. Interference contexts reduce the validity of a cue as already described by the term of cue diagnosticity (Nairne, 2002). A cue is diagnostic if it uniquely picks out the target and excludes the distractors (see Martin et al., 2012 for detailed discussion). Thus, interference during a retrieval operation as part of a model update is closely related to the precision of future predictions.
Finally, the present findings provide further converging support for the notion that predictive processing during language comprehension mayat least under certain circumstanceslead to the anticipation of the form of upcoming input elements. This idea was prominently put forward by DeLong, Urbach, and Kutas (2005), who reported evidence for an anticipation of the English articles "a" versus "an" on the basis of a predicted, following noun. A recent large-scale replication study failed to replicate this effect at the position of the article (Nieuwland et al., 2018), thus leading to a controversial discussion of whether the language comprehension system indeed predicts upcoming form-based features. By showing effects of prediction and interference at an article (i.e. a function word that does not vary in its compatibility with the context unless viewed in conjunction with the following noun), our study supports the notion that form-based predictions can indeed be generated.
This adds further support to a growing body of literature in languages other than English, e.g. (noun-)gender-based N400 effects at the position of pre-nominal articles in spoken Spanish (Wicha, Bates, Moreno, & Kutas, 2003; but see Wicha, Moreno, & Kutas, 2004 for a P600 effect in a similar design) and pre-nominal adjectives in Polish (Szewczyk & Schriefers, 2013), and handshape-based N400 effects prior to the onset of a critical sign in German sign language (Hosemann, Herrmann, Steinbach, Bornkessel-Schlesewsky, & Schlesewsky, 2013). By further demonstrating that these pre-nominal prediction effects are modulated by a range of influences that have not been examined to datemost notably interference, but also the interaction of article and noun CPour study provides an initial indication as to why previous studies may have shown apparently contradictory results.

Conclusion
The results of the present ERP study suggest that prediction outweighs interference. Under high predictability, interference effects do not arise, because the sensory input at the retrieval site is "explained away" in the sense of predictive coding. Hence, retrieval is not necessary, thus eliminating the possibility of interference. We conclude that interference may play a crucial role in a hierarchically organized, cortical predictive coding architecture for language as it influences the complexity of model updating and the precision of future predictionsparticularly when predictability is low.    Model names reflect the predictors used: sag = sagittality, lat = laterality, interf = interference, dist = target distance, artCP = article cloze probability, nounCP = noun cloze probability.   high interference / long target distance low interference / long target distance high interference / short target distance low interference / short target distance