## Abstract

The effect of prior knowledge and expectations on perceptual and decision-making processes have been extensively studied. Yet, the computational mechanisms underlying those effects have been a controversial issue. Recently, using a recursive Bayesian updating scheme, unmet expectations have been shown to entail further computations, and consequently delay perceptual processes. Here we take a step further and model these empirical findings with a recurrent cortical model, which was previously suggested to approximate Bayesian inference (Heeger, 2017). Our model fitting results show that the cortical model can successfully predict the behavioral effects of expectation. That is, when the actual sensory input does not match with the expectations, the sensory process needs to be completed with additional, and consequently longer, computations. We suggest that this process underlies the delay in perceptual thresholds in unmet expectations. Overall our findings demonstrate that a parsimonious recurrent cortical model can explain the effects of expectation on sensory processes.

## Introduction

A growing body of work in the last two decades have examined whether and how prior knowledge and expectations affect perceptual processes. These studies have consistently shown that expected stimuli are detected more rapidly and accurately than unexpected stimuli. While these effects are well-established at the behavioral level, the computational mechanisms that may underlie these effects remain relatively unclear.

Bayesian modeling framework have been pretty successful to account for a wide range of empirical data in visual perception. Numerous studies have provided evidence that perception can be modeled as an inference process where noisy or ambiguous sensory stimuli can be combined with the prior (Chalk, Seitz, & Seriès, 2010; de Lange, Heilbron, & Kok, 2018; Ernst & Banks, 2002; Kersten, Mamassian, & Yuille, 2004; Knill & Pouget, 2004; Maloney & Mamassian, 2009; Mamassian, Landy, & Maloney, 2002; Summerfield & De Lange, 2014; Teufel, Subramaniam, & Fletcher, 2013; Weiss, Simoncelli, & Adelson, 2002; Yuille & Kersten, 2006). In line with this, recently, using a behavioral paradigm and Bayesian modeling, we examined whether and how prior knowledge and expectations affect perceptual processes (Urgen & Boyaci, 2021). Unlike what is commonly suggested in the field, we found that valid expectations do not speed up perceptual processes (Urgen & Boyaci, 2021). On the contrary, our findings indicate that unmet expectations lead to a delay, and consequently longer processing times. Moreover, we showed that the recursive Bayesian updating scheme can successfully capture that behavior (Urgen & Boyaci, 2021).

Bayesian framework is not limited to modeling the human behavior. It has gained considerable interest to model the brain function as well (e.g., Friston (2005); Heeger (2017); Rao and Ballard (1999)). These models are generally known to be *predictive processing* models. Despite some architectural differences, at a common and fundamental level all these mechanistic cortical models suggest that information processing in the brain can be implemented via dynamic interaction between bottom-up sensory input and top-down prior knowledge. Currently, we have considerable amount of evidence that is in line with the main assumptions of these models (e.g., Shipp (2016)). For example, increased neural response to unexpected events is consistently shown in several neuroimaging studies, and is interpreted as reflecting the prediction error signal. However, even though there is an extensive effort to link the empirical findings with the predictive processing models, a very crucial step has been overlooked in many studies. Unlike Bayesian models of behavior, the proposed mechanistic cortical models have not usually been directly tested against behavioral and neuronal data. Given the power of computational models not only in interpreting empirical data but also in providing a mechanistic understanding for the information processing and even in making predictions, it is important to directly test these predictive processing models against empirical data.

Here, we propose a recurrent cortical model (Heeger, 2017) to examine the effects of expectation on perceptual processes. For this aim, we modeled behavioral findings of Urgen and Boyaci (2021), where we found that unmet expectations lead to higher temporal thresholds. As mentioned above, in this earlier work, using a recursive Bayesian model we show that longer time is needed by the system to complete the sensory process when sensory input and expectations disagree. Here, we take a step further and examine how these effects can be modelled in the predictive processing framework. The cortical model we present here is a parsimonious one and simply assumes that the activity of neural units is an interplay between weighted response to bottom-up sensory input and top-down prior. There is no specified sub-population of neural units, e.g. for prediction or error computing. Notably, within each trial of the behavioral experiment the prior is updated recursively to catch the temporal dynamics of the sensory process. Modeling the behavioral data allowed us to reveal whether the proposed cortical model can explain the behavioral effects of expectation. This approach also made it possible to test whether the cortical model predictions approximate the Bayesian model implemented in Urgen and Boyaci (2021).

## Methods

The cortical model we propose here is used to model the behavioral findings of Urgen and Boyaci (2021). See Figure 1 for experimental paradigm and behavioral results. Briefly, in the behavioral experiment each trial started with a foveally presented cue, either a house or face symbol. This predictive cue was informative about the upcoming target image category. Subsequently an intact (target) image and its scrambled version were briefly shown on either side of the central fixation point, followed by new scrambled images as masks. Participants’ task was to indicate the location, left or right, of the intact (target) image. The validity of the cue was set at 100%, 75%, 50%, and neutral (no expectation) in different experimental sessions. We computed temporal thresholds in neutral, congruent (expected) and incongruent (unexpected) trials under different validity conditions. We found that incongruent trials lead to longer thresholds than congruent trials in 75%-validity condition (Urgen & Boyaci, 2021). We also found that thresholds under the 100%-validity condition are not lower than those neither under the neutral condition nor the valid trials of other validity conditions. Hence, we concluded that valid expectations do not speed up sensory processes, instead, violation of expectations slow them down.

### Cortical Model

For a biologically plausible mechanistic model to explain these findings, we adapted a recently proposed cortical model (Heeger, 2017; Heeger & Mackey, 2019). Figure 2 outlines the model, which was composed of one input, one decision and three intermediate layers, and three category-specific feature units (representing populations of neurons) for face, house and scrambled images. We first define an energy function that the system tries to minimize:

where indices *i* and *j* run over units and layers respectively, and
*Ŷ*_{i} are priors, are unit responses. The parameters *γ*^{(j)} can have values between 0 and 1, and determine the relative weights of the feedforward and prior drives, where as *α* ^{(j)} determine relative contributions of layers. are the weights of connections between units of different layers. Unit responses are updated by minimizing the energy function (Eq. 1) with respect to using gradient descent:
where *a* is the inverse of a time constant and set to 1/5. Note that the feedback and “horizontal” interactions between different units in the same layer emerge in the equations after taking the derivative of the energy function. Number of iterations, *N*, is determined by
where *τ* is the duration of presentation of the images in the trial, and Δ*t* determines how long each iteration lasts in the system. At the beginning of a trial (t = 0) intermediate layer unit responses are randomly drawn from a normal distribution with mean 0
where *σ*_{u} defines the noise in unit responses (Ma, Beck, Latham, & Pouget, 2006).

The values of the priors, *Ŷ*_{i}, are initialized based on the *cue* and its *validity* at the beginning of a trial (*t* = 0). Later (*t* > 0), however, the priors are updated based on the responses of layer 3 units in previous iterations. This amounts to using priors that are updated over time.

### Input layer units

We defined the input stimulus, **s** = (*s*_{1}, *s*_{2}, *s*_{3}), as a three element vector
and at each iteration we computed a noisy abstracted observation
where *σ*_{s} defines the noise level. Next, we calculated the input layer responses
where *ψ*_{i} are noise-free neuronal responses based on their tuning curves
Note that the input layer units were not subject to the energy minimization, and they did not receive feedback and prior drive.

### Prior units

At the beginning of each trial (t = 0) we defined *initial prior probabilities, c* = (*c*_{1}, *c*_{2}, *c*_{3}), which depended on the cue and its validity. For example in a trial under the 75% validity condition, if the cue is a *face*,
Then we computed the activity of prior units at t = 0 as follows:
For *t* > 0, the prior unit activities were updated recursively at each iteration (in a single trial) with the past values of unit responses. Specifically, the values of in the previous iteration become the prior in the next iteration.

### Decision

To make a decision we calculated the sum of last layer’s (Layer 3) face and house unit responses for left- and right locations separately (T_{LEFT}, T_{RIGHT}). Then, a decision is made by the model
where λ is the decision threshold (Heekeren, Marrett, Bandettini, & Ungerleider, 2004). If the above-mentioned conditions are not satisfied, a choice is made randomly.

## Results

### Model Simulations of Behavioral Data

We tested whether the cortical model can explain the observed behavioral effect. To this end we fit the model to the observer data at the individual participant level by optimizing three parameters λ (decision criteria), Δ-*t* (duration of an iteration), and (variance of the tuning curves at the input layer).

Figure 3*a* shows the simulation results for a single participant (see *Supplementary Material* for simulations of all participants). The results agree well with the empirical findings: when the cues are invalid, the curve shifts to the right in the 75%-validity condition, indicating that the cortical model also needs a longer time to detect the location of the target image in an incongruent (unexpected) trial. There was no difference between the congruent and incongruent trials in the 50% validity condition, again consistent with the empirical data.

Next, we tested whether the cortical model suggests that further processes, in other words greater number of iterations, leads to the longer thresholds under incongruent trials. For this, we compared the number of iterations performed by the model in all trial types (congruent, incongruent) and validity conditions (100%, 75%, 50%). Recall that the number of iterations, N, is computed by taking the ratio of the duration of that trial, τ, and the time it takes to complete a single iteration, Δ *t*. Figure 3*b* shows the number of iterations under each condition. The results show that N is greater for incongruent trials under the 75% validity condition but not under the 50% validity condition. Specifically, we performed 2 (trial type: congruent, incongruent) x 2 (validity: 75, 50) repeated measures ANOVA to investigate the effect of expectation and validity on N. We found that the main effect of expectation was statistically significant (*F* (1,7) = 18.511, *p* = 0.004), but there were no main effect of validity and interaction (*F* (1,7) = 0.299, *p* = 0.602; *F* (1,7) = 0.738, *p* = 0.419). The number of iterations were significantly greater in incongruent trials than in congruent trials in the 75%-validity condition (*t* (7) = 3.220, *p* = 0.015). However, there was no difference between the congruent and incongruent trials in 50%-validity condition (*t* (7) = 2.047, *p* = 0.08), as well as no differences between the 100% validity condition and the congruent trials of 50% (*t* (7) = -1.829, *p* = 0.110) and 75% validity conditions (*t* (7) = -1.247, *p* = 0.253). These results are consistent with the empirical data, and show that simply further processing, thus a longer time is required to converge on a decision when the expectations are not met.

### Timecourse of unit responses

Figure 4 shows the unit responses at each layer of the cortical model for a single trial. The trial is from the 75%-validity condition in which a face cue is presented. In all layers, at the beginning of the trial, face units respond higher than other units when the presented image is congruent or incongruent with the cue. However, the responses change throughout the iterations of the trial. Specifically, in the congruent trials (i.e. when the presented image is a face), face units continue to be the most responsive units until the end of the trial. However, in the incongruent trials (i.e. when the presented image is a house), face units’ responses decrease while house unit responses gradually increase throughout the trial. The comparison between the unit responses of the congruent and incongruent trials clearly show that the model responses (for a correct decision) are delayed in the incongruent trials compared to the congruent trials.

## Discussion

In this study we present a recurrent cortical model to explain the behavioral effects of expectation on early visual processes that we found in Urgen and Boyaci (2021). Recurrent models have been suggested to be superior for visual inference compared to the models with only feedforward architecture (van Bergen & Kriegeskorte, 2020). Model fitting results reveal that the cortical model can successfully predict behavioral effects of expectation. Specifically, when expectations are not met, the cortical model needs to compute more iterations, which results in longer processing, to converge on a decision. Notably, this result is inline with our previous findings with Bayesian modeling of the same data (Urgen & Boyaci, 2021), and further bolster that additional steps of computation is responsible for the higher perceptual thresholds in unmet expectations.

There are several mechanistic cortical models which have computational constructs that are analogous to the ones in Bayesian framework (e.g., Friston, 2005; Heeger, 2017; Mumford, 1992; Rao & Ballard, 1999). Despite the architectural differences between these models, referred as predictive processing models here, they all provide a compelling frame-work to understand the involvement of prior knowledge in cortical information processing. In line with this, several neuroimaging findings provide strong neural evidence that top-down information coming from higher regions have a modulatory effect on the activity of early visual areas as well as higher visual processing areas (e.g. FFA) (Bar, 2004; Egner, Monti, & Summerfield, 2010; Gilbert & Sigman, 2007; Kok, Bains, van Mourik, Norris, & de Lange, 2016; Kok, Brouwer, van Gerven, & de Lange, 2013; Kok, Jehee, & De Lange, 2012; Muckli et al., 2015; Muckli & Petro, 2013; Summerfield et al., 2006; Summerfield & Koechlin, 2008). Specifically, recent neuroimaging findings showed that prior knowledge and expectations influence several information processing stages, including the early and late stages of visual processing (e.g., Alink, Schwiedrzik, Kohler, Singer, and Muckli (2010); Egner et al. (2010); Kok et al. (2013, 2012); Richter, Ekman, and de Lange (2018); Summerfield et al. (2006); Summerfield and Koechlin (2008)). Accordingly, recent neural evidence has been interpreted to be consistent with the predictive processing account of brain function. However, the models are not directly tested against empirical data, which hinders the real use and explanatory power of these models. Our primary effort in this study was to directly test a predictive processing model against empirical data and provide a mechanistic understanding of the effect of expectation on visual perception.

The recurrent cortical model adapted in this study is a very simple and parsimonious one that does not include neither subpopulations of special neural units, e.g. error or prediction computation, nor comparison of (low-level) sensory input and (high-level) predictions (Heeger, 2017). The model assumes that information processing in the brain can be accomplished simply by feedforward and feedback connections. Our findings show that even such a simple and parsimonious model can successfully elucidate the behavioral effects of expectation on perceptual processes. Specifically, we suggest that when we are exposed to an unexpected stimulus, there might be a change in feedforward-feedback interactions, e.g. additional neural units may become active and get involved in the process. This may in turn elicit additional processing, and consequently result in longer computations. This idea can account for why unexpected stimulus leads to higher perceptual thresholds and delay in sensory processes as found in Urgen and Boyaci (2021).

## Conclusion

We contend that the cortical model we propose here offers a parsimonious explanation for the effects of expectation on sensory processes. Delays in human responses to unexpected stimuli can simply be explained with further, and consequently longer, computations required by the system. The proposed model simulations agree well with a Bayesian model, as well. From a broader perspective, the model offers a biologically plausible mechanism underpinning Bayesian perceptual inference in the brain and offers a rigorous link between behavioral and neuronal responses.

## Author Contributions

HB and BMU conceived the original study. BMU implemented the cortical model and performed the modeling with support from HB.

## Competing Interests

The authors declare no competing interests.

## Acknowledgements

This work was funded by a grant of the Turkish National Scientific and Technological Council (TÜ BİTAK 217K163) awarded to HB. We thank Katja Doerschner for her valuable comments on an earlier version of the manuscript.