## Abstract

During perception, the brain combines information received from its senses with prior information about the world (von Helmholtz, 1867) – a process whose neural basis is still unclear. If sensory neurons represent posterior beliefs in a Bayesian inference process, then they, just like the beliefs themselves, must depend both on sensory inputs and on prior information. We derive predictions for how prior knowledge relates a neuron’s stimulus tuning to its response covariability in a way specific to the psychophysical task performed by the brain, and for how this covariability arises from both feedforward and feedback signals. We show that our predictions are in agreement with existing measurements. Finally, we demonstrate how to use neurophysiological measurements to reverse-engineer information about the subject’s internal beliefs about the structure of the task. Our results reinterpret neural covariability as signatures of Bayesian inference and provide new insights into their cause and their function.

## Introduction

At any moment in time, the sensory information entering the brain is insufficient to give rise to our rich perception of the outside world. To compute those rich percepts from incomplete and noisy inputs, the brain has to employ prior experience about which causes are most likely responsible for a given input (von Helmholtz, 1867). Mathematically, this process can be formalized as probabilistic inference in which posterior beliefs about the outside world (our perception), are computed as the product of a likelihood function (based on sensory inputs) and prior expectations. While there is ample empirical evidence that human behavior is consistent with such probabilistic computations (reviewed in (Pouget et al., 2013; Ma and Jazayeri, 2014)), how these computations are implemented in the brain is far from clear. Our work builds on the previous observation that these Bayesian computations map naturally onto a cortical architecture in which feedforward (bottom-up) pathways communicate the information in the likelihood function about the sensory inputs, feedback (top-down) pathways communicate prior expectations, and cortical sensory neurons compute posterior beliefs about the variables that they represent (Mumford, 1992; Lee and Mumford, 2003). While it is conceptually straightforward to investigate the feedforward pathway by varying the external stimulus in a way controlled by the experimenter and recording neural responses and behavior (reviewed in (Parker and Newsome, 1998)), it is less obvious how to probe the feedback influences on sensory neurons without control of internal representations (see Figure 1). There are two principal ways to overcome this challenge: correlational studies that rely on changes to internal representations over natural development (Berkes et al., 2011), and – as we describe below – causal studies that affect internal representations in an experimenter-controlled way.

In the first part of this paper, we describe a general hypothesis of ‘posterior coding’ that relates firing rates directly to Bayesian inference with few assumptions. From this hypothesis we derive relationships between sensory neurons’ stimulus tuning and their (co-)variability while the experimenter keeps the external stimulus constant. Importantly, those relationships are specific to the task context defined by the experimenter and thereby allow interventional tests of the predictions. A comparison of our *task-specific* predictions with existing empirical studies confirms them. We further relate these predictions to the ongoing debate about the cause and interpretation of decision-related signals and response correlations in sensory cortex.

The functional implications of response variability and covariability for sensory coding has almost exclusively been analyzed and discussed in the context of classical feedforward encoding/decoding models (Zohary et al., 1994; Abbott and Dayan, 1999; Shamir and Sompolinsky, 2006; Ecker et al., 2011) (reviewed in (Kohn et al., 2016)), even when explicitly acknowledging that some of that variability may be induced by extrasensory common inputs (Ecker et al., 2014, 2016). While it enables one to compute the effect of covariability on the information contained in neural responses about the external stimulus, the classical framework makes no predictions about its structure or source.

Our results extend those in a recent numerical study (Haefner et al., 2016) based on specific assumptions about how exactly probabilities are represented in the brain, about the stimulus tuning of the sensory neurons, and about the structure of the internal model. Our results further expose the analytical relationships that drive the numerical observations in that study.

In the second part of this paper, we build on insights from the first part and demonstrate a way to use recordings of sensory neurons’ responses to infer aspects of a subject’s internal, prior beliefs. In particular, we describe how to interpret them in terms of the stimulus to yield information about the subject-specific strategies in psychophysics tasks.

## Results

Our central hypothesis is of ‘posterior coding’ – that sensory neurons encode *posterior* beliefs over latent variables in the brain’s internal model (Lee and Mumford, 2003; Hoyer and Hyvärinen, 2003; Fiser et al., 2010; Haefner et al., 2016). If they do, then their responses will depend both on information from the sensory periphery (likelihood), and on relevant information in the rest of the brain (prior). In a hierarchical model, the former are communicated by feedforward connections from the periphery, and the latter are relayed by feedback connections from higher-level areas (Lee and Mumford, 2003) (Figure 2a). Many of our predictions below stem from the simple insight that any given posterior (Figure 2b–c, middle row) may arise from the combination of an uninformative prior with an informative likelihood (Figure 2b), or from the reverse (Figure 2c), implying that neurons that encode the *output* of the Bayesian computation (posteriors) will respond equivalently when they are informed by the stimulus or when they are informed by prior expectations about the stimulus.

We formalize these ideas in a hierarchical generative model (Figure 1, Figure 2a). **E** represents the directly observed variable – the sensory input, and **x** represents the variable corresponding to the recorded neural population under consideration. **I** is a high-dimensional vector representing all other internal variables in the brain that are probabilistically related to **x**. For instance, when considering the responses of a population of V1 neurons, **E** is the high-dimensional image projected onto the retina, and **x** has been hypothesized to represent the presence or absence of Gabor-like features at particular retinotopic locations (Bornschein et al., 2013) or the intensity of such features (Olshausen and Field, 1996; Schwartz and Simoncelli, 2001), though the exact nature of these variables is not important for our results. In higher visual areas, variables are likely related to the identity of objects and faces (Kersten et al., 2004). **I** represents these higher-level variables, as well as knowledge about the visual surround, task-related knowledge about the probability of upcoming stimuli, etc. There is an important distinction between the *variables* in the brain’s internal model (i.e. **x** and **I**) and the responses of neurons that encode the *distributions* of these variables via some representation ℛ (Figure 3).

In this framework, classical feedforward tuning curves (Dayan and Abbott, 2001) reflect probabilistic relationships between the variables represented by a neuron and the sensory inputs. Changes to the evidence **E** along an experimenter-defined direction *s* (e.g. rotating an image of a grating) affect the inferred probability of *P*(**x**|**E**). If the variable **x** represented by the recorded neurons is statistically dependent on *s*, then the likelihood *P*(**E**|**x**) will vary as *s* is varied. As a result, the posterior *P*(**x**|**E**) will also vary (Figure 3a), and in turn so will the neural responses representing it. The dependence of the mean of those responses on *s* gives rise to tuning curves, denoted **f**(*s*) (Figure 3b, Methods). Furthermore, for small changes in *s* around some reference point, *s* = 0, we can linearly approximate the average neural responses: . That is, the population response, **r**, changes in the f′ ≡ d **f**/d*s*-direction due a changing posterior belief about **x**, which in turn is driven by changes in the external stimulus **E**(*s*) (Averbeck et al., 2006).

We now derive predictions for the effect of the *prior* on sensory responses. When a subject performs a perceptual decision-making task, the experimenter defines a distribution of stimuli *P*_{task}(**E**) used in that task. Learning a task implies an increase in the subject’s prior for *P*_{task}(**E**) as they begin to expect stimuli drawn from this distribution. In discrimination tasks, the stimulus is varied along the experimenter-defined axis *s*, and subjects must make decisions about the category of *s* by observing **E**. For example, *s* could be the orientation coherence of a grating (Bondy and Cumming, 2016), dot motion coherence (Britten et al., 1992), or the frequency of tactile stimulation (Romo and Salinas, 2001). The experimenter also determines the distribution of stimuli at a particular value of *s* (e.g. by embedding the signal in noise), as well as the distribution of *s*, *P* (*s*). Consequently, *P*_{task}(**E**) = ∫ *P*(**E**|*s*)*P* (*s*)d*s*. If the subject has completely learnt the task, the prior over **x** will correspond to the average likelihood in the task (Berkes et al., 2011):

Intuitively, *P*(**x**) defines a small volume of increased probability mass in **x**–space, *elongated* along a line given by the dependence of the mean of *P*(**x**) on *s* (Figure 3a).

For illustration, consider a stimulus distribution, *P*(*s*), that is symmetric with respect to the decision-boundary, *s* = 0, i.e. training with an equivalent number of trials for each signal level for both choices. This induces a symmetric prior along in the brain. Many experiments contain a fraction of ‘zero-signal’ trials in which the average stimulus is uninformative about the correct decision (Britten et al., 1996; Nienborg et al., 2012); that is, the likelihood is symmetric with respect to the two categories. If both categories are equally likely *a priori*, then performing exact inference in these trials will yield a symmetric posterior (Figure 4a for an example). However, inference in the brain is at best approximate, both in terms of computation and in terms of representation. Hence on any one trial, the actual likelihood and prior used by the brain deviates from the correct one. The likelihood varies as the result of noise in the stimulus and because of noise in the afferent pathway. The prior varies if the subject erroneously assumes serial dependencies between trials (Fischer and Whitney, 2014), or if the subject develops a belief about the value of *s* over the course of each trial (Haefner et al., 2016).

Trial-to-trial changes in the likelihood entail trial-to-trial changes in the posterior that lie primarily along since that is the line along which most of the prior mass is concentrated (Figure 4b; see also Figure S1). Furthermore, changes in the subject’s internal beliefs about *s* – both within and across trials – will by definition cause a shift in the posterior mass along , this time through the prior (Figure 4c; see also Figure S2). At the same time, any changes along entail changes in the neural responses along the f′–direction – at least to a linear approximation as explained above (Figure 4d). Intuitively, this means that both variation in the stimulus and variation in the subject’s beliefs about the stimulus are reflected in changes in neural responses along f′. The consequence is increased covariability proportional to f′f′^{T}. Dividing both sides by the response variability, task-dependent noise correlations are predicted to be proportional to the product of the neural sensitivities: (using d-prime to measure sensitivity). This predicted proportionality has two direct implications: first, performing a task should most change the noise correlation between neurons that are the most informative for this specific task, i.e. for whom has the largest magnitude. Second, this change should be positive for neurons with the same task-specific selectivity, i.e. should both increase or both decrease their activity in response to a stimulus predictive of a particular choice, and negative for those with opposite preferences. This is exactly the correlation structure observed in the empirical data recorded from primary visual cortex while a monkey was performing a coarse orientation discrimination task (Bondy and Cumming, 2016). Furthermore, it explains and generalizes numerical results generated for the specific case of a neural sampling-based representation (Haefner et al., 2016) to a wide range of representations including neural sampling and probabilistic population codes (Tajima et al., 2016; Hoyer and Hyvärinen, 2003; Haefner et al., 2016; Buesing et al., 2012; Pecevski, 2011; Savin and Denève, 2014).

We emphasize that our predictions only describe how learning a task-specific prior *changes* response correlations, and makes no predictions about correlations induced by the prior that the brain has learnt for natural images (Olshausen and Field, 2004; Berkes et al., 2011), or those that are the result of specific connectivity patterns between neurons. One strategy to experimentally test the task-specific predictions is to hold the stimulus constant while switching between two comparable tasks a subject is performing, predictably altering their task-specific prior (Methods). The difference in neural responses to zero-signal stimuli will isolate the task-dependent component to which the above predictions apply (Figure 5b). At least two existing studies have used a similar approach (Cohen and Newsome, 2008; Bondy and Cumming, 2016), and found changes in the correlation structure consistent with our predictions (discussed in (Haefner et al., 2016)). A related approach is to compare the amount of correlated variability in the current task’s direction with other ‘hypothetical’ tasks, which is possible having measured the neurons’ tuning curves in those other contexts (Figure 5c–e). A third strategy is to *statistically* isolate the top-down component of neural variability within a single task using a sufficiently powerful regression model. A recent study (Rabinowitz et al., 2015) used this type of approach to infer the primary top-down ‘modulators’ of V4 responses in a change-detection task (Cohen and Newsome, 2009), and found that the dominant modulator had projections to each neuron proportional to the neuron’s *d*′, implying correlated variability in the population proportional to (their data replotted in Figure 5f).

In addition to making empirically testable predictions for the influence of top-down signals on neural responses, the probabilistic inference framework provides a normative explanation for their existence. While in the classic feedforward framework decision-related signals contaminate the sensory evidence and decrease behavioral performance (Wimmer et al., 2015), here they serve the function of communicating to a sensory neuron knowledge derived from stimuli at earlier points in time, or any other relevant information from the brain’s complex internal model. Consider the case of a dynamic stimulus in which the noise obscuring the fixed signal is dynamically redrawn over the course of the trial. Given the knowledge that the underlying signal has not changed, the brain’s posterior belief about the signal should integrate information over all stimulus frames presented up to that moment. At any point in time, this belief over the previous stimulus frames acts as a prior that is to be combined with the likelihood representing the next stimulus frame. Communicating that prior to sensory neurons allows them to take the information provided by previous stimulus frames into account and not just rely on the current inputs (Figure 5f). Interestingly, the *d*′*d*′ correlations induced through top-down signals here have the same shape as the information-limiting correlations previously described (Moreno-Bote et al., 2014). However, unlike in the feedforward case where these correlations limit information (Moreno-Bote et al., 2014), here they are induced through feedback signals that reflect prior beliefs about the stimulus, e.g. from earlier frames in the trial (Figure 5g), or due to the subject’s internal beliefs going into the trial. In general, differential correlations reduce information only when they are induced by variability unrelated to the stimulus (i.e. actual noise), and not if they are induced by prior knowledge about the stimulus.

We next ask what the implications of learning the task-specific sensory prior are for decision-related signals in sensory neurons (Parker and Newsome, 1998; Nienborg et al., 2012). Under the assumption that the behavioral decision of the subject is based on the posterior belief represented by the neurons under consideration, the average posterior preceding choice 1 will have more mass favoring choice 1, and the average posterior preceding choice 2 will have more mass favoring choice 2, even if the average posterior across all trials is symmetric with respect to the decision boundary. Since the difference in the corresponding mean responses is proportional to the tuning curve slope vector f′, it follows that where CTA_{i} is the ‘choice triggered average,’ or difference between neuron *i*’s mean response preceding choice 1 and its mean response preceding choice 2. This prediction relates the dependence of a neuron’s response on the external stimulus at the category boundary to the dependence of its response on the choice *given a fixed stimulus*. In fact, when dividing both sides of this proportionality by the standard deviation of the neuron’s response, *σ*_{i}, one obtains a prediction for the relationship between a neuron’s choice probability (CP) and its neural sensitivity: (Figure 5a). Many empirical studies have found such a relationship between a neuron’s CP and its neurometric sensitivity (Nienborg et al., 2012). Interestingly, the classic feedforward-only framework makes the same prediction as the probabilistic inference framework when the decoding weights are linear optimal (Haefner et al., 2013). Therefore, this prediction alone cannot distinguish between the classic feedforward framework and the probabilistic inference framework.

### Reverse-engineering the internal model

We have shown that internal beliefs about the stimulus induce corresponding structure in the correlated variability of sensory neurons’ responses. Conversely, this means that the statistical structure in sensory responses can be used to infer properties of these beliefs.

The task structure of a simple discrimination task as discussed above determines the only task-relevant belief (which of two target stimuli is the better explanation for the external inputs). However, more complicated tasks may involve inference over more than one variable, and therefore more than one task-relevant belief. For instance, a task in which the categories to be discriminated can vary from trial to trial involves inference both over the correct task and over the correct choice. Even if a pre-trial cue indicates the correct task, the cue may not be completely reliable, or the subject may not be completely certain about the cue (Cohen and Newsome, 2008; Sasaki and Uka, 2009). This uncertainty may be about the task parameters (e.g. the specific target orientation, or spatial frequency), or due to confusion with a previously learnt task. If those task-related uncertainties are sufficiently large, trial-to-trial variability in the associated beliefs will lead to measurable changes in the statistical structure of sensory responses (Figure 6a), as well as a decrease in behavioral performance.

Importantly, the probabilistic inference framework also suggests an intuitive method for interpreting top-down sources of covariability. As described above, tuning curves have a general probabilistic interpretation in terms of the statistical dependence between *s* and the variable(s) **x** represented by a population. As responses are assumed to encode the posterior over **x**, it follows that variability in **x** – whether due top-down or bottom-up sources – may be understood in terms of the same stimulus parameters (i.e. *s* or **E**) to which the neurons are tuned. For example, top-down modulators of neurons that are tuned to visual orientation may themselves be understood, in part, as varying prior beliefs about orientation.

In order to demonstrate the usefulness of this approach, we used it to infer the structure of an existing neural-sampling-based probabilistic inference model for which the ground truth is known (Haefner et al., 2016). In the simulated task, subjects had to perform a coarse orientation discrimination task either between a vertical and a horizontal grating (cardinal context), or between a −45deg and +45deg grating (oblique context). The model was cued to the correct context before each trial, but had remaining uncertainty about the correct task context corresponding to an 80% − 20% prior. The model simulates the responses of a population of primary visual cortex neurons with oriented receptive fields. Since the relevant stimulus dimension for this task is orientation, we sorted the neurons by preferred orientation. The resulting noise correlation matrix – computed for *zero-signal trials* – has a characteristic structure in qualitative agreement with empirical observations (Figure 6b) (Bondy and Cumming, 2016). The correlation matrix has five significant eigenvalues (Figure 6d) corresponding to five eigenvectors (Figure 6c). Each of these eigenvectors (equivalent to the principal components of the population activity) represents one direction in which the trial-by-trial variability in the neural responses is larger than expected. Knowing the stimulus selectivity of each neuron, i.e. how the response of each neuron depends on variables in the external world, allows us to interpret the eigenvectors in terms of variables in the external world. For instance, the elements of the eigenvector associated with the largest eigenvalue (blue in Figure 6c) are largest for neurons with vertically oriented receptive fields, and negative for those neurons with preferred horizontal orientation. Finding such an eigenvector in empirical data therefore indicates that there is trial-to-trial variability in the subject’s internal belief (represented by the rest of the brain and communicated as a prior on the sensory responses) about whether “there is a vertical grating and not a horizontal grating” – or vice versa – in the stimulus. Recall that the external stimulus was fixed, i.e. that this variability is due to variability in the internal beliefs, not the external stimulus. Knowing the stimulus-dependence of the neurons’ responses allows us to interpret the abstract statistical structure in neural covariability in terms of the stimulus space defined by the experimenter. Equally, one can interpret the eigenvector corresponding to the third-biggest eigenvalue (yellow in Figure 6c–d) as corresponding to the belief that a +45-degree grating is being presented, but not a −45-deg grating, or vice versa. This is the correct axis for the wrong (oblique) context, indicating that the subject maintained some uncertainty about which is the correct task context across trials. (see Methods for interpretation of other eigenvectors shown in Figure 6c).

Maintaining this uncertainty is the optimal strategy from the subject’s perspective given their imperfect knowledge of the world. However, when compared to certain (perfect knowledge), it decreases behavioral performance on the actual task defined by the experimenter. In the probabilistic inference framework, behavioral performance is optimal when the internal model learnt by the subject exactly corresponds to the experimenter-defined one. An empirical prediction, therefore, is that eigenvalues corresponding to the correct task-defined stimulus dimension will increase with learning, while eigenvalues representing other dimensions should decrease. While no study has analyzed data in this framework, we know that the first and third eigenvalue must initially be increasing during task learning simply because task-dependent correlations can by definition only emerge over the course of learning. At the same time, the third eigenvalue should decrease again at some point since it represents uncertainty over the correct task context, which is presumably decreasing with learning. A previous study reported a decrease in average noise correlations due to learning (Gu et al., 2011). In our analysis, this would correspond to a decrease in the 2nd eigenvalue (average noise correlations are captured by the red eigenvector since it is approximately constant).

Much research has gone into inferring latent variables that contribute to the responses of neural responses (Cunningham and Yu, 2014; Archer et al., 2014; Kobak et al., 2016). Our predictions in the context of the probabilistic inference framework suggest that at least some of these latent variables can usefully be characterized as internal beliefs. Importantly, our framework suggests that the coefficients with which each latent variable influences each of the recorded sensory neurons can be interpreted in the stimulus space using knowledge of the stimulus-dependence of each neuron’s tuning function (Figure 6c).

## Discussion

We have derived task-specific, neurophysiologically testable, predictions within the mathematical framework of probabilistic inference (Ma and Jazayeri, 2014; Pouget et al., 2013; Fiser et al., 2010; Knill and Pouget, 2004; Kersten et al., 2004). Our assumption that sensory neurons represent posterior beliefs, not likelihoods, means that sensory responses do not just represent information about the external stimulus but also include information about the brain’s expectations about this stimulus (Lee and Mumford, 2003). By treating task-training as an experimenter-controlled perturbation of the brain’s expectations (part of the internal model), we have derived predictions for how neural responses should change as a result of this perturbation. Our derivation makes only minimal assumptions about the relationship between neural responses and posterior beliefs, making it applicable to a wide range of proposed neural implementations of probabilistic inference (Lee and Mumford, 2003; Tajima et al., 2016; Hoyer and Hyvärinen, 2003; Haefner et al., 2016; Buesing et al., 2012; Pecevski, 2011; Savin and Denève, 2014). Our approach has allowed us to sidestep two major challenges: that the brain’s internal model is currently unknown, and that there is no consensus on how neurons represent probabilities (Pouget et al., 2013; Fiser et al., 2010). While the presented theoretical predictions are novel, they are in agreement with a range of prior (Cohen and Newsome, 2008; Law and Gold, 2008; Gu et al., 2011; Rabinowitz et al., 2015) and new (Bondy and Cumming, 2016) empirical findings. Finally, we have used this framework to show how aspects of the low-dimensional structure in the observed covariability can be used to reverse engineer the structure of the internal beliefs that vary on a trial-to-trial basis.

The nature of our predictions directly addresses several debates in the field. First, they provide a rationale for the apparent ‘contamination’ of sensory responses by top-down decision signals (Nienborg and Cumming, 2009; Wimmer et al., 2015; Ecker et al., 2016; Rabinowitz et al., 2015). In the context of our framework, top-down signals allow sensory responses to incorporate stimulus information from earlier in the trial, not reflecting the decision per se but integrating information about the outside world (Nienborg and Roelfsema, 2015). Second, this dynamic feedback of feedforward stimulus information from earlier in the trial induces choice probabilities that are the result of both feedforward and feedback components (Nienborg and Cumming, 2009, 2014; Haefner et al., 2016). Third, the same process introduces correlated sensory variability that appears to be information-limiting (Moreno-Bote et al., 2014) but is not. Whether *f*′*f*′–covariability increases or decreases information depends on its source: if the latent variable driving it contains information about the stimulus, it adds information; if it is due to noise (Kanitscheider et al., 2015), then it reduces it.

Furthermore, the assumption that sensory responses represent posterior beliefs formalizes previous ideas and agrees with empirical findings about the top-down influence of experience and beliefs on sensory responses (von der Heydt et al., 1984; Lee and Mumford, 2003; Nienborg and Cumming, 2014). It also relates to a large literature on association learning and visual imagery (reviewed in (Albright, 2012)). In particular, the idea of ‘perceptual equivalence’ (Finke, 1989) reflects our starting point that the very same posterior belief (and hence the same percept) can be the result of different combinations of sensory inputs and prior expectations. In a discrimination task, for instance, there are three distinct associations inducing correlations. First, showing the same input many times induces positive correlations between sensory neurons responding to the same input. Second, presenting only one of two possible inputs induces negative correlations between neurons responding to different inputs. Third, keeping the input constant within a trial induces positive auto-correlations.

It seems plausible that only a subset of sensory neurons actually represent the output of the hypothesized probabilistic computations (posterior), while others represent information about necessary ‘ingredients’ (likelihood, prior), or carry out other auxiliary functions (Pecevski, 2011). Since our work also shows how to generate task-dependent predictions for those ingredients, it can serve as a tool for a hypothesis-driven exploration of the functional and anatomical diversity of sensory neurons.

In deriving the predictions for changes in the task-specific correlations we have implicitly assumed that the feedforward encoding of sensory information, i.e. the likelihood *P*(**E**|**x**), remains unchanged between the compared conditions. This is well-justified for lower sensory areas in adult subjects (Hensch, 2005), or when task contexts are switched on a trial-by-trial basis (Cohen and Newsome, 2008). However, it is not necessarily true for higher cortices (Li and DiCarlo, 2008), especially when conditions are compared separated by long periods of task (re)training. In those cases, changing sensory statistics may lead to changes in the feedforward encoding, and hence the nature of the represented variable **x** (Ganguli and Simoncelli, 2014; Wei and Stocker, 2015).

Previous work has demonstrated the possibility of using *behavioral* judgements to infer the shape of a subject’s prior (Houlsby et al., 2013). Our results are complementary to behavioral methods, but have the advantage that the amount of information that can be collected in neurophysiology experiments far exceeds that in psychophysical studies.

The detail with which the internal beliefs can be recovered from the statistical structure in neurophysiological recordings is primarily limited by experimental techniques. Much current research is aimed developing those techniques and at extracting the latent structure in the resulting recordings. For illustration, we used principal component analysis in Figure 6, implicitly assuming linear effects of varying beliefs on the sensory population (Methods) and orthogonality of their directions. With nonlinear effects of the prior and in order to infer non-orthogonal causes, more sophisticated tools will be required to infer latent structure in sensory responses (Cunningham and Yu, 2014). Importantly, our work suggests a way to interpret this structure, and makes predictions about how it should change with learning and attention.

## Methods

### Definition of tuning curves

Most generally, one can think of the process of encoding the posterior as a functional *R* that maps from a distribution over **x** to a distribution of neural responses: *P*(**r**) = *R*[*P*(**x**)] (Figure 1). We require that is smooth as changes (where denotes the mean of *y* across trials), which allows us to use linear approximations of tuning functions. We define the tuning function of neuron *i* as the neuron’s mean response across trials within a specific task context as **E** is changed with *s*:
where *P*(**x**|*s*) ≡ *∫ P*(*x*|**E**, **I**)*P*(**E**|*s*)*P*(**I**) d**E**d**I**.

### Prediction for the difference between comparable tasks

The magnitude of task-dependent response variability depends on the magnitude of the trial-to-trial changes in beliefs about *s*, and on strength and shape of the learned prior along . Two arbitrary tasks will in general differ in these aspects as well as in the intrinsic covariance of responses to the zero-signal stimulus. We call two tasks ‘comparable’ when they agree in both the magnitude of the prior and the intrinsic response covariance, as can reasonably be expected, for instance, in rotationally symmetric situations where all that changes between the tasks is the angle (Bondy and Cumming, 2016) or direction (Cohen and Newsome, 2008) of the discrimination boundary while the zero-signal stimulus stays the same. In that case the strength of the respective *f*′*f*′–component can be assumed to be the same and hence, the intrinsic covariability can be subtract out:
where superscripts denote the task. That is, denotes the slope of neuron *i*’s tuning curve with respect to the discrimination axis in task 1 measured at *s* = 0. Note that two fine discrimination tasks (e.g. orientation discrimination around the vertical and the horizontal axes, respectively) are not necessarily ‘comparable’ since the two tasks differ in their zero-signal stimulus (a vertical and a horizontal grating, respectively), which may yield different intrinsic covariability.

### Inferring internal model

Complex tasks (e.g. those switching between different contexts), or incomplete learning (e.g. uncertainty about fixed task parameters), will often induce variability in multiple internal beliefs about the stimulus. Assuming that this variability is independent between the beliefs, we can write the observed covariance between two neurons as . Here, each vector corresponds to the change in the population response corresponding to a change in internal belief *k*. The coefficients λ^{(k)} correspond to the variance of the trial-to-trial variability in belief *k*, and represents the intrinsic covariance.

The model in our proof-of-concept simulations has been described previously (Haefner et al., 2016). In brief, it performs inference by neural sampling in a linear sparse-coding model of primary visual cortex (Olshausen and Field, 1996; Hoyer and Hyvärinen, 2003; Fiser et al., 2010). The prior is derived from an orientation discrimination task with 2 contexts – oblique orientations, and cardinal orientations – that is modeled on an analog direction discrimination task (Cohen and Newsome, 2008). We simulated the responses of 1024 V1 neurons whose receptive fields uniformly tiled the orientation space. Each neuron’s response corresponds to a sample from the posterior distribution over the intensity of its receptive field in the input image. We simulated zero-signal trials by presenting white noise images to the model. The elements of the eigenvector corresponding to the 2nd largest eigenvalue are all approximately the same indicating that variability corresponding to the associated latent variable adds response variability that does not depend on the neurons’ orientations. Since the recovered eigenvectors are orthogonal to each other, the eigenvalue corresponding to a constant eigenvector determines the average correlations in the population. The eigenvectors not described in the main text correspond to stimulus-driven covariability, plotted in Figure S3 for comparison.

## Acknowledgements

We thank the many colleagues with whom we have discussed this work and who have provided us with valuable feedback, in particular Alex Ecker, Matthias Bethge, Hendrikje Nienborg, Jakob Macke, Adrian Bondy, and Bruce Cumming.

## Footnotes

↵* ralf.haefner{at}gmail.com