## Abstract

Sensory stimuli can be recognized more rapidly when they are expected. This phenomenon depends on expectation affecting the cortical processing of sensory information. However, virtually nothing is known on the mechanisms responsible for the effects of expectation on sensory networks. Here, we report a novel computational mechanism underlying the expectation-dependent acceleration of coding observed in the gustatory cortex (GC) of alert rats. We use a recurrent spiking network model with a clustered architecture capturing essential features of cortical activity, including the metastable activity observed in GC before and after gustatory stimulation. Relying both on network theory and computer simulations, we propose that expectation exerts its function by modulating the intrinsically generated dynamics preceding taste delivery. Our model, whose predictions are confirmed in the experimental data, demonstrates how the modulation of intrinsic metastable activity can shape sensory coding and mediate cognitive processes such as the expectation of relevant events. Altogether, these results provide a biologically plausible theory of expectation and ascribe a new functional role to intrinsically generated, metastable activity.

## Introduction

Expectation exerts a strong influence on sensory processing. It improves stimulus detection, enhances discrimination between multiple stimuli and biases perception towards an anticipated stimulus^{1-3}. These effects, demonstrated experimentally for various sensory modalities and in different species^{2,4-7}, can be attributed to changes in sensory processing occurring in primary sensory cortices. However, despite decades of investigations, little is known regarding how expectation shapes the cortical processing of sensory information.

While different forms of expectation likely rely on a variety of neural mechanisms, modulation of pre-stimulus activity is believed to be a common underlying feature^{8-10}. Here, we investigate the link between pre-stimulus activity and the phenomenon of general expectation in a recent set of experiments performed in gustatory cortex (GC) of alert rats^{6}. In those experiments, rats were trained to expect the intraoral delivery of one of four possible tastants following an anticipatory cue. The use of a single cue allowed the animal to predict the availability of gustatory stimuli, without forming expectations on which specific taste was being delivered. Cues predicting the general availability of taste modulated the firing rates of GC neurons. Tastants delivered after the cue were encoded more rapidly than uncued tastants, and this improvement was phenomenologically attributed to the activity evoked by the preparatory cue. However, the precise computational mechanism linking faster coding of taste and cue responses remains unknown.

Here we propose a mechanism whereby an anticipatory cue modulates the timescale of temporal dynamics in a recurrent population model of spiking neurons. In the model proposed here, neurons are organized in strongly connected clusters and produce sequences of metastable states similar to those observed during both pre-stimulus and evoked activity periods^{11-17}. A metastable state is a vector of firing rates across simultaneously recorded neurons that can last for several hundred milliseconds before giving way to the next state in a sequence. The ubiquitous presence of state sequences in many cortical areas and behavioral contexts^{18-24} has raised the issue of their role in sensory and cognitive processing. Here, we elucidate the central role played by pre-stimulus metastable states in processing forthcoming stimuli, and show how cue-induced modulations of state sequences drive anticipatory coding. Specifically, we show that an anticipatory cue affects sensory coding by decreasing the duration of metastable states and accelerating the pace of state sequences. This phenomenon, which results from a reduction in the effective energy barriers separating the metastable states, accelerates the onset of specific states coding for the presented stimulus, thus mediating the effects of general expectation. The predictions of our model were confirmed in a new analysis of the experimental data, also reported here.

Altogether, our results provide a model for general expectation, based on the modulation of pre-stimulus ongoing cortical dynamics by anticipatory cues, leading to acceleration of sensory coding.

## Results

### Anticipatory cue accelerates stimulus coding in a clustered population of neurons

To uncover the computational mechanism linking cue-evoked activity with coding speed, we modeled the gustatory cortex (GC) as a population of recurrently connected excitatory and inhibitory spiking neurons. In this model, excitatory neurons are arranged in clusters^{12,25,26} (Fig. 1a), reflecting the existence of assemblies of functionally correlated neurons in GC and other cortical areas^{27,28}. Recurrent synaptic weights between neurons in the same cluster are potentiated compared to neurons in different clusters, to account for metastability in GC^{11,16} and in keeping with evidence from electrophysiological and imaging experiments^{27,28} ^{29,30}. This spiking network also has bidirectional random and homogeneous (i.e., non-clustered) connections among inhibitory neurons and between inhibitory and excitatory neurons. Such connections stabilize network activity by preventing runaway excitation and play a role in inducing the observed metastability^{11,12,15}.

The model was probed by sensory inputs modeled as depolarizing currents injected into randomly selected neurons. We used four sets of simulated stimuli, wired to produce gustatory responses reminiscent of those observed in the experiments in the presence of sucrose, sodium chloride, citric acid, and quinine (see Supplementary Methods for details). The specific connectivity pattern used was inferred by the presence of both broadly and narrowly tuned responses in GC^{31,32}, and the temporal dynamics of the inputs were varied to determine the robustness of the model (Supplementary Results, Sec. 1.3).

In addition to input gustatory stimuli, we included anticipatory inputs designed to produce cue-responses analogous to those seen experimentally in the case of general expectation. To simulate general expectation, we connected anticipatory inputs with random neuronal targets in the network. The peak value of the cue-induced current for each neuron was sampled from a normal distribution with zero mean and fixed variance (see Fig. S1-S2 and Supplementary Results for details), thus introducing a spatial variance in the afferent currents. This choice reflected the large heterogeneity of cue responses observed in the empirical data, where excited and inhibited neural responses occurred in similar proportions^{10} and overlapped partially with taste responses^{6,10}. Fig 1b shows two representative cue-responsive neurons in the model: one inhibited by the cue and one excited by the cue (more details and examples are reported in the Supplementary Results).

Given these conditions, we simulated the experimental paradigm adopted in awake-behaving rats to demonstrate the effects of general expectation^{6,10}. In the original experiment, rats were trained to self-administer into an intra-oral cannula one of four possible tastants following an anticipatory cue. At random trials and time during the inter-trial interval, tastants were unexpectedly delivered in the absence of a cue. To match this experiment, the simulated paradigm interleaves two conditions: in expected trials, a stimulus (out of 4) is delivered at *t=0* after an anticipatory cue (the same for all stimuli) delivered at *t=-0.5*s (Fig. 1b); in unexpected trials the same stimuli are presented in the absence of the cue. Importantly, in the general expectation paradigm adopted here, the anticipatory cue is identical for all stimuli in the expected condition. Therefore, it does not convey any information regarding the identity of the stimulus being delivered.

We tested whether cue presentation affected stimulus coding. A multi-class classifier (see Methods and Fig. S3) was used to assess the information about the stimuli encoded in the neural activity, where the four class labels correspond to the four tastants. Stimulus identity was encoded well in both conditions, reaching perfect average accuracy across the four tastants after a few hundred milliseconds (Fig. 1c, across-taste average decoding accuracy). However, comparing the time course of the decoding accuracy between conditions, we found that the increase in decoding accuracy was significantly faster in expected than in unexpected trials (Fig. 1c, pink and blue curves represent expected and unexpected conditions, respectively). Indeed, the onset time of a significant decoding occurred earlier in the expected vs. the unexpected condition (decoding latency was 0.13 ± 0.01 s [mean±s.e.m.] for expected compared to 0.21 ± 0.02 s for unexpected, across 20 independent sessions; p=0.002, signed-rank=14, d.o.f.=39; inset in Fig. 1c). Similar decoding accuracies were obtained for each individual tastant separately (see Supplementary Results and Fig. S3). Thus, in the model network, the interaction of cue response and activity evoked by the stimuli results in faster encoding of the stimuli themselves, mediating the expectation effect.

To clarify the role of neural clusters in mediating expectation, we simulated the same experiments in a homogeneous network (i.e., without clusters) operating in the balanced asynchronous regime^{25,26} (Fig. 1d, intra - and inter-cluster weights were set equal, all other network parameters and inputs were the same as for the clustered network). Even though single neurons’ responses to the anticipatory cue were comparable to the ones observed in the clustered network (Fig. 1e, Fig. S2 and Supplementary Results), stimulus encoding was not affected by cue presentation (Fig. 1f). In particular, the onset of a significant decoding was similar in the two conditions (latency of significant decoding was 0.17 ± 0.01 s for expected and 0.16 ± 0.01 s for unexpected tastes averaged across 20 sessions; p=0.31, signed-rank=131, d.o.f.=39; inset in Fig. 1f).

The anticipatory activity observed in the clustered network was robust to variations in key parameters related to the sensory and anticipatory inputs, as well as network connectivity, size and architecture (Fig. S4-6 and Supplementary Results). Furthermore, acceleration of coding depended on the patterns of connectivity of anticipatory inputs, specifically on the fact that it increased the spatial variance in the cue afferent currents (Fig. S6). In a model where the cue recruited the recurrent inhibition (by increasing the input currents to the inhibitory population), stimulus coding was decelerated (Fig. S7), suggesting a potential mechanism mediating the effect of distractors.

Overall, these results demonstrate that a clustered network of spiking neurons can successfully reproduce the acceleration of sensory coding induced by expectation and that removing clustering impairs this function.

### Anticipatory cue speeds up the network’s dynamics

Having established that a clustered architecture mediates the effects of expectation on coding, we investigated the underlying mechanism.

Clustered networks spontaneously generate highly structured activity characterized by coordinated patterns of ensemble firing. This activity results from the network hopping between metastable states in which different combinations of clusters are simultaneously activated^{11,14,15}. To understand how anticipatory inputs affected network dynamics, we analyzed the effects of cue presentation for a prolonged period of 5 seconds in the absence of stimuli. Activating anticipatory inputs led to changes in network dynamics, with clusters turning on and off more frequently in the presence of the cue (Fig. 2a). We quantified this effect by showing that a cue-induced increase in input spatial variance (𝜎^{2}) led to a shortened cluster activation lifetime (top panel in Fig. 2b; Kruskal-Wallis one-way ANOVA: p<10^{−17}, *χ*^{2}(5)=91.2), and a shorter cluster inter-activation interval (i.e., quiescent intervals between consecutive activations of the same cluster, bottom panel in Fig. 2b, kruskal-wallis one-way ANOVA: p<10^{−18}, *χ*^{2}(5)=98.6).

Previous work has demonstrated that metastable states of co-activated clusters result from attractor dynamics^{11,14,15}. Hence, the shortening of cluster activations and inter-activation intervals observed in the model could be due to modifications in the network’s attractor dynamics. To test this hypothesis, we performed a mean field theory analysis^{33-36} of a simplified network with only two clusters, therefore producing a reduced repertoire of configurations. Those include two configurations in which either cluster is active and the other inactive (‘A’ and ‘B’ in Fig. 2c), and a configuration where both clusters are moderately active (‘C’). The dynamics of this network can be analyzed using a reduced, self-consistent theory of a single excitatory cluster, said to be *in focus*^{33} (see Methods for details), based on the effective transfer function relating the input and output firing rates of the cluster (*r* and *r*_{out}, Fig. 2c). The latter are equal in the A, B and C network configurations described above - also called ‘fixed points’ since these are the points where the transfer function intersects the identity line, *r _{out}* = Φ(

*r*).

_{in}Configurations A and B would be stable in an infinitely large network, but they are only metastable in networks of finite size, due to intrinsically generated variability^{15}. Transitions between metastable states can be modeled as a diffusion process and analyzed with Kramers’ theory^{37}, according to which the transition rates depend on the height ∆ of an effective energy barrier separating them^{15,37}. In our theory, the effective energy barriers (Fig. 2c, bottom row) are obtained as the area of the region between the identity line and the transfer function (shaded areas in top row of Fig. 2c; see Methods for details). The effective energy is constructed so that its local minima correspond to stable fixed points (here, A and B) while local maxima correspond to unstable fixed points (C). Larger barriers correspond to less frequent transitions between stable configurations, whereas lower barriers increase the transition rates and therefore accelerate the network’s metastable dynamics.

This picture provides the substrate for understanding the role of the anticipatory cue in the expectation effect. Basically, the presentation of the cue modulates the shape of the effective transfer function, which results in the reduction of the effective energy barriers. More specifically, the cue-induced increase in the spatial variance, σ^{2}, of the afferent current flattens the transfer function along the identity line, reducing the area between the two (shaded regions in Fig. 2c). In turn, this reduces the effective energy barrier separating the two configurations (Fig. 2c, bottom row), resulting in faster dynamics. The larger the cue-induced spatial variance σ^{2} in the afferent currents, the faster the dynamics (Fig. 2d; lighter shades represent larger σs).

In summary, this analysis shows that the anticipatory cue increases the spontaneous transition rates between the network’s metastable configurations by reducing the effective energy barrier necessary to hop among configurations. In the following we uncover an important consequence of this phenomenon for sensory processing.

### Anticipatory cue induces faster onset of taste-coding states

The cue-induced modulation of attractor dynamics led us to formulate a hypothesis for the mechanism underlying the acceleration of coding: The activation of anticipatory inputs prior to sensory stimulation may allow the network to enter more easily configurations encoding stimuli while exiting more easily non-coding configurations. Fig. 3a shows simulated population rasters in response to the same stimulus presented in the absence of a cue or after a cue. Spikes in red hue represent activity in taste-selective clusters and show a faster activation latency in response to the stimulus preceded by the cue compared to the uncued condition. A systematic analysis revealed that in the cued condition, the clusters activated by the subsequent stimulus had a significantly faster activation latency than in the uncued condition (Fig. 3b, 0.22 ± 0.01 s (mean±s.e.m.) during cued compared to 0.32 ± 0.01 s for uncued stimuli; p<10^{−5}, rank sum test R(39)=232).

We elucidated this effect using mean field theory. In the simplified two-cluster network of Fig. 3c (the same network as in Fig. 2d), the configuration where the taste-selective cluster is active (“coding state”) and the nonselective cluster is active (“non-coding state”) have initially the same effective potential energy, in the absence of stimulation (local minima of the black line in Fig. 3c), separated by an effective energy barrier whose height is reduced by the anticipatory cue (dashed vs. full line). When the taste stimulus is presented, it activates the stimulus-selective cluster, so that the coding state will now sit in a deeper well (lighter lines) compared to non-coding state. Stronger stimuli (lighter shades in Fig. 3c) increase the difference between the wells’ depths breaking their initial symmetry, so that now a transition from the non-coding to the coding state is more likely than a transition from the coding to the non-coding state^{37}, (also in the absence of the cue; full lines). The anticipatory cue reduces further the existing barrier and thereby increases the transition rate into coding configurations. This results into *faster* coding, on average, of the stimuli encoded by those states.

We tested this model prediction on the data from Samuelsen et al. (Fig 4)^{6}. To compare the data to the model simulations, we randomly sampled ensembles of model neurons so as to match the sizes of the empirical datasets. Since we only have access to a subset of neurons in the experiments, rather than the full network configuration, we segmented the ensemble activity in sequences of metastable states via a Hidden Markov Model (HMM) analysis (see Methods). Previous work has demonstrated that HMM states, i.e. patterns of coordinated ensemble firing activity, can be treated as proxies of metastable network configurations^{11}. In particular, activation of taste-coding configurations for a particular stimulus results in HMM states containing information about that stimulus (i.e., taste-coding HMM states). If the hypothesis originating from the model is correct, transitions from non-coding HMM states to taste-coding HMM states should be faster in the presence of the cue compared to uncued trials. We indeed found faster transitions to HMM coding states in cued trials for both model and data (Fig. 4a and 4c, respectively; color-coded horizontal bars overlay coding states). The latency of coding states was significantly faster during cued compared to uncued trials in both the model (Fig. 4b, mean latency of the first coding state was 0.32 ± 0.02 s for expected vs 0.38 ± 0.01 s for unexpected trials; rank sum test R(39)=319, p=0.014) and the empirical data (Fig. 4d: 0.46 ± 0.02 s for expected vs 0.56 ± 0.03 s for unexpected trials; rank sum test R(37)=385, p=0.026).

Altogether, these results demonstrate that anticipatory inputs speed up sensory coding by reducing the effective energy barriers from non-coding to coding metastable states (i.e., the transitions facilitated by the stimulus).

## Discussion

Expectations modulate perception and sensory processing. Typically, expected stimuli are recognized more accurately and rapidly than unexpected ones ^{1-3}. In the gustatory cortex, acceleration of taste coding has been related to changes in firing activity evoked by cues predicting the general availability of tastants^{6}. However, the computational mechanisms linking pre-stimulus activity with changes in the latency of sensory coding are still unknown. Here we propose a novel mechanism that explains the effects of expectation through the modulation of the dynamics intrinsically generated by the cortex. Our results provide a new functional interpretation for the intrinsically generated activity that is ubiquitously observed in cortical circuits^{11,18-21,24,38-42}.

The proposed mechanism requires a recurrent spiking network where excitatory neurons are arranged in clusters, which has been demonstrated to capture essential features of the dynamics of neural activity in sensory circuits^{11,16}. In such a model, network activity during both spontaneous and stimulus-evoked periods unfolds through state sequences, each state representing a metastable network attractor. In response to an anticipatory cue, the pace of state sequences speeds up, accelerated by a higher transition probability among states. The latter is caused by lowering the potential barrier separating metastable states in the attractor landscape. This anticipates the offset of states not conveying taste information and the onset of states containing the most information about the delivered stimulus (‘coding states’), causing the faster decoding observed by Samuelsen *et al*^{6} (see Fig. 1c).

Notably, this novel mechanism for anticipation is unrelated to increases in network excitability which would lead to unidirectional changes in taste-evoked firing rates. It relies instead on an increase in the spatial variance of the cue afferent currents to the sensory network brought about by the anticipatory cue. This increase in the input’s variance is observed experimentally after training^{10}, and is therefore the consequence of having learned the anticipatory meaning of the cue. The acceleration of the dynamics of state sequences predicted by the model was also confirmed in the data from ensembles of simultaneously recorded neurons in awake-behaving rats.

These results provide a precise explanatory link between the intrinsic dynamics of neural activity in a sensory circuit and a specific cognitive process, that of general expectation^{6} (see also ^{43,44}).

### Clustered connectivity and metastable states

A key feature of our model is the clustered architecture of the excitatory population. Removing excitatory clusters eliminates the cue-induced anticipatory effect (Fig. 1d-f). Theoretical work in recurrent networks had previously shown that a clustered architecture can produce stable patterns of population activity called attractors^{12}. Noise (either externally^{14,45} or internally generated^{11,15}) may destabilize those states, driving the emergence of temporal dynamics based on the progression through metastable states. Network models with clustered architecture provide a parsimonious explanation for the state sequences that have been observed ubiquitously in alert mammalian cortex, during both task engagement^{17,18,46,47} and inter-trial periods.^{11,39,40} In addition, this type of models accounts for various physiological observations such as stimulus-induced reduction of trial-to-trial variability^{11,14,15,48}, neural dimensionality^{16}, and firing rate multistability^{11} (see also^{49,50}). In particular, models with metastable attractors have been used to explain the state sequences observed in rodent gustatory cortex during taste processing and decision making^{11,45,51}.

In this work, we propose that clustered networks have the ability to modulate coding latency, and demonstrate one specific mechanism for modulation that can underlie the phenomenon of general expectation. Changes in the depth of attractor wells, induced by a non-stimulus specific anticipatory cue (which in turn may depend on the activation of top-down and neuromodulatory afferents^{6,52}), can accelerate or slow down network dynamics. The acceleration resulting from shallower wells leads to a reshaping of ongoing activity and to a quicker recruitment of states coding for sensory information. To our knowledge, the link between generic anticipatory cues, network metastability, and coding speed as presented here is novel and represents the main innovation of our work.

### Functional role of heterogeneity in cue responses

As stated in the previous section, the presence of clusters is a necessary ingredient to obtain a faster latency of coding. Here we discuss the second necessary ingredient, i.e., the presence of heterogeneous neural responses to the anticipatory cue (Fig. 1b).

Responses to anticipatory cues have been extensively studied in cortical and subcortical areas in alert rodents^{6,10,53,54}. Cues evoke heterogeneous patterns of activity, either exciting or inhibiting single neurons. The proportion of cue responses and their heterogeneity develops with associative learning,^{10,54} suggesting a fundamental function of these patterns. In the generic expectation paradigm considered here, the anticipatory cue does not convey any information about the identity of the forthcoming tastant, rather it just signals the availability of a stimulus. Experimental evidence suggests that the cue may induce a state of arousal, which was previously described as “priming” the sensory cortex^{6,55,56}. Here, we propose an underlying mechanism in which the cue is responsible for acceleration of coding by increasing the spatial variance of pre-stimulus activity. In turn, this modulates the shape of the neuronal current-to-rate transfer function and thus lowers the effective energy barriers between metastable configurations.

We note that the presence of both excited and inhibited cue responses poses a challenge to simple models of neuromodulation. The presence of cue-evoked suppression of firing^{10} suggests that cues do not improve coding by simply increasing the excitability of cortical areas. Additional mechanisms and complex patterns of connectivity may be required to explain the suppression effects induced by the cue. However, here we provide a parsimonious explanation of how heterogeneous responses can improve coding without postulating any specific pattern of connectivity other than i) random projections from thalamic and anticipatory cue afferents and ii) the clustered organization of the intra-cortical circuitry. Notice that the latter contains wide distributions of synaptic weights and can be understood as the consequence of Hebb-like re-organization of the circuitry during training^{57,58}.

It is also worth noting that our model incorporates excited and inhibited cue responses in such a manner to affect only the spatial variance of the activity across neurons, while leaving the mean input to the network unaffected. As a result, the anticipatory cue leaves average firing rates unchanged in the clustered network (Fig. S10), and only modulates the network temporal dynamics. Our model thus provides a mechanism whereby increasing the spatial variance of top-down inputs has, paradoxically, a beneficial effect on sensory coding.

### Specificity of the anticipatory mechanism

Our model of anticipation relies on gain reduction in clustered excitatory neurons due to a larger spatial variance of the afferent currents. This model is robust to variations in parameters and architecture (Fig. S1, S4-S6). A priori, this effect might be achieved through different means, such as: increasing the strength of feedforward couplings; decreasing the strength of recurrent couplings; or modulating background synaptic inputs^{59}. However, when scoring those models on the criteria of coding anticipation and heterogeneous cue responses, we found that they failed to simultaneously match both criteria, although for some range of parameters they could reproduce either one (see Fig. S8-9 and Supplementary Results for a detailed analysis). Thus, we concluded that only the main mechanism proposed here (Fig. 1a) captures the plurality of experimental observations pertaining anticipatory activity in a robust and biologically plausible way.

### Cortical timescales, state transitions, and cognitive function

In populations of spiking neurons, a clustered architecture can generate reverberating activity and sequences of metastable states. Transitions from state to state can be typically caused by external inputs^{11,15}. For instance, in frontal cortices, sequences of states are related to specific epochs within a task, with transitions evoked by behavioral events^{18,19,22}. In sensory cortex, progressions through state sequences can be triggered by sensory stimuli and reflect the dynamics of sensory processing^{23,46}. Importantly, state sequences have been observed also in the absence of any external stimulation, promoted by intrinsic fluctuations in neural activity^{11,41}. However, the potential functional role, if any, of this type of ongoing activity has remained unexplored.

Recent work has started to uncover the link between ensemble dynamics and sensory and cognitive processes. State transitions in various cortical areas have been linked to decision making^{45,60}, choice representation^{22}, rule-switching behavior^{24}, and the level of task difficulty^{23}. However, no theoretical or mechanistic explanations have been given for these phenomena.

Here we provide a mechanistic link between state sequences and expectation, by showing that intrinsically generated sequences can be accelerated, or slowed down, thus affecting sensory coding. Moreover, we show that the interaction between external stimuli and intrinsic dynamics does not result in the simple triggering of state transitions, but rather in the modulation of the intrinsic transition probabilities. The modulation of intrinsic activity can dial the duration of states, producing either shorter or longer timescales. A shorter timescale leads to faster state sequences and coding anticipation after stimulus presentation (Fig. 1 and 4). Other external perturbations may induce different effects: for example, recruiting the network’s inhibitory population slows down the timescale, leading to a slower coding (Fig. S7).

The interplay between intrinsic dynamics and anticipatory influences presented here is a novel mechanism for generating diverse timescales, and may have rich computational consequences. We demonstrated its function in increasing coding speed, but its role in mediating cognition is likely to be broader and calls for further explorations.

## Author Contributions

LM, GLC, and AF designed the project, discussed the models and the data analyses, and wrote the manuscript; LM performed the data analysis, model simulations and theoretical analyses.

## Competing Financial Interests

The authors declare no competing financial interests.

## Methods

### Experimental dataset

The experimental data come from a previously published dataset Ref.^{1} (for details, see Supplementary Methods). Experimental procedures were approved by the Institutional Animal Care and Use Committee of Stony Brook University and complied with university, state, and federal regulations on the care and use of laboratory animals.

### Ensemble states detection

A Hidden Markov Model (HMM) analysis was used to detect ensemble states in both the empirical data and model simulations. Here, we give a brief description of the method used and we refer the reader to Refs. ^{2-5} for more detailed information.

The HMM assumes that an ensemble of *N* simultaneously recorded neurons is in one of *M* hidden states at each given time bin. States are firing rate vectors *r _{i}*(

*m*), where

*i*= 1,…,

*N*is the neuron index and

*m*= 1,…,

*M*identifies the state. In each state, neurons were assumed to discharge as stationary Poisson processes (Poisson-HMM) conditional on the state’s firing rates. Trials were segmented in 2 ms bins, and the value of either 1 (spike) or 0 (no spike) was assigned to each bin for each given neuron (Bernoulli approximation for short time bins); if more than one neuron fired in a given bin (a rare event), a single spike was randomly assigned to one of the firing neurons. A single HMM was used to fit all trials in each recording session, resulting in the emission probabilities

*r*(

_{i}*m*)and in a set of transition probabilities between the states. Emission and transition probabilities were calculated with the Baum-Welch algorithm

^{6}with a fixed number of hidden states

*M*, yielding a maximum likelihood estimate of the parameters given the observed spike trains. Since the model log-likelihood

*LL*increases with

*M*, we repeated the HMM fits for increasing values of

*M*until we hit a minimum of the Bayesian Information Criterion (BIC, see below and Ref.

^{6}). For each

*M*, the

*LL*used in the BIC was the sum over 10 independent HMM fits with random initial guesses for emission and transition probabilities. This step was needed since the Baum-Welch algorithm only guarantees reaching a local rather than global maximum of the likelihood. The model with the lowest BIC (having

*M*

^{*}states) was selected as the winning model, where

*BIC*= −2

*LL*+ [

*M*(

*M*- 1) +

*MN*]. In

*T, T*being the number of observations in each session (= number of trials × number of bins per trials). Finally, the winning HMM model was used to “decode” the states from the data according to their posterior probability given the data. During decoding, only those states with probability exceeding 80% in at least 25 consecutive 2ms-bins were retained (henceforth denoted simply as “states”)

^{3,5}. This procedure eliminates states that appear only very transiently and with low probability, also reducing the chance of overfitting. A median of 6 states per ensemble was found, varying from 3 to 9 across ensembles.

### Coding states

In each condition (i.e., expected vs. unexpected), the frequencies of occurrence of a given state across taste stimuli were compared with a test of proportions (chi-square, p<0.001 with Bonferroni correction to account for multiple states). When a significant difference was found across stimuli, a post-hoc Marascuilo test was performed^{7}. A state whose frequency of occurrence was significantly higher in the presence of one taste stimulus compared to all other tastes was deemed a ‘coding state’ for that stimulus (Fig. 4).

### Spiking network model

We modeled the local neural circuit as a recurrent network of *N* leaky-integrate-and-fire (LIF) neurons, with a fraction *n _{E}* = 80% of excitatory (E) and

*n*= 20% of inhibitory (I) neurons.

_{I}^{8}Connectivity was random with probability

*p*= 0.2 for E to E connections and

_{EE}*p*=

_{EI}*p*=

_{IE}*p*= 0.5 otherwise. Synaptic weights

_{II}*J*from pre-synaptic neuron

_{ij}*j*∈

*E*,

*I*to post-synaptic neuron

*i*∈

*E,I*scaled as with

*j*drawn from normal distributions with mean

_{ij}*j*(for α,β =

_{αβ}*E,I*) and 1% SD. Networks of different architectures were considered:

*i)*networks with segregated clusters (referred to as “clustered network,” parameters as in Tables 1 and 2);

*ii)*networks with overlapping clusters (see Suppl. Table S2 and Suppl. Methods for details),

*iii)*homogeneous networks (parameters as in Table 1). In the clustered network, E neurons were arranged in

*Q*clusters with

*N*= 100 neurons per clusters on average (1% SD), the remaining fraction

_{C}*n*

_{bg}of E neurons belonging to an unstructured “background” population. In the clustered network, neurons belonging to the same cluster had intra-cluster synaptic weights potentiated by a factor

*J*

_{+}; synaptic weights between neurons belonging to different clusters were depressed by a factor

*J*

_{-}= 1 -

*γf*(

*J*

_{+}- 1) < 1 with

*γ*= 0.5;

*f*= (1 -

*n*

_{bg})/

*Q*is the average number of neurons in each cluster.

^{8}When changing the network size

*N*, all synaptic weights

*J*were scaled by the intra-cluster potentiation values were

_{ij}*J*

_{+}=5, 10, 20, 30, 40 for

*N*= 1,2,4,6,8 × 10

^{3}neurons, respectively, and cluster size remained unchanged (see also Table 1); all other parameters were kept fixed. In the homogeneous network, there were no clusters (

*J*

_{+}=

*J*

_{-}= 1).

### Model neuron dynamics

Below threshold the LIF neuron membrane potential evolved in time as
where 𝜏_{m} is the membrane time constant and the input currents *I* are a sum of a recurrent contribution *I _{rec}* coming from the other network neurons and an external current

*I*=

_{ext}*I*

_{0}+

*I*+

_{stim}*I*(units of Volt/s). Here,

_{cue}*I*

_{0}is a constant term representing input from other brain areas;

*I*and

_{stim}*I*represent the incoming stimuli and cue, respectively (see

_{cue}*Stimulation protocols*below). When

*V*hits threshold

*V*, a spike is emitted and

_{thr}*V*is then clamped to the rest value

*V*for a refractory period

_{reset}*𝜏*. Thresholds were chosen so that the homogeneous network neurons fired at rates

_{ref}*r*= 5 spks/s and

_{E}*r*= 7 spks/s. The recurrent contribution to the postsynaptic current to the

_{I}*i-*th neuron was a low-pass filter of the incoming spike trains where

*𝜏*is the synaptic time constant;

_{syn}*J*is the recurrent synaptic weights from presynaptic neuron

_{ij}*j*to postsynaptic neuron

*i*, and is the

*k*-th spike time from the

*j*-th presynaptic neuron. The constant external current was

*I*

_{0}=

*N*, with

_{ext}p_{i0}J_{i0}v_{ext}*N*=

_{ext}*n*,

_{E}N*p*= 0.2, with

_{io}*j*

_{E0}for excitatory and

*j*

_{I0}for inhibitory neurons (see Table 1), and

*r*= 7 spks/s. For a detailed mean field theory analysis of the clustered network and a comparison between simulations and mean field theory during ongoing and stimulus-evoked periods we refer the reader to the Suppl. Methods and Refs.

_{ext}^{5,9}.

### Stimulation protocols

Stimuli were modeled as time-varying stimulus afferent currents targeting 50% of neurons in stimulus-selective clusters *I _{stim}*(

*t*) =

*I*

_{0}·

*r*(

_{stim}*t*), where

*r*(

_{stim}*t*) was expressed as a fraction of the baseline external current

*I*

_{0}. Each cluster had a 50% probability of being selective to a given stimulus, thus different stimuli targeted overlapping sets of clusters. The anticipatory cue, targeting a random 50% subset of E neurons, was modeled as a double exponential with rise and decay times of 0.2 s and 1 s, respectively, unless otherwise specified; its peak value for each selective neuron was sampled from a normal distribution with zero mean and standard deviation σ (expressed as fraction of the baseline current

*I*

_{0}; σ=20% unless otherwise specified). The cue did not change the mean afferent current but only its spatial (quenched) variance across neurons.

In both the unexpected and the unexpected conditions, stimulus onset at *t* = 0 was followed by a linear increase *r _{stim}*(

*t*) in the stimulus afferent current to stimulus-selective neurons reaching a value

*r*at

_{max}*t*= 1 s (

*r*= 20%, unless otherwise specified). In the expected condition, stimuli were preceded by the anticipatory cue

_{max}*r*(

_{cue}*t*) with onset at

*t*= −0.5s before stimulus presentation.

### Network simulations

All data analyses, model simulations and mean field theory calculations were performed using custom software written in MATLAB (MathWorks), and C. Simulations comprised 20 realizations of each network (each one representing a different experimental session), with 20 trials per stimulus in each of the 2 conditions (unexpected and expected); or 40 trials per session in the condition with “cue-on” and no stimuli (Fig. 2). Dynamical equations for the LIF neurons were integrated with the Euler method with 0.1 ms step.

### Mean field theory

Mean field theory was used in a simplified network with 2 excitatory clusters (parameters as in Table 2) using the population density approach^{10-12}: the input to each neuron was completely characterized by the infinitesimal mean *𝜇 _{α}* and variance of the post-synaptic current (see Sec. 2.3 of Supplementary Methods for their expressions). The network fixed points satisfied the

*Q*+ 2 self-consistent mean field equations

^{8}where is the population firing rate vector (boldface represents vectors).

*F*is the current-to-rate function for population

_{α}*α*, which varied depending on the population and the condition. In the absence of the anticipatory cue, the LIF current-to-rate function was used where Here, .

^{13,14}In the presence of the anticipatory cue, a modified current-to-rate function was used to capture the cue-induced Gaussian noise in the cue afferent currents to the cue-selective populations (

*α*= 1,…,

*Q*): where is the Gaussian measure with zero mean and unit variance,

*μ*=

_{ext}*I*

_{0}is the baseline afferent current and

*𝜎*is the anticipatory cue’s SD as fraction of

*μ*

_{ext}(Fig. 3d; in Fig. 3c, for illustration purposes we used Fixed points

*r*^{*}of equation (1) were found with Newton’s method; the fixed points were stable (attractors) when the stability matrix evaluated at

*r*^{*}was negative definite. Stability was defined with respect to an approximate linearized dynamics of the mean

*m*and SD

_{α}*s*of the input currents

_{α}^{15}where

*μ*and are the stationary values given in the Suppl. Methods.

_{α}### Effective mean field theory for the reduced network

The mean field equations (1) for the *P*=*Q*+2 populations may be reduced to a set of effective equations governing the dynamics of a smaller subset of *q<P* of populations, henceforth referred to as populations *in focus*^{16}. The reduction is achieved by integrating out the remaining *P*-*q out-of-focus* populations. This procedure was used to estimate the energy barrier separating the two network attractors in Fig. 2d and Fig. 3c. Given a fixed set of values for the in-focus populations, one solves the mean field equations for *P-q* out-of-focus populations
for*β* = *q* + 1,…,*P* to obtain the stable fixed point of the out-of-focus populations as functions of the in-focus firing rates Stability of the solution is computed with respect to the stability matrix (2) of the reduced system of *P-q* out-of-focus populations. Substituting the values into the fixed-point equations for the *q* populations in focus yields a new set of equations relating input rates to “output” rates *r*^{out}:
for *α* = 1,…,*q*. The input and output *r*^{out} firing rates of the in-focus populations will be different, except at a fixed point of the full system where they coincide. The correspondence between input and output rates of in-focus populations defines the effective current-to-rate transfer functions
for *α* = 1,…, *q* in-focus populations at the point The fixed points of the in-focus equations (4) are fixed points of the entire system. It may occur, in general, that the out-of-focus populations attain multiple attractors for a given value of in which case the set of effective transfer functions is labeled by the chosen attractor; in our analysis of the two-clustered network, only one attractor was present for a given value of

### Energy potential

In a network with *Q*=2 clusters, one can integrate out all populations (out-of-focus) except one (in-focus) to obtain the effective transfer functions for the in-focus population representing a single cluster, with firing rate (equation (4) for *q*=1). Network dynamics can be visualized on a one-dimensional curve, where it is well approximated by the first-order dynamics (see ref. ^{16} for details):

These dynamics can be expressed in terms of an effective energy function as
so that the dynamics can be understood as a motion in an effective potential energy landscape, as if driven by an effective force The minima of the energy with respect to are the stable fixed points of the effective 1-dimensional dynamics, while its maxima represent the effective energy barriers between two minima, as illustrated in Fig. 2c. The one-cluster network has 3 fixed points, two stable attractors (‘A’ and ‘B’ in Fig. 2c) and a saddle point (‘C’). We estimated the height *𝛥* of the potential energy barrier on the trajectory from A to B through C as minus the integral of the force from the first attractor A to C:
which represents the area between the identity line and the effective transfer function (see Fig. 2c). In the finite network, where the dynamics comprise stochastic transitions among the states, switching between A and B would occur with a frequency that depends on the effective energy barrier **∆**, as explained in the main text.

### Population decoding

The amount of stimulus-related information carried by spike trains was assessed through a decoding analysis^{17} (see Fig. S3a for illustration). A multiclass classifier was constructed from *Q* neurons sampled from the population (one neuron from each of the *Q* clusters for clustered networks, or *Q* random excitatory neurons for homogeneous networks). Spike counts from all trials of *n _{stim}* taste stimuli in each condition (expected vs. unexpected) were split into training and test sets for cross-validation. A “template” was created for the population PSTH for each stimulus, condition and time bin (200 ms, sliding over in 50 ms steps) in the training set. The PSTH contained the trial-averaged spike counts of each neuron in each bin (the same number of trials across stimuli and conditions were used). Population spike counts for each test trial were classified according to the smallest Euclidean distance from the templates across 10 training sets (‘bagging’ or bootstrap aggregating procedure

^{18}). Specifically, from each training set

*L*, we created bootstrapped training sets

*L*, for

_{b}*b*= 1,‥,

*B*=

*10*, by sampling with replacement from

*L*. In each bin, each test trial was then classified

*B*times using the

*B*classifiers, obtaining

*B*different “votes”, and the most frequent vote was chosen as the bagged classification of the test trial. Cross-validated decoding accuracy in a given bin was defined as the fraction of correctly classified test trials in that bin.

Significance of decoding accuracy was established via a permutation test: 1000 shuffled datasets were created by randomly permuting stimulus labels among trials, and a ‘shuffled distribution’ of 1000 decoding accuracies was obtained. In each bin, decoding accuracy of the original dataset was deemed significant if it exceeded the upper bound, *α*_{0.05}, of the 95% confidence interval of the shuffled accuracy distribution in that bin (this included a Bonferroni correction for multiple bins, so that *α*_{0.05} = 1 - 0.05/*N _{b}*, with

*N*the number of bins). Decoding latency (insets in Figs. 1c and 1f) was estimated as the earliest bin with significant decoding accuracy.

_{b}### Cluster dynamics

To analyze the dynamics of neural clusters (lifetime, inter-activation interval, and latency; see Figs. 2 and 3), cluster spike count vectors *r _{i}* (for

*i*= 1,…

*Q*) in 5 ms bins were obtained by averaging spike counts of neurons belonging to a given cluster. A cluster was deemed active if its firing rate exceeded 10 spks/s. This threshold was chosen so as to lie between the inactive and active clusters’ firing rates, which were obtained from a mean field solution of the network

^{5}.

### Code and data availability statement

Experimental datasets are available from the corresponding authors upon request. All data analysis and network simulation scripts are available from the authors upon request.

## Acknowledgements

This work was supported by a National Institute of Deafness and Other Communication Disorders Grant K25-DC013557 (LM), by the Swartz Foundation Award 66438 (LM), by National Institute of Deafness and Other Communication Disorders Grants NIDCD R01DC012543 and R01DC015234 (AF), and partly by a National Science Foundation Grant IIS-1161852 (GLC). The authors would like to thank Drs. S. Fusi, A. Maffei, G. Mongillo, and C. van Vreeswijk for useful discussions.

## References

## References

- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.

## References

- [79].
- [80].
- [81].
- [82].
- [83].
- [84].
- [85].
- [86].
- [87].