Predictive learning rules generate a cortical-like replay of prob-1 abilistic sensory experiences

13 The brain is thought to construct an optimal internal model representing the prob-14 abilistic structure of the environment accurately. Evidence suggests that sponta-15 neous brain activity gives such a model by cycling through activity patterns 16 evoked by previous sensory experiences with the experienced probabilities. The 17 brain's spontaneous activity emerges from internally-driven neural population dy-18 namics. However, how cortical neural networks encode internal models into spon-19 taneous activity is poorly understood. Recent computational and experimental 20 studies suggest that a cortical neuron can implement complex computations, in-21 cluding predictive responses, through soma-dendrite interactions. Here, we show 22 that a recurrent network of spiking neurons subject to the same predictive learn-23 ing principle provides a novel mechanism to learn the spontaneous replay of 24 probabilistic sensory experiences. In this network, the learning rules minimize 25 probability mismatches between stimulus-evoked and internally driven activities 26 in all excitatory and inhibitory neurons. This learning paradigm generates stimu-27 lus-specific cell assemblies that internally remember their activation probabilities 28 using within-assembly recurrent connections. The plasticity of cells' intrinsic ex-29 citabilities normalizes neurons' dynamic ranges to further improve the accuracy 30 of probability coding. Our model contrasts previous models that encode the sta-31 tistical structure of sensory experiences into Markovian transition patterns among 32 cell assemblies. We demonstrate that the spontaneous activity of our model well 33 replicates the behavioral biases of monkeys


Introduction
The brain is believed to construct an internal statistical model of an uncertain environment from sensory information streams for predicting the external events that are likely to occur.Evidence suggests that spontaneous brain activity learns the representation of such a model through repeated experiences of sensory events.In the cat visual cortex, spontaneously emerging activity patterns cycle through cortical states that include neural response patterns to oriented bars (Kenet et al., 2003).In the ferret visual cortex, spontaneous activity gradually resembles a superposition of activity patterns evoked by natural scenes, eventually giving an optimal model of the visual experience (Berkes et al., 2011).As replay activities can provide prior information for hierarchical Bayesian computation by the brain (Ernst & Banks, 2002;Kording & Wolpert, 2004;Friston, 2010;Fiser et al., 2010;Bastos et al., 2012;Orban et al., 2016;Legaspi & Toyoizumi, 2019), clarifying how the brain learns the spontaneous replay of optimal internal models is crucial for understanding whole-brain computing.However, the neural mechanisms underlying this modeling process are only poorly understood.
Several mechanisms of the brain's probabilistic computation have been explored (Jimenez Rezende & Gerstner, 2014; Li et al., 2022).Models with reverberating activity are particularly interesting owing to their potential ability to generate spontaneous activity.For instance, spiking neuron networks with symmetric recurrent connections were proposed for Markov Chain Monte Carlo sampling of stochastic events (Buesing et al, 2011;Bill et al., 2015).Spike-timing-dependent plasticity was used to organize spontaneous sequential activity patterns, providing a predictive model of sequence input (Hartmann et al., 2015).However, previous models did not clarify how recurrent neural networks learn the spontaneous replay of the probabilistic structure of sensory experiences, for which these networks should learn the accurate probabilities of sensory stimuli and an appropriate excitation-inhibition balance simultaneously.Moreover, previous models assumed that each statistically salient stimulus in temporal input is already segregated and is delivered to a pre-assigned assembly of coding neurons, implying that the recurrent network, at least partly, knows the stochastic events to be modeled before learning.How the brain extracts salient events for statistical modeling has not been addressed.
Here, we present a learning principle to encode the experiences' probability structure into spontaneous network activity.To this end, we extensively use the synaptic plasticity rule proposed previously based on the hypothesis that the dendrites of a cortical neuron learn to predict its somatic responses (Urbanczik & Senn, 2014;Asabuki & Fukai, 2020).We generalize the hypothetical predictive learning to a learning principle at the entire network level.Namely, in a recurrent network driven by external input, we ask all synapses on the dendrites of each excitatory or inhibitory neuron to learn to predict its somatic responses (although the dendrites will not be explicitly modeled).This enables the network model to simultaneously learn the events' probabilistic structure and the excitation-inhibition balance required to replay this structure.Further, our network model requires no pre-assigned cell assemblies since the model neuron can automatically segment statistically salient events in temporal input (Asabuki & Fukai, 2020) -a cognitive process known as "chunking" (Fujii & Graybiel, 2003;Jin & Costa, 2010;Jin et al., 2014;Schapiro et al., 2013;Zacks et al., 2001).Intriguingly, the cell assemblies generated by our model store their replay probabilities primarily in the within-assembly network structure, and intrinsic dynamical properties of membership neurons also contribute to this coding.This is in striking contrast to other network models that encode probabilities into the Markovian transition dynamics among cell assemblies (Buesing et al., 2011;Hartmann et al., 2015;Asabuki & Clopath, 2024).
Our model trained on a perceptual decision-making task can replicate both unbiased and biased decision behaviors of monkeys without fine-tuning of parameters (Hanks et al., 2011).In addition, in a network model consisting of distinct excitatory and inhibitory neural populations, our learning rule predicts the emergence of two types of inhibitory connections with different computational roles.
We show that the emergence of the two inhibitory connection types is crucial for robust learning of an optimal internal model.

Replay of probabilistic sensory experiences -A toy example
We first explain the task our model solves with a toy example.Consider a task in which the animal should decide whether a given stimulus coincides with or resembles any of two previously learned stimuli.Whether the animal learned these stimuli with a 50-50 chance or a 30-70 chance should affect the animal's anticipation of their occurrence and hence affect its decision.
It has been suggested that spontaneous activity expresses an optimal internal model of the sensory environment (Berkes et al., 2011).In our toy example, the evoked activity patterns of the two stimuli should be spontaneously replayed with the same probabilities as these stimuli were experienced during learning: where features = {stimulus 1, stimulus 2} and the right-hand side expresses the probabilities of replayed activities.The angular brackets indicate averaging over the stimuli.According to Hebb's hypothesis, two cell assemblies should be formed to memorize the two stimuli in the toy example.Moreover, the spontaneous replay of these cell assemblies should represent the probabilities given in the right-hand side of the above equation.Below, we propose a mathematical principle of learning to achieve these requirements.

Prediction-driven synaptic plasticity for encoding an internal model
We previously proposed a learning rule for a single two-compartment neuron (Asabuki & Fukai, 2020).Briefly, our previous model learns statistically salient features repeated in input sequences by minimizing the error between somatic and dendritic response probabilities without external supervision to identify the temporal locations of these features.In this study, we extend this plasticity rule to recurrent networks by asking all neurons in a network to minimize the error in response probabilities between the internally generated and stimulus-evoked activities (Fig. 1).Our central interest is whether this learning principle generates spontaneous activity representing the statistical model of previous experiences.
We first introduce our learning principle using a recurrent network model (nDL model) that does not obey Dale's law for distinguishing between excitatory and We considered two modes of activity (i.e., evoked and spontaneous activity).In the evoked mode, the membrane potential u of a network neuron was calculated as a linear combination of inputs across all different connections (v W , v M , and v G ).This evoked mode is considered during the learning phase, when all synapses attempt to predict the network activity, as we will explain in the main text.Once all synapses are sufficiently learned, all FF inputs are removed, and the network is driven spontaneously (spontaneous mode).Our interest lies in the statistical similarity of the network activity in these two modes.(b) The gain and threshold of output response function was controlled by a dynamic variable, h, which tracks the history of the membrane potential.(c) A schematic of the learning rule for a network neuron is shown (top).During learning, for each type of connection on a postsynaptic neuron, synaptic plasticity minimizes the error between output (gray diamond) and synaptic prediction (colored diamonds).Note that all types of synapses share the common plasticity rule, where weight updates are calculated as the multiplication of the error term and the presynaptic activities (bottom).Our hypothesis is that such plasticity rule allows a recurrent neural network to spontaneously replay the learned stochastic activity patterns without external input.
inhibitory neurons (Materials and Methods).A more realistic model with distinct excitatory and inhibitory neuron pools will be shown later.The nDL model consists of Poisson spiking neurons, each receiving Poisson spike trains from all input neurons via a modifiable all-to-all afferent feedforward (FF) connection matrix  (Fig. 1a).These input neurons may be grouped into multiple input neuron groups responding to different sensory features.Due to the all-to-all connectivity, the afferent input has no specific predefined structure.Two types of all-to-all modifiable recurrent connections (REC),  and , exist among the neurons.Matrix  is a mixture of excitatory and inhibitory connections, and matrix  represents inhibitory-only connections.Due to a minus sign for v ) , all components of  are positive.The firing rate of neurons are defined as a modifiable sigmoidal function of the membrane potential (Fig. 1b), which we will explain later in detail.All types of connections, both afferent and recurrent ones, are modifiable by unsupervised learning rules derived from a common principle: on each neuron, all synapses learn to predict the neuron's response optimally (Fig. 1c: see Materials and Methods).In reality, all synaptic inputs may be terminated on the dendrites, although they are not modeled explicitly.
Without a teaching signal, predictive learning may suffer a trivial solution problem in which all synapses vanish, and hence all neurons become silent (Asabuki & Fukai, 2020).To avoid it, we homeostatically regulate the dynamic range of each neuron (i.e., the slope and threshold of the response function) according to the history ℎ of its subthreshold activity (see Eqs. [10][11][12].When the value of ℎ is increased, the neuron's excitability is lowered (Fig. 1b).The input-output curves of neurons are known to undergo homeostatic regulations through various mechanisms (Chance et al., 2002;Mitchell and Silver, 2003;Torres-Torrelo et al., 2014).
Though no direct experimental evidence is available for our homeostatic process via ℎ, it mathematically avoids saturating neuronal activity.
Note that the present homeostatic regulation of intrinsic excitability differs from the homeostatic synaptic scaling mechanism.The role of homeostatic synaptic scaling in generating irregular cell-assembly activity patterns was previously studied computationally (Hiratani and Fukai, 2014; Litwin-Kumar and Doiron, 2014; Zenke et al., 2015).However, unlike the present model, the previous models did not address whether and how synaptic scaling contributes to statistical modeling by recurrent neural networks.Furthermore, unlike our model, in which neurons in the recurrent layer and input neurons are initially connected in an all-to-all manner, most previous models assumed preconfigured receptive fields for recurrent-layer neurons, implying that these models had predefined stimulus-specific cell assemblies.

Cell assembly formation for learning statistically salient stimuli
We first explain how our network segments salient stimuli and forms stimulusspecific cell assemblies via network-wide predictive learning rules.To this end, we tested a simple case in which two non-overlapping input groups are intermittently and repeatedly activated with equal probabilities.The two input patterns were separated by irregular, low-frequency, unrepeated spike trains of all input neurons (Materials and Methods).We will consider input patterns with unequal occurrence probabilities later.After several presentations of individual input patterns, each network neuron responded selectively to one of the repeated patterns (Figure 2a).This result is consistent with our previous results (Asabuki & Fukai, 2020) that the plasticity of feedforward connections segments input patterns.Indeed, feedforward synapses W on each neuron were strengthened or weakened when they mediated its preferred or non-preferred stimulus, respectively (Fig. 2b, left; Fig. 2c).Inhibitory connections G grew between neurons within the same assembly but not between assemblies (Fig. 2b, right; Fig. 2c, bottom), enhancing the decorrelation of within-assembly neural activities (Asabuki & Fukai, 2020).
Recurrent connections M were modified to form stimulus-specific cell assemblies, as evidenced by the self-organization of excitatory (Fig. 2c, top) and inhibitory (Fig. 2c, bottom) recurrent connections within and between cell assemblies, respectively.The inhibitory components are necessary for suppressing the simultaneous replay of different cell assemblies, as shown later.
We then investigated whether and how spontaneous activity preserves and replays these cell assemblies in the absence of afferent input.To demonstrate this in a more complex task, we trained the network with afferent input involving five repeated patterns and then removed the input and observed post-training spontaneous network activity (Fig. 2d).The termination of afferent input initially lowered the activities of neurons, but their dynamic ranges gradually recovered with the excitability of the neural population (indicated by the population-averaged ℎ value), and the network eventually started spontaneously replaying the learned cell assemblies.All plasticity rules were turned off during the recovery period (about 20 seconds from the input termination), after which the network settled in a stable spontaneous firing state (plasticity off), with firing rates lower than those of the evoked activity (inset).Then, the plasticity rules could be turned on (plasticity on) without drastically destroying the structure of spontaneous replay.Intriguingly, spontaneous neuronal activities were highly correlated within each cell assembly but were uncorrelated between different cell assemblies (Fig. 2e).This was because self-organized recurrent connections  were excitatory within each cell assembly, whereas the between-assembly recurrent connections were inhibitory, as in Fig. 2c.
Thus, the network model successfully segregates, remembers, and replays stimulus-evoked activity patterns in temporal input.The loss of between-assembly excitatory connections is interesting as it indicates that the present spontaneous reactivation is not due to the sequential activation of cell assemblies.This can also be seen from the relatively long intervals between consecutive cell-assembly activations: spontaneous neural activity does not propagate directly from one cell assembly to another (Fig. 2d).Indeed, within-assembly excitation is the major cause of spontaneous replay in this model, which we will study later in detail.
In summary, we have proposed the predictive learning rules as a novel plasticity mechanism for all types of synapses (i.e., feedforward and recurrent connections).
We have shown that the plasticity rules in our model learn the segmentation of salient patterns in input sequences and form pattern-specific cell assemblies without preconfigured structures.We also showed that our model replays the learned assemblies even when external inputs were removed.

Replays of cell assemblies reflect a learned statistical model
We now turn to the central question of this study.We asked whether internally generated network dynamics through recurrent synapses (i.e., spontaneous replay of cell assemblies) can represent an optimal model of previous sensory experiences.Specifically, we examined whether the network spontaneously reactivates learned cell assemblies with relative frequencies proportional to the probabilities with which external stimuli activated these cell assemblies during learning.We addressed these questions in slightly more complex cases with increased numbers of external stimuli.
We first examined a case with five stimuli in which stimulus 1 was presented twice as often as the other four stimuli (Fig. 3a).Hereafter, the probability ratio refers to the relative number of times stimulus 1 is presented during learning.For instance, the case shown in Fig. 2d represents the probability ratio one.As in Fig. 2d, the network self-organized five cell assemblies to encode stimuli 1 to 5 and replayed all of them in subsequent spontaneous activity (Fig. 3b).We found that output neurons were activated more frequently and strongly in cell assembly 1 than in other cell assemblies.Therefore, we accessed quantitative differences in neuronal activity between different cell assemblies by varying the probability ratio.
The neuronal firing rate of cell assembly 1 relative to other cell assemblies increased approximately linearly with an increase in the probability ratio (Fig. 3c).
Similarly, the size of cell assembly 1 relative to other cell assemblies also increased with the probability ratio (Fig. 3d).However, neither the relative firing rate nor the relative assembly size faithfully reflects changes in the probability ratio: scaling the probability ratio with a multiplicative factor does not scale these quantities with this factor.Therefore, we further investigated whether the assembly activity ratio, the ratio in the total firing rate of cell assembly 1 to other cell assemblies (Materials and Methods), scales faithfully with the probability ratio of cell assembly 1.This was the case: the scaling was surprisingly accurate (Fig. 3e).To examine the ability of the nDL network further, we trained it with five stimuli occurring with various probabilities (Fig. 3f and Supplementary Fig. 1a).After learning, the spontaneous activity of the model replayed the learned cell assemblies at the desired ratios of population firing rates (Fig. 3g and Supplementary Fig. 1b).
We then asked whether our model would learn a prior distribution for more stimuli.
To this end, we presented seven stimulus patterns to the same network with graded probabilities (Supplementary Fig. 1c).The self-organized spontaneous activity exhibited cell assemblies that well learned the graded probability distribution of these stimuli (Supplementary Fig. 1d).These results demonstrate that the trained network remembers the probabilities of repetitively experienced stimuli by the spontaneous firing rates of the encoding cell assemblies and that this dynamical coding scheme has a certain degree of scalability.
So far, we have represented external stimuli with non-overlapping subgroups of input neurons.However, in biologically realistic situations, input neuron groups may share part of their membership neurons.We tested whether the proposed model could learn the probability structure of overlapping input patterns in a case where two input neuron groups shared half of their members.The two patterns were presented with probabilities of 30% and 70%, respectively (Supplementary Fig. 2a).After sufficient learning, the network model generated two assemblies that encoded the two stimuli without sharing the coding neurons (Supplementary Fig. 2b) and replayed these assemblies with frequencies proportional to the stimulus presentation probabilities (Supplementary Fig. 2c).The results look reasonable because each neuron in the network segments one of the stimulus patterns and recurrent connections within each non-overlapping assembly can encode the probability of its replay.
Altogether, these results suggest that our model spontaneously replays learned cell assemblies with relative frequencies proportional to the probability that each cell assembly was activated during the learning phase.We have shown that the population activities of assemblies, rather than the firing rates of individual neurons, encode the occurrence probabilities of stimulus patterns.

Within-assembly recurrent connections encode probabilistic sensory experiences
To understand the mechanism underlying the statistical similarity between the evoked patterns and spontaneous activity, we then investigated whether and how biases in probabilistic sensory experiences influence the strengths of recurrent connections.To this end, we compared two cases in which two input patterns (stim 1 and stim 2) occurred with equal (50% vs. 50%) and different (30% vs. 70%) probabilities during learning (Fig. 4a).From the results shown in Fig. 3, we hypothesized that within-assembly learned connections should reflect the stimulus occurrence probabilities and hence the activation probabilities of the corresponding cell assemblies during spontaneous activity.Therefore, we calculated the total strengths of incoming recurrent synapses on each neuron within the individual cell assemblies (Fig. 4b).While the distributions of incoming synaptic strengths are similar between cell assemblies coding stimulus 1 and stimulus 2 in the 50-vs-50 case, they look different in the 30-vs-70 case (Fig. 4c).
Since incoming weights increased more significantly in the cell assembly activated by a more frequent stimulus (i.e., the assembly encoding stimulus 2 in the 30-vs-70 case), we expect that the degree of positive shifts in incoming weight distributions will reflect stimulus probabilities.To examine whether this is indeed the case, we computed the sum of total excitatory incoming weights (i.e., the sum of positive elements of M) over neurons belonging to each assembly after training.
We then normalized these excitatory incoming weights over the two assemblies.Interestingly, we found that the normalized excitatory incoming weights for the two assemblies well approximates the empirical probabilities of the two stimuli in both the 50-vs-50 and 30-vs-70 cases (Figure 4d).These analyses revealed that recurrent connections learned within assemblies encode biases in probabilistic sensory experiences.Indeed, the elimination of between-assembly excitatory connections did not significantly affect the replay probabilities, as the sampling is driven by strong within-assembly recurrent inputs after learning (Supplementary Fig. 3).

Roles of inhibitory plasticity for stabilizing cell assemblies
Experimental and computational results suggest that inhibitory synapses are more robust to spontaneous activity than excitatory synapses and are crucial for maintaining cortical circuit function (Mongillo et al., 2018).To see the crucial role of the inhibitory plasticity of G for cell assembly formation, we compared the spontaneously driven activities in the learned network between two cases, plastic inhibitory connection G versus fixed G, in the 30-vs-70 case.The results show that only a single, highly active assembly self-organizes for fixed inhibitory synapses (Supplementary Fig. 4a).In contrast, such unstable dynamics do not emerge from plastic inhibitory synapses (Supplementary Fig. 4b), suggesting the crucial role of inhibitory plasticity in stabilizing spontaneous activity.
To further clarify the functional role of inhibitory plasticity in regulating spontaneous activity, we compared how the self-organized assembly structure of recurrent connections  evolves in the two simulation settings shown in Supplementary Fig. 5a.In the control model, we turned off the plasticity of  for a while after the cessation of external stimuli but again switched it on, as was previously in Fig. 2.
The cell-assembly structure initially dissipated but eventually reached a well-defined equilibrium structure (Supplementary Fig. 5b, magenta).Consistent with this, the postsynaptic potentials mediated by connections  and  predicted the normalized firing rate of a postsynaptic excitatory neuron in the control model (Supplementary Fig. 5c).In striking contrast, the cell-assembly structure rapidly dissipated in the truncated model in which the G-plasticity was kept turned off after the cessation of external stimuli (Supplementary Fig. 5b, blue).Accordingly, the postsynaptic potentials induced by  and , so was the normalized firing rate, evolved into trivial solutions and almost vanished in the truncated model (Supplementary Fig. 5d).Only the control model, but not the truncated model, could maintain prediction errors small and nearly constant after the termination of the stimuli (Supplementary Fig. 5e).These results indicate that maintaining the learned representations requires the continuous tuning of within-assembly inhibition.

The role of homeostatic regulation of neural activities
As indicated by the weak couplings between cell assemblies, the present mechanism of probability learning differs from the conventional sequence learning mechanisms.Consistent with this, the network trained repetitively by a fixed sequence of patterned inputs does not exhibit stereotyped sequential transitions among cell assemblies (due to the lack of strong inter-assembly excitatory connections; Supplementary Fig. 6).Indeed, the probability-encoding spontaneous activity emerges in the present model mainly from the within-assembly dynamics driven by strong within-assembly reverberating synaptic input.However, homeostatic variable ℎ also plays a role in maintaining a stable spontaneous network activity after learning (see Fig. 2d; activity pattern from 5 to 10 sec).This is achieved by the time evolution of ℎ, which maintains the firing rate of each neuron in a suitable range by adjusting the threshold and gain of the somatic sigmoidal response function (Fig. 1b).
Therefore, we explored the role of the homeostatic variable in learning an accurate internal model of the sensory environment.In each neuron, the variable ℎ is updated whenever the membrane potential undergoes an abrupt increase (Eq.10).Therefore, the time evolution of ℎ monitors the activity of each neuron over the timescale of seconds, which in turn regulates the neural activity by controlling the activation function (Supplementary Fig. 7a; Eqs.8-9).When the instantaneous value of ℎ is high, the neuron's excitability is lowered (namely, the gain and threshold of the response function are decreased or increased, respectively: see Eqs. [10][11][12].This activity regulation is crucial to avoid the trivial solution of the plasticity rules (Asabuki and Fukai, 2020) but not critical for sampling with appropriate probabilities.Actually, a model with a fixed value of ℎ still showed spontaneous replay, although the true probability distribution was estimated less accurately (Supplementary Fig. 7b: cf.Fig. 3f).

Learning conditioned prior distributions
The predictive coding hypothesizes that top-down input from higher cortical areas provides prior knowledge about computations in lower cortical areas.This implies in the brain's hierarchical computation that the top-down input conditions the prior distributions in local cortical areas to those relevant to the given context.The proposed learning rules can account for how a conditioned input from other cortical areas conditions the prior distribution in a local cortical circuit.
The neural network consists of two mutually interacting non-overlapping subnetworks of equal sizes, where the subnetworks may represent different cortical areas (Supplementary Fig. 8a).Subnetwork A was randomly exposed to stimuli 1 and 2 (S1 and S2) with equal probabilities 1/2, whereas subnetwork B was to stimuli 3 and 4 (S3 and S4) with the conditional probabilities 1/3 and 2/3 if S1 was presented to subnetwork A and the conditional probabilities 2/3 and 1/3 if S2 was presented to subnetwork A. After learning, the network model self-organized four cell assemblies each of which responded preferentially to one of the four stimuli (Supplementary Fig. 8b).Consistent with this, the self-organized connection matrix represented strong within-assembly connections within each cell assembly and weak between-assembly connections (Supplementary Fig. 8c).Note that between-assembly connections were inhibitory between assemblies encoding mutually exclusive stimuli, i.e., S1 and S2 and S3 and S4, as they should be.Now, we turned off S3 and S4 to subnetwork B and only applied S1 or S2 to subnetwork A each at one time.Applying the same stimulus (i.e., S1 or S2) to subnetwork A activated either S3-or S4-coding cell assembly in subnetwork B in a probabilistic manner (Supplementary Fig. 8d).The cell assemblies evoked in subnetwork B by S1 or S2 to subnetwork A varied the total firing rates approximately in proportion to the conditional probabilities (e.g., P(S3|S1) = 1/3 vs. P(S4|S1) = 2/3) used during learning (Supplementary Fig. 8e).Note that S3-and S4-coding cell assemblies could become simultaneously active to represent the desired activation probabilities (e.g., a vertical arrow in Supplementary Fig. 8d).Together, these results indicate that our network can learn prior distributions conditioned by additional inputs through different pathways.

Replication of biased perceptual decision making in monkeys
Prior knowledge about the environment often biases our percept of the external world.For instance, if we know that two possible stimuli exist and that stimulus A appears more often than stimulus B, we tend to feel that a given stimulus is more likely to be stimulus A than stimulus B. Previously, a similar bias was quantitatively studied in monkeys performing a perceptual decision making task (Hanks et al., 2011).In the experiment, monkeys had to judge the direction (right or left) of the coherent motion of moving dots on a display.When both directions of coherent motion appeared randomly during learning, the monkey showed unbiased choice behaviors.However, if the frequencies of the two motion directions were different, the monkey's choice was biased toward the direction of a more frequent motion stimulus.
We constructed a network model shown in Fig. 5a to examine whether the present mechanism of spontaneous replay could account for the behavioral bias.
The model comprises a recurrent network similar to that used in Fig. 2 and two input neuron groups, L and R, encoding leftward or rightward coherent dot movements, respectively.We modulated the firing rates of these input neurons in proportion to the coherence of moving dots (Materials and Methods).During learning, we trained this model with external stimuli having input coherence Coh of either -0.5 or +0.5 (Materials and Methods), where all dots move leftward in the former or rightward in the latter.In so doing, we mimicked the two protocols used in the behavioral experiment of monkeys: in the 50:50 protocol, two stimuli with Coh = ±0.5 were presented randomly with equal probabilities, while in the 80:20 protocol, stimuli with Coh = +0.5 and -0.5 were delivered with probabilities of 80% and 20%, respectively.In the 80:20 protocol, stimuli were highly biased toward a coherent rightward motion.
The network model could explain the biased choices of monkeys surprisingly well.In either training protocol, the recurrent network self-organized two cell assemblies responding selectively to one of the R and L input neuron groups.
Then, we examined whether the responses of the self-organized network are consistent with experimental observations by stimulating it with external inputs  2011).In the 50:50 protocol, moving dots in the "R" (Coh = 0.5) and "L" (Coh = -0.5)directions were presented randomly with equal probabilities, while in the 80:20 protocol, the "R" and "L" directions were trained with 80% and 20% probabilities, respectively.Shaded areas represent SDs over 20 independent simulations.The computational and experimental results show surprising coincidence without curve fitting.(c) Spontaneous and evoked activities of the trained networks are shown for the 50:50 (left) and 80:20 (right) protocols.Evoked responses were calculated for three levels of coherence: Coh = -50%, 0%, and 50%.In both protocols, the activity ratio in spontaneous activity matches the prior probability and gives the baseline for evoked responses.In the 80:20 protocol, the biased priors of "R" and "L" motion stimuli shift the activity ratio in spontaneous activity to an "R"-dominant regime.
having various degrees of input coherence.The resultant psychometric curves almost perfectly coincide with those obtained in the experiment (Fig. 5b).We note that the psychometric curves of the model do not significantly depend on the specific choices of parameter values as far as the network learned stable spontaneous activity.We did not perform any curve fitting to experimental data, implying that the psychometric curves are free from parameter finetuning.Biases in the psychometric curves emerged from biased firing rates of spontaneous activity of the self-organized cell assemblies.To show this, we investigated how the activities of the two self-organized cell assemblies change before and after the onset of test stimuli in three relatively simple cases, i.e., Coh = -0.5, 0, and +0.5. Figure 5c shows the activity ratio AR between the R-encoding cell assembly and the entire network (Materials and Methods) in pre-stimulus spontaneous and post-stimulus evoked activity.When the network was trained in a nonbiased fashion (i.e., in the 50:50 protocol), the activity ratio was close to 0.5 in spontaneous activity, implying that the two cell assemblies had similar activity levels.In contrast, when the network was trained in a biased fashion (i.e., in the 80:20 protocol), the activity ratio in spontaneous activity was close to 0.8, implying that the total spontaneous firing rate of R-encoding cell assembly was four times higher than that of L-encoding cell assembly.Our results show that the spontaneous activity generated by the proposed mechanism can account for the precise relationship between motion coherence and perceptual biases in decision making by monkeys.

Crucial roles of distinct inhibitory pathways
The model presented so far lacked biological plausibility in several key aspects.Specifically, we assumed that the recurrent connections M could change its sign through plasticity and be either excitatory or inhibitory, while the inhibitory connection G was restricted to being inhibitory only.This setting does not reflect the biological constraint that synapses maintain a consistent excitatory or inhibitory type.Furthermore, due to this unconstrained recurrent connectivity M, the original model had two types of inhibitory connections (i.e., the negative part of M and the inhibitory connection G) without providing a clear computational role for each type of inhibition.
To address these limitations and to understand the role of the two types of inhibition, we considered a novel architecture in which all recurrent connections are constrained to be either exclusively excitatory or inhibitory, maintaining their sign throughout the learning process.The refined model includes two different types of inhibitory connections (i.e., M #$* and G), each serving a specific computational purpose: minimizing prediction error and maintaining the excitatory-inhibitory balance.In combination with the excitatory connection M +,-, the M #$* connections are trained to minimize the prediction error between somatic and dendritic activity, as considered in the original M connection in Figure 1.We found that the trained M #$* connections introduce competition among cell assembly activities by forming strong connections between assemblies (Fig. 6b), allowing the network to effectively sample and replay the activities of individual assemblies.
In contrast, inhibitory connections G were trained to balance network dynamics, as in the original setting.We found that the inhibitory G connections form strong intra-assembly inhibition (Fig. 6c), which balances the strong excitatory connections that arise within cell assemblies through plasticity (Fig. 6a).
In summary, the dual inhibitory mechanism allows the network to perform the reactivation of different cell assemblies while regulating their internal dynamics.The prediction-error-minimizing inhibitory connections M #$* facilitate selecting and activating specific assemblies through competition such that the learned probabilities are replayed.In contrast, the network-balancing inhibitory connections G prevent runaway excitation within active assemblies.

An elaborate network model with distinct excitatory and inhibitory neuron pools
The predictive learning rule performed well in training the nDL model to learn the probabilistic structure of the stimulus-evoked activity patterns.However, whether the same learning rule works in a more realistic neural network is yet to be investigated.To examine this, we constructed an elaborate network model (DL model) consisting of distinct excitatory and inhibitory neuron pools, obeying Dale's law (Fig. 7a).The nDL model suggested the essential roles of inhibitory plasticity in maintaining excitation-inhibition balance and generating an appropriate number of cell assemblies.To achieve these functions, inhibitory neurons in the DL model project to excitatory and other inhibitory neurons via two synaptic paths (Fig. 7b), motivated by the results shown in Figure 6.In path1, inhibitory connections alone predict the postsynaptic activity, whereas inhibitory and excitatory connections jointly predict the activity of the postsynaptic neuron in path2 (Materials and Methods).All synapses in the DL model are subject to the predictive learning rule.We trained the DL model with three input neuron groups while varying their activation probabilities.As in the nDL model, the DL model self-organized three cell assemblies activated selectively by the three input neuron groups (Supplementary Fig. 9a).Furthermore, in the absence of external stimuli, the DL model spontaneously replayed these assemblies with the assembly activity ratios in proportion to the occurrence probabilities of the corresponding stimuli during learning (Fig. 7c).
The two inhibitory paths divided their labors consistent with the results shown in Figure 6.To see this, we investigated the connectivity structures learned by these paths.In path 1, inhibitory connections were primarily found on excitatory neurons in the same assemblies (Fig. 7d, top).In contrast, in path 2, inhibitory connections were stronger on excitatory neurons in different assemblies than those in the same assemblies (Fig. 7d, bottom).On both excitatory and inhibitory neurons, the total inhibition (i.e., path 1 + path 2) was balanced with excitation (Fig. 7e). Figure 7f summarizes the connectivity structure of the DL model.Excitatory neurons in a cell assembly project to inhibitory neurons in the same assembly.
Then, these inhibitory neurons project back to excitatory neurons in the same or different assemblies via paths 1 and 2. Interestingly, lateral inhibition through path 1 is more potent between excitatory neurons within each cell assembly than between different assemblies (Fig. 7g).In contrast, path 2 mediates equally strong within-assembly and between-assembly inhibition.
We can understand the necessity of the two inhibitory paths based on the dynamical properties of competitive neural networks.Figure 7h displays the effective competitive network of excitatory cell assemblies suggested by the above results.
Both paths 1 and 2 contribute to within-assembly inhibition among excitatory neurons, whereas between-assembly inhibition (i.e., lateral inhibition) mainly comes from path 2. In a competitive network, the lateral inhibition to self-inhibition strength ratio determines the number of winners having non-vanishing activities: the higher the ratio is, the smaller the number of winners is (Fukai & Tanaka, 1997).Therefore, self-organizing the same number of excitatory cell assemblies as that of external stimuli requires tuning the balance between the within-assembly and between-assembly inhibitions.This tuning during learning is likely easier when the network has two independently learnable inhibitory circuits.
Indeed, a network model with only one inhibitory path rarely succeeded in encoding and replaying all stimuli used in learning (Supplementary Fig. 9b, c).
In summary, we have shown the roles of distinct recurrent inhibitory connections.
Using a network consisting of excitatory and inhibitory populations, we have shown that distinct inhibitory circuits are necessary to generate within-and between-assembly competition crucial to maintain the stability of learned multiple assemblies.

Discussion
Having proper generative models is crucial for accurately predicting statistical events.The brain is thought to improve the prediction accuracy of inference by learning internal generative models of the environment.These models are presumably generated through multiple mechanisms.For instance, the predictive coding hypothesizes that top-down cortical inputs provide lower sensory areas with prior information about sensory experiences (Friston, 2010; Bastos et al., 2012; Keller & Mrsic-Flogel, 2018).However, experimental evidence also suggests that spontaneous activity represents an optimal model of the environment in sensory cortices.This study proposed a biologically plausible mechanism to learn such a model, or priors for experiences, with the brain's internal dynamics.
Our model adopted a single predictive learning principle for the plasticity of excitatory and inhibitory synapses to learn the replay of probabilistic experiences.On each neuron, excitatory and inhibitory synaptic weights undergo plastic changes to improve their independent predictions on the cell's firing.This was done by minimizing the mismatch between the output firing rate and the network predictions(Eqs.13 and 17).This simple learning rule showed excellent performance in a simplified network model and in a more realistic model obeying Dale's law.The latter model predicts a division of labor between two inhibitory paths.Intriguingly, the inhibitory path 2 of this model resembles interpyramidal inhibitory connections driven directly by nearby pyramidal cells (Ren et al., 2007) ).Typically, these models embed prior knowledge on sensory experiences into the wiring patterns of afferent (and sometimes also recurrent) synaptic inputs such that these inputs can evoke the learned activity patterns associated with the prior knowledge.The present model differs from the previous models in several aspects: i) First, the model segments repeated stimuli to be remembered in an unsupervised fashion; ii) Then it generates cell assemblies encoding the segmented stimuli; (iii) Finally, it replays these cell assemblies spontaneously with learned probabilities.Note that the same learning rules enable the network to perform all necessary computations for (i) to (iii).To our knowledge, our model is the first to perform all these steps for encoding an optimal model of the environment into spontaneous network activity.
The present mechanism of memory formation differs from the previous ones that self-organize cell assemblies through Hebbian learning rules (Vogels et with a constant value of  = 3 and we have dropped the explicit time dependence in our notation for the sake of simplicity.Here, the dynamical variable ℎ is determined by the history of the membrane potential: The maximum instantaneous firing rate  ; is 50 Hz and  = = 10 s.Through Eq. 10, ℎ .tracks the maximum value of the membrane potential  . in a time window of approximately the length  = in the immediate past.The value of ℎ is utilized to regulate the gain  and threshold  of the sigmoidal response function as follows: where the values of constant parameters are  ; = 5, and  ; = 1.Neuron  generates a Poisson spike train at the instantaneous firing rate of  .().While a small value of h leads to a steep slope of our activation function (Eq.11), we have shown numerically that this does not lead to a problem in neural dynamics.Further, the saturation part of the sigmoidal function is crucial for stable formation of assemblies.

Learning rules
We first explain the plasticity rule for feedforward connections.Synaptic connections were modified to minimize the Kullback-Leibler divergence (KL-divergence) between two Poisson distributions associated with the neuron's output and the feedforward activity over a sufficiently long period : where v .
A is a feedforward prediction of a firing rate, defined as: and  . is the firing rate of -th neuron.The function  is a static sigmoidal function, defined as: The above cost function evaluates to what extent the feedforward potential predicts the activity of postsynaptic neurons (Asabuki and Fukai, 2020).We have previously shown that taking the gradient of the cost function in Eq.13 derives the online plasticity rule for the feedforward connections as: where  is a learning rateand was set to  = 10 <C , unless otherwise specified.
Here, we have dropped the explicit time dependence in our notation for the sake of simplicity.
Similarly, the recurrent connections were modified to minimize the following cost function: .12 where v .
is a recurrent prediction.Similar to the feedforward plasticity, the gradient descent of the above cost function leads to the following plasticity rule: The derived recurrent plasticity rule suggests that the recurrent prediction learns the statistical model of the evoked activity, which in turn allows the network to replay the learned internal model.
In addition to the above plasticity rules, we defined the cost function for the inhibitory plasticity as: where v .
) is the inhibitory input onto postsynaptic neuron via inhibitory connection G: Again, by taking the gradient of ℒ F with respect to G .B derive the following inhibitory plasticity rule to keep the network dynamics balanced: While the resultant rule is not the same as feedforward and recurrent plasticity rules, all of these rules are similar in a sense that the weight updates are proportional to the prediction error and the presynaptic activity.We therefore assumed the following rule for the inhibitory plasticity, which has the same structure with the rest of the plasticity rules that we have already explained:  -5);  = 3 (Fig. 7).The typical time length required for the convergence of learning is 1,000 s.

Measures for cell assembly activities
Here, we explain the measures used in Fig. 3.We calculated the firing rate ratio of cell assembly 1 in Fig. 3c as follows: Firing rate ratio = in Fig. 3d and assembly activity ratio of cell assembly 1 as Assembly activity ratio = in Fig. 3e.Here, r %S% (.) represents the population neural activity of cell assembly : %S% (B) ≡ ∑  .

Simulations of perceptual decision making
In each learning trial, we trained the network with either leftward or rightward dot movement represented by the corresponding input neurons firing at  T8, = 50 Hz.In test trials, we defined input coherence as Coh =  U − 0.5 according to Hanks et al. (2011), where  U is the ratio of R input neurons to the sum of R and L input neurons in firing rate.The value of Coh ranges between -0.5 (all dots moving leftward) and +0.5 (all dots moving rightward).Then, in test trials for input coherence Coh, we generated Poisson spike trains of R and L input neurons at the rates (Coh + 0.5) T8, and (−Coh + 0.5) T8, , respectively.
In Fig. 5c, we calculated the activity ratio (AR) as where  W %S% and  X %S% represent the average population firing rates of R-encoding and L-encoding cell assemblies, respectively.In Fig. 5b, we defined "choices to right" as Choices to right = AR × 100 (%).

A network model with distinct excitatory and inhibitory synapses
Here, we explain the network model and the plasticity rules used in Figu.6.The network consists of 500 neurons and the membrane potential of a neuron  at time  is given as follows: ''''"'''" where {W ./⬚ } is afferent synaptic weights, which are a mixture of excitatory and inhibitory connections as in the nDL model.The weights of recurrent excitatory synapses are {M .[+,-}.Here, we considered two types of recurrent inhibitory connections, denoted by M ⬚ #$* and G, respectively.Here, we assumed that half of the recurrent connections were assumed to be excitatory and the remaining connections were all inhibitory, half of which were M ⬚ #$* and the other half were G.

Figure 1 .
Figure 1.Unsupervised prior learning in a recurrent neural network.(a) A schematic of a network model is shown.The interconnected circles denote the model neurons, of which the activities are controlled by two types of inputs: feedforward (FF) and recurrent (REC) inputs.Colored circles indicate active neurons.Here,  denotes FF, and  and  denote REC connections.We considered two modes of activity (i.e., evoked and spontaneous activity).In the evoked mode, the membrane potential u of a network neuron was calculated as a linear combination of inputs across all different connections (v W , v M , and v G ).This evoked mode is considered during the learning phase, when all synapses attempt to predict the network activity, as we will explain in the main text.Once all synapses are sufficiently learned, all FF inputs are removed, and the network is driven spontaneously (spontaneous mode).Our interest lies in the statistical similarity of the network activity in these two modes.(b) The gain and threshold of output response function was controlled by a dynamic variable, h, which tracks the history of the membrane potential.(c) A schematic of the learning rule for a network neuron is shown (top).During learning, for each type of connection on a postsynaptic neuron, synaptic plasticity minimizes the error between output (gray diamond) and synaptic prediction (colored diamonds).Note that all types of synapses share the common plasticity rule, where weight updates are calculated as the multiplication of the error term and the presynaptic activities (bottom).Our hypothesis is that such plasticity rule allows a recurrent neural network to spontaneously replay the learned stochastic activity patterns without external input.

Figure 2 .
Figure 2. Formation of stimulus-selective assemblies in a recurrent network.(a) Example dynamics of neuronal output and synaptic predictions are shown before (left) and after (right) learning.Colored bars at the top of the figures represent periods of stimulus presentations.(b) Example dynamics of feedforward connection W and inhibitory connection G are shown.W-connections onto neurons organizing to encode the same or different input patterns are shown in red and blue, respectively.Similarly, the same colors are used to represent G connections within and between assemblies.(c) Dynamics of the mean connection strengths are shown on neuron in cell assembly 1. Shaded areas represent SDs.In the schematic, triangles indicate input neurons and circles indicate network neurons.The color of each neuron indicates the stimulus preference of each neuron.(d) Example dynamics of the averaged dynamical variable ℎ % (top) and the learned

Figure 3
Figure 3 Priors coded in spontaneous activity.An nDL network was trained with five probabilistic inputs.(a) Stimulus 1 appeared twice as often as the other four stimuli during learning.The example empirical probabilities of the stimuli used for learning are shown.(b) The spontaneous activity of the trained network shows distinct assembly structures.(c) The mean ratio of the

Figure 4 .
Figure 4. Probability encoding by learned within-assembly synapses.(a) Two input stimuli were presented in two protocols: uniform (50% vs. 50%) or biased (30% vs. 70%).(b) The total incoming synaptic strength on each neuron was calculated within each cell assembly.(c) left, The distributions of incoming synaptic strength are shown for the learned assemblies in the 50-vs-50 case.right, Same as in the left figure, but in the 30-vs-70 case.(d) left, The empirical probabilities of stimuli 1 and 2 and the normalized excitatory incoming weights within assemblies are compared in the 50-vs-50 case.right, Same as in the left figure, but in the 30-vs-70 case.

Figure 5 .
Figure 5. Simulations of biased perception of visual motion coherence.(a) The network model simulated perceptual decision-making of coherence in random dot motion patterns.In the network shown here, network neurons have already learned two assemblies encoding leftward or rightward movements from input neuron groups L and R. The firing rates of input neuron groups were modulated according to the coherence level Coh of random dot motion patterns (Materials and Methods).(b) The choice probabilities of monkeys (circles) and the network model (solid lines) are plotted against the motion coherence in two learning protocols with different prior probabilities.The experimental data were taken from Hanks et al. (2011).In the 50:50 protocol, moving dots in the "R" (Coh = 0.5) and "L" (Coh = -0.5)directions were presented randomly with equal probabilities, while in the 80:20 protocol, the "R" and "L" directions were trained with 80% and 20% probabilities, respectively.Shaded areas represent SDs over 20 independent simulations.The computational and experimental results show surprising coincidence without curve fitting.(c) Spontaneous and evoked activities of the trained networks are shown for the 50:50 (left) and 80:20 (right) protocols.Evoked responses were calculated for three levels of coherence: Coh = -50%, 0%, and 50%.In both protocols, the activity ratio in spontaneous activity matches the prior

Figure 6 .
Figure 6.A network model with distinct excitatory and inhibitory connections.(a) Strong excitatory connections were formed within assemblies.(b) The first type of recurrent inhibitory connections, M !"# , became stronger between assemblies, enhancing assembly competition.(c) The second type of inhibitory connections G were strengthened within assemblies to

Figure 7 .
Figure 7.The DL model of excitatory and inhibitory cell assemblies.(a) This model consists of distinct excitatory and inhibitory neuron pools, obeying Dale's law.(b) Each inhibitory neuron projects to another neuron X through two inhibitory paths, path 1 and path 2, where the index X refers to an excitatory or an inhibitory postsynaptic neuron.Hexagons represent minimal units for prediction and learning in the neuron model and may correspond to dendrites, which were not modeled explicitly.(c) The probability ratios estimated by numerical simulations are plotted for the assembly activity ratios (purple), firing rate ratios (cyan), and assembly size ratios (green) as functions of the true probability ratio of external stimuli.Error bars indicate SEs calculated over five simulation trials with different initial states of neurons and synaptic weights in each parameter setting.(d) Inhibitory connection matrices are shown for path 1 and path 2. (e) The mean weights of self-organized synapses on excitatory and inhibitory postsynaptic neurons are shown.(f) Within-assembly and between-assembly connectivity patterns of excitatory and inhibitory neurons are shown.Colors indicate three cell assemblies self-organized.(g) The strengths of lateral inhibitions within-(W/N) and between-assemblies (B/N) are shown for paths 1 and 2. Horizontal bars show the medians and quartiles.(h) The resultant connectivity pattern suggests an effective . − (v . ))`.(22)We have shown by numerical simulation that the rule keeps the network dynamics balanced.Initial values of  and  are sampled from gaussian distributions with the mean 0 and variances 0.1/√ and 0.1/√, respectively.During learning, the elements of  and  can take both positive and negative values.After sufficient learning, the postsynaptic potentials v .A and v .E on neuron on neuron  converge to a common value of .* .Therefore, • .A € ≈ •v .E € ≈ (v .* ) ≈  ., imply-ing that the postsynaptic potentials of afferent and recurrent synaptic inputs to neuron  can both predict its output  .after learning.The initial values of  are uniformly set to 1/√, and its elements are truncated to non-negative values during learning.This implies that v . )does not become negative.After learning, (v .) ) ≈  . is satisfied.Although some elements of  may give recurrent inhibitory connections, modifiable connections in  are necessary to encode all external inputs into specific cell assemblies.Stimulation protocolsFeedforward input to the recurrent network consisted of  Poisson spike trains with a background firing rate of 2Hz.The input randomly presented  non-overlapping patterns of 100 spike trains (the duration 100 ms and the mean frequency 50 Hz), one at a time, with pattern-to-pattern intervals of 100 ms.Therefore, the number of input neurons and patterns satisfy the relationship of  = 100 × .For simplicity, we simulated the constant-interval case, but using irregular intervals does not change the essential results.The value of  varies from task to task, and the values for each figure are as follows:  = 5 (Fig. 2c-e, Fig. 3, Fig. 6);  = 2 (Fig. 2a-b, Figs. 4 firing rate  .(B) of the -th neuron in cell assembly  and the number  B of neurons belonging to the cell assembly.Similarly, we defined the assembly size ratio of cell assembly 1 as Assembly size ratio = .; ℎ .) =  ; Y1 + exp[(ℎ .)(− .