Abstract
The properties of different dopamine receptors constrain the function of dopamine signals in the striatum of the basal ganglia. Still, dopamine receptor kinetics are often neglected in considerations of the temporal dynamics of dopamine signalling. Here we develop a neurochemical model of dopamine receptor binding taking into account slow receptor kinetics. Contrary to current views, in our model D1 and D2 dopamine receptor populations react very similarly to dopamine signals independent of their timescale and integrate them over minutes. Furthermore, our model explains why ramping dopamine concentrations, observed experimentally, are an effective signal for increasing the occupancy of dopamine receptors.
Introduction
The neuromodulator dopamine (DA) has complex effects on the activity of striatal neurons by changing their excitability (Day et al., 2008) and strength of synaptic inputs (Reynolds et al., 2001) in the context of motor control (Syed et al., 2016), action-selection (Redgrave et al., 2010), reinforcement learning (Schultz, 2007), and addiction (Everitt and Robbins, 2005). Striatal DA concentration ([DA]) may change over multiple timescales (Schultz, 2007). Fast, abrupt increases in [DA] lasting for ≈ 1 − 3s result from phasic bursts in DA neurons (Roitman et al., 2004), which signal reward-related information (Schultz, 2007; Grace et al., 2007). Slightly slower [DA] ramps occur when rats approach a goal location (Howe et al., 2013) or perform a reinforcement learning task (Hamid et al., 2016). Finally, slow tonic spontaneous firing of DA neurons controls the baseline [DA] and may change on a timescale of minutes or longer (Grace et al., 2007). However, whether fast and slow changes in [DA] actually represent distinct signalling modes, e.g. for learning and motivation (Niv et al., 2007), has recently been challenged (Berke, 2018). Furthermore, DA acts on two different main receptor types, D1 and D2, adding another layer of complexity to its signalling.
Based on different DA affinities of D1 and D2 receptors (D1R and D2R), it is often assumed that striatal medium spiny neurons (MSN) respond differently to tonic and phasic DA changes, depending on which DA receptor they express predominantly (Dreyer et al., 2010; Surmeier et al., 2007; Grace et al., 2007; Schultz, 2007; Frank and O’Reilly, 2006). According to this “affinity-based" model the low affinity D1Rs (i.e. high dissociation constant ; Richfield et al., 1989) cannot detect tonic changes in [DA] because the fraction of occupied D1Rs is small (≈1%) at baseline [DA] (20nM), see Methods) and does not change much during tonic, low amplitude [DA] changes. However, D1Rs seem well suited to detect phasic, high amplitude [DA] increases because they saturate at very high [DA]. By contrast, D2Rs have a high affinity (i.e. low dissociation constant ; Richfield et al., 1989) leading to ≈ 40% of D2Rs being occupied at baseline [DA] (20nM). Due to their high affinity, D2Rs can detect low amplitude, tonic increases/decreases in [DA]. However, as D2Rs saturate at a relatively low , they seem unable to detect high amplitude, phasic increases in [DA]. This suggests that D1 and D2 type MSNs differentially encode phasic and tonic changes in [DA] solely because of the different affinities of D1Rs and D2Rs (Schultz, 2007). However, this view is incompatible with recent findings that D2R expressing MSNs can detect phasic changes in [DA] (Yapo et al., 2017; Marcott et al., 2014).
The affinity-based model assumes that the reaction equilibrium is reached instantaneously, whereby the receptor binding affinity can be used to approximate the fraction of receptors bound to DA. However, this assumption holds only if the receptor kinetics are fast with respect to the timescale of the DA signal, which is typically not the case. For instance, D1Rs and D2Rs unbind from DA with a half-life time of t1/2 ≈ 80s (Burt et al., 1976; Sano et al., 1979; Maeno, 1982; Nishikori et al., 1980), much longer than phasic signals of a few seconds (Robinson et al., 2001; Schultz, 2007; Hamid et al., 2016). Moreover, the fraction of bound receptors might be a misleading measure for the effect of DA signals, since the abundances of D1R and D2R are quite different (see below). Therefore, we developed a model of receptor binding based on the kinetics and abundances of D1Rs and D2Rs to re-evaluate current views on DA signalling in the striatum.
Results and Discussion
To provide a realistic description of receptor kinetics, the binding and unbinding rates that determine the receptor affinity are required. The available experimental measurements indicate that the different D1R and D2R affinities are largely due to different binding rates, while their unbinding rates are similar (Burt et al., 1976; Sano et al., 1979; Maeno, 1982; Richfield et al., 1989). We incorporated these measurements into our slow kinetics model (see Methods) and investigated the model in a variety of scenarios mimicking DA signals on different timescales.
Firstly, to examine our model at baseline [DA], we investigated receptor binding for a range of affinities (Fig. 1a), reflecting the range of measured values in different experimental studies (Neve and Neve, 1997). We report the resulting receptor occupancy in terms of the concentration of D1Rs and D2Rs bound to DA (denoted as [D1 − DA] and [D2 − DA], respectively). Due to the low affinity of D1Rs, slow changes in [DA] only lead to small changes in the fraction of bound D1 receptors. However, there are overall more D1Rs than D2Rs (Richfield et al., 1989), and ≈ 80% of D2Rs are retained in the endoplasmatic reticulum (Prou et al., 2001). Therefore, the concentration of D1Rs in the membrane available to extracellular DA is a lot higher than the concentration of D2Rs (e.g. 20 times more in the nucleus accumbens; Nishikori et al., 1980; Methods). Thus, in our simulation, the actual concentration of bound D1Rs ([D1 − DA] ≈ 20nM) was, at DA baseline, much closer to the concentration of bound D2Rs ([D2 − DA] ≈ 35nM) than suggested by the different D1 and D2 affinities alone. We further confirmed that this was not due to a specific choice of the dissociation constants in the model, as [D1 − DA] and [D2 − DA] remained similar over the range of experimentally measured D1R and D2R affinities (Neve and Neve, 1997) (Fig. 1a). This suggests that [D1 − DA] is at most twice as high as [D2 − DA] instead of 40 times higher as suggested by the difference in fraction of bound receptors. Therefore, [D1 − DA] and [D2 − DA] might be better indicators for the signal transmitted to MSNs as the fraction of bound receptors neglects the different receptor type abundances.
Next, we investigated the effect of slow [DA] changes (Grace, 1995; Schultz, 1998; Floresco et al., 2003) by exposing our model to changes in the [DA] baseline. For signalling timescales that are long with respect to the half-life time of the receptors (ts/ow >> t1/2 ≈ 80s), we used the dissociation constant to calculate the steady state receptor occupancy. We found that for slow changes to a range of [DA] baselines, [D1 −DA] and [D2 −DA] were also similar (Fig. 1b). Thus, we conclude that D1R and D2R occupancy reacts similarly to slow, low amplitude [DA] changes because of the different abundances of D1 and D2 receptors. This is contrary to instant kinetics models, suggesting that D2Rs are better suited to encode slow or tonic changes in [DA].
To study the impact of faster [DA] signals, we measured the step response of the model to a [DA] change from 20nM to 1μM. This is quite a large change compared to phasic DA signals in vivo (Robinson et al., 2001; Cheer et al., 2007; Hamid et al., 2016), which we choose to illustrate that our results are not just due to a small amplitude DA signal. We found that binding to both receptor subtypes increased very slowly. Even for the high affinity D2Rs it took more than 5s to reach their new equilibrium (Fig. 1c). Thus, unlike the instant kinetics model, our model suggests that the D2Rs will not saturate for single reward events, which last overall for up to ≈ 3s. Note that the non-saturation is independent of the abundance of the receptors and is only determined by the kinetics of the receptors (see Methods). Due to their slow unbinding, D1Rs and D2Rs also took a long time to return to baseline receptor occupancy after a step down from [DA] = 1μM to [DA] = 20nM (Fig. 1d). Thus, we conclude that with slow kinetics of receptor binding both D1Rs and D2Rs can detect single phasic DA signals and that both remain occupied long after the [DA] has returned to baseline.
Next, we investigated [D1 − DA] and [D2 − DA] for a phasic DA increase (mimicking reward responses; Robinson et al., 2001; Cheer et al., 2007), a phasic DA increase followed by a decrease (mimicking responses to non-reward, salient stimuli; Schultz, 2016), and a prolonged DA ramp (mimicking goal approach; Howe et al., 2013; Hamid et al., 2016). In the instant kinetics model the D1Rs mirrored the [DA] time course, since even at [DA] = 200nM they are far from saturation, whereas the D2Rs showed saturation effects as soon as , leading to differing D1 and D2 time courses (Fig. 1e, f). Importantly, in our model with slow kinetics, the time courses of [D1 − DA] and [D2 − DA] were similar for each of the three types of phasic DA signals.
While in our model we assumed slow kinetics based on neurochemical estimates of wildtype DA receptors (Burt et al., 1976; Sano et al., 1979; Maeno, 1982), recent genetically-modified DA receptors, used to probe [DA] changes, have apparent fast kinetics (Sun et al., 2018; Patriarchi et al., 2018). Although their kinetics strongly changed between receptor variants and may not reflect the kinetics of the wildtype receptor, we examined our model also in the context of faster DA kinetics and found that the similarity between [D1 − DA] and [D2 − DA] can be observed even if the actual kinetics were a 100 times faster than assumed in our model (Supp. Fig. 1). Therefore, our results do not depend on the exact kinetics parameters or potential temperature effects, as long as the parameter changes are roughly similar for D1 and D2 receptors. Furthermore, taking into account different affinity states for D1Rs and D2Rs (Richfield et al., 1989), preserved the similarity of time courses of D1R and D2R occupancy (Supp. Fig. 7). Finally, pauses in the DA firing following aversive stimuli (Schultz, 2007) that lead to reductions in [DA] (Roitman et al., 2008), also have a similar effect on D1R and D2R occupancy (Supp. Fig. 4e).
Another striking effect of incorporating receptor kinetics was that a phasic increase in [DA] kept the receptors occupied for a long time (Fig. 1e). However, when a phasic increase was followed by a decrease, [D1 − DA] and [D2 − DA] quickly returned to baseline. This indicates that burst-pause firing patterns observed in DA cells for aversive or salient non-rewarding signals (Schultz, 2016) can be distinguished from pure burst firing patterns (which only lead to a phasic increase in [DA]) on the level of the MSN DA receptor occupancy. This supports the view that the fast component of the DA firing patterns (Schultz, 2016) is a salience response, and points to the intriguing possibility that the pause following the burst can, at least partly, revoke the receptor-ligand binding induced by the burst (see also Supp. Fig.2). This effect even persists in a sequence of burst and burst-pause events (Supp. Fig. 5). Thereby, the burst-pause firing pattern of DA neurons could effectively signal a reward false-alarm.
The similarity of [D1 − DA] and [D2 − DA] responses to both slow and fast [DA] changes indicates that the different DA receptors respond similarly independent of the timescale of [DA] changes. To understand why the D1Rs and D2Rs respond similarly, we considered the relevant model parameters in more detail. The binding rate constants of D1Rs and D2Rs differ by a factor of ≈ 60 ; Burt et al., 1976; Sano et al., 1979; Maeno, 1982; Methods), suggesting faster D2Rs. However, experimental data suggests that there are ≈ 40 fold more unoccupied D1 receptors (≈1600nM) than unoccupied D2 receptors (≈ 40nM) on MSN membranes in the extracellular space of the rat striatum (Nishikori et al., 1980). Therefore, the absolute binding rate differs only by a factor of ≈ 1.5 between the D1Rs and D2Rs. That is, the difference in the kinetics of D1Rs and D2Rs is compensated by the different receptor numbers, resulting in nearly indistinguishable aggregate kinetics (Fig. 1e, f). This is consistent with recent experimental findings that D2R expressing MSNs can detect phasic [DA] signals (Yapo et al., 2017; Marcott et al., 2014).
Incorporating the slow kinetics in the model is crucial for functional considerations of the DA system. Currently, following the instant kinetics model, the amplitude of a DA signal (i.e. peak [DA]) is often considered as a key signal e.g. in the context of reward magnitude or probability (Hamid et al., 2016; Tobler et al., 2005; Morris et al., 2004). However, as DA unbinds slowly (over tens of seconds; Fig. 1d) and the binding rate changes approximately linearly with [DA], the amount of receptor occupancy primarily depended on the area under the curve of the [DA] signal (Supp. Fig. 3). Therefore, DA ramps, even with a relatively small amplitude (Fig. 1f and Supp. Fig. 4), were very effective in increasing DA receptor binding. In contrast, for locally very high [DA] (e.g. at corticostriatal synapses during phasic DA cell activity; Grace et al., 2007) the high concentration gradient would only lead to a very short duration of this local DA peak and thereby make it less effective in occupying DA receptors.
The dynamics introduced by the slow kinetics had further effects on the timecourse of DA signalling. With instant kinetics the maximum receptor occupancy was reached at the peak [DA] (Fig. 1e, f). By contrast, for slow kinetics the maximum receptor occupancy was reached when [DA] returned to its baseline (Fig. 1e) because as long as [DA] was higher than the equilibrium value of [D1-DA] and [D2-DA], more receptors continued to become occupied. Therefore for all DA signals, the maximum receptor occupancy was reached towards the end of the pulse (Fig. 1e, f and Supp. Fig. 4).
Another effect of the slow kinetics was that DA receptors remained occupied long after the DA pulse is over (Fig. 1e, f). This allowed the integration of DA pulses over minutes (Fig. 2a, b and Supp. Fig.5). We investigated potential functional consequences of this integration by exposing the model to a sequence of trials modeling a simple behavioural experiment with stochastic rewards (see Methods). We found that both D1R and D2R occupancy coded for reward probability (Fig. 2 and Supp. Fig. 6), consistent with functional roles of DA signalling in motivation. However, this does not preclude potential DA roles on shorter time scales, such as the invigoration of movements (Roesch et al., 2009) or fast updates of state value (Hamid et al., 2016), as a sensitive readout mechanism could also detect small increases in [D1-DA] and [D2-DA] (Lamb and Pugh Jr, 1992).
Overall, our slow kinetics model of DA receptor binding casts doubt on several long-held views on DA signalling. Our model indicates that both D1R and D2R systems can detect [DA] changes, independent of the timescale, equally well. Although, D1Rs and D2Rs have opposing effects on the excitability (Flores-Barrera et al., 2011) and strength of cortico-striatal synapse of D1 and D2 type MSNs (Centonze et al., 2001), we challenge the current view that differences in receptor affinity introduce additional asymmetries in D1 and D2 signalling. Instead of listening to different components of the DA signal, D1 and D2 MSNs seem to respond to the same DA input, increasing the differential effect on firing rate response of D1 and D2 MSNs.
Methods and Materials
The models were implemented in Python. The scripts used to generate the data and figures can be accessed here: https://bitbucket.org/Narur/abundance_kinetics/src/.
Kinetics model
In the instant kinetics model the fraction of occupied D1 and D2 receptors (fD1 and fD2) are calculated directly from the concentration of free DA in the extracellular space, [DA], and the dissociation constant KD:
However, the dissociation constant is an equilibrium constant, so it should only be used for calculating the receptor occupancy when the duration of the DA signal is longer than the time needed to reach the equilibrium. As this is typically not the case for phasic DA signals (see main text), we developed a model incorporating slow kinetics.
When DA and one of its receptors are both present in a solution they constantly bind and unbind. During the binding a receptor ligand complex (here called DA−D1 or DA−D2) is formed. We call the receptor ligand complex an occupied DA receptor. Note that although in the following part we provide the equations for D1 receptors, the same equations apply for D2 receptors (with different kinetic parameters). In a solution binding occurs when receptor and ligand meet due to diffusion, with high enough energy and a suitable orientation, described as:
Accordingly, unbinding of the complex is denoted as:
The kinetics of this binding and unbinding, treated here as first-order reactions, are governed by the rate constants kon and koff that are specific for a receptor ligand pair and temperature dependent. Since both processes are happening simultaneously we can write this as:
The rate at which the receptor is occupied depends on [DA], the concentration of free receptor [D1] and the binding rate constant kon:
The rate at which the receptor-ligand complex unbinds is given by concentration of the complex [DA − D1] and the unbinding rate constant koff:
The equilibrium is reached when the binding and unbinding rates are equal, so by combining Eq. 5 and Eq. 6 we obtain:
At the equilibrium the dissociation constant KD is defined as:
When half of the receptors are occupied, i.e. [DA−D1] = [D1], Eq. 8 simplifies to KD = [DA]. So at equilibrium, KD is the ligand concentration at which half of the receptors are occupied.
Importantly, for fast changes in [DA] (i.e. over seconds) it takes some time until the changed binding (Eq. 5) and unbinding rates (Eq. 6) are balanced, so the new equilibrium will not be reached instantly. The timescale in which equilibrium is reached can be estimated from the half-life time of the bound receptor. The half-life time assumes an exponential decay process as described in Eq. 6 and is the time required so that half of the currently bound receptors unbind. If [DA] = 0, and there is no more binding, the half life time of the receptors can be calculated from the off-rate by using t1/2 = ln(2)/koff. Signal durations should be of the same order of magnitude (or longer) than the half-life time in order for the instant kinetics model to be applicable.
We calculated the time course of occupied receptor after an abrupt change in [DA] by integrating the rate equation, given by the sum of Eq. 5 and Eq. 6:
To integrate Eq. 9 we substitute where [D1tot] is the total amount of D1 receptor (bound and unbound to DA) on the cell membranes available for binding to extracellular DA.
To model the effect of phasic changes in [DA] we choose the initial receptor occupancy [DA − D1](t = 0) = [DA − D1]0 and the receptor occupancy for the new equilibrium at time infinity [DA − D1](t = ∞) = [DA − D1]∞ as the boundary conditions. With these boundary conditions we get an expression for the time evolution of the receptor occupancy under the assumption that binding to the receptor does not significantly change the free [DA]:
For our slow kinetics model we solved Eq. 9 for each receptor type and arbitrary DA timecourses numerically employing a 4th order Runge Kutta solver with a 1 ms time resolution.
We did not take into account the change in [DA] caused by the binding and unbinding to the receptors since the rates at which DA is removed from the system by binding to the receptors is much slower than the rate of DA being removed from the system by uptake through DA transporters. For example the rate at which DA binds to the receptors is:
At a DA concentration of [DA] = 1μM with a D1 and D2 occupancy of [DA − D1] ≈ 20.0nM and [DA − D2] ≈ 40nM (the equilibrium values for [DA] = 20nM) and , ., [D1] ≈ 1600.0nM, [D2] ≈ 40.0nM and [DA]=1μM the rate of DA removal through binding to the receptors is:
However, the DA removal rate by Michaelis-Menten uptake through the DA transporters at this concentration would be:
Where Vmax is the maximal uptake rate, and Km the Michaelis-Menten constant describing the [DA] concentration at which uptake is at half the maximum rate. As , the DA dynamics are dominated by the uptake process and not by binding to the receptors. Therefore, we neglected the receptor-ligand binding for the DA dynamics in our model. However, for faster DA receptors this effect would become more important.
Receptor parameters
An important model parameter is the total concentration of the D1 and D2 receptors on the membrane ([D1]tot and [D2]tot) that can bind to DA in the extracellular space of the striatum. Our estimate of [D1]tot and [D2]tot is based on radioligand binding studies in the rostral striatum (Richfield et al., 1989, 1987). We use the following equation, in which X is a placeholder for the respective receptor type, to calculate these concentrations.
The experimental measurements provide us with a the number of receptors per unit of protein weight [D1]m and [D2]m. To transform these measurements into molar concentrations for our simulations, we multiply by the protein content of the wet weight of the rat caudate nucleus e, which is around 12% (Banay-Schwartz et al., 1992). This leaves us with the amount of protein per g of wet weight of the rat brain. Next we divide by the average density of a rat brain which is ρbrain = 1.05g/m/ (DiResta et al., 1990) to find the amount of receptors per unit of volume of the rat striatum. Finally, we divide by the volume fraction a, the fraction of the brain volume that is taken up by the extracellular space in the rat brain, to obtain the receptor concentration of the receptor in the extracellular medium. The procedure ends here for the D1 receptors since there is no evidence that D1 receptors are internalized in the baseline state (Prou et al., 2001). However, a large fraction of the D2 receptors is retained in the endoplasmatic reticulum of the neuron (Prou et al., 2001), reducing the amount of receptors that contribute to the concentration of receptors in the extracellular medium by fmembrane, the fraction of receptors protruding into the extracellular medium.
In addition to the receptor concentration, the kinetic constants of the receptors are key parameters in our slow kinetics model. In an equilibrium measurement in the canine caudate nucleus the dissociation constant of low affinity DA binding sites, corresponding to D1 receptors (Maeno, 1982), has been measured as Kd = 1.6μM (Sano et al., 1979). However, when calculating Kd (using Eq. 8) from the measured kinetic constants (Sano et al., 1979) the value is . To be more easily comparable to other simulation works (Dreyer et al., 2010) and direct measurements (Richfield et al., 1989; Sano et al., 1979) we choose in our simulations. For this purpose we modified both the rate measured (Sano et al., 1979) by ≈ 25%, making slightly faster and slightly slower, so that the resulting . The kinetic constants have been measured at 30 c and are temperature dependent. In biological reactions a temperature change of 10 c is usually associated with a change in reaction rate around a factor of 2-3 (Reyes et al., 2008). However, the conclusions of this paper do not change for an increase in reaction rates by a factor of 2 − 3 (see Supp. Fig. 1). It should also be noted that the measurement of the commonly referenced Kd (Richfield et al., 1989) have been performed at room temperature.
The kinetic constants for the D2 receptors were obtained from measurements at 37°C of high affinity DA binding sites (Burt et al., 1976), which correspond to the D2 receptor (Maeno, 1982). The values are and which yields , in line with the values measured in (Richfield et al., 1989). As the off-rate of the D1 and D2 receptors and is quite similar, the difference in and is largely due to differences in the on-rate of the receptors. This is important because the absolute rate of receptor occupancy depends linearly not only on the on-rate, but also on the receptor concentration (see Eq. 5), which means that a slower on-rate could be compensated for by a higher number of receptors.
The parameters that we used in the simulations are summarized in Tab. 1.
Dopamine signals
In our model we assumed a baseline [DA] of [DA]tonic = 20 nM (Dreyer et al., 2010; Dreyer, 2014; Venton et al., 2003; Suaud-Chagny et al., 1992; Borland et al., 2005; Justice Jr, 1993; Atcherley et al., 2015). We modelled changes in [DA] to mimic DA signals observed in experimental studies. We use three types of single pulse DA signals: (long-)burst, burst-pause and ramp.
The burst signal mimics the result of a phasic burst in the activity of DA neurons in the SNc, e.g. in response to reward-predicting cues (Pan et al., 2005). The model burst signal consists of a rapid linear [DA] increase (with an amplitude Δ[DA] and rise time trise) and a subsequent return to baseline. The return to baseline is governed by Michaelis Menten kinetics with appropriate parameters for the dorsal striatum Vmax = 4.0 μMs−1 and Km = 0.21 μM (Bergstrom and Garris, 2003) and the nucleus accumbens Vmax = 1.5 μMs−1 (Dreyer and Hounsgaard, 2013). In our model the removal of DA is assumed to happen without further DA in2ux into the system (baseline firing resumes when [DA] has returned to its baseline value). Unless stated otherwise, the long-burst signals are used with a Δ[DA] = 200 nM and a rise time of trise = 0.2 s at Vmax = 1.5 μMs−1, similar to biologically realistic transient signals (Cheer et al., 2007; Robinson et al., 2001; Day et al., 2007).
The burst-pause signal has two components, an initial short, small amplitude burst (Δ[DA] = 100 nM, trise = 0.1 s), with the corresponding [DA] return to baseline (as for the long burst above). However, there is a second component in the DA signal, in which [DA] falls below baseline, simulating a pause in DA neuron firing. The length of this firing pause is characterized by the parameter tpause. This burst-pause [DA] signal re2ects the DA cell firing pattern consisting of a brief burst followed by a pause in activity (Pan et al., 2008; Schultz, 2016).
The ramp DA signal is characterized by the same parameters as the burst pattern, but with a longer trise, and a smaller Δ[DA].
Behavioural task simulation
To determine whether DA receptor occupancy can integrate reward signals over minutes, we simulated sequences consisting of 50 trials. Each sequence had a fixed reward probability. The trials contained either a long burst DA signal (mimicking a reward) or a burst-pause DA signal (mimicking no reward) at the beginning of the trial according to the reward probability of the sequence. The inter trial interval was 15 ± 5s (Fig. 2 and Supp. Fig. 6). We choose this highly simplistic scenario to re2ect DA signals in a behavioural task in which the animal is rewarded for correct performance. However, here the specifics of the task are not relevant as our model addresses the integration of the DA receptor occupancy over time. Although we chose to use the burst-pause type signal as shown in Fig. 1e as a non-rewarding event, the difference to a non-signal are minimal after the end of the pause (Supp. Figs. 2 and 5). Each sequence started from a baseline receptor occupancy, assuming a break between sequences long enough for the receptors to return to baseline occupancy (around 5 minutes). For the simulations shown in Supp. Fig. 5 all trials started exactly 15 s apart.
We simulated all reward probabilities from 0% to 100% in 10% steps. For each reward probability we ran 500 sequences, and calculated the mean receptor occupancy over time (single realisations shown in Fig. 2a, b). To investigate whether the receptor occupancy distinguished between different reward probabilities we applied a simple classifier to the receptor occupancy timeline.
The classifier was used to compare two different reward probabilities at a time. At each time point it was applied to a pair of receptor occupancies, e.g. one belonging to a 50% and one to a 30% reward probability sequence. The classifier assigned the current receptor occupancy to the higher or lower reward probability depending on which one was closer to the mean (over 500 sequences) receptor occupancy of that reward probability. As we knew the underlying reward probability of each sequence we were able to calculate the true and false positive rates and accuracy for each time point in our set of 500 sequences for both the D1R and D2R (Supp. Fig. 6). The accuracy was calculated based on all time points between 200 and 800s within a sequence to avoid the effect of the initial “swing-in” and post-sequence DA levels returning to baseline.
Acknowledgments
We thank Joshua Berke, Paul Overton, Alejandro Jimenez, Mohammadreza Mohagheghi Nejad and Amin Mirzaei for helpful discussions. This work was supported by the University of SheZeld and its high performance computing resources, the BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG, grant number EXC 1086), and the state of Baden-Wuerttemberg through bwHPC.