Contextual gating of motivationally-relevant stimuli in the mouse nucleus ac- cumbens

1 Neural activity in the nucleus accumbens (NAc) is thought to track fundamentally value-centric quantities 2 such as current or future expected reward, reward prediction errors, the value of work, opportunity cost, and 3 approach vigor. However, the NAc also contributes to flexible behavior in ways that are difficult to explain 4 based on value signals alone, raising the question of if and how non-value signals are encoded in NAc. We 5 recorded NAc neural ensembles while head-fixed mice performed a biconditional discrimination task, and 6 extracted single-unit and population-level correlates of task features. We found coding for context-setting 7 cues that modulate the stimulus-outcome association of subsequently presented reward-predictive cues. This 8 context signal occupied a subspace orthogonal to classic value representations, suggesting that it does not 9 interfere with value-related NAc output. Finally, we show that the context signal is predictive of subsequent 10 value coding, supporting a circuit-level gating model for how the NAc contributes to behavioral flexibility 11 and providing a novel population-level perspective from which to view NAc computations. 12


Introduction
: Example single-unit responses for units with value (top) and context (bottom) correlates. Top of each plot shows spike rasters for the four trial types (purple: trials with context cue O1; orange: trials with context cue O2; dark colors: rewarded trials; light colors: unrewarded trials). Bottom half of each panel shows trial-averaged firing rates for each trial type aligned to context cue onset. Context cue presentation (0-1 s) is bordered by red lines, and target cue presentation (3-4 s) is bordered by black lines. A: Example unit that shows a general response to cue presentation (blue arrow), as well as a subsequent discrimination between rewarded and unrewarded trials after target cue onset (red arrow). B: Example unit showing a dip in firing after context cue onset (blue arrow), followed by a ramping of activity leading up to target cue onset, and a subsequent dip in firing after presentation of the rewarded target cue (red arrow). C: Example unit that predominantly responds during presentation of the rewarded target cue (blue arrow). D: Example unit that shows transient responses to the cues, showing a discrimination to both context (blue arrow) and target (red arrow) cues. E: Example unit that discriminates between context cues, including throughout the delay period (blue arrow). F: Example unit that discriminates context cues only during the delay period following offset of the context cue (blue arrow), as well as discriminating the subsequent target cues (red arrow).

Figure 3:
Characterization of single-unit responses to the task. Top of each plot is a heat plot showing either max normalized firing rates or firing rate differences for trial-averaged data for all eligible units, with unit identity sorted according to the peak value for the comparison of interest. Red lines border context cue presentation, and black lines border target cue presentation. Bottom of each plot shows a distribution of units with significant tuning for each task parameter, relative to a shuffled distribution. Red dotted lines signify z-scores of +/-2.58. A: Firing rate profiles for units at 1 s pre-and post-context cue onset, sorted according to maximum value after context cue onset. B: Firing rate differences for units across context cues, sorted according to maximum difference during context cue presentation. C: Firing rate differences for units across context cues during the delay period, sorted according to maximum difference during the 1 s period preceding target cue presentation. D: Firing rate profiles for units at 1 s pre-and post-target cue onset, sorted according to maximum value after target cue onset. E: Firing rate differences for units across target cues, sorted according to maximum difference during target cue presentation. F: Firing rate differences for units for rewarded and unrewarded trial types during target cue presentation, sorted according to maximum difference during target cue presentation. G: Proportion of significant units for each task component. Each mouse is indicated by a symbol, and the average across mice is indicated by the red line. Note the higher context coding for M040 and M142, relative to M111 and M146. Also, note the stronger value coding in M111. General cue, gating, and state value categories represent units that were modulated by a combination of different task components, see methods for classification details. H: Correlation of firing rates across time on trial-averaged data across all units. Note that high correlations for periods of time when a cue is present, and the anticorrelations of these cue periods with pre-cue periods. Red lines border context cue presentation, and black lines border target cue presentation.
Across mice, there appeared to be a qualitative relationship between the time spent in training and the 123 strength of context cue coding, with mice that spent more time to acquire the task showing stronger con-124 text coding than mice that had a shorter learning curve (Figure 3-supplement 1). For instance, M040 (28 125 training sessions; 27% context-coding units) and M142 (18 training sessions; 35% context-coding units) 126 showing more context-sensitive units than M111 (9 training sessions; 5% context-coding units) and M146 127 (7 training sessions; 21% context-coding units). Additionally, M111 which had the least amount of context-128 coding units, also had the most caudal recording coordinates across mice ( Figure 9). Furthermore, this 129 variability across animals was not related to behavioral performance during recording sessions, and a similar 130 relationship was also not seen for target or value coding. Together, this suggests that variability in context 131 coding across mice might be due to differences in training duration or precise recording location. that context-dependent neural activity implements a routing mechanism in the NAc. For instance, depending 140 on the context, which is realized as a distinct network activity pattern in the NAc, a given reward-predictive 141 cue would be associated with a different expected value. Thus, each context-associated network state would 142 gate the flow of subsequent reward-predictive stimuli to generate dynamic value representations, through 143 some previous dopamine-dependent learning mechanism that trains the synaptic weights ( Figure 4A).

144
An alternative possibility for how context may be represented in the NAc is through the framework of rein-   Figure 4: Schematic of hypothetical coding scenarios for context cues. A: Context cues function to gate the routing of subsequent target cues. Target cues in this study are reward-predictive cues whose reward-predictive properties are dynamic and not fixed. Top: Schematic of network activity for a pool of neurons in response to a series of motivationally-relevant cues, similar to that presented in 1. In this schematic, a behavioral response to cue 1 is rewarded when preceded by context cue 1 but not by context cue 2, and vice versa for cue 2. After presentation of a context cue, the network shifts to a new activity state, characterized by a change in the firing rates of individual neurons, with each context cue triggering a distinct network state. Furthermore, the difference in the excitability of individual units after presentation of the context cue then determines the receptivity of individual units in the network to subsequent response of the target cue, allowing the generation of dynamic value estimates to facilitate the appropriate Go/No-go response. In this case the network is coding a gating response, that is modulated by the context. Bottom: Hypothetical PETHs for each trial type for a context coding population-level representation. Note, the representation discriminates firing to the two context cues and this difference is sustained until target cue presentation (purple vs. orange lines). Red lines border presentation of the context cue, black lines border presentation of the target cue (see Figure 1 for further details). B: Context cues are interpreted in the NAc by their proximity to a rewarded state. Top: Presentation of either context cue elicits a similar network state, that is further amplified upon presentation of the rewarded target cue. This ramp-like activity in the network is coding a state value signal. Bottom: Hypothetical PETHs for a populationlevel state value coding signal. Note, the peak activity during presentation of the target cue for rewarded trial types (dark colors) but not unrewarded trial types (light colors), and the ramp leading up to this via the context cues. a state closer to reward, regardless of the identity of the context cue ( Figure 4B). Finally, a third possibility 155 is that the NAc may generalize across all motivationally-relevant cues ( Figure 4C).

156
A possible functional role for this context signal is to appropriately gate the response to cues whose relevance 157 is context-dependent. In this case, context-dependent activity preceding target cue presentation should be 158 able to predict the behavioral response for a given target cue. For example, if a unit shows a higher firing rate 159 for context cue O1 over O2, this discrimination would be linked to subsequent behavior if on a trial-by-trial 160 basis it informed whether or not an animal licked in response to O3. In this situation, a licking response to 161 O3 would be predicted on trials where the unit had a higher firing rate preceding target cue presentation. To 162 test for this at the single-unit level, we first reran our firing rate comparisons across the whole trial period, 163 and found that 15-26% of units had firing rates that discriminated between the two context cues during the 164 span of the context cue and delay periods ( Figure 5A). We then trained a binomial regression to predict the 165 behavioral response (lick or no lick) for a given target cue using the firing rate of a unit at various time points 166 in a trial. We found that 5-13% of units across time points were able to predict subsequent response to a 167 target cue above chance, suggesting that individual units possess some information about the context that 168 informs future response behavior ( Figure 5B). Predicting behavioral response to a given target cue based on firing rate activity. For a given target cue such as O3, this analysis sought to predict whether a lick or no lick response occurred based on activity preceding the target cue during the context and delay period. A: Sliding window demonstrating the proportion of units that show discriminatory activity between the two context cues throughout the different periods in the trial, similar to Figure 3B-C. Red lines border context cue presentation, and black lines border target cue presentation. B: Proportion of all units whose firing rate at a given point in the trial predicts the behavioral response to a given target cue above chance, according to a binomial regression. C: Same as B, but using the firing rates of all units recorded for a mouse to generate a pseudo-ensemble prediction of the behavioral response for a given target cue. Each line denotes the prediction for a various mouse. Note, the variability across mice reflects the variability in single-unit context coding seen in Figure 3G.
While single-unit analyses are informative to get a sense of what information is present within a neural population, the utility of these responses are dependent upon their position within the broader NAc network, and how they are interpreted by downstream structures. To characterize this population-level activity, we 172 combined across-session data for each mouse to generate 4 pseudo-ensembles, one for each mouse in the 173 dataset. As a first step to test if the population could improve predictions of trial outcome above and beyond 174 that of the best performing single-unit units, we trained a binomial regression to predict the behavioral 175 response for each target cue using the firing rates of these pseudo-ensembles ( Figure 5C). This analysis 176 revealed the ability to accurately predict subsequent the behavioral response for a given target cue was above

182
To determine which patterns of ensemble activity were driving these behavioral predictions, as well as to test 183 the hypothetical coding scenarios outlined above (Figure 4), we used the dimensionality reduction technique 184 dPCA to extract the task-related latent factors relating to the context cues, target cues, and trial value ( Figure   185 6). dPCA was the method selected as it constrains dimensionality reduction to extract the components that 186 explain the most variance in the data for a given task parameter. dPCA differs from PCA as the latter extracts 187 the components that capture the most variance in the data, agnostic to any aspects of the task. Additionally, 188 dPCA was chosen over LDA, as LDA is focused on reconstructing identities, while dPCA is focused on 189 reconstructing data means, and thus, dPCA is better suited to preserve aspects of the original data. We applied   To investigate population-level representations of task features, and their relationship to one another, requires a dimensionality reduction technique that can extract latent variables representing individual features of the task, while preserving the structure of the data. In dPCA, pseudo-ensemble activity for each mouse, shown here by five hypothetical single-unit response profiles (top), can be reduced into a few key behaviorally-relevant components (bottom) through a decoder (middle) that seeks to minimize reconstruction error between reconstructed data and task-specific trial-averaged data. Note, there are multiple ways to combine individual units to generate population-level representations. In this example, orthogonal context and value representations are extracted, suggesting that these two patterns of activity occupy separate subspaces in the neural activity space. Green numbers represent the weights of a unit for a given component.

B:
The hypothetical context and value components (bottom) can be used to test the feasibility of the context-dependent gating hypothesis (top). Shown on the bottom is the progression of neural activity through a trial for each trial type in a two-dimensional neural subspace, with the trial-averaged projected activity in the context component on the x-axis, and the trial-averaged projected activity in the value component on the y-axis. If the contextrelated component brings the network to a distinct state (note the separation along the context-axis from ITI to delay) that modulates the input-output mapping of the subsequent target cue (note the separate paths taken by a target cue in the neural space for each context cue), then a quantifiable relationship should exist between the two components at these timepoints. This can be tested by using linear regression to predict activity (arrow) in the value axis during target cue presentation (red box) from activity in a context axis during the delay period preceding target cue presentation (blue box).

Figure 7:
Top extracted components demonstrating the presence of co-existing signals during the context and delay period for each mouse, as outlined in Figure 4. dPCA was applied to the pseudeo-ensemble data from each mouse to extract low-dimensional population representations for various task features. Each plot represents the trial-averaged projected activity onto the component for a given task feature (rows) for each mouse (columns) for each trial type (purple: trials with context cue O1; orange: trials with context cue O2; dark colors: rewarded trials; light colors: unrewarded trials). Plot title denotes the overall ranking of the component, and the amount of variance explained by the component. Red lines border context cue presentation, and black lines border target cue presentation. From left-to-right shows components for M040, M111, M142 and M146. A: Top non-specific signal that responded to all odors, and called the general cue component ( Figure 4C). Note that this signal is present during presentation of both context and target cues. B: The extracted component from each mouse that best represents a state value signal ( Figure 4B), with a ramping-like activity after context cue onset, with a separation between rewarded (dark colors) and unrewarded (light colors) trials after target cue onset. Note the variability in this component across mice (greatest rewarded and unrewarded separation for M040 and M111). C: The context-related component that best separated context cues during the delay period ( Figure 4A). Note that in the mice where this signal is strongest (M040, M142), the strong separation between context O1 (purple) and context O2 (orange) trials from context cue onset until target cue onset.
In terms of components related to the context cues, there are several noteworthy observations. First, there    Figure. A: Progression of neural activity through a trial for each trial type (purple: trials with context cue O1; orange: trials with context cue O2; dark colors: rewarded trials; light colors: unrewarded trials) in a two-dimensional neural subspace, with the trial-averaged projected activity in the context-related delay component (see Figure 7C) on the x-axis, and the trialaveraged projected activity in the value component (see Figure 7-supplement 1 F) on the y-axis. During the context and delay periods a separation is observed along the context axis between context O1 and context O2 trials, after which separation is observed along the value axis following target cue presentation for rewarded versus unrewarded trials. Red circles signal context-cue onset, cyan circles signal delay period 1 s after context cue offset, black circles signal 1 s after target cue onset. B: Using a binomial regression to predict the behavioral response (lick or no lick) for a given target cue based on projected activity along the context-related delay component at various timepoints, showing the high accuracy during the context and delay periods. Red lines border context cue presentation, and black lines border target cue presentation. C: Using a linear regression to predict projected activity along the value-related axis after target cue onset for a given target cue (black circles from A) based on projected activity in the context-related axis at various timepoints, showing performance above chance levels during the context and delay period. D: Control analysis showing the inability of using a linear regression to predict projected activity along the value-related axis after target cue onset for a given context cue based on projected activity in the context-related axis. E: Iteratively removing the top 10% of contributors to the context-related delay component and repeating the linear regression-based analysis of predicting value-related activity as in C, showing the ability to achieve above chance perform even after removing the top 20% of single-unit contributors to the context-related delay component.

279
To our knowledge, this study is the first to show in rodents that NAc units discriminate between context 280 cues that are not directly tied to reward (26% of units across mice), but instead set the expected value of 281 subsequently presented reward-predictive cues. This finding of context coding is not likely due to unequal 282 cue salience across context cues as we counterbalanced the odor associations across mice, and observed 283 behavioral performance to be similar across cues for each animal. The present study expands upon our pre-284 vious work that demonstrated NAc units that distinguished between sets of motivationally-relevant stimuli 285 that had equal outcome predictive properties (Gmaz et al., 2018). Interestingly, a subset of these stimulus 286 set-discriminating units also showed sustained changes in firing during trial periods before the presentation 287 of the cue, suggesting that they encoded an abstract task feature not directly tied to stimulus presentation. 288 However, given that the outcome-predictive properties of the motivationally-relevant stimuli were fixed, and 289 the sets of stimuli were presented in separate continuous blocks, we were unable to further characterize this  While we use the term "context" throughout the text to refer to the first cues presented during a trial, we 299 acknowledge that since we only used one cue per context we cannot rule out the possibility that these units 300 encoded cue identity as opposed to context. However, even if NAc encoded context cue identity as opposed 301 to context, it would not affect our interpretation of the data through the gating model (see below), as NAc 302 units just need to be in distinct network states during these cues for it to be feasible. Additionally, while we 303 believe that the 2 s of active flushing of the odorant during the delay between cues is sufficient for the context 304 cue odor to disperse from the experimental apparatus, we cannot exclude the possibility that a proportion 305 of the odor remained and combined with the target cue to form a compound cue. However, we think this 306 is unlikely given the presence of single-unit and population-level correlates that only discriminated between 307 context cues during the delay period following context cue offset, as well as the correlates that discriminated 308 between the target cues, regardless of which context cue was previously presented. Finally, given that the 309 value of a trial and the behavior of the animal were yoked in our task, that is rewarded trials were followed 310 by licking behavior, and unrewarded trials were followed by the absence of licking, we cannot fully separate 311 contributions of value and behavior to our value-related components. However, this does not interfere with 312 the interpretation of our primary finding of context coding, or the linking of context-related activity during 313 the delay period with subsequent, behaviorally-relevant activity during the target cue period.

Relationship between context and value coding in the NAc
In addition to showing context coding using a biconditional task design specifically designed to control for 335 value, a further innovation in this study is the population-level analysis, which allowed us to show evidence 336 for co-existing activity patterns in the population-level representations, as well as a functional link between 337 context coding and subsequent value coding. To determine whether our behavioral predictive power was 338 arbitrary to any population-level component, we ran the binomial regression on all major components, and 339 found that only those containing significant context information had predictive utility during the delay pe-  Figure 6B). Indeed, the distinct occupancy in the pseudo-ensemble 356 space for each context cue signals that the context cues might be driving the NAc into separate network 357 states, setting an initial state for subsequent input-output flow of the target cue. In addition to the presence 358 of a context signal, this routing function would also require that the context signal is both functionally linked to the subsequent value signal, while simultaneously not interfering with value-related output. We 360 found support for the former from the observation that activity in this space during the delay period could

Data analysis
If mice learned the appropriate associations between context and target cues, then correct behavior on the 472 task would look like a high licking response rate to context-target pairings that are rewarded, and a low 473 licking response rate to context-target pairings that are unrewarded (see Figure 1B for hypothetical learned 474 trial structure). To assess whether mice learned to discriminate between rewarded and unrewarded odor pairs 475 we compared the mean proportion of rewarded and unrewarded trials that the animal made a lick response 476 for a given odor, relative to shuffling the trial type label for the mean proportion of trials licked for a given 477 session. Furthermore, to assess whether mice were not responding differently to individual target cues we 478 also compared the mean proportion of trials with a lick for each target cue, relative to shuffling target cue 479 identity for the mean proportion of trials licked for a given session.  To determine how context is signaled throughout the period between context-cue onset and target-cue onset,  to how linear discriminant analysis (LDA) does for a single task parameter ( Figure 6A). The dPCA method 511 first took the mean-subtracted, trial-averaged data for all units, and decomposed the population data matrix 512 into a sum of separate data matrices that each represented the contributions of a different aspect of the task, 513 and noise. These task features are inputs to the analysis set by the experimenter, and in the current experi-514 ment the task inputs were context cue type, target cue type, and the interaction between context and target 515 cue signifying cue value. The loss function of dPCA then used the ordinary least squares solution to find 516 the transformation that minimized the reconstruction error between the reconstructed data and the decon-517 structed data, with the deconstructed data matrix representing the contributions of a given task-parameter 518 to the full trial-averaged data. Dimensionality reduction was then achieved via eigendecomposition of the 519 covariance matrix of the transformed data, and the top components were stored. The explained variance of 520 each component was the fraction of the total variance in the trial-averaged data that could be attributed by 521 the reconstructed data for that component.

522
In our task, we sought to 'demix' the contributions of the context cue, target cue, and cue value across time, 523 projecting the data using components derived from these task variables, and visualized how the projected 524 neural trajectories evolved throughout a trial in this reduced dPCA space (see Figure 6B for hypothetical 525 trajectories along context and outcome axes). This analysis requires a sample size of 100 neurons to achieve 526 satisfactory demixing, and thus sessions within a mouse were pooled together to run on pseudo-ensembles. 527 Furthermore, given that dPCA does not constrain the components extracted for each parameter to be or-   showing either max normalized firing rates or firing rate differences for trial-averaged data for all eligible units, with unit identity sorted according to the peak value for the comparison of interest. From left-to-right shows data for M040, M111, M142 and M146. Red lines border context cue presentation, and black lines border target cue presentation. A: Firing rate profiles for units at 1 s pre-and post-context cue onset, sorted according to maximum value after context cue onset. B: Firing rate differences for units across context cues, sorted according to maximum difference during context cue presentation. C: Firing rate differences for units across context cues during the delay period, sorted according to maximum difference during the 1 s period preceding target cue presentation. D: Firing rate differences for units across target cues, sorted according to maximum difference during target cue presentation. E: Firing rate differences for units for rewarded and unrewarded trial types during target cue presentation, sorted according to maximum difference during target cue presentation.

C:
Prediction accuracy for the context-related delay component.

D:
Prediction accuracy for the context component.

E:
Prediction accuracy for the target component.

F: Prediction accuracy
for the value component.  Figure 7C) on the x-axis, and the trial-averaged projected activity in the value component (see Figure 7-supplement 1 F) on the y-axis. Note the relatively weak structure in the context-related delay axis, compared to M040. Red circles signal context-cue onset, cyan circles signal delay period 1 s after context cue offset, black circles signal 1 s after target cue onset. B: Predicting behavioral response for a given target cue based on projected activity along the context-related delay component at various timepoints. Red lines border context cue presentation, and black lines border target cue presentation. C: Predicting projected activity along the value-related axis after target cue onset for a given target cue (black circles from A) based on projected activity in the context-related axis at various timepoints. D: Control analysis predicting projected activity along the value-related axis after target cue onset for a given context cue based on projected activity in the context-related axis. E: Iteratively removing the top 10% of contributors to the context-related delay component and attempting to predict value-related activity as in C.  Figure 7C) on the x-axis, and the trial-averaged projected activity in the value component (see Figure 7-supplement 1 F) on the y-axis. Throughout the progression of a trial a separation is observed along the context axis, which then flows into the value axis after target cue presentation, similar to M040. Red circles signal context-cue onset, cyan circles signal delay period 1 s after context cue offset, black circles signal 1 s after target cue onset. B: Predicting behavioral response for a given target cue based on projected activity along the context-related delay component at various timepoints. Red lines border context cue presentation, and black lines border target cue presentation. C: Predicting projected activity along the value-related axis after target cue onset for a given target cue (black circles from A) based on projected activity in the context-related axis at various timepoints. D: Control analysis predicting projected activity along the value-related axis after target cue onset for a given context cue based on projected activity in the context-related axis. E: Iteratively removing the top 10% of contributors to the context-related delay component and attempting to predict value-related activity as in C.  Figure 7C) on the x-axis, and the trial-averaged projected activity in the value component (see Figure 7-supplement 1 F) on the y-axis. Note the relatively weak structure in the context-related delay axis, compared to M040. Red circles signal context-cue onset, cyan circles signal delay period 1 s after context cue offset, black circles signal 1 s after target cue onset. B: Predicting behavioral response for a given target cue based on projected activity along the context-related delay component at various timepoints. Red lines border context cue presentation, and black lines border target cue presentation. C: Predicting projected activity along the value-related axis after target cue onset for a given target cue (black circles from A) based on projected activity in the context-related axis at various timepoints. D: Control analysis predicting projected activity along the value-related axis after target cue onset for a given context cue based on projected activity in the context-related axis. E: Iteratively removing the top 10% of contributors to the context-related delay component and attempting to predict value-related activity as in C.