Abstract
An animal’s movements and internal state transitions generate an “internal backdrop” of activity that is dynamically modulated. During behavior, this internal backdrop interacts with signals arising from incoming sensory stimuli and may have a substantial impact on task-related computations, like those underlying decision-making. To understand the joint effects of internal backdrop and task-imposed variables, we measured neural activity across the entire dorsal cortex of task-performing mice. We characterized internal backdrop using multiple measures of self-generated parameters, e.g., pupil diameter, whisking and body motion. Surprisingly, internal backdrop dominated neural activity across the entire cortex, dwarfing task-related variables and even sensory stimuli. Single neurons in frontal cortex were likewise dominated by internal backdrop. A linear model allowed us to account for multiple dimensions of internal backdrop and uncover hidden signatures of task-related activity. The internal backdrop therefore captures a fundamental dimension of complex behavior that must be accounted for when studying decision-making.
Highlights
We imaged cortex-wide neural activity during auditory and visual decisions in mice.
Cortical activity was surprisingly similar during sensory-guided versus random decisions.
Movement and state variables vastly outperformed task variables in predicting neural activity.
A linear model revealed hidden task-related activity in single neurons.
Introduction
Complex behaviors are accompanied by dynamic responses across cortical circuits. During decisionmaking, cortical activity reflects multiple processes including sensory inputs (Freedman and Assad, 2006), selection and integration of behaviorally-relevant information (Roitman and Shadlen, 2002), estimation and anticipation of reward (Bouret and Sara, 2004; Pratt and Mizumori, 2001), choice confidence (Kepecs et al., 2008) and recent trial history (Abrahamyan et al., 2016; Bichot and Schall, 1999; Manoach et al., 2007; Morcos and Harvey, 2016).
Many decision-making studies have acknowledged the potential impact of decision-related movements on neural activity. Because neural activity in many decision-making structures is known to reflect movements, it is essential to separate the impact of movements from that of decision formation. Movements that are associated with decision reporting, such as head orientation (Erlich et al., 2011), eye movements (Roitman and Shadlen, 2002) or licking (Allen et al., 2017) are therefore often taken into account to ensure that the variable of concern cannot fully explain decision-related activity.
Beyond decision-reporting, other movements are known to strongly modulate neural activity. For instance, whisking is critical for texture discrimination and object localization in mice (Chen et al., 2013; O’Connor et al., 2013). Running modulates the gain of visual inputs (Niell and Stryker, 2010) and is critical for integration of visual motion (Ayaz et al., 2013; Saleem et al., 2013) and predictive coding (Keller et al., 2012). Some movements are also known to modulate neural activity in multiple cortical areas (Allen et al., 2017; Ferezou et al., 2007; Shimaoka et al., 2018). A potential explanation for these widespread effects is that certain movements reflect changes in the animal’s internal state, like increased arousal during running (Niell and Stryker, 2010). Indeed, internal state can account for changes in neural activity of different sensory areas that are as strong as responses to sensory stimuli (Crochet and Petersen, 2006; Okun et al., 2015; Pachitariu et al., 2015). Internal state is also reflected in pupil dilation, which is associated with increased excitability and desynchronization of cortical neurons (Reimer et al., 2014). Importantly, movements and pupil dilation have different effects on cortical activity (Vinck et al., 2015), suggesting that internal state is multidimensional and driven by a variety of internal sources (Harris and Thiele, 2011). The combined effects of movements and internal state transitions can therefore be thought of as an ‘internal backdrop’ that is critical to consider when analyzing neural responses.
Broad measures of the internal backdrop are rarely incorporated into analyses of decision-making activity. This is in part because most studies of cortical modulation due to internal state have been focused on sensory areas (Niell and Stryker, 2010; Okun et al., 2015; Pachitariu et al., 2015; Reimer et al., 2014; Vinck et al., 2015), the impact of internal backdrop on decision-making areas is therefore poorly understood. Since most studies also use only narrow measures of internal state, like pupil dilation or running speed, the combined importance of multiple movements on neural activity is also unclear. Broadening this scope has been challenging because it requires measuring many different movements together with cortex-wide neural activity in task-performing animals.
To assess the impact of internal backdrop on decision-making, we used widefield imaging to measure neural activity across the entire dorsal cortex of mice performing auditory or visual decisions, while tracking a wide array of movements and pupil diameter. To evaluate how cortical activity was affected by task-related or self-generated variables, we built a linear encoding model. Surprisingly, animal movements captured the majority of signal variability across the cortex, outpacing other variables such as sensory stimuli, choice and reward. Moreover, task-aligned movements had a significant impact on trial-averaged data and accounted for features commonly attributed to cognitive task demands, like evidence accumulation, urgency, or motor planning. These observations argue that that the internal backdrop has a much larger impact on neural activity during decision-making than previously appreciated.
Results
To measure cortex-wide neural dynamics during perceptual decisions, we trained mice to report the spatial position of an auditory or visual stimulus. Animals interacted with handles to initiate trials and lick spouts to report choices. Handles and spouts were controlled by servo motors to limit their accessibility to appropriate epochs in the task (Batista-Brito et al., 2017; Goard et al., 2016) (Fig. 1).
Stimuli were presented 0.875-1.125 s after handle touch and consisted of auditory or visual stimulus sequences. Each sequence consisted of two 0.6-s long presentations separated by a 0.5 s gap. After a 1 s delay, animals could report a decision and received a water reward when licking the spout that corresponded to the stimulus presentation side (Fig. 1B). Two distinct cohorts of animals were trained on either auditory or visual stimuli (but not both) and consequently achieved expert performance in the trained modality (Fig. 1C). Expert mice generalized the task timing, but not contingencies, to the untrained modality. This enabled us to measure cortical activity during either sensory-guided decisions or random guesses in the same animals (e.g., vision experts in blue were ~80% correct in visual trials but remained at novice level in auditory trials).
To study neural activity during decision making, we used a custom-built widefield macroscope (Ratzlaff and Grinvald, 1991) with a large 12.5 x 10.5 mm field of view (Fig. 1D). Mice were transgenic (Ai93; Emx-Cre; LSL-tTA; CaMKII-tTA), expressing the Ca2+-indicator GCaMP6f in excitatory neurons. Fluorescence was measured through the cleared skull (Guo et al., 2014). To avoid contamination from intrinsic signals (e.g., hemodynamic responses), we used excitation light at 473 nm to record Ca2+-dependent fluorescence and excitation light at 405 nm to record Ca2+-independent fluorescence (Lerner et al., 2015) on alternating frames. By rescaling and subtracting Ca2+-independent fluorescence we were then able to isolate a purely Ca2+-dependent signal (Allen et al., 2017; Wekselblatt et al., 2016). Using a combination of four brain landmarks, we aligned all data to the Allen Institute Common Coordinate Framework v3 (CCF, Figure S1). To confirm accurate CCF alignment, we performed retinotopic visual mapping (Marshel et al., 2011) in each animal and found high correspondence between functionally identified visual areas and the CCF (Fig. 1E).
Baseline-corrected fluorescence (ΔF/F) revealed significant modulation of neural activity across dorsal cortex during different episodes of the task (Fig. 1F, average response to visual trials, 22 sessions from 11 mice). While holding the handles, cortical activity was strongest in the somato-motor areas for hind- and forepaw (‘Hold’). The first visual stimulus caused robust activation of visual areas in posterior cortex and weaker responses in secondary motor cortex (M2) (‘Stim 1’). Activity in anterior cortex increased during stimulus presentation (‘Stim 2’) and the delay period (‘Delay’). When animals were allowed to respond, neural activity strongly increased across the entire dorsal cortex (‘Response’). A comparison of neural activity across conditions confirmed that neural activity was modulated by whether the stimulus was auditory vs. visual (Fig. 1G) and whether it was presented on the left vs. right (Fig. 1H). In both cases, differences across conditions were mainly restricted to primary and secondary visual areas. Activity in more anterior structures was nearly identical across conditions. This similarity may be because areas for motor planning are less lateralized (Li et al., 2015) and exhibit mixed tuning for both decision sides and modalities. Surprisingly, a comparison of neural activity in novice vs. expert decisions revealed almost no difference between the two trial categories (Fig. 1I). This similarity across the entire dorsal cortex was evident despite markedly different behavioral performance (Fig. 1C), suggesting that large parts of cortical activity did not distinguish informed decisions vs. guesses.
To better understand how behavior related to neural activity, we built a linear model. The model was designed to account for the fluorescence of each pixel via any time-varying combination of 23 possible behavioral variables, while at the same time preventing overfitting of the dataset. The predictor matrix (i.e., the design matrix) was constructed from sets of regressors, where each set was locked to a different sensory or motor event (Fig 2A, Steps 1-2). The regressors in each set formed a temporal sequence of pulses to allow the linear reconstruction of neural activity over time, relative to event onset. For sensory events, each regressor set contained regressors locked to each frame from stimulus onset until the end of the trial (‘Post-event’, blue). For motor events, regressors spanned a fixed duration of 0.5 s before until 1 s after event onset (‘Peri-event’, green). To account for cognitive task variables with no defined event onset, such as animal success in a given trial, we used regressor sets that spanned the entire trial (‘Whole trial’, black). We also included non-binary regressors, such as data from a piezo sensor underneath the animal to track hindpaw movements (‘Analog’, orange). Each behavioral variable was thus represented by a set of specific regressors. The model was fit to the data using ridge regression. Each regressor was assigned a β-weight, indicating how strongly that single regressor was linearly related to the neural activity in a given pixel (Fig. 2A, Step 3). To reduce computational cost, we used singular value decomposition (SVD) on the imaging data and predicted changes in data dimensions instead of individual pixels. Multiplying the full design matrix with the corresponding β-weights results in a model reconstruction of the imaging data (Fig. 2A, Step 4).
In addition to traditional behavioral measurements (such as lick times), we leveraged video data from two cameras, observing the animal’s face and body. These data were used in two ways: first, we used video data to estimate variables known to modulate neural activity, such as whisking and pupil size (Fig. 2B). Second, we used SVD to extract the 200 highest-variance video dimensions and used them as analog regressors to provide additional information on animal movements that we could not track otherwise or had not previously considered (Powell et al., 2015; Stringer et al., 2018). To ensure that video regressors did not overlap with other model regressors, we used a QR decomposition to orthogonalize video regressors from already-described model variables.
Cortical maps of β-weights confirmed expected features of the data, matching known functions of visual and motor cortices. For example, pixel weights located in left V1 were highly positive in response to a rightward visual stimulus (Fig. 2C, left); pixels located in left somatosensory and primary motor forelimb area were highly positive when the right handle was grabbed (Fig. 2C, right). To evaluate how well the model captured neural activity at different cortical locations, we computed the 10-fold cross-validated R2 for the full model at different epochs during the trial (Fig. 2D). While some areas were particularly well predicted in specific trial epochs (e.g. V1 during stimulus presentation), there was high predictive power throughout the cortex during all epochs of the trial. For all data (‘Whole trial’), the model predicted 37.8 ± 1.2% of all variance across cortex.
We next sought to address which particular model variables were most critical for its success. The simplest way to do this is to fit a model consisting of a single variable, and ask how well it predicts the data. We therefore computed cross-validated R2 values, over all data, for each single-variable model separately. As shown in the light green bars in Fig. 2E, many variables could individually predict a large amount of variance in the imaging data. However, model variables that were associated with animal movement or internal state (‘Movement’) contained particularly high predictive power compared to task-related variables (‘Task’). This suggests that these movement and state variables, which reflect the internal backdrop, are particularly important for predicting cortical activity. Interestingly, video was the most predictive model variable, explaining ~20% of all variance. By projecting β-weights of the video dimension regressors back into video pixel space, we found that specific areas in the animal’s face, especially the jaw, were particularly important for predicting multiple dimensions of cortical activity (Figure S2).
While many model variables contained high predictive power, it is critical to quantify the amount of unique, non-redundant information contained in each variable. For instance, while licking had high predictive power, it could also be strongly correlated to other task variables such as choice, since licking occurs at roughly the same time in each trial. It might therefore contain little unique information that is not present in other model variables. If true, then removing lick regressors from the model should not affect the model’s overall predictive power since other variables could predict the cortical data equally well.
To isolate the predictive power that is unique to each variable, we created reduced models in which we temporally shuffled the regressor set of a given variable, and compared these reduced models to the full model. The resulting loss of predictive power (ΔR2) with shuffling provides a conservative estimate of the amount of unique information contained in that variable. Pixel-wise ΔR2 maps showed that unique information was highly spatially localized (Fig. 2F, see Figure S3 for other model variables) and matched the cortical areas where β-weights were highest (Fig. 2C).
This analysis revealed considerable variability in how essential each variable was to the model (Fig. 2E, dark green bars). A good example is the ‘time’ variable, a regressor set designed to capture signal deviations that always occur at the same time in each trial (similar to an average over all trials). Although the time-only model captured considerable variance (light green bar), eliminating it had a negligible effect on the model’s predictive power (dark green bar). This is because other task variables, such as choice or stimulus regressors, could capture time-varying modulation equally well. In contrast, movement variables contained large amounts of unique information. Notably, the video regressors contained a high degree of both overall and unique information, substantially outperforming all task-related model variables (Fig. 2E, both dark and light green bars corresponding to ‘Video’ are large).
To directly compare the impact of movement and internal state vs. task variables, we assigned each variable into either a ‘movement’ or ‘task’ category (Fig. 2G). The resulting movement model contained a very high amount of unique information, more than 5-fold as much as the task model (ΔR2Motor = 19.54 ± 0.8% vs. ΔR2Task = 3.43 ± 0.2%; dark green bars). This stark difference was even more pronounced in cortical maps of unique explained variance. These maps revealed that the movement model was far more predictive than the task model throughout the entire cortex (Fig. 2H). The same result was also clearly visible when comparing the accuracy of single-trial reconstructions in different cortical areas, including V1 (Fig. 2I). These results strongly argue that cortical activity is much better explained by the internal backdrop than by cognitive or sensory task variables.
Importantly, the large fraction of variance that is uniquely explained by the movement model is, by definition, orthogonal to the temporal structure of the task. This activity therefore cannot be captured when averaging over trials. However, there was also a significant amount of explained variance that was shared between the movement and task model (R2Shared = 14.86 ± 0.9%; Fig. 2G, light green bars same for task and movement), indicating that many features that are visible in a trial average may be either due to task variables or to certain movements that are task-aligned (e.g., licking at a specific time in every trial). To assess which movement variables were task-aligned, for each movement variable we computed how much explained variance influenced the trial average (‘task shared’ variance) and how much was trial-by-trial variability that averaged out across trials (‘task independent’ variance). Surprisingly, almost all movement regressors contained a large amount of explanatory power that was shared with task variables (Fig. 3A, light blue bars), indicating that each may have a considerable impact on the trial average.
To better understand how movement and task variables influenced the trial average, we used the full model to reconstruct the imaging data and computed trial averages for different cortical areas (Fig. 3B, top). As expected, the model closely reconstructed the imaging data. We then split the model prediction into two parts, based on movement and task variables, without re-fitting. This provides the best available estimate of the relative contribution of all movement variables (blue traces) and task variables (green traces) on the trial average. In V1 (left), baseline activity was mainly reconstructed with movement variables whereas activity after visual stimulation was well explained by task variables. In M2 (right), baseline activity was also mostly explained by movement whereas later activity was explained by a combination of both groups. Separating trial averages into task and movement components therefore allowed us to assess which features of trial-averaged activity are likely to be truly task-related when taking animal movements and state into account.
When we reconstructed trial-averaged activity across cortex based on task variables alone, we found several areas that were substantially task-modulated. Shortly after stimulus onset, task modulation was highest in the visual areas (Fig. 3C, ‘Stim1’). During subsequent visual stimulation and the delay (‘Stim2’ & ‘Delay’), additional modulation developed along the midline, especially in retrosplenial cortex but also parts of M2 and facial somatosensory cortex. To summarize these effects, we summed absolute task modulation over the whole trial duration (Fig. 3D left). We then computed a task modulation index (TI) to identify areas that were most strongly affected by task vs. movement variables (Fig. 3E). The TI was defined as the difference between absolute task and movement modulation (Fig. 3D, left minus right) divided by their sum, rescaled between 0 and 1. High TI values indicate stronger trial-average modulation due to task variables, while low values indicate a strong movement contribution. The TI revealed multiple cortical areas with considerable relative task modulation. These areas are potential candidates for involvement in decisionmaking, and included primary and secondary visual cortex, facial somatosensory cortex and specific subareas within medial and anterior M2.
One of these identified areas was the anterior lateral motor cortex (ALM; circled in Fig. 3E). This area was of particular interest because recent work has identified ALM as causally involved in comparable decisionmaking tasks (Chen et al., 2017; Li et al., 2015). We therefore used two-photon (2p) imaging to investigate ALM more closely and determine whether activity of individual ALM neurons is strongly task-modulated (Fig. 4A). This was also particularly important because widefield imaging mainly reflects average activity across many neural structures in superficial layers (Allen et al., 2017). It was therefore not clear whether the importance of animal movement and state would be equally strong on a single-cell level.
In agreement with earlier reports (Li et al., 2015), many individual ALM neurons were highly active during licks to the contralateral spout (Fig. 4B, top). Other neurons exhibited modulation that was aligned to other task events, such as grabbing the handles, or showed mixed tuning (middle). Some neurons exhibited no modulation in their trial averages (‘untuned’, bottom).
We then applied the exact same linear model as above to the single-cell 2p data. In the single-cell data, as in the widefield data, individual movement variables strongly outperformed task variables (Fig. 4C, light green bars). Given the known causal role of ALM for licking (Li et al., 2015), one might expect that licking would be a particularly important variable to predict ALM activity. Instead, in agreement with our widefield results, we found that almost all movement variables contained considerable information and video-based regressors were far more powerful than any other model variable.
Many movement variables also contained a large amount of unique information (ΔR2, dark green bars). In contrast, task variables explained much less of the overall variance across neurons and contained very little unique explanatory power. Again, this strong difference between movement and task variables became clearer still when comparing the variables by group (Fig. 4D). The full model’s predicted variance was almost entirely matched by the movement model (R2full = 28.85 ± 0.7%; R2Motor = 28.13 ± 0.7%; both light + dark green bars), whereas the task model accounted for much less variance and contained very little unique information (R2Task = 8.74 ± 0.6%, both bars; ΔR2Task = 0.7 ± 0.003%, dark green bar). These effects were not driven by outliers but found in almost every recorded neuron. Across all neurons, a movement-only model performed almost identically to the full model in predicting single-cell variance (light blue trace overlies red trace). For all cells, a large portion of variance was also uniquely explained by the movement model (dark blue trace). Conversely, the task model predicted less variance in most neurons (light green trace) and accounted for any significant variance at all in only about half of all cells. Very few cells contained variance that was uniquely explained by the task model (dark green trace). These results demonstrate that the internal backdrop is of key importance for predicting activity of individual neurons, just as for widefield population data. Moreover, many neurons that would usually be considered untuned due to their lack of modulation in the trial average could still be explained and rendered interpretable by movement variables.
However, the dominance of the backdrop in single cell activity is also worrying, as it implies that many neural response features that appear to be task-related might in fact be due to movements or state transitions that are temporally aligned with the task. It is important to note that this concern is limited to variance that is shared between movement and task variables (light green bars). The majority of movement-explained variance is unique to the movement model, and therefore orthogonal to the task. That is, the majority of the internal backdrop accounts for ‘spontaneous’ trial-by-trial variability that is removed when averaging over trials.
To determine whether features in the trial average were best explained by task or movement variables, we repeated the analysis from Fig. 3 and reconstructed trial-averaged data for each neuron based on the full model. We then computed the absolute sum of all deviations in the trial average that were either due to movement or due to task variables. As shown in Fig. 4F, the trial average of many neurons was still appreciably modulated by task variables. Using the TI described above, we could then isolate neurons that were strongly modulated by either movement or task variables. For neurons with a low TI, the trial average was almost exclusively modulated by movement variables, including average features that could easily be confused with stimulus-evoked responses or evidence integration signals (Fig. 4G, blue box). Conversely, neurons with a high TI were strongly modulated by task variables, thus identifying individual neurons whose trial average was strongly affected by the behavioral task instead of animal movement or state (green box).
Importantly, this distinction would not have been visible by examination of the trial average alone. The movement-driven example cell exhibited many average features that might have appeared to be responses to the stimuli, and a late rise in firing is reminiscent of decision formation. The model argues that these explanations are inaccurate. On the other hand, in the task-driven example cell, the rising activity might have appeared closely linked to licking, but was found to be mainly driven by task variables. Our model-driven approach therefore provided much more detailed insight into each neuron’s tuning preference and enabled us to isolate single neurons that were truly task-modulated when taking internal backdrop into account.
Discussion
Our results demonstrate that activity across dorsal cortex is dominated by the internal backdrop. By including a wide array of self-generated movements and pupil dilation into our linear model, we were able to take these variables into account and predict neural activity with high accuracy. The dominance of the internal backdrop was observed in both cortex-wide population activity and single neuron data. By quantifying the modulation of trial-averaged data through movement and task variables, we could also identify cortical areas or individual neurons that were most affected by task variables and thus reveal the spatiotemporal dynamics of truly task-related activity.
Cortical activity is widely invariant to animal expertise
By training animals on either visual or auditory stimuli but testing them with both modalities, we could compare neural activity during sensory-guided decisions (expert) versus random guesses (novice) in the same animal. This allowed us to separate neural activity that was due to stimulus presentation or movement from informed utilization of sensory inputs. Surprisingly, though animals understood one contingency and were at chance for the other, cortical responses were highly similar for expert and novice decisions across the many activated areas in dorsal cortex. This suggests that most trial-averaged activity we observed across cortex does not reflect the transformation of sensory evidence to guide animals’ choices, but instead reflects responses closely related to sensory input, movements and state changes. This might also explain the discrepancy between studies that have shown widespread task-related activity in many different brain areas (Allen et al., 2017; Goard et al., 2016; Merre et al., 2017), and studies in which systematic inactivation of many cortical areas found no behavioral effects outside of primary sensory and secondary motor cortex (Allen et al., 2017; Guo et al., 2014).
More subtle decision-related activity might be overshadowed by such cortex-wide modulations. But when we separated movement-from task-related activity, cortical responses for expert and novice decisions remained similar (Figure S4). There are at least two potential reasons for this. Sensory-guided decisions may be encoded by specific sub-populations of cortical neurons that are intermixed within more diverse local networks (Li et al., 2015); or, they may exhibit extensive mixed selectivity (Park et al., 2014; Raposo et al., 2014). Either scenario would obscure the impact of relevant neurons on the population average that is reflected in widefield signals. While this issue is best addressed by measuring individual neurons locally, cell-type-specific widefield imaging could also be used to measure activity of neuronal subtypes across the cortex (Allen et al., 2017; Chan et al., 2017). By measuring from layer- or projection-specific subpopulations instead of all excitatory neurons, this approach may provide a more detailed view of large-scale cortical information processing. It may also help to alleviate an important caveat of widefield imaging: its bias towards superficial layers (Allen et al., 2017), which may obscure more task-related neural activity in deeper layers. While our 2p imaging results revealed individual neurons with interesting task modulation, recordings in deeper layers might be even more informative to find decision-related activity that was not seen with widefield imaging.
Another explanation for the lack of cortical modulation during informed decisions could be the behavioral task design. Our task allowed for fast training (2-4 weeks), robust behavioral performance and comparison of expert vs. novice decisions. However, some cortical areas may be more important in a different setting, like learning a new behavior (Chen et al., 2013; Kawai et al., 2015; Merre et al., 2017), during tasks that require temporal accumulation of noisy sensory evidence (Erlich et al., 2011; Licata et al., 2017) or during spatial navigation (Harvey et al., 2012; Pinto et al., 2018). If true, the methods and analyses that we describe here might be critical to detect additional cortical involvement in other behavioral paradigms.
One of the non-sensory areas that we identified as task-modulated was ALM, which has been shown to be involved in planning and execution of motor output in comparable tasks to ours (Guo et al., 2014; Li et al., 2015). However, it remains unclear whether ALM is involved in evidence integration, or equally driven by sensory-guided versus random decisions. Our recordings show that many ALM neurons were mostly driven by internal backdrop whereas unique task-modulation was present but sparse. Furthermore, neural activity in about half of all recorded ALM neurons was modulated by spontaneous movements but completely orthogonal to the task. The master decision circuitry in our task may therefore lie mostly in subcortical targets like the dorsal striatum (Wang et al., 2018b), hippocampus (Aronov et al., 2017; Merre et al., 2017) or thalamus (Schmitt et al., 2017) and subsequently be relayed to ALM to create or sustain a motor plan. To address these questions, future studies should therefore combine more complex paradigms or subcortical recordings with close monitoring of animal movements and behavioral controls to disentangle differences between sensory-guided versus random decisions.
Cortical activity is dominated by the internal backdrop
Earlier studies that reported a large impact of the internal backdrop on cortical activity mostly focused on spontaneous behaviors like running on a wheel, where internal states are highly variable (Niell and Stryker, 2010; Vinck et al., 2015). One might assume that the internal state of task-performing animals is more constrained: animals are well-trained to the timing and contingencies of the task and perform the same behavior consistently over long periods of time, which might keep them in a less variable, attentive state (Harris and Thiele, 2011). This view is also supported by a reduction of trial-to-trial variance of cortical responses over the course of learning as behavioral performance increases (Ni et al., 2018). Our task design aimed to promote such a stable internal state by allowing mice to self-initiate trials, thereby ensuring that they were aware of an upcoming trial and were willing to perform the task. Despite this, we found that the large majority of cortical activity was dominated by animal movements and internal state changes instead of the behavioral task.
The profound impact of the internal backdrop has important implications when analyzing neural dynamics during decision-making. Although task variables alone explained a considerable amount of variance in cortical data, only ~3% was uniquely explained by the task. Most neural dynamics that might have been considered task-related were therefore ambiguous and equally well explained by internal dynamics or movements. The prevalence of movement modulation across cortex may explain why task-related activity has been observed in a variety of cortical areas (Allen et al., 2017; Goard et al., 2016; Merre et al., 2017) and highlights the importance of additional controls like neural inactivation to test the relevance of a given area for decision-making.
Even in ALM, which had been identified as causal for behavior (Chen et al., 2017; Li et al., 2015), much of the observed single-cell dynamics may be due to ongoing movements. Many of our ALM neurons were strongly modulated in their trial average and exhibited dynamics that seemed reminiscent of evidence accumulation or urgency signals; nonetheless, their activity was often fully explained by movement variables (Fig. 4G). This argues that even when focusing on areas that have been identified with neural inactivation, much of the observed single-cell dynamics may be due to internal backdrop. To address this issue, our linear model could be leveraged to isolate neurons that are best explained by task variables, when taking movements into account. Careful quantification of animal behavior can therefore be utilized to uncover previously obscured task-related neural dynamics.
The large and widespread impact of movements may appear to be in contrast with earlier decision-making studies that mostly found a weak relation between neural activity and movements (Allen et al., 2017; Erlich et al., 2011). The main difference between these earlier findings and our current study is most likely the number of parameters used to describe animal behavior. Our model included a wide variety of different movements and we found that most of them contributed a substantial amount of unique predictive power (Fig. 2E). This means that each variable had a distinct impact on cortical activity that cannot be inferred from other movements. While individual movement variables were indeed less informative than the task model, combining all variables into a larger model led to a pronounced increase in predictive power (Fig. 2G). This highlights the importance of tracking different sources for the internal backdrop when assessing their cumulative impact on cortical activity. Notably, our results are still a lower bound for how well neural activity can be predicted from observing animal behavior. Using more sophisticated machine vision analysis (Mathis et al., 2018) or additional sensors (Bollu et al., 2018) could result in far more detailed information on animal movement or state changes. Such information may enable dissociating effects of state change from specific motor activity, and a deeper understanding of the physiological mechanisms through which different components of the internal backdrop modulate cortical activity.
Notably, using video data alone captured a significant amount of neural variance. This is in agreement with recent work that used PCA to extract facial features from video data, explaining large amounts of variance in dense recordings of many individual neurons in V1 and multiple other brain regions (Stringer et al., 2018). It is therefore possible to extract a surprisingly large amount of information on the animal’s state by recording video data and using well-established linear analysis. Given the feasibility of this approach, we believe it should become standard practice to acquire video data during behavioral experiments.
Finally, the prominence of the internal backdrop raises the question of its role in cortical information processing. Historically, non-task related activity has often been described as random internal noise that is reduced when performing a behavioral task. Yet, this view seems largely incompatible with the tight coupling of ‘spontaneous’ activity to the animal movements and internal state that we describe here. Some earlier work in sensory areas has hypothesized that integration of specific motor feedback is advantageous for sensory processing, like the integration of running in visual areas for motion perception or predictive coding (Ayaz et al., 2013; Keller et al., 2012; Saleem et al., 2013). However, just as auditory and somatosensory cortices were also found to be modulated by running (Ayaz et al., 2018; Schneider et al., 2014; Shimaoka et al., 2018) our results may indicate that this concept is not specific to sensory processing but holds true on a much larger scale. It is not yet clear what purpose this large and widespread modulation serves. As previously speculated, it may relate to cancelling or tracking self-motion (Sommer and Wurtz, 2008), gating of inputs (Schmitt et al., 2017); biasing circuits toward receptive ‘ON’ states (Engel et al., 2016), or permitting distributed associational learning (Engel et al., 2015; Wang et al., 2018a). Every cortical area, regardless of its specific computation, plays a potentially important role in case of unexpected feedback. Global transmission of the internal backdrop might therefore be a key component to broadcast behavioral context and flexibly adapt information processing in local cortical networks.
Methods
Animal Subjects
The Cold Spring Harbor Laboratory Animal Care and Use Committee approved all animal procedures and experiments. Experiments were conducted with male mice from the ages of 6-25 weeks. All mouse strains were of C57BL/6J background and purchased from Jackson Laboratory. Four transgenic strains were crossed to create the transgenic mice used for imaging: Emx-Cre (JAX 005628), LSL-tTA (JAX 008600), CaMK2α-tTA (JAX 003010) and Ai93 (JAX 024103). All trained mice were housed in groups of two or more under an inverted 12:12-h light-dark regime and trained during their active dark cycle.
Surgical procedures
All surgeries were performed under 1-2% isoflurane in oxygen anesthesia. After induction of anesthesia, 1.2 mg/kg of Meloxicam was injected subcutaneously and Lidocaine ointment was topically applied to the skin. After making a medial incision, the skin was pushed to the side and fixed in position with tissue adhesive (Vetbond, 3M). We then created an outer wall using dental cement (Ortho-Jet, Lang Dental) while leaving as much of the skull exposed as possible. A circular headbar was attached to the dental cement. For widefield imaging, after carefully cleaning the exposed skull we applied a layer of cyanoacrylate (Zap-A-Gap CA+, Pacer technology) to clear the bone. After the cyanoacrylate was cured, cortical blood vessels were clearly visible.
For two photon imaging, instead of clearing the skull, we performed a circular craniotomy using a biopsy punch (diameter: 3 mm), centered 1.5 mm mediolateral and 1.5 mm anterior to bregma. We then positioned a circular window over the cortex and sealed the remaining gap between the bone and glass with tissue glue. The window was then secured to the skull using C&B Metabond (Parkell) and the remaining exposed skull was sealed using dental cement. After surgery, animals were kept on a heating mat for recovery and a daily dose of analgesia (1.2 mg/kg Meloxicam) and antibiotics (2.3 mg/kg Enroflaxin) was administered subcutaneously for at least 3 days.
Behavior
The behavioral setup was based on an Arduino-controlled finite state machine (Bpod r0.5, Sanworks) and custom Matlab code (2015b, Mathworks) running on a linux PC. Servo motors and visual stimuli were controlled by microcontrollers (Teensy 3.2, PJRC) running custom code. Eleven mice were trained on a delayed 2-alternative forced choice (2AFC) spatial discrimination task. Mice initiated trials by touching either of two handles with their forepaws. Handles were mounted on servo motors and were moved out of reach between trials. After one second of holding a handle, sensory stimuli were presented. Sensory stimuli consisted of either a sequence of auditory clicks, or repeated presentation of a visual moving bar (3 repetitions, 200 ms each). Auditory stimuli were presented from either a left or right speaker, and visual stimuli were presented on one of two small LED displays on the left or right side. The sensory stimulus was presented for 600 ms, there was a 500 ms pause with no stimulus, and then the stimulus was repeated for another 600 ms. The 500 ms inter-stimulus period was added to allow probing neural dynamics during potential decision formation in the absence of sensory stimuli. After the second stimulus, a 1000 ms delay was imposed, then servo motors moved two lick spouts into close proximity of the animal’s mouth. If the animal licked to the spout on the same side as the stimulus, he was rewarded with a drop of water. After one spout was contacted, the other spout was moved out of reach to force the animal to commit to its initial decision.
Animals were trained over the course of approximately 30 days. After 2-3 days of restricted water access, animals were head-fixed and received water in the setup. Water was given by presenting a sensory stimulus, subsequently moving the correct spout close to the animal and dispensing water automatically. After several habituation sessions, animals had to touch the handles to trigger the stimulus presentation. Once animals reliably reached for the handles, the required touch duration was gradually increased up to 1 second. Lastly, the probability for fully self-performed trials, at which both spouts were moved towards the animal after stimulus presentation, was gradually increased until animals reached stable detection performance levels of 80% or higher.
Each animal was trained exclusively on a single modality (6 visual animals, 5 auditory). Only during imaging sessions were trials of the untrained modality presented as well. This allowed us to compare neural activity on trials where animals performed sensory guided decision-making versus trials where animal decisions were random. To ensure that detection performance was not overly affected by presentation of the untrained modality, the trained modality was presented in 75% and the untrained modality in 25% of all trials.
Behavioral sensors
We used information from several sensors in the behavioral setup to measure different aspects of animal movement. The handles detected contact with the animal’s forepaws, and the lick spouts detected contact with the tongue. An additional piezo sensor below the animal’s trunk was used to detect hindpaw and whole-body movements.
Video monitoring
Two webcams (C920 and B920, Logitech) were used to monitor animal movements. Cameras were positioned to capture the animal’s face (side view) and the body (bottom view). To target particular behavioral variables of interest, we defined subregions of the video which were then examined in more detail. These included a region surrounding the eye, the whisker pad and the nose. From the eye region we extracted changes in pupil diameter using custom Matlab code. To analyze whisker movements, we computed the absolute temporal derivative averaged over the entire whisker pad. The resulting 1-D trace was then normalized and thresholded at 2 standard deviations to extract whisking events. Based on whisking events we created a binary peri-event design matrix that was included in the linear model (see below). The same approach was used for the nose.
Widefield imaging
Widefield imaging was done using an inverted tandem-lens macroscope (Grinvald et al., 1991) in combination with an sCMOS camera (Edge 5.5, PCO) running at 60 fps. The top lens had a focal length of 105 mm (DC-Nikkor, Nikon) and the bottom lens 85 mm (85M-S, Rokinon), resulting in a magnification of 1.24x. The total field of view was 12.5 x 10.5 mm and the image resolution was 640 x 540 pixels after 4x spatial binning (spatial resolution: ~20μm/pixel). To capture GCaMP fluorescence, a 500 nm long-pass filter (ET500lp, Chroma) was placed in front of the camera. Excitation light was projected on the cortical surface using a 495 nm long-pass dichroic mirror (T495lpxr, Chroma) placed between the two macro lenses. The excitation light was generated by a collimated blue LED (470 nm, M470L3, Thorlabs) and a collimated violet LED (405 nm, M405L3, Thorlabs) that were coupled into the same excitation path using a dichroic mirror (#87-063, Edmund optics). We alternated illumination between the two LEDs from frame to frame, resulting in one set of frames with blue and the other with violet excitation at 30 fps each. Excitation of GCaMP at 405 nm results in non-calcium dependent fluorescence (Lerner et al., 2015), allowing us to isolate the true calcium-dependent signal by rescaling and subtracting frames with violet illumination from the preceding frames with blue illumination (Allen et al., 2017). All subsequent analysis was based on this differential signal at 30 fps.
Two-photon imaging
Two-photon imaging was performed in 2 mice (visual experts) with a resonant-scanning two-photon microscope (Sutter Instruments, Movable Objective Microscope, configured with the “Janelia” option for collection optics), a Ti:Sapphire femtosecond pulsed laser (Ultra II, Coherent Inc.), and a 16X 0.8 NA objective (Nikon Instruments). Images were acquired at 30.9 Hz with an excitation wavelength of 930 nm. All focal planes were between 140-150 μm below the pial surface. The objective height was manually adjusted during recording in 1-2 μm increments as often as necessary to maintain the same focal plane.
Images were processed using Suite2P (Pachitariu et al., 2016) with model-based background subtraction. Sessions yielded 63-126 neurons each, for 271-529 behavioral trials.
Preprocessing of neural data
To analyze widefield data, we used SVD to compute the 200 highest-variance dimensions. These dimensions accounted for at least 88% of the total variance in the data. Using 500 dimensions accounted for little additional variance (~0.15%), indicating that additional dimensions were mostly capturing recording noise. SVD returns ‘spatial components’ U (of size pixels x components), ‘temporal components’ VT (of size components x frames) and singular values S (of size components x components) to scale components to match the original data. To reduce computational cost, all subsequent analysis was performed on the product SVT. Results of analyses on SVT were later multiplied with U, to recover results for the original pixel space. All widefield data was rigidly aligned to the Allen Common Coordinate Framework v3, using four anatomical landmarks: the left, center, and right points where anterior cortex meets the olfactory bulbs and the medial point at the base of retrosplenial cortex.
To analyze 2p data, Suite2P was used to perform rigid motion correction on the image stack, identify neurons, extract their fluorescence, and correct for neuropil contamination (Pachitariu et al., 2016). ΔF/F traces were produced using the method of Jia et al. (Jia et al., 2011), skipping the final filtering step. Using these traces, we produced a matrix of size neurons x time, and treated this similarly to SVT above. Finally, we confirmed imaging stability by examining the average firing rate of neurons over trials. If this varied substantially at the beginning or end of a session, the unstable portion was discarded.
To compute trial-averages, imaging data were double-aligned to the time when animals initiated a trial and to the stimulus onset. After alignment, single trials consisted of 1.8 s of baseline, 0.83 s of handle touch and 3.3 s following stimulus onset. The randomized additional interval between initiation and stimulus onset (0 – 0.25 s) was discarded in each trial and the resulting trials of equal length were averaged together.
Linear model
The linear model was constructed by combining multiple sets of regressors into a design matrix, to capture signal modulation by different task or motor events (Fig. 2A). Each regressor set was based on a single binary vector that contained a pulse at the time of the relevant event. To produce the regressor set, we repeated this vector with each copy being shifted in time by one frame relative to the original. For sensory stimuli, we created post-event regressor sets spanning all frames from stimulus onset until the end of the trial. For motor events like licking or whisking, we created peri-event regressor sets that spanned the frames from 0.5 s before until 1 s after each event. Lastly, we created whole-trial regressors, covering each frame in a given trial. Whole-trial regressors were aligned to stimulus onset and contained information about decision variables, such as animal choice or whether a given trial was rewarded. The model also contained several analog (non-binary) regressors, such as 1-D regressors for pupil diameter. To capture animal movements, we used SVD to compute the 200 highest dimensions of video information in both cameras. SVD was performed either on the raw video data (‘video’) or the absolute temporal derivative (‘motion’). SVD analysis of behavioral video was the same as for the widefield data, and we used the product SVT of temporal components and singular values as analog regressors in the linear model. We did not use lagged versions of the analog regressors, including the video regressors.
To use video data regressors, it was important to ensure that they would not contain explanatory power from other model variables like licking and whisking that can also be inferred from video data. To accomplish this, we first created a reduced design matrix Xr, containing all movement regressors as well as times when spouts or handles were moving. Xr was ordered so that the motion and video columns were at the end. We then performed a QR decomposition of Xr (Mumford et al., 2015). The QR decomposition of a matrix A is A = QR, where Q is an orthonormal matrix and R is upper triangular. Columns 1 to j of Q therefore span the same space as columns 1 to j of A, but all the columns are orthogonal to one another. Finally, we replaced the motion and video columns of the full design matrix X with the corresponding columns of Q. This allowed the model to improve the fit to the data using any unique contributions of the motion and video regressors, while ensuring that the weights given to other regressors were not altered.
When a design matrix has columns that are close to linearly dependent (multicollinear), model fits are not reliable. To test for this, we devised a novel method we call “cumulative subspace angles.” The idea is that for each column of the design matrix, we wish to know how far it lies from the space spanned by the previous columns (note that pairwise angles do not suffice to determine multicollinearity). Our method works as follows: (1) the columns of the matrix were normalized to unit magnitude, (2) a QR decomposition of X was performed, (3) the absolute value of the elements along the diagonal of R were examined. Each of these values is the absolute dot product of the original vector with the same vector orthogonalized relative to all previous vectors. The values range from zero to one, where zero indicates complete degeneracy and one indicates no multicollinearity at all. Over all experiments, the most collinear regressor received a 0.26, indicating that it was 15° from the space of all other regressors. The average value was 0.84, corresponding to a mean angle of 57°.
To avoid overfitting, the model was fit using ridge regression. The regularization penalty was estimated separately for each column of the widefield data using marginal maximum likelihood estimation (Karabatsos, 2017) with minor modifications that reduced numerical instability for large regularization parameters.
Variance analysis
Explained variance (R2) was obtained using 10-fold cross-validation. To compute all explained variance by individual model variables, we created reduced models where all regressors that did not correspond to a given variable were shuffled in time. The explained variance by each reduced model revealed the maximum potential predictive power of the corresponding model variable.
To assess unique explained variance by individual variables, we created reduced models for each variable where only the corresponding regressor set was shuffled in time. The difference in explained variance between the full and the reduced model yielded the unique contribution ΔR2 of that model variable. The same approach was used to compute unique contributions for groups of variables, i.e., ‘movement’ or ‘task’. Here, all variables that corresponded to a given group were shuffled together.
To compute the ‘task-shared’ or ‘task-independent’ explained variance for each movement variable, we created reduced models where all movement variables were shuffled in time. This task-only model was then compared to other reduced models where all movement variables but one were shuffled. The difference between the task-only model and this model yielded the task-independent contribution of that movement variable. The task-shared contribution was the difference between the total variance explained by a given variable and its task-independent contribution.
Model-based reconstruction of trial-averages
Reconstructed trial averages (Fig. 3 & 4) were produced by fitting the full model and averaging the reconstructed data over all trials. To split the model into the respective contributions of movement and task variables, we reconstructed the data based on either the movement or task variables alone (using the weights as in the full model) and averaging over all trials. To evaluate the relative impact of task variables on the trial average, we computed a task modulation index (TI), defined as where ΔTask and ΔMovement denote the mean absolute deviation of the reconstructed trial average based on either task or movement variables. The TI ranges from 0 (fully motor related) to 1 (fully task related). Intermediate values denote a mixed contribution of task and motor regressors to the trial-average.
Model-based video reconstruction
To better understand how the video related to the neural data, we analyzed the portion of the β-weight matrix that corresponded to the video regressors. This portion of the matrix was projected back up into the original video space. The result was of size p x d, where p is the number of video pixels (153,600) and d is the number of dimensions of the widefield data (200). We performed PCA on this matrix, reducing the number of rows. The top few ‘scores’ (projections onto the principal components) are low-dimensional representations of the widefield maps that were most strongly influenced by the video. To choose the dimensionality, we used the number of dimensions required to account for >90% of the variance (Fig. S2A). To obtain the widefield maps showing how the video was related to neural activity (Fig. S2B), we projected the scores back into widefield data pixel space and sparsened them using the varimax rotation. To determine the influence of each video pixel on the widefield (Fig. S2C), we projected the low-dimensional β-weights into video pixel space, took the magnitude of the β-weights for each pixel, and multiplied by the standard deviation for that pixel.
Acknowledgements
We thank Onyekachi Odoemene, Sashank Pisupati and Hien Nguyen for technical assistance and scientific discussions. Financial support was received from the Swiss National Science foundation (SM), the Pew Charitable Trusts (AKC) and the Simons Collaboration on the Global Brain (AKC, MTK).