Mega-scale movie-fields in the mouse visuo-hippocampal network

Natural visual experience involves a continuous series of related images while the subject is immobile. How does the cortico-hippocampal circuit process a visual episode? The hippocampus is crucial for episodic memory, but most rodent single unit studies require spatial exploration or active engagement. Hence, we investigated neural responses to a silent movie (Allen Brain Observatory) in head-fixed mice without any task or locomotion demands, or rewards. Surprisingly, a third (33%, 3379/10263) of hippocampal –dentate gyrus, CA3, CA1 and subiculum– neurons showed movie-selectivity, with elevated firing in specific movie sub-segments, termed movie-fields, similar to the vast majority of thalamo-cortical (LGN, V1, AM-PM) neurons (97%, 6554/6785). Movie-tuning remained intact in immobile or spontaneously running mice. Visual neurons had >5 movie-fields per cell, but only ~2 in hippocampus. The movie-field durations in all brain regions spanned an unprecedented 1000-fold range: from 0.02s to 20s, termed mega-scale coding. Yet, the total duration of all the movie-fields of a cell was comparable across neurons and brain regions. The hippocampal responses thus showed greater continuous-sequence encoding than visual areas, as evidenced by fewer and broader movie-fields than in visual areas. Consistently, repeated presentation of the movie images in a fixed, but scrambled sequence virtually abolished hippocampal but not visual-cortical selectivity. The preference for continuous, compared to scrambled sequence was eight-fold greater in hippocampal than visual areas, further supporting episodic-sequence encoding. Movies could thus provide a unified way to probe neural mechanisms of episodic information processing and memory, even in immobile subjects, across brain regions, and species.


Introduction
In addition to the position and orientation of simple visual cues, like Gabor patches and drifting gratings 8 , primary visual cortical responses are also direction selective 9 , and show predictive 35 coding 10 , suggesting that the temporal sequence of visual cues influences neural firing. Accordingly, these and higher visual cortical neurons too encode a sequence of visual images, i.e., a movie [11][12][13][14][15][16][17][18] . The hippocampus is farthest downstream from the retina in the visual circuit. The rodent hippocampal place cells encode spatial or temporal sequences 2,19-26 and episode-like

Movie tuning is not an artifact of behavioral or brain state changes
To confirm these findings, we performed several controls. Running alters neural activity in visual areas [45][46][47][48] and hippocampus [49][50][51] . Hence, we used the data from only the stationary epochs 85 (see Methods) and only from sessions with at least 300 seconds of stationary data (17 sessions, 24906 cells). Movie tuning was unchanged in this data (Figure 1-figure supplement 4). This is unlike place cells where spatial selectivity is greatly reduced during immobility 5,6 . Neurons recorded simultaneously from the same brain region also showed different selectivity patterns (Figure 1-figure supplement 5). Thus, nonspecific effects such as running cannot explain brain 90 wide movie selectivity. Prolonged immobility could change the brain state, e.g., the emergence of sharp-wave ripples. Hence, we removed the data around sharp wave ripples and confirmed that movie tuning was unaffected (Figure 1-figure supplement 6). Strong movie tuned cells were seen in sessions with long bouts of running as well as with predominantly immobile behavior (Figure 1-figure supplement 7), unlike responses to auditory tones, which were lost during 95 running behavior 51 . Place cell selectivity of hippocampal neurons is influenced by theta rhythm [52][53][54] . We compared the movie selectivity during periods of high theta, vs. periods of low theta. Significant movie selectivity in both cases (Figure 1-figure supplement 7). To further assess the effect of changes in brain state, we similarly analyzed movie tuning in two equal subsegments of data, corresponding to epochs with high and low pupil dilation, which is a strong 100 correlate of arousal 55-57 . Movie tuning was above chance levels in both sub-segments ( Figure 1figure supplement 7). Hence, locomotion, arousal or changes in brain states cannot explain the hippocampal movie tuning.

Similarities and differences between place fields and movie fields
Hippocampal neurons have one or two place fields in typical mazes which take a few seconds to 105 traverse 58 . In larger arenas that take tens of seconds to traverse, the number of peaks per cell and the peak duration increases 59-62 . Peak detection for movie tuning is nontrivial because neurons have nonzero background firing rates, and the elevated rates cover a wide range (Figure 1). We developed a novel algorithm to address this (see Methods). On average, V1 neurons had the largest number of movie-fields (Figure 2a, mean±s.e.m.=10.4±0.1, here we use mean instead of 110 median to gain a better resolution for the small and discrete values of number of fields per cell), followed by LGN (8.6±0.3) and AM-PM (6.3±0.07). Hippocampal areas had significantly fewer movie-fields per cell: dentate gyrus (2.1±0.1), CA3 (2.8±0.3), CA1(2.0±0.02) and subiculum (2.1±0.05). Thus, the number of movie-fields per cell was smaller than the number of placefields per cell in comparably long spatial tracks 59-64 , but a handful of hippocampal cells had more 115 than 5 movie-fields ( Figure 2-figure supplement 1).

Mega-scale structure of movie-fields
Typical receptive field size increases as one moves away from the retina in the visual hierarchy 36 . A similar effect was seen for movie-field durations. On average, hippocampal movie-fields were longer than visual regions (Figure 2b). But there were many exceptions -movie-fields of LGN 120 (median±s.e.m., here and subsequently, unless stated otherwise, 308.5±33.9 ms) were twice as long as in V1 (156.6±9.2ms). Movie-fields of subiculum (3169.9±169.8 ms) were significantly longer than CA1 (2786.1±77.5 ms) and nearly three-fold longer than the upstream CA3 (979.1±241.1 ms). However, the dentate movie-fields (2113.2±172.4 ms) were two-fold longer than the downstream CA3. This is similar to the patterns reported for CA3, CA1 and dentate 125 gyrus place cells 64 . But others have claimed that CA3 place fields are slightly bigger than CA1 65 , whereas movie-fields showed the opposite pattern.
The movie-field durations spanned a 500-1000-fold range in every brain region investigated (Figure 2e). This mega-scale scale is unprecedentedly large, nearly 2 orders of magnitude greater than previous reports in place cells 59,61 . Even individual neurons showed 100-fold mega-scale 130 responses (Figure 2c & d) compared to less than 10-fold scale within single place cells 59,61 . The mega-scale tuning within a neuron was largest in V1 and smallest in subiculum (Figure 2e). This is partly because the short duration movie-fields in hippocampal regions were typically neither as narrow nor as prominent as in the visual areas ( Figure 2-figure supplement 2).
Despite these differences in mega-scale tuning across different brain areas, the total duration of 135 elevated activity, i.e., the cumulative sum of movie-field durations within a single cell, was remarkably conserved across neurons within and across brain regions (Figure 2f). Unlike moviefield durations, which differed by more than ten-fold between hippocampal and visual regions, cumulative durations were quite comparable, ranging from 6.2s (V1) to 10.2s (CA3) (Figure 2f, LGN=8.8±0.21sec, V1=6.2±0.09, AM-PM=7.8±0.09, DG=9.4±0.26, CA3=10.2±0.46, 140 CA1=9.1±0.12, SUB=9.5±0.27). Thus, hippocampal movie-fields are longer and less multipeaked than visual areas, such that the total duration of elevated activity was similar across all areas, spanning about a fourth of the movie, comparable to the fraction of large environments in which place cells are active 61,63,64 . To quantify the net activity in the movie-fields, we computed the total firing in the movie-fields (i.e., the area under the curve for the duration of the movie-145 fields), normalized by the expected discharge from the shuffled response. Unlike the ten-fold variation of movie-field durations, net movie-field discharge was more comparable (<3x variation) across brain areas, but maximal in V1 and least in subiculum (Figure 2g).
Many movie-fields showed elevated activity spanning up to several seconds, suggesting ratecode like encoding ( Figure 2h). However, some cells showed movie-fields with elevated spiking 150 restricted to less than 50ms, similar to responses to briefly flashed stimuli in anesthetized cats 12,13,66 . This is suggestive of a temporal code, characterized by low spike timing jitter 67 . Such short-duration movie-fields were not only common in the thalamus (LGN), but also AM-PM, three synapses away from the retina. A small fraction of cells in the hippocampal areas, more than five synapses away from the retina, showed such temporally coded fields as well ( Figure   155 2h).
To determine the stability and temporal-continuity of movie tuning across the neural ensembles we computed the population vector overlap between even and odd trials 68 (see Methods). Population response stability was significantly greater for tuned than for untuned neurons ( Figure  3-figure supplement 1). The population vector overlap around the diagonal was broader in 160 hippocampal regions than visual cortical and LGN, indicating longer temporal-continuity, reflective of their longer movie-fields. Further, the population vector overlap away from the diagonal was larger around frames 400-800 in all brain areas due to the longer movie-fields in that movie segment (see below). 165 Are all movie frames represented equally by all brain areas? The duration and density of moviefields varied as a function of the movie frame and brain region (Figure 3-figure supplement 2). We hypothesized that this variation could correspond to the change in visual content from one frame to the next. Hence, we quantified the similarity between adjacent movie frames as the correlation coefficient between corresponding pixels and termed it as frame-to-frame (F2F) 170 image correlation. For comparison, we also quantified the similarity between the neural responses to adjacent frames (F2F neural correlation), as the correlation coefficient between the firing rate response of neuronal ensembles between adjacent frames. For all brain regions, the neural F2F was correlated with image F2F, but this correlation was weaker in hippocampal output regions (CA1 and SUB) than visual regions like LGN and V1. The majority of brain 175 regions had substantially reduced density of movie-fields between the movie frames 400 to 800, but the movie-fields were longer in this region. This effect as well was greater in the visual regions than hippocampal regions. Using significantly tuned neurons, we computed the average neural activity in each brain region at each point in the movie (see Methods). Although moviefields (Figure 3a), or just the strongest movie-field per cell (Figure 3b), covered the entire movie, 180 the peak normalized, ensemble activity level of all brain regions showed significant overrepresentation, i.e., deviation from the uniformity, in certain parts of the movie (Figure 3c, see Methods). This was most pronounced in V1 and the higher visual areas AM-PM. The number of movie frames with elevated ensemble activity was higher in visual cortical areas than hippocampal regions (Figure 3d), and also this modulation (see Methods) was smaller in 185 hippocampus and LGN, compared to the visual cortical regions (Figure 3e).

Relationship between movie image content and neural movie tuning
Using the significantly tuned neurons, we also computed the average neural activity in each brain region corresponding to each frame in the movie, without peak rate normalization (see Methods). The degree of continuity between the movie frames, quantified as above (F2F image correlation), was inversely correlated with the ensemble rate modulation in all areas except DG, CA3 and 190 CA1 (Figure 3f and g). As expected for a continuous movie, this F2F image correlation was close to unity for most frames, but highest in the latter part of the movie where the images changed more slowly. The population wide elevated firing rates, as well as the smallest moviefields, occurred during the earlier parts (Figure 3-figure supplement 2). Thus, the movie-code was stronger in the segments with greatest change across movie frames, in agreement with recent 195 reports of visual cortical encoding of flow stimuli 69 . These results show differential population representation of the movie across brain regions.

Differential neural encoding of sequential versus scrambled movie in visual and hippocampal areas
If these responses were purely visual, a movie made of scrambled sequence of images would 200 generate equally strong or even stronger selectivity due to the even larger change across movie frames, despite the absence of similarity between adjacent frames. To explore this possibility, we investigated neural selectivity when the same movie frames were presented in a fixed but scrambled sequence (scrambled movie, Figure 4-Video 1). The within frame and the total visual content was identical between the continuous and scrambled movies, and the same sequence of 205 images was repeated many times in both experiments (see Methods). But there was no correlation between adjacent frames, i.e., visual continuity, in the latter (Figure 4a).
For all brain regions investigated, the continuous movie generated significantly greater modulation of neural activity than the scrambled sequence ( Figure 4b). Middle 20 trials of the continuous movie were chosen as the appropriate subset for comparison since they were 210 chronologically closest to the scrambled movie presentation. This choice ensured that other longterm effects, such as behavioral state change, instability of single unit measurement and representational 70 or behavioral 71 drift could not account for the differences in neural responses to continuous and scrambled movie presentation. This preference for continuous over scrambled movie was the greatest in hippocampal regions where the percentage of significantly tuned 215 neurons (4.4%, near chance level of 2.3%) reduced more than 4-fold compared to the continuous movie (17.8%, after accounting for the lesser number of trials, see Methods). This was unlike visual areas where the scrambled (80.4%) and the continuous movie (92.4%) generated similar prevalence levels of selectivity (Figure 4b). The few hippocampal cells which had significant selectivity to the scrambled sequence, did not have long-duration responses, but only very short, 220 ~50ms long responses (Figure 4d), reminiscent of, but even sharper than human hippocampal responses to flashed images 33 . To estimate the effect of continuous movie compared to the scrambled sequence on individual cells, we computed the normalized difference between the continuous and scrambled movie selectivity for cells which were selective in either condition ( Figure 4c, see Methods). This visual continuity index was more than eight-fold higher in 225 hippocampal areas (median values across all 4 hippocampal regions = 87.8%) compared to the visual areas (median = 10.6% across visual regions).
The pattern of increasing visual continuity index as we moved up the visual hierarchy, largely paralleled the anatomic organization 72 , with the greatest sensitivity to visual continuity in the hippocampal output regions, CA1 and subiculum, but there were notable exceptions. The 230 primary visual cortical neurons showed the least reduction in selectivity due to the loss of temporally contiguous content, whereas LGN neurons, the primary source of input to the visual cortex and closer to the periphery, showed far greater sensitivity (Figure 4c).
Many visual cortical neurons were significantly modulated by the scrambled sequence, but their number of movie-fields per cell was greater and their duration was shorter than during the 235 continuous movie (Figure 4-figure supplement 1&2). This could occur due to the loss of frameto-frame correlation in the scrambled sequence. The average activity of the neural population in V1 and AM-PM showed significant deviation even with the scrambled movie, comparable to the continuous movie, but this multi-unit ensemble response was uncorrelated with the frame-toframe correlation in the scrambled sequence ( Population vector decoding of the ensemble of a few hundred place cells is sufficient to decode the rat's position using place cells 73 , and the position of a passively moving object 40 . Using 250 similar methods, we decoded the movie frame number (see Methods). Continuous movie decoding was better than chance in all brain regions analyzed (Figure 4f). Upon accounting for the number of tuned neurons from different brain regions, the decoding was most accurate in V1, and least in dentate gyrus. Scrambled movie decoding was significantly weaker yet above chance level (based on shuffles, see Methods) in visual areas, but not in CA3 and dentate gyrus. But 255 CA1 and subiculum neuronal ensembles could be used to decode scrambled movie frame number slightly above chance levels ( Figure 4g). Similarly, the population overlap between even and odd trials for the scrambled sequence was strong for visual areas, and weaker in hippocampal regions, but significantly greater than untuned neurons in hippocampal regions ( Figure 4-figure supplement 6). Combined with the handful of neurons in hippocampus whose 260 movie selectivity persisted to the scrambled presentation, this suggests that loss of correlations between adjacent frames in the scrambled sequence abolishes most, but not all of the hippocampal selectivity to visual sequences.

Discussion
Movie tuning in the visual areas. To understand how neurons encode a continuously unfolding 265 visual episode, we investigated the neural responses in the head fixed mouse brain to an isoluminant, black-and-white, silent human movie, without any task demands or rewards. As expected, neural activity showed significant modulation in all thalamo-cortical visual areas, with elevated activity in response to specific parts of the movie, termed movie-fields. Most (96.6%, 6554/6785) of thalamo-cortical neurons showed significant movie tuning. This is nearly double 270 that reported for the classic stimuli such as Gabor patches in the same dataset 36 , although a direct comparison is difficult due to the differences in experimental and analysis methods. For example, the classic stimuli were presented for 250ms, preceded by a blank background whereas the images changed every 30ms in a movie. On the other hand, significant tuning of the vast majority of visual neurons to movies is consistent with other reports 11-13,15,17,66,69-71 . Thus, 275 movies are a reliable method to probe the function of the visual brain and its role in cognition.

Movie tuning in hippocampal areas.
Remarkably, a third of hippocampal neurons (32.9%, 3379/10263) were also movie-tuned, comparable to the fraction of neurons with significant spatial selectivity in mice 74 and bats 75 , and far greater than significant place cells in the primate hippocampus [76][77][78] . While the hippocampus is implicated in episodic memory, rodent 280 hippocampal responses are largely studied in the context of spatial maps or place cells, and more recently in other tasks which requires active locomotion or active engagement 7,79 . However, unlike place cells 5,6 , movie-tuning remained intact during immobility in all brain areas studied, which could be because self-motion causes consistent changes in multisensory cues during spatial exploration but not during movie presentation. This dissociation of the effect of mobility 285 on spatial and movie selectivity agrees with the recent reports of dissociated mechanisms of episodic encoding and spatial navigation in human amnesia 80 . Our results are broadly consistent with prior studies that found movie selectivity in human hippocampal single neurons 81 . However, that study relied on famous, very familiar movie clips, similar to the highly familiar image selectivity 33 to probe episodic memory recall. In contrast, mice in our study had seen this black-290 and-white, human movie clip only in two prior habituation sessions and it is very unlikely that they understood the episodic content of the movie. Recent studies found human hippocampal activation in response to abrupt changes between different movie clips 34,82,83 , which is broadly consistent with our findings. Future studies can investigate the nature of hippocampal activation in mice in response to familiar movies to probe episodic memory and recall. These observations 295 support the hypothesis that specific visual cues can create reliable representations in all parts of hippocampus in rodents 5,37,40 , nonhuman primates 76,78 and humans 84,85 , unlike spatial selectivity which requires consistent information from multisensory cues 28,38,86 .
Mega-scale nature of movie-fields. Across all brain regions, neurons showed a mega-scale encoding by movie-fields varying in duration by up to 1000-fold, similar to, but far greater than 300 recent reports of 10-fold multi-scale responses in the hippocampus 59-64, 87 . While neural selectivity to movies has been studied in visual areas, such mega-scale coding has not been reported. Remarkably, mega-scale movie coding was found not only across the population but even individual LGN and V1 neurons could show two different movie fields, one lasting less than 100ms and other exceeding 10,000ms. The speed at which visual content changed across movie 305 frames could explain a part, but not all of this effect. The mechanisms governing the mega-scale encoding would require additional studies. For example, the average duration of the movie-field increased along the feed-forward hierarchy, consistent with the hierarchy of response lags during language processing 88 . Paradoxically, the mega-scale coding of movie field meant the opposite pattern also existed, with 10s long movie fields in some LGN cells while less than 100ms long 310 movie fields in subiculum.
Continuous versus scrambled movie responses. The analysis of scrambled movie-sequence allowed us to compute the neural response latency to movie frames. This was highest in AM-PM (91ms) than V1 (74ms) and least in LGN (60ms), thus following the visual hierarchy. The pattern of movie tuning properties was also broadly consistent between V1 and AM/PM (Fig 2). 315 However, several aspects of movie-tuning did not follow the feed-forward anatomical hierarchy. For example, all metrics of movie selectivity (Fig 2) to the continuous movie showed a consistent pattern that was the inconsistent to the feed-forward anatomical hierarchy: V1 had stronger movie tuning, higher number of movie fields per cell, narrower movie-field widths, larger mega-scale structure, and better decoding than LGN. V1 was also more robust to 320 scrambled sequence than LGN. One possible explanation is that there are other sources of inputs to V1, beyond LGN, that contribute significantly to movie tuning 89 . Amongst the hippocampal regions, the tuning properties of CA3 neurons (field durations, mega-chronicity index, visual continuity index and several measures of population modulation) were closest to that of visual regions, even though the prevalence of tuning in CA3 was lesser than that in other hippocampal 325 as well as visual areas.
Emergence of episode-like movie code in hippocampus. Temporal integration window 90-92 as well as intrinsic timescale of firing 36 increase along the anatomical hierarchy in the cortex, with the hippocampus being farthest removed from the retina 72 . This hierarchical anatomical organization, with visual areas being upstream of hippocampus could explain the longer movie-330 fields, the strength of tuning, number of movie peaks, their width and decoding accuracy in hippocampal regions. This could also explain the several fold greater preference for the continuous movie over scrambled sequence in the hippocampus compared to the upstream visual areas. But, unlike reports of image-association memory in the inferior temporal cortex for unrelated images 93,94 , only a handful hippocampal neurons showed selective responses to the 335 scrambled sequence. These results, along with the longer duration of hippocampal movie-fields could mediate visual-chunking or binding of a sequence of events. In fact, evidence for episodiclike chunking of visual information was found in all visual areas as well, where the scrambledsequence not only reduced neural selectivity but caused fragmentation of movie-fields (  No evidence of nonspecific effects. Could the brain-wide mega-scale tuning be an artifact of poor unit isolation, e.g., due to an erroneous mixing of two neurons, one with very short and another with very long movie-fields? This is unlikely since the LGN and visual cortical neural selectivity to classic stimuli (Gabor patches, drifting gratings etc.) in the same dataset was similar to that reported in most studies 36 whereas poor unit isolation should reduce these 345 selective responses. However, to directly test this possibility, we calculated the correlation between the unit isolation index (or fraction of refractory violations) and the mega-scale index of the cell, while factoring out the contribution of mean firing rate (Figure 1-figure supplement 8). This correlation was not significant (p>0.05) for any brain areas.
Movie-fields vs. place-fields. Do the movie fields arise from the same mechanism as place 350 fields? Studies have shown that when rodents are passively moved along a linear track that they had explored 6 , or when the images of the environment around a linear track was played back to them 5 , some hippocampal neurons generated spatially selective activity. Since the movie clip involved change of spatial view, one could hypothesize that the movie fields are just place fields generated by passive viewing. This is unlikely for several reasons. Mega-scale movie fields were 355 found in the vast majority of all visual areas including LGN, far greater than spatially modulated neurons in the visual cortex during virtual navigation 95,96 . Further, in prior passive viewing experiments, the rodents were shown the same narrow linear track, like a tunnel, that they had previously explored actively to get food rewards at specific places. In contrast, in current experiments, these mice had never actively explored the space shown in the movie, nor obtained 360 any rewards. Active exploration of a maze, combined with spatially localized rewards engages multisensory mechanisms resulting in increased place cell activation 22,28,97 which are entirely missing in these experiments during passive viewing of a movie, presented monocularly, without any other multisensory stimuli and without any rewards. Compared to spontaneous activity about half of CA1 and CA3 neurons shutdown during spatial exploration and this shutdown is even 365 greater in the dentate gyrus. Further, compared to the exploration of a real-world maze, exploration of a visually identical virtual world causes 60% reduction in CA1 place cell activation 86 . In contrast, there was no evidence of neural shutdown during the movie presentation compared to grey screen spontaneous epochs ( Figure 1-figure supplement 8). Similarly, the number of place fields (in CA1) per cell on a long track is positively correlated with the mean hippocampal neurons could not be rearranged to obtain the continuous movie response. This shows the importance of continuous, episodic content instead of mere sequential recurrence of 390 unrelated content for rodent hippocampal activation. We hypothesize that similar to place cells, movie-field responses without task-demand would play a role, to be determined, in episodic memory. Further work involving a behavior report for the episodic content can potentially differentiate between the sequence coding described here and the contribution of episodically meaningful content. However, the nature of movie selectivity tested so far in humans was 395 different (recall of famous, short movie clips 81 , or at event boundaries 34 ) than in rodents here (human movie, selectivity to specific movie segments).

Broader outlook.
Our findings open up the possibility of studying thalamic, cortical, and hippocampal brain regions in a simple, passive, and purely visual experimental paradigm and extend comparable convolutional neural networks 11 to have the hippocampus at the apex 72 . 400 Further, our results here bridge the long-standing gap between the hippocampal rodent and human studies 34,99-101 , where natural movies can be decoded from fMRI signals in immobile humans 102 . This brain-wide mega-scale encoding of a human movie episode and enhanced preference for visual continuity in the hippocampus compared to visual areas supports the hypothesis that the rodent hippocampus is involved in non-spatial episodic memories, consistent 405 with classic findings in humans 1 and in agreement with a more generalized, representational framework 103,104 of episodic memory where it encodes temporal patterns. Similar responses are likely across different species, including primates. Thus, movie-coding can provide a unified platform to investigate the neural mechanisms of episodic coding, learning and memory.

Methods:
640 Experiments: We used the Allen Brain Observatory -Neuropixels Visual Coding dataset (© 2019 Allen Institute, https://portal.brain-map.org/explore/circuits/visual-coding-neuropixels). This website and related publication 36 contain detailed experimental protocol, neural recording techniques, spike sorting etc. Data from 24 mice (16 males, n=13-C57BL/6J wild-type, n=2 Pvalb-IRES-645 Cre×Ai32, n=6 Sst-IRES-Cre×Ai32, and n=3 Vip-IRES-Cre×Ai32) from the "Functional connectivity" dataset was analyzed herein. Prior to implantation with Neuropixel probes, mice passively viewed the entire range of images including drifting gratings, Gabor patches and movies of interest here. Videos of the body and eye movements were obtained at 30Hz and synced to the neural data and stimulus presentation using a photodiode. Movies were presented 650 monocularly on an LCD monitor with a refresh rate of 60Hz, positioned 15cm away from the mouse's right eye and spanned 120 o x95 o . 30 trials of the continuous movie presentation were followed by 10 trials of the scrambled movie. Next was a presentation of drifting gratings, followed by a quiet period of 30 minutes where the screen was blank. Then the second block of drifting gratings, scrambled movie and continuous movie was presented. After surgery, all mice 655 were single-housed and maintained on a reverse 12-h light cycle in a shared facility with room temperatures between 20 and 22 °C and humidity between 30 and 70%. All experiments were performed during the dark cycle.
Neural spiking data was sampled at 30 kHz with a 500Hz high pass filter. Spike sorting was automated using Kilosort2 105 . Output of Kilosort2 was post-processed to remove noise units, 660 characterized by unphysiological waveforms. Neuropixel probes were registered to a common co-ordinate framework 106 . Each recorded unit was assigned to a recording channel corresponding to the maximum spike amplitude and then to the corresponding brain region. Broad spiking units identified as those with average spike waveform duration (peak to trough) between 0.45 to 1.5ms and those with mean firing rates above 0.5Hz were analyzed throughout, except Figure 1-figure   665 supplement 8.

Movie tuning quantification
The movie consisted of 900 frames: 30s total, 30Hz refresh rate, 33.3ms per frame. At the first level of analysis, spike data were split into 900 bins, each 33.3ms wide (the bin size was later varied systematically to detect mega-scale tuning, see below). The resulting tuning curves were 670 smoothed with a Gaussian window of σ=66.6 ms or 2 frames. The degree of modulation and its significance was estimated by the sparsity s as below, and as previously described 40,86 .
where rn is the firing rate in the ℎ frame or bin and N=900 is the total number of bins. This is equivalent to "lifetime sparseness", used previously 11,14 , except for the normalization factor of 675 (1-1/N), which is close to unity, when N is close to 900 as in the case of movies. Statistical significance of sparsity was computed using a bootstrapping procedure, which does not assume a normal distribution. Briefly, for each cell, the spike train as a function of the frame number from each trial was circularly shifted by different amounts and the sparsity of the randomized data computed. This procedure was repeated 100 times with different amounts of random shifts. The 680 mean value and standard deviation of the sparsity of randomized data were used to compute the z-scored sparsity of observed data using the function zscore in MATLAB. The observed sparsity was considered statistically significant if the z-scored sparsity of the observed spike train was greater 2, which corresponds to p<0.023 in a one tailed t-test. A similar method was used to quantify significance of the scrambled movie tuning, as well as for the subset of data with only 685 stationary epochs, or its equivalent subsample (see below). Middle 20 trials of the continuous movie were used in comparisons with the scrambled movie in Figure 4, to ensure a fair comparison by using same number of trials, with similar time delays across measurements.
In addition to sparsity, we quantified movie tuning using two other measures. of these alternative measures of selectivity was computed similar to that for sparsity and is detailed in Figure 1-figure supplement 3.

Stationary epoch and sharp wave ripple free epoch identification
To eliminate the confounding effects of changes in behavioral state associated with running, we 700 repeated our analysis in stationary epochs, defined as epochs when the running speed remained less than 2cm/sec for this period, as well as for at least 5 seconds before and after this period. Analysis was further restricted to sessions with at least 5 total minutes of these epochs during the 60 trials of continuous movie presentation. To account for using lesser data of the stationary epochs, we compared the tuning using a random subsample of data, regardless of running or 705 stopping and compared the two results for difference in selectivity.
Similarly, to remove epochs of sharp wave ripples (SWR), we first computed band passed power in the hippocampal (CA1) recording sites in the 150-250Hz range. SWR occurrence was noted if any of the best 5 sites in CA1 (those with highest theta (5-12Hz) to delta (1-4Hz) ratio), or the median SWR across all CA1 sites exceeded their respective 3 standard deviations of power. To 710 remove SWRs, we removed frames corresponding to ±0.5second around the SWR occurrence and recomputed movie tuning in the remaining data. Similar to the stationary epoch calculation above, we compared tuning to an equivalent random subset to account for loss of data.

Pupil dilation and theta power comparisons
To assess the contribution of arousal state on movie tuning, we re-calculated z-scored sparsity in 715 epochs with high vs. low pupil dilation. The pupil was tracked at a 30Hz sampling rate, and the height and width of the elliptical fit as provided in the publicly available dataset was used. For each session, the pupil area thus calculated was split into two equal halves, by using data above and below the 50 th percentile. The resultant z-scored sparsity is reported in Figure 1-figure supplement 7.

720
Similarly, the theta power computed from the band passed local field potential signal in the 5-12Hz range was split into 2 equal data sub segments. The channel from CA1, with the highest average theta to delta (1-4Hz) power ratio was nominated as the channel to be used for these calculations. Movie tuning in data with high and low theta power thus separated is reported in Figure 1-figure supplement 7.

Mega-scale movie-field detection in tuned neurons
For neurons with significant movie-sparsity, i.e., movie-tuned, the movie response was first recalculated at a higher resolution of 3.33ms (10 times the frame rate of 33.3ms). The findpeaks function in MATLAB was used to obtain peaks with prominence larger than 110% (1.1x) the range of firing variation obtained by chance, as determined from a sample shuffled response. 730 This calculation was repeated at different smoothing values (logarithmically spaced in 10 Gaussian smoothing schemes with σ ranging from 6.7ms to 3430ms), to ensure that long as well as short movie-fields were reliably detected and treated equally. For frames where overlapping peaks were found at different smoothing levels, we employed a comparative algorithm to only select the peak(s) with higher prominence score. This score was obtained as the ratio of the 735 peak's prominence to the range of fluctuations in the correspondingly smoothed shuffle. This procedure was conducted iteratively, in increasing order of smoothing. If a broad peak overlapped with multiple narrow ones, the sum of scores of the narrow ones was compared with the broad one. To ensure that peaks at the beginning as well as the end of the movie frames were reliably detected, we circularly wrapped the movie response, for the observed as well as shuffle 740 data.
Identifying frames with significant deviations in multiple single-unit activity (MSUA) First, the average response across tuned neurons for each brain region was computed for each movie frame, after normalizing the response of each cell by the peak firing response. This average response was used as the observed "Multiple single unit activity (MSUA)" in Figure 3. 745 To compute chance level, individual neuron responses were circularly shifted with respect to the movie frames to break the frame to firing rate association but maintain overall firing rate modulation. 100 such shuffles were used, and for each shuffle, the shuffled MSUA response was computed by averaging across neurons. Across these 100 shuffles, mean and standard deviation was obtained for all frames, and used to compute the z-score of the observed MSUA. To obtain 750 significance at p=0.025 level, Bonferroni correction was applied, and the appropriate z-score (4.04) level was chosen. The number of frames in the observed MSUA above (and below) this level were further quantified in Figure 3. The firing deviation for these frames was computed as the ratio between the mean observed MSUA and the mean shuffled MSUA, reported as a percentage, for frames corresponding to z-score greater than +4 or less than -4. To obtain a total 755 firing rate report, where each spike gets equal vote, we computed the total firing response by computing the total rate across all tuned neurons (and averaging by the number of neurons) in Figure 3 and across all neurons in Figure 3-figure supplement 2.

Population Vector Overlap
To evaluate the properties of a population of cells, movie presentations were divided into 760 alternate trials, yielding even and odd blocks 68 . Population vector overlap was computed between the movie responses calculated separately for these 2 blocks of trials. Population vector overlap between frames x of the even trials & frame y of the odd trials was defined as the Pearson correlation coefficient between the vectors (R1,x , R 2,x ,… R N,x ) & (R 1,y , R 2,y ,… R N,y ), where R n,x is the mean firing rate response of the n th neuron to the x th movie frame. N is the total number 765 of neurons used, for each brain region. This calculation was done for x and y ranging from 1 to 900, corresponding to the 900 movie frames. The same method was used for tuned and untuned neurons in continuous movie responses in Figure 3-figure supplement 1, and for scrambled sequence responses in Figure 4-figure supplement 6.
Decoding analysis 770 Methods similar to those previously described were used 40, 73 . For tuned cells, the 60 trials of continuous movie were each decoded using all other trials. Mean firing rate responses in the 59 trials for 900 frames were used to compute a "look-up" matrix. Each neuron's response was normalized between 0 and 1. At each frame in the "observed" trial, the correlation coefficient was computed between the population vector response in this trial and the look-up matrix. The 775 frame corresponding to the maximal correlation was denoted as the decoded frame. Decoding error was computed as the average of the absolute difference between actual and decoded frames, across the 900 frames of the movie. For comparison, shuffle data was generated by randomly shuffling the cell-cell pairing of the look-up matrix and "observed response". To enable a fair comparison of decoding accuracy across brain regions, the tuned cells from each 780 brain region were subsampled, and a random selection of 150 cells was used. A similar procedure was used for the 20 trials of the scrambled sequence, and the corresponding middle 20 trials of the continuous movie were used here for comparison.

Rearranged scrambled movie analysis
To differentiate the effects of visual content versus visual continuity between consecutive 785 frames, we compared the responses of the same neuron to the continuous movie and the scrambled sequence. In the scrambled movie, the same visual frames as the continuous movie were used, but they were shuffled in a pseudo random fashion. The same scrambled sequence was repeated for 20 trials. The neural response was first computed at each frame of the scrambled sequence, keeping the frames in the chronological order of presentation. Then the 790 scrambled sequence of frames was rearranged to recreate the continuous movie and the corresponding neural responses computed. To address the latency between movie frame presentation and its evoked neural response, which can differ across brain regions and neurons, this calculation was repeated for rearranged scrambled sequences with variable delays between τ= -500 to +500 ms (i.e., -150 to +150 frames of 3.33ms resolution, in steps of 5 frames or 795 16.6ms). The correlation coefficient was computed between the continuous movie response and this variable delayed response at each delay as rmeasured (τ) = corrcoef (Rcontinuous, Rscramblerearranged(τ)). Rcontinuous is the continuous movie response, obtained at 3.33ms resolution and similarly, Rscramble-rearranged corresponds to the scrambled response after rearrangement, at the latency τ. The latency τ yielding the largest correlation between the continuous and rearranged 800 scrambled movie was designated as the putative response latency for that neuron. This was used in Figure 4-figure supplement 4. The value of rmeasured(τmax) was bootstrapped using 100 randomly generated frame reassignments, and this was used to z-score rmeasured(τmax), with z-score > 2 as criterion for significance. The resultant z-score is reported in Figure 4-figure supplement 4.

805
The latency τ was rounded off for use with 33ms bins and used to rearrange actual as well as shuffled data to compute the strength of tuning for scrambled presentation. Z-scored sparsity was computed as described above. This was compared with the z-scored sparsity of continuous movie as well as the scrambled movie data, without the rearrangement, and shown in Figure 4-figure supplement 5.  comparable across brain regions. The largest cumulative duration (10.2±0.46s, CA3) was only 1.66x of the smallest (6.2±0.09 sec, V1). Visual-hippocampal and visual-visual brain region pairs' cumulative duration distributions were significantly different (KS-test p<0.001), but not hippocampal pairs (p>0.07).
(g) Distribution of the firing within fields, normalized by that in the shuffle response. All fields from all tuned neurons in a brain region were used. Firing in movie-fields was significantly different across all 865 brain region pairs (KS-test, p<1.0x10 -7 ), except DG-CA3. Movie-field firing was largest in V1 (2.9±0.03) and smallest in subiculum (1.14±0.

03). (h) Snippets of movie-fields from representative tuned cells, from
LGN showing a long movie-field (233 frames, or 7.8s, panel 1), and from AM-PM and from hippocampus showing short fields (2 frames or 66.6ms wide or less).

Figure 3 | Population averaged movie-tuning varies across brain areas. (a)
Stack plot of all the moviefields detected from all tuned neurons of a brain region. Color indicates relative firing rate, normalized by the maximum firing rate in that movie-field. The movie-fields were sorted according to the frame with the maximal response. Note accumulation of fields in certain parts of the movie, especially in 875 subiculum and AM-PM. (b) Similar to (a), but using only a single, tallest movie-field peak from each neuron showing a similar pattern, with pronounced overrepresentation of some portions of the movie in most brain areas. Each neuron's response was normalized by its maximum firing rate. The average firing rate of non-peak frames, which was inversely related to the depth of modulation, was smallest (0.35x of the average peak response across all neurons) for V1, followed by AM-PM 0.37, leading to blue shades. greater or equal to that below, for all brain regions, with the largest positive deviation in AM-PM (9.3%), largest negative deviation in V1 (6.0%), and least in CA3 (zero each). (f) Total firing rate response of visual regions across tuned neurons. All regions had significant negative correlation (r<-0.39, p<3.4x10 -

34
) between the ensemble response and the frame-to-frame (F2F) image correlation (gray line, y-axis on the left) across movie frames. (g) Similar to (f), for hippocampal regions. CA3 response were not 900 significantly correlated with the frame-to-frame correlation, dentate gyrus (r=0.26, p=4.0x10 -15 ) and CA1 (r=0.21, p=1.5x10 -10 ) responses were positively correlated, and subiculum response was negatively correlated (r=-0.44, p=2.2x10 -43 ). Note the substantially higher mean firing rates of LGN in (f) and subiculum neurons in (g) (colored lines closer to the top) compared to other brain areas. increased spiking responses to only one or two scrambled movie frames, lasting about 50ms. Tuned responses to scrambled movie were found in all brain regions, but these were the least frequent in DG and CA1. (e) One representative cell each from V1 (left) and CA1 (right), where the frame rearrangement of scrambled responses resulted in a response with high correlation to the continuous movie response for V1, but not CA1. Pearson correlation coefficient values of continuous movie and 925 rearranged scrambled responses are indicated on top. (f) Average decoding error for observed data (see Methods), over 60 trials for continuous movie (maroon), was significantly lower than shuffled data (gray) (KS-test p<1.2x10 -22 ). Solid line -mean error across 60 trials using all tuned cells from a brain region, shaded box -s.e.m., green dots -mean error across all trials using a random subsample of 150 cells from each brain region. Decoding error was lowest for V1 (30.9 frames) and highest in DG (241.2) and 930 significantly different between all brain regions pairs (p<1.9x10 -4 ), except CA3-CA1, CA3-subiculum and CA1-subiculum (p>0.63). (g) Similar to (f), decoding of scrambled movie was significantly worse than that for the continuous movie (KS-test p<2.6x10 -3 ). Scrambled responses, in their "as is", chronological order were used herein. LGN decoding error for scrambled presentation was 6.5x greater than that for continuous movie, whereas the difference in errors was least for V1 (1.04x). Scrambled movie decoding 935 error for all visual areas and for CA1 and subiculum was significantly smaller than chance level (KS-test p<2.6x10 -3 ), but not DG and CA3 (p>0.13). Only the middle 20 trials of the continuous movie were used for comparison with the scrambled movie since the scrambled movie was only presented 20 times. Middle trials of the continuous movie were chosen as the appropriate subset since they were chronologically closest to the scrambled movie presentation.   The percentage of movie tuned cells, deemed as z-scored metric > 2, were significantly greater than chance levels (p<4.9x10 -11 ), using either sparsity or depth of modulation or mutual information as the metric 970 (see Methods for metric definitions). Sparsity yielded higher movie tuning than depth of modulation across all brain regions (p<1.8x10 -3 ), putatively because it captures multi-peaked tuning better than depth of modulation, which only relies on the largest and smallest firing rate responses. Similarly, zscored mutual information led to greater tuning than chance levels (p<4.9x10 -11 ), but lesser than that with the sparsity metric (p<1.3x10 -5 ). using only the data when the mouse was immobile, while excluding the data when the mouse was running (stationary data, see Methods). (b) Fraction of selective neurons was significantly above chance in all brain regions, ranging from 94.7% in V1 up to 7.1% in CA3 in the stationary data. (c) To explicitly test the effect of running on movie selectivity, we compared the results in (b) with a random subsample of data, of equal duration as the stationary data, that included running and stationary, to control for the 985 loss of data (see Methods). Prevalence of movie selectivity was not significantly different (KS-test p>0.05) in these 2 subsamples, except in CA1 (p=0.03, 13.1% in stationary data, 15.0% in the equivalent subsample). Only sessions with at least 300 seconds of stationary data were used in this analysis to ensure sufficient statistical power. The reduction in fraction tuned neurons in (b) and (c) for 'stationary data', compared to 'all data' here and in Figure 1 and   significant modulation movie tuning using data after removal of SWR events (14371 cells from 20 sessions where SWR information was available, see Methods). (b) Fraction of selective neurons was significantly above chance in all brain regions, ranging from 96.7% in V1 up to 12.3% in CA3 in the SWR removed data. (c) To control the loss of data by the removal of SWR, we compared the results with movie tuning in an equivalent subsample of data. Prevalence of movie selectivity was not significantly 1010 different (KS-test p>0.05) in these 2 subsamples, except in AM-PM (p=0.02, 97.1% in SWR removed data, 94.6% in the equivalent subsample). As before (Figure 1-figure supplement 4), due to a reduction in the amount of data, a smaller number of neurons showed significant movie tuning in both SWR removed data as well as equivalent subsampled data.  data did not have significantly different movie tuning prevalence for LGN, DG and CA3 (p>0.73, which could be because of smaller number of cells recorded in these brain regions), but dilated pupil corresponds to slightly greater tuning for other brain regions (p<3.4x10 -4 ). (f) Similar to (e), the movie tuning in high as well as low theta power data was significantly greater than chance levels (p<5.0x10 -10 ). Movie tuning was greater in data with high theta power for DG and CA1 (p<2.1x10 -6 ), but not significantly different for other brain regions (p>0.07). Both sub-segments had equal amounts of data to ensure fair comparison.  movie-field durations (0.57±0.13s) were about two-fold longer than V1 (p=2.5x10 -21 ); though both were smaller than those in the higher order brain areas (0.71±0.05s). (b) Firing in movie-fields, normalized by that in the shuffled response were used to obtain the median value from all fields of a neuron. This metric of median movie-field activation is significantly different across all brain region pairs (KS-test p<3.4x10 -5 ), except DG-CA3, CA3-CA1 and DG-CA1 pairs. The largest median movie-field activation was 1060 in V1 (2.5±0.05), and the smallest in subiculum (1.13±0.03). (c) Cumulative firing in movie-fields, normalized by that in the shuffle response, obtained by adding the activity within all fields of a neuron was significantly different across all brain region pairs (KS-test p<3.0x10 -7 ), except DG-CA3, CA3-CA1 and DG-CA1. V1 response was largest (1.93±0.04), and subiculum was the smallest (1.11±0.02). (d) For each brain region, the movie-field duration ratio was recalculated by randomly reassigning the cell ids to all 1065 the movie peaks from that brain region. Using this new assignment of movie peaks to a cell, we obtain the expected mega-scale index (largest/smallest peak duration) based on the ensemble behavior. The observed mega-scale index within a cell was smaller than expected from the ensemble in all the visual areas (KS-test p<3.2x10 -3 , median was 77.5%, 56.2% and 41.7% of chance for LGN, V1 and AM-PM respectively). This was not the case in hippocampal regions (p>0.23). Thus, individual cells in the visual, 1070 but not hippocampal, areas sampled a subset of possible mega-scale coding values of the ensemble. (e) Histogram of movie-fields, binned for their durations (log-scaled) and their prominence (also log-scaled). The most prominent fields tended to be wider in most brain areas, and this effect was stronger in hippocampal regions, than visual. Note that the histogram color is also log-scaled. highest overlap along the diagonal (i.e. for the same movie frame) for all brain regions. Each neuron's response was normalized by its mean rate and the average response in even as well as odd trials was smoothed by a Gaussian window of 2 frames (66.6ms, see Methods). Dashed black lines indicate the -300 and +300 frames away from the diagonal. Notice large correlations (close to unity, horizontal color bar) indicating stable responses. The correlations decay quickly to smaller values for the visual areas but 1085 more slowly for hippocampal areas, due to their broader movie-fields. (b) Same as (a), but for untuned neurons, resulting in a salt and pepper overlap pattern and low values of correlation, indicating lesser stability than the tuned neurons. Since the majority of cells in the visual areas were tuned, the untuned population was smaller, leading to more variable population vector overlap. (c) The average population vector overlap, computed across all frames, as a function of the number of movie frames away from the 1090 diagonal in (a). It had a large value in visual regions for the 0 th diagonal (colored lines) indicating stable responses, whereas the untuned neuron population (gray lines) were unstable, with values near zero, or chance level. The highest population vector overlap in hippocampal regions was smaller than visual areas but persisted for more frames, due to their broader movie-fields (Full width at half maximum of the peak -17.  The adjacent movie frame (framen,framen+1) correlation coefficient, indicating the similarity of 2 consecutive frames, termed F2F image correlation, is shown in gray. Similarly, the correlation coefficient between the population vector of neural responses between adjacent frames, was termed F2F neural correlation, computed separately for each brain region is shown in color. The relationship between F2F-image, and F2F-neural correlation   (a) Stack plot of tuned responses to the scrambled movie presentation from each brain region, sorted according to the frame with peak response. Each firing rate profile is normalized by the peak response causing the diagonal to be unity. The average firing rate of non-peak frames (similar to Figure 3b  but rearranged (dark blue). Movie tuning was significantly higher for the continuous presentation (p<3.5x10 -3 ) than the scrambled as is condition or scrambled rearranged condition (p<2.6x10 -6 ), in all brain regions. Movie tuning for the scrambled presentation taken as is, or after rearrangement was not significantly different for all brain regions (p>0.08), except LGN (p=1.3x10 -5 ) and V1 (p=0.001), although the prevalence of tuning was comparable (63.7 and 64.3%-LGN and 90.1 and 90.0%-V1).   86 . In contrast, there was no consistent pattern of neural activation or shutdown during the movie presentation in all brain areas. To make a more conservative estimate, this 1255 comparison was restricted to units whose firing rates did not differ by more than 20% across the two movie blocks. Further, only the data when the animals were immobile was used to avoid confounding effects of running, and the rate threshold of 0.5Hz was removed for this panel. (b) The amount of movie tuning was positively correlated with the mean firing rates of the neurons for all brain regions (r>0.14, p<4.2x10 -10 ). (c) The number of movie fields was uncorrelated with the mean firing rate of tuned cells in 1260 V1, DG, CA1 and SUB (p>0.12), but positively correlated for LGN, AM-PM and CA3 (r>0.04, p<0.01). Note the different y-scales for visual and hippocampal brain regions. Since the number of movie fields is an integer, data along the y-axis was slightly jittered for better visualization.(d) The mega-scale index was only weakly correlated with the mean firing rate of a neuron in V1 (Pearson's correlation coefficient r=0.08, p=7.3x10 -5 ), CA1 (r=-0.14, p=3.5x10 -8 ) and subiculum (r=-0.14, p=0.02), and was uncorrelated for 1265 other brain regions (p>0.05). (e) The refractory violations index was uncorrelated with the mega-scale index (lower index means better cluster quality 36,108 ) for all brain regions (p>0.05). To remove the potential confounding effect of mean firing rates, we computed the partial correlation coefficient by factoring out the mean firing rate). (f) Similar to (c), the isolation index (greater isolation index means better cluster quality 36,109 ) was uncorrelated with the mega-scale index for all brain regions (partial correlation coefficient, by factoring out the mean firing rate, p>0.12). Factoring out the contribution of mean firing rate is necessary since the isolation index was typically positively correlated (the refractory violations index was typically negatively correlated) with the mean rate. The mega-scale index comparisons were restricted to movie active, tuned neurons with at least two movie peaks. Note-log spaced axes for (a)-(d)