Abstract
Narratives may provide a general context, unrestricted by space and time, which can be used to organize episodic memories into networks of related events. However, it is not clear how narrative contexts are represented in the brain. Here we test the novel hypothesis that the formation of narrative-based contextual representations in humans relies on the same hippocampal mechanisms that enable formation of spatiotemporal contexts in rodents. Participants watched a movie consisting of two interleaved narratives while we monitored their brain activity using fMRI. We used representational similarity analysis, a type of multivariate pattern analysis, which uses across-voxel correlations as a proxy for neural-pattern similarity, to examine whether the patterns of neural activity can be used to differentiate between narratives and recurring narrative elements, such as people and locations. We demonstrate that the neural activity patterns in the hippocampus differentiate between event nodes (people and locations) and narratives (different stories) and that these narrative-context representations diverge gradually over time akin to remapping-induced spatial maps represented by rodent place cells.
SIGNIFICANCE STATEMENT Narratives, especially in movie format, are very engaging and can be used to investigate neural mechanisms underlying cognitive functions in more naturalistic settings than that of traditional paradigms. Narratives also provide a more general context, unrestricted by space and time, that can be used to organize memories into networks of related events. For this reason, narratives are ideally suited to engage neural mechanisms underlying episodic memory formation. In this study, participants watched a movie with two interleaved narratives while their brain activity was monitored using fMRI. We show that the hippocampus, which is involved in formation of spatiotemporal contexts in episodic memory, also represents gradually diverging narrative contexts as well as narrative elements, such as people and locations.
Introduction
Storytelling serves an important sociocultural role, and neural mechanisms underlying narrative comprehension have received considerable attention in the literature (Ferstl et al., 2008; Martín-Loeches et al., 2008; Lerner et al., 2011; Corballis, 2013; Nijhof and Willems, 2015). We also find narratives very engaging, and exposure to narratives, particularly in the form of movies, can be used to engage neural mechanisms underlying perceptual and cognitive operations (Huth et al., 2012; Hasson et al., 2015) in more realistic settings than that of more traditional paradigms. Movies, where each event consist of multiple elements, such as people, locations, objects, or actions, and where individual events are bound together by common context of the narrative, are particularly well suited for study of episodic memory mechanisms. As a parallel, autobiographical memories are often organized into personal narratives (Conway and Pleydell-Pearce, 2000; Schacter et al., 2007; Spreng et al., 2009; Collin et al., 2015; Milivojevic et al., 2015), which may provide a general context for remembering individual events unrestricted by space and time. For example, we might sequentially work on various projects from the confines of our office. Nevertheless, we can recall a series of events that led to a project completion, regardless of whether those events occurred close together in time or in space. But how does the brain enable the formation of such narrative-based contextual representations while keeping track of individual elements, or event nodes, which comprise those events?
Here we propose that neural mechanisms underlying episodic memory formation are also involved in keeping track of spatiotemporally extended narratives. Namely, episodic memories are not stored in isolation but rather form networks of related events (Eichenbaum et al., 1999; Norman and O'Reilly, 2003; Shohamy and Wagner, 2008; Staresina and Davachi, 2009; Staresina et al., 2012; Zeithamova et al., 2012; Shohamy and Turk-Browne, 2013; Horner et al., 2015). Although it remains unclear what organizing principles govern the structure of such mnemonic networks, it seems certain that spatial (O'Keefe and Nadel, 1978; Burgess et al., 2002; Doeller et al., 2008, 2010; Moser et al., 2008; Steemers et al., 2016) and temporal contexts (Howard and Kahana, 2002; Eichenbaum, 2014; Ezzyat and Davachi, 2014; Hsieh et al., 2014; Deuker et al., 2016), in which events took place, are essential components of episodic memories. Nevertheless, despite the fact that in our daily lives we often revisit the same environments, and that temporally proximal events are not necessarily related, we have little trouble integrating information across large spatiotemporal gaps (Staresina and Davachi, 2009).
It remains unclear, however, how separate narrative contexts emerge and how we switch between those separate narrative contexts in episodic memory. Encoding of new events as separate from old ones in episodic memory is thought to require pattern separation (Bakker et al., 2008; Yassa and Stark, 2011; Duncan et al., 2012; Deuker et al., 2014) because everyday events can share multiple features, such as spatial and temporal context, but also people, objects, and actions. Therefore, spatial and temporal contexts, as well as personal narrative contexts might all serve a role in the organization of human episodic memories. Our previous work suggests that the hippocampus indeed underlies emergence of narrative contexts through integration of multiple related events (Collin et al., 2015; Milivojevic et al., 2015). Here we predicted that such narrative-based contextual representations would diverge over time through similar neural mechanisms of pattern separation that govern the emergence of stable contextual representations of space (Lever et al., 2002).
To examine whether this is indeed the case, we used fMRI to monitor brain activity while participants watched a movie, which consisted of two interleaved narratives diverging from a common beginning. The movie was examined and tagged for characters, locations, and narratives. We tracked the emergence of narrative-specific representations over the course of the movie.
Materials and Methods
Participants.
Twenty-five (five male, 20 female) volunteers with normal or corrected-to-normal vision, no hearing impairments, and no history of neurological or psychiatric disease, participated in this study. None of them had seen the stimulus movie, Sliding Doors, before, and all were comfortable watching movies in English without subtitles (by self-report). Procedures were approved by the local ethical review committee (CMO region Arnhem-Nijmegen, The Netherlands), and participants gave written informed consent to participate. Four participants were excluded from further analysis due to excessive head movement (see Image preprocessing). Further, two participants were excluded due to image reconstruction failure, which resulted in the loss of neuroimaging data for the last 30%–50% of the movie. The final sample consisted of 19 participants (four male, 15 female, age range: 19–27 years, mean age: 23.84 years, SD: 2.61 years).
MRI acquisition.
Imaging data were acquired on a 3T Siemens TIM Trio scanner using a 32-channel head coil. We used a custom 3D EPI pulse sequence (Poser et al., 2010) with the following parameters: volume TR = 1800 ms; TE = 25 ms; flip angle = 15°; volume resolution = 2 mm3; FOV = 224 × 224 × 128 mm; slab orientation = −25° pitch rotation; 3D acceleration factor = 2. Functional (T2*-weighted) image acquisition was subdivided into two runs of 1536 volumes (46:08 min) each, with a 10 min break in between. The structural T1-weighted image was acquired using an MP2RAGE sequence (Marques et al., 2010) with the following parameters: TR = 5000 ms; TE = 2.96 ms; flip angle = 4°; in-plane resolution = 256 × 256 mm; GRAPPA acceleration factor PE = 3; voxel resolution = 1 mm3, duration = 8:87 min. Before functional volume acquisition, a gradient-field map was acquired using a gradient echo sequence with the following parameters: TR = 1020 ms; TE1 = 10 ms; TE2 = 12.46 ms; flip angle = 90°; volume resolution = 3.5 × 3.5 × 2 mm; FOV = 224 × 224 mm; slice orientation = −25° pitch rotation. The field map was applied for distortion-correction of the acquired functional images (see below).
Stimuli.
Participants watched the movie Sliding Doors (by Peter Howitt, 1998, runtime: 99 min, produced by Intermedia Films and Mirage Enterprises, distributed by Miramax and Paramount Pictures), a romantic comedy that begins as one narrative, but ∼5 min after the beginning of the movie, diverges into two alternating narratives (referred to as Narratives 1 and 2 hereafter), featuring the same group of characters, who frequent the same set of locations. Narrative 1 was defined as the one in which Helen (played by Gwyneth Paltrow) missed the train (Helen 1), and Narrative 2 was the one in which Helen caught the train (Helen 2). Helen 1 was also considered to be the original Helen before the narratives split into two. This movie was chosen for two main reasons. First, the two narratives were interleaved, as opposed to sequential, which reduces the potential of temporal confounds of MRI data (e.g., more within- than between-narrative similarity due to temporal proximity). Second, because the same group of characters appear in both story lines and frequent the same set of locations, the two narrative versions are visually similar to each other. It should be noted that, although all characters and most locations appear in both narratives, the relative contribution of the characters and locations to each narrative differs (for more details, see Temporal co-occurrence control analyses and Table 1). Given the degree of overlap of locations and characters between the narratives, both story-lines often contained similar visual features, where the main visual difference between the two narratives is that the main character, Helen, changes her hairstyle partway through the movie in Narrative 2 (∼26 min into the film, or approximately halfway through the first scanning run). Additional potential visual confounds were controlled for in the statistical analysis (below).
The movie was shown using Presentation software (Version 16.4; www.neurobs.com), thus ensuring synchronized presentation across participants. Participants used MRI-compatible earbuds, providing equal audio input to both ears, and additional noise insulation was provided by head cushions placed over the ears. The movie was subdivided into eight time bins for the analyses described in Figure 4. Although the two scanning runs were both ∼45 min, the time bins were somewhat shorter for the first scanning run (∼10 min each) than for the second scanning run (∼12 min each) because the movie began as a single narrative and diverged into two story lines ∼5 min into the film.
Stimulus-movie labeling and analysis.
The movie was segmented into 1.8 s time bins, corresponding to the volumes acquired during each TR (see Fig. 1 for details). Each movie segment was tagged for contents by two independent raters based on the three categories of interest: narratives (Narrative 1, Narrative 2, both narratives, or neither narrative i.e., before the narrative-split occurred), people (any of the eight main characters, or random other people), and locations (one from a total of 24 locations, including two regressors for unspecific indoor or outdoor scenes). These tags were used to define trials for each of the 25 event types of interest (presence of either of the two narratives, any of the eight characters, and due to power issues only 15 of the most commonly visited and specific locations; Table 1) as well as event types of no interest (remaining tag labels). Sequential movie segments containing each event type of interest were considered as individual “trials” for that condition and were defined with a single onset and a variable duration (corresponding to the duration of the segments containing the defining feature). These vectors were then used to create two regressors of interest per event type, corresponding to odd and even “trials,” and were modeled using a GLM (for more details, see Fig. 2). The model also included 12 event types of no interest (volumes in which both narratives occurred, neither narrative, random people, nine of the less frequently visited locations) but were not split into odd and even trials (see Fig. 2A,B; Table 1).
To control for differences in visual and auditory stimulation during movie watching, low-level audio and visual features of the movie were analyzed and used to create nuisance regressors for the fMRI time-series analysis (Bartels and Zeki, 2004). The movie was again segmented into 1.8 s time bins, corresponding to the volumes acquired during each TR. Each of the segments was used to define eight perceptual features of interest: power amplitude of the auditory signal collapsed across frequencies; power amplitude of low, mid, and high spatial-frequency content; LAB space color features (luminance, red-green, and blue-yellow color-opponent channels); and optic flow (using The Computer Vision System Toolbox: http://www.mathworks.nl/products/computer-vision/). To create 8 control regressors, these features were estimated for each segment for the auditory track, whereas the visual properties were computed initially for each frame of the movie within the segment, and then averaged across the frames within each segment.
Postscan narrative-discrimination task.
To determine whether the participants were able to distinguish between the two narratives, following scanning, the participants were shown 68 3-s segments of the movie without sound, and were required to indicate whether the clip was from Narrative 1 or 2. Participants had a maximum of 10 s to respond by pressing buttons 1 or 2 on the keyboard, which were labeled with small pictures of Helen 1 and Helen 2 to prevent response-button confusion. There were 34 clips per story, 12 of which featured Helen before she changed her hair color after the narrative split. These clips were included to ensure that the participants could actually recall the event content, rather than classifying the events based on the salient visual difference between the two versions of the main character. This task was also executed using Presentation software (Version 16.4, www.neurobs.com).
Data analysis: image preprocessing.
MRI data were preprocessed using Automatic Analysis (Cusack et al., 2015), which combines tools from SPM8 (http://www.fil.ion.ucl.ac.uk/spm/software/spm8/), FreeSurfer (http://surfer.nmr.mgh.harvard.edu/), and the FMRIB Software Library v5.0 (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/), complemented by custom scripts. Structural images were bias-corrected and denoised using an adaptive optimized nonlocal means filter (Coupe et al., 2008). We used the SPM8 iterative functional-image realignment and unwarping procedure to estimate movement parameters (three for rotation and three for translation) and corrected the images with respect to gradient-field inhomogeneities caused by motion (Andersson et al., 2001). A robust mean and SD were calculated for each participant's sum of absolute rotation and translation parameters across scans (rotation: mean = 0.03°, SD = 0.16°; translation: mean = 0.03 mm, SD = 0.01 mm). Four participants with parameters exceeding 3 SDs during either scanning run were excluded from further analysis (average rotation score = 0.08°, average translation score = 0.08 mm) (Power et al., 2012). A spike detection algorithm was used to detect signal spike events, which were later modeled as nuisance variables (Power et al., 2012). The structural image and mean EPI were coregistered to the SPM T1 and EPI templates, respectively. The mean functional EPI was then coregistered to the structural image. Coregistration parameters of the mean EPI were subsequently applied to all functional images. The FSL brain extraction toolbox was used to create a skull-stripped structural image. This image was segmented into gray matter, white matter, and CSF as implemented in SPM8 (Ashburner and Friston, 2005). First-level model also included mean intensity values for white matter and CSF tissue classes were computed at each time point as nuisance regressors (Verhagen et al., 2008), along with the six movement parameters.
For the Representational Similarity Analysis (RSA, see below), the images were not preprocessed further, but output statistics were normalized to the MNI space using normalization parameters estimated through the unified segmentation procedure as implemented by SPM8 and smoothed with a 3D Gaussian kernel of 6 mm3 FWHM. For the univariate control analyses (see below), the functional images were normalized to the MNI template using normalization parameters estimated through the unified segmentation procedures, as implemented in SPM8, and smoothed using a 6 mm3 FWHM 3D Gaussian kernel. In all second-level RSA statistical analyses, we used nonparametric permutation tests, corrected for multiple comparisons using threshold-free cluster enhancement method, and applied a corrected statistical threshold of p < 0.05 on the cluster level (Smith and Nichols, 2009). For second level univariate analyses, we used a FWE-corrected threshold of p < 0.05. Results are reported in MNI space. In the figures, results are presented at the above-mentioned thresholds.
Representational similarity analysis.
We were primarily interested in whether, based on the patterns of voxelwise activity, we could differentiate between neural representations of the two Narratives, and whether the representations within these regions diverged over time. Additionally, we were interested whether, similarly to previous studies, we could differentiate between individual items within the two categories of interest: Characters and Locations.
To examine these questions, we used RSA to analyze the multivoxel pattern of neural activity (Kriegeskorte et al., 2008) and applied a roving-searchlight approach on our whole-brain data. To this end, we examined the Pearson's correlation coefficients between patterns of activity of gray-matter voxels within spherical regions of interest, or searchlights, throughout the whole-brain volume. Correlation coefficients for odd-trial and even-trial regressors were reduced in the following way: Estimates for within-item similarity were computed as correlations between a regressor modeling odd and a regressor modeling even trials for that item, while eliminating autocorrelations corresponding to the main diagonal. Estimates for across-item correlations were averaged across correlations for odd and even trials for each pair of interest. These correlation coefficients were then normalized using a Fisher Z transform.
The first analysis step involved constructing a GLM for the fMRI time series. We constructed a model that included two separate regressors (corresponding to odd and even trials) per scanning run for each of the 25 event types described above (two narratives, eight main characters, 15 locations, see Table 1; Fig. 2A,B), in each scanning run separately. Additionally, each run included regressors modeling all the remaining tagging categories (both narratives, presplit narrative, nine less frequently visited locations and scenes with people who were not the main protagonists of the movie) but those regressors were not split into odd and even trials. All these regressors were convolved with the canonical HRF, producing a modeled time course of neural activity. For each run, our model also included the following nuisance regressors: perceptual regressors modeling auditory and visual features (auditory amplitude, FFT amplitude and angle, the three LAB color space dimensions, power at high, medium, and low spatial frequencies) of the movie, six motion regressors, two regressors for the mean signal intensity in the CSF and white matter (Verhagen et al., 2008), and one regressor for each signal spike (see Preprocessing). A high-pass filter (128 s cutoff) was applied to remove low-frequency signal drifts. Perceptual regressors were also convolved with the canonical HRF, whereas the other nuisance regressors were not. The 25 pairs of event regressors per run, corresponding to odd and even trials, were considered as the main regressors of interest.
For the follow-up question of whether the Narrative representations diverged over time, we modeled the time series data on the first level using a similar model as in the main analysis, where the main difference was that only Narrative regressors were split into odd and even trials, and were further split into eight time bins (four per scanning run, where each scanning run was modeled using a separate GLM). As a control, we also examined whether Character representations diverged over time, but due to low number of volumes for the less prominent characters, we only used the four main characters (two Helens, James, and Gerry) in the analysis. We also only used six time bins (three per run) due to low number of volumes for one of those four characters partway through the second half of the movie.
Here we modeled the time series data on the first level using a similar model as in the Narrative divergence analysis, where the main difference was that instead of Narrative regressors, only four of the Character regressors (Helen 1, Helen 2, Gerry, and James) were split into odd and even trials, and then further split into six time bins.
Voxelwise β estimates resulting from these regressors of interest were used for the subsequent searchlight RSA.
In the second analysis step, we investigated the degree of correlation between across-voxel patterns of activity within searchlights measuring 9 voxels in diameter. The searchlights were restricted to gray matter voxels only. For each searchlight in the main analysis, we computed a 25 × 25 correlation matrix per scanning run. The main diagonal of this correlation matrix corresponded to the correlations between odd and even trials for each of the 25 event types per run (two narratives, eight characters, 15 locations), excluding the true autocorrelated main diagonal (which would always be equal to 1). The off-diagonal cells corresponded to correlations between β estimates collapsed across odd and even trials for the 25 event types per run. For each searchlight for the analyses exploring the divergence over time of Narrative and Character representations, we computed two 8 × 8 (for Narratives) or 12 × 12 (for Characters) correlation matrices, one per scanning run, corresponding to correlations between β estimates for each of the two Narratives or four Characters in each of the time bins, reduced across the odd and even trials as described above.
In the third, and final, step of the analysis, these correlation coefficients were initially normalized using a Fisher Z transform, and then used as the dependent variable in a GLM where the contrast matrices of interest (for the main analysis of interest, see Fig. 2C; for control analyses, see the contrasts presented in Fig. 2D–G) were used as predictors. The β estimates were then averaged across the two runs and saved in the center voxel of the searchlight sphere, resulting in a single image per contrast of interest, per participant. The main contrasts of interest were within-across (or same > different) contrasts, computed to compare correlations within-item (correlations between odd and even trials) to across-item correlations (collapsed across odd and even trials) for each category separately (Narratives, Characters, and Locations). The contrast matrices can be seen in Figure 2C. Thus, we computed three contrast images of interest per participant, corresponding to within- to across-correlations for Narratives, Character, and Locations. These contrasts of interest were then normalized to MNI space and smoothed using a 6 mm3 FWHM Gaussian 3D kernel, and compared across-subjects to zero using one-sample t test analysis using permutation testing at second level, as implemented by FSL (Smith and Nichols, 2009). To examine the effects of time on divergence of Narrative and Character representations, we performed the same analysis logic on each of the time bins in each of the scanning runs separately.
Temporal co-occurrence control analyses.
We defined temporal co-occurrence as the proportion of time that one item (Narrative, Character, or Location) appeared, given the appearance of another character (P(A|B) in Bayes' theorem). This can alternatively be viewed as the likelihood of each item (Narrative, Character, or Location), co-occurring in a volume with any of the other items. This matrix is asymmetrical since P(A|B) ≠ P(B|A) (e.g., the likelihood that Helen 2 will appear in a scene with James is high, but the likelihood that James will be in a scene in which Helen 2 is present is lower). Importantly, we took an average of the two to create a symmetrical matrix representing a distance measure between each pair of items (see Fig. 2D).
We used temporal co-occurrence of Narratives, Characters, and Locations for two main purposes. First, we wished to determine which Characters and Locations were predominantly in Narrative 1 and which were predominantly in Narrative 2. And second, we created co-occurrence contrast matrices used to examine whether effects reported in the main analyses were in part driven by co-occurrence of narrative elements (Characters and Locations).
The Narrative-Character co-occurrence pattern suggested that four of the Characters were deemed to be predominantly featured in Narrative 1, whereas the other four Characters were deemed to be predominantly featured in Narrative 2. Similarly, the Narrative-Location co-occurrence pattern suggested that six locations were featured predominantly in Narrative 1, another six locations were featured predominantly in Narrative 2, two Locations were very similarly represented in both Narratives, and one Location only appeared presplit (for details, see Table 1). We used these Character-Narrative and Location-Narrative groupings to create contrast matrices for two control analyses (described below) designed to disentangle the contribution of Narratives, Characters, and Locations in the main analyses. However, although we were able to define two groupings of characters and locations, all characters appeared in both story lines but the relative probability that they appeared in one or the other narrative was lower (12%–35%) or higher (56%–84%). This was also the case for all but one of the 12 locations (6%–29% for lower and 61%–88% for higher). All volumes that belonged to Narrative 1 or Narrative 2 also contained either Locations or Characters that were unevenly represented in the two narratives, and it would not have been possible to perform a narrative differentiation analysis on only those volumes that contained elements that were equally represented in both Narratives.
The first set of contrast matrices, presented in Figure 2E, were designed to compare each Character (or Location) only to those Characters (or Locations) who (or which) appeared mainly in the same Narrative as the Character (or Location) in question. We refer to these contrasts as “Within-Narrative Character/Location” contrast. The aim of this analysis was to reduce the contribution of across-Narrative effect on the effects related to Character or Location differentiation. The results of these control analyses indicate that Character and Location effects remain the same even if the same-different comparison is defined only within each Narrative, suggesting that the Character and Location differentiation is not driving the Narrative differentiation effect (for summary statistics of these analyses, see Table 2).
The second set of contrast matrices, presented in Figure 2F were designed to compare the representations between Characters (or Locations) featured in one Narrative to the Characters (or Locations) featured in the other Narrative, excluding the main diagonal corresponding to within-Character (or Location) representations. We refer to these contrasts as “Across-narrative Character/Location” contrasts. The aim of this analysis was to determine whether the effects reported in response to Narrative differentiation could indeed be attributed to differentiation of “Character-networks” or (“Location-networks”) developed in each Narrative. The results of these analyses were restricted to the precuneus, basal ganglia, and thalamus for the Characters and inferior frontal gyrus for Locations, and crucially no hippocampal involvement was observed even at more liberal thresholds (p < 0.001; for details, see Table 2).
We also used Character-Character co-occurrences and Location-Character co-occurrences as predictor contrast matrices to determine whether the degree of co-occurrence resulted in greater similarity of representations for Character-Location pairs or Character-pairs (see Fig. 2C). No significant increases in similarity associated with co-occurrence were observed in the hippocampus, even at liberal threshold of p < 0.001. Character-Locations co-occurrence modulated similarity around occipitotemporal region bilaterally, whereas Character-Character co-occurrence modulated similarity in the right fusiform gyrus (Table 2).
Cross-narrative character differentiation control analysis.
To test whether representations of characters were indeed independent of the narrative, we performed a cross-narrative character differentiation analysis. Here we recoded all instances of Helen (Narrative 1 Brunette and Narrative 2 Blonde) and modeled them by a single regressor, which is comparable to all the other characters. We then split each of the seven characters into Narrative 1, Narrative 2, and “Other” regressors. We therefore had three regressors per character. Character events that belonged to Narrative 1 or Narrative 2 were then split into regressors modeling even and odd trials, whereas other character regressors, which were not part of Narratives 1 or 2, were modeled with a single regressor. We used Character-Narrative regressors to examine whether hippocampal activity patterns can differentiate between Characters across stories, by contrasting the similarity of each character with their other-story equivalent to the similarity between each character to all the other characters in the other story. We were unable to perform an equivalent analysis for Locations because there were too few events (<25 volumes) for almost half of the locations, and thus not enough power to complete this analysis. Results of this analysis are reported below.
Univariate control analyses: Auditory and visual-feature analysis.
Because film stimulus material is largely uncontrolled, the low-level audio and visual features of the movie were analyzed to create regressors that would account for them (compare Bartels and Zeki, 2004). The procedure for obtaining low-level audio and visual features has been described in detail above. In short, the film was divided into TR length (1.8 s) segments; and for each segment, we extracted the audio signal and movie frames, which were then used to compute auditory power amplitude, amplitudes of LAB colorspace features (luminance, red-green, and blue-yellow opponent channels), image power-amplitudes at lowpass, midpass, and highpass spatial frequencies, and optic flow. These eight Perceptual regressors have been modeled in first-level models used for the RSA. The resulting β estimates were collapsed across the two scanning runs and used in second-level analysis using one-sample t tests as implemented in SPM8. Figure 5 displays the results of these analyses for each of the perceptual regressors of interest. Another contrast was used to collapse across the eight regressors and two runs and was also analyzed using a one-sample t test on second level, with the resulting summary statistics presented in Table 3.
Univariate signal analysis.
To examine whether Narratives elicited different BOLD signal amplitudes, we performed a separate univariate analysis. Here we used the same first-level model as for the RSA, but the data were first normalized to MNI space and smoothed using a 6 mm3 isotropic 3D Gaussian kernel. We then computed contrast images corresponding to the difference between the two Narratives (Narrative 1 > Narrative 2 and Narrative 2 > Narrative 1) and corresponding to average activation in response to any of the main Characters (all Characters > implicit baseline). We did not examine overall activation in response to Locations since all movie scenes appeared in a spatial context. The results are reported in Table 3.
Results
Behavioral results
After the participants watched the movie Sliding Doors, which consisted of two interleaved narratives, they performed a narrative-discrimination task. The results of this task showed that participants were equally proficient at identifying which narrative each event belonged to (mean accuracy: 84.8%, SE: 0.32; mean reaction times: 2366 ms, SE: 168 ms) with no significant differences in either accuracy (i.e., percentage correct; t(18) = −0.40, p = 0.691) or reaction times (t(18) = −0.69, p = 0.502) between the two narratives. This was the case regardless of whether the main character “Helen” (who has different physical appearance in the two narratives) was present in the clips (segments with Helen: accuracy: t(18) = −0.56, p = 0.582; reaction time: t(18) = 0.55, p = 0.589; segments without Helen: accuracy: t(18) = −0.11, p = 0.91; reaction time: t(18) = −0.95, p = 0.36). There was also no significant difference between accuracy for clips with or without Helen (t(18) = −0.26, p = 0.799). We have also performed a 2 × 2 repeated-measures ANOVA with Helen presence and Narrative as factors for response accuracy. Neither main effect nor the interaction between them reached significance (all F(1,18) < 1). The results indicate that participants formed reliable narrative representations in memory and that the differentiation between the narratives did not necessarily rely on Helen's character. Furthermore, we investigated how these narratives are represented in the brain and how they change over time.
Neuroimaging results: nodal event representation of characters and locations
In our task-free movie-watching paradigm, we identified the brain regions that differentiated between exemplars within specific stimulus categories: Characters and Locations. We used RSA, which relies on correlations of across-voxel activation patterns as a proxy of neural similarity. We compared the correlations between odd and even trials for (1) each of the eight main Characters to all correlations between different Characters, and (2) each of the 15 main Locations to all correlations between different Locations (see Materials and Methods; Table 1; for details on event tagging, Fig. 1; for contrast matrices for these two analyses, see Fig. 2). We found greater same > different Character correlations (Fig. 3A; Table 4) as well as greater same > different Location correlations (Fig. 3B; Table 4) in the hippocampus, bilaterally. These effects were unlikely to be related to potential temporal confounds, such as closer temporal proximity of trials with the same individual, as temporal co-occurrence between Characters and between Character-Location pairs did not show modulation of neural pattern similarity in these regions (Fig. 2D; see Materials and Methods; no significant effects in hippocampus with liberal threshold of p < 0.001 uncorrected). In addition, regions along the ventral stream, extending from V1 to fusiform gyrus, as well as thalamus were sensitive to Characters (Fig. 3A) and Locations (Fig. 3B). Additionally, the parahippocampal cortex was sensitive to Locations (Fig. 3B), whereas basal ganglia were sensitive to Characters (Fig. 3B). Although we note differences in the distribution of the effects, it is likely that the regions that code for individual narrative elements are closely overlapping as no significant differences were observed between the strength of same > different contrast for Characters compared with Locations in any of these regions, except superior occipital gyrus (x = 18, y = −85, z = 23, p < 0.05 corrected).
Narrative context is represented in the hippocampus
Next, we wanted to determine whether, akin to how spatial contexts are represented by distinct neural patterns of hippocampal activity (Lever et al., 2002; Steemers et al, 2016), narrative contexts could also be distinguished on the basis of hippocampal activity patterns. To test this hypothesis, we compared correlations between odd and even trials for each of the two Narratives to all correlations between the two Narratives. We found that within-Narrative correlations were indeed higher than across-Narrative correlations in the right hippocampus (Fig. 4A; Table 4). The Narrative effect in the hippocampus is unlikely to reflect differences in the univariate signal between the two storylines (see Table 3) as no differences between stories were observed in the hippocampus at a liberal threshold of p < 0.001, uncorrected (for regions that did show differences at a threshold of p < 0.05 corrected FWE, see Table 3). Although all volumes contained either locations or characters that were predominantly featured in one or the other narrative, the narrative differentiation effect is unlikely to be solely driven by either Location or Character differentiation because there were no differences in hippocampal pattern similarity using the Across-Narrative-Character-Networks or Across-Narrative-Location-Networks contrasts, and no differences were observed in this region even when only the two versions of Helen were compared (no significant effects in hippocampus with liberal threshold of p < 0.001, uncorrected). We did, however, observe significantly higher same > different character similarity in the hippocampus at a more liberal threshold of p < 0.001 uncorrected (x = −20, y = −34, z = 0, Z-score = 3.4, p < 0.001 uncorrected), when we compared each character with his or her counterpart from the other narrative, where the “different” characters were also from the other narrative. This effect held also when Helen was not included in the analysis (x = −18, y = −34, z = 0, Z-score = 3.34, p < 0.001 uncorrected). These results at a liberal threshold might suggest that Character representations are independent of narrative representations. We did not have enough power to compute an equivalent analysis for locations as seven of 15 locations had too few volumes (<25) in one or the other story.
Hippocampal narrative-context representations diverge over time
Finally, we asked whether, similar to diverging spatial context representations (Lever et al., 2002; Steemers et al, 2016), distinct narrative-context representations diverged gradually over time in the hippocampus. To answer this question, we segmented the time series into eight time bins, with equal number of volumes belonging to each time bin, within each run. For each of these segments separately, we computed same > different Narrative contrasts. The resulting parameter estimates were then extracted for the peak searchlight identified in the previous analysis, and analyzed using a one-way repeated-measures ANOVA (Fig. 4B). A significant main effect of time was present (F(7,126) = 3.21, p = 0.012, Greenhouse-Geisser corrected) and was characterized by significant linear (F(1,18) = 8.52, p = 0.009) and quadratic trends (F(1,18) = 4.98, p = 0.039). As a post hoc test, we compared the narrative differentiation effect to zero using a series of one-sample t tests. Only the last time bin showed significant narrative-differentiation effect (T8: t(18) = 4.03, p = 0.001), whereas second to last time bin approached significance (T7: t(18) = 2.49, p = 0.023, critical p = 0.00625 with Bonferroni correction). These results indicate that narrative-context representations diverge gradually over time with hippocampal representations becoming significantly distinguishable only after over an hour of movie viewing. To exclude the possibility that the observed effect is a simple time confound, we performed the same analysis for the fusiform gyrus and the middle occipital gyrus (i.e., the other regions distinguishing Narratives, see Table 4; for more information, see Fig. 4C). Both of these control regions showed an early differentiation between contexts with effects decreasing over time (fusiform gyrus: F(7,126) = 3.23, p = 0.010; middle occipital gyrus: F(7,126) = 3.77, p = 0.007, Greenhouse-Geisser corrected). This pattern stands in contrast to the representational contents in the hippocampus where the dissimilarities between the narratives were small early in the film, and increased over time. We also examined whether comparable effects were evident for Characters. Only four Characters (two Helens, Gerry, and James) were featured frequently enough to attempt this analysis. However, due to the time when the Characters appear in the movie, we needed to reduce the number of time bins from eight (used in Narrative divergence analysis) to six for this Character divergence analysis. We extracted parameter estimates for the same > different Character contrasts for each of the time bins from left and right hippocampal (left: x, y, z = −24, −26, −4; right: x, y, z = 30, −38, −6) searchlights, which showed greatest Character effect in the main analysis. We collapsed across the left and the right hippocampus and performed a one-way, repeated-measures ANOVA with time as the within-subjects factor. Results indicate that there is no linear increase in Character differentiation over the course of the movie. Instead, there is a decrease in Character differentiation (F(5,90) = 4.16, p = 0.004), which consists of linear (F(1,18) = 8.27, p = 0.01) and cubic (F(1,18) = 4.94, p = 0.039) trends. Post hoc one-sample t tests indicated that the effects approached significance only in the second and third time bins (T2: t(18) = 2.89, p = 0.01; T3: 2.31, p = 0.033; critical p = 0.008 with Bonferroni correction).The character differentiation effect in the hippocampus, therefore, resembles the narrative differentiation effect in visual regions. We also controlled for effects driven by visual features (Fig. 5; see Materials and Methods). These results indicate that gradual divergence over time is restricted to hippocampal Narrative representations.
Discussion
We hypothesized that a general narrative context may be used to organize our memories into networks of related events. Here, participants watched a movie (as per Hasson et al., 2008), which consisted of two interleaved narratives. Movies are ideally suited to stimulate neural mechanisms underlying episodic memory formation and segregation into narrative-specific contexts because they consist of multiple individual events bound together into a narrative. The events themselves are, in turn, comprised of individual elements that are repeated across different events. We hypothesized that watching a movie with two interleaved narratives will give rise to partially overlapping networks of narrative-specific event representations as well as nodal representations of individual elements, such as characters and locations, which are repeated across events. We combined this paradigm with fMRI and across-voxel correlations to investigate emergence of narrative-specific representations over time.
Our results indicate that hippocampal neural activity patterns can be used to differentiate between specific locations and characters in the movie, which is in accordance with the role of the hippocampus in binding items to context (Davachi, 2006; Eichenbaum et al., 2007; Milivojevic and Doeller, 2013). However, our results on character-specific representations may not appear entirely consistent with studies which suggest that the hippocampus is not sensitive to individual items (e.g., Copara et al., 2014; Libby et al., 2014; Hsieh et al., 2014; but see Schlichting et al., 2015; Aly and Turk-Browne, 2016). What could account for such differences? First, there is a large difference between the stimulus material used in the current study, and previous studies. Here, we used dynamic stimuli where each “item” was presented multiple times in different spatiotemporal contexts, and these multiple presentations may have given rise to abstraction of the items from the contexts. In contrast, the studies cited above used static images that were either decontextualized or repeated within the same or different contexts. We propose that our findings may reflect hierarchical organization within memory (McKenzie et al., 2014; Collin et al., 2015), whereby items that appear within individual events may create item-specific “context” in the form of connected networks of events, with the specific items (characters and locations) resembling nodes of such mnemonic networks, which can be spatial as well as nonspatial in nature (Eichenbaum et al., 1999; Eichenbaum and Cohen, 2001; Kumaran and Maguire, 2006). Thus, activity patterns in the hippocampus seem to represent the “essence” of an item in memory, abstracted, in a sense, from individual events, and generalized across different events in which that item occurs (Kumaran and Maguire, 2006), instead of representing that item in a particular context only. The nodal representations may serve as a form of a context as well. That may certainly be the case with spatial-like context for locations, but perhaps characters may also serve as a context within a broader mnemonic hierarchy (Milivojevic and Doeller, 2013).
Here we also showed that events that belonged to the same narrative are represented with similar activity patterns in the hippocampus. These results suggest that the hippocampus also codes for those aspects of individual events that are in common with other events belonging to the same narrative, and may well reflect networks of events related through individual characters or locations that are more prevalent in one or the other narrative. Thus, the hippocampus codes for the underlying narrative context. These results dovetail with the findings that the hippocampus is involved in contextual representations in both spatial (O'Keefe and Nadel, 1978; Moser et al., 2008; Libby et al., 2014) and temporal (Howard and Kahana, 2002; Copara et al., 2014; Eichenbaum, 2014; Ezzyat and Davachi, 2014; Hsieh et al., 2014) domains. If narrative context indeed reflects networks of events related through common memory features (such as characters and locations) (Eichenbaum et al., 1999; Eichenbaum and Cohen, 2001), then these contextual representations have more in common with spatial contextual representations or spatial maps (O'Keefe and Nadel, 1978) because both arise as a consequence of multiple individual experiences (i.e., events) with particular elements, such as locations or characters (Buzsáki and Moser, 2013). In contrast, temporal context depends on the timing of a unique event and, as such, does not depend on multiple experiences.
We propose that this type of narrative-based contextual representation may serve to organize episodic memories into networks of related events, unrestricted by space or time, and may be the neural mechanism underlying autobiographical narrative construction (Conway and Pleydell-Pearce, 2000; Schacter et al., 2007; Spreng et al., 2009; Milivojevic et al., 2015). The data suggest that the patterns of hippocampal neural activity can differentiate between temporally proximal events that belong to separate narrative contexts, which is consistent with the observation that hippocampal neural patterns can remap when an animal moves from one spatial context to another (Muller and Kubie, 1987; Wills et al., 2005; Colgin et al., 2008). This similarity between context-dependent remapping in the spatial domain and the observed narrative effect may reflect common coding of spatial and episodic memories in the hippocampus (Eichenbaum and Cohen, 2014) and serve as the neural mechanism underlying narrative divergence in the hippocampus.
With time, these narrative-level contextual representations become increasingly more dissimilar in the hippocampus. This pattern stands in contrast to the representational contents in visually responsive regions where the dissimilarities between the narratives were greater early in the film and decreased over time. This pattern is also in contrast to the hippocampus-based differentiation of Characters, further supporting the notion that Characters and Narrative effects are independent of each other. One possibility is that the increased pattern similarity reflects participants' need to retrieve mnemonic representations during movie watching, which enables them to “place” the current event into one or the other narrative. Another possibility is that the narrative divergence in the hippocampus resembles gradual divergence of spatial contextual representations in the hippocampus over time (Lever et al., 2002). Namely, with increased exposure to spatial contexts, initially similar place-cell firing patterns for visually dissimilar environments gradually diverge over time (Lever et al., 2002). Such context-dependent remapping mechanisms in rodents are thought to arise as a consequence of pattern separation processes in the hippocampus (McNaughton and Morris, 1987; Muller and Kubie, 1987; Bakker et al., 2008; Yassa and Stark, 2011; Duncan et al., 2012; Deuker et al., 2014), which are also thought to play an important role in separation of unique episodic memories in humans (Chadwick et al., 2010).
These two explanations are not necessarily mutually exclusive. The separation of narrative-contextual representations may also rely on pattern-separation mechanisms (McNaughton and Morris, 1987; Yassa and Stark, 2011) in the hippocampus, to gradually disambiguate initially undifferentiated contextual representations as mnemonic representations of events in the two storylines are encoded into two separate contexts. In contrast, visual dissimilarities, which may be important for narrative differentiation before narrative contexts are established in memory, may cease to be relevant with a stronger hippocampal representations of the narratives.
It should also be noted that we consider the Lever et al. (2002) study as an analogy for the narrative remapping process. There are clearly many differences between their experimental protocol and ours. Narrative context compared with spatial context is one, but also there are bound to be differences between contextual differentiation in humans compared with rodents. It is conceivable that episodic memory processes are simply faster, but also time differences in place cell remapping can depend on many factors (e.g., differences between the environments, shape only, or shape and odor), which may lead to faster remapping (compare Wills et al., 2005). Furthermore, we also use a movie, which represents a much longer temporal entity, which has been edited (the way one would “edit” personally experienced events into a personal narrative). It is therefore possible that the narratives that are “prepared” for us by authors or directors also hyperstimulate the mechanisms that are normally used to establish distinct neural representations of narrative contexts.
In conclusion, we reveal two distinct types of hippocampal representations. We showed that the hippocampus codes for nodal representations (Eichenbaum et al., 1999) where activity patterns represent the “essence” of an item in memory, which is common across different events featuring that item. We also showed that, in addition to item-specific nodal representations within a narrative, the hippocampus also codes for the entire narrative, which may reflect networks of events related through the nodal representations, which feature with relatively different frequencies in each of the two narratives. In combination, the evidence of both item-specific and narrative-specific representations in the human hippocampus suggests that human episodic memories may be subject to hierarchical organization (Kumaran et al., 2012; McKenzie et al., 2014) and may answer the outstanding question of how the brain can simultaneously support seemingly conflicting operations of individuating and flexible recombining of memories (Eichenbaum et al., 1999; Collin et al., 2015; Milivojevic et al., 2015). Conceptually, narrative-level representation is similar to other forms of contextual representations, such as temporal and spatial contexts, and similarly to spatial contextual representations, the representations of different narrative contexts diverge over time. The neural mechanisms, which subserve narrative context formation shown here, may be involved in organization of memories of related autobiographical events into personal narratives (Conway and Pleydell-Pearce, 2000; Schacter et al., 2007; Spreng et al., 2009).
Footnotes
This work was supported by European Research Council ERC-StG RECONTEXT 261177 and The Netherlands Organisation for Scientific Research NWO-Vidi 452-12-009. We thank Sander Bosch for comments on the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to either Dr. Branka Milivojevic or Dr. Christian F. Doeller, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Kapittelweg 29, 6525 EN Nijmegen, The Netherlands. b.milivojevic{at}donders.ru.nl or christian.doeller{at}donders.ru.nl