Confronting the elephant in the room – 3 verbal paired associates and the hippocampus

Abstract


Introduction
The field of hippocampal neuroscience is characterized by vigorous debates.But one point on which there is wide agreement is that people with bilateral hippocampal damage and concomitant amnesia (hippocampal amnesia) are significantly impaired on verbal paired associates (VPA) tasks.For example, in tests like the widely-used Wechsler Memory Scale (WMS-IV; Wechsler, 2009) the requirement is to encode pairs of words (e.g., bag-truck), memory for which is then tested.The reliable deficit observed in hippocampal amnesia means the VPA has become emblematic of hippocampal function.
One theory explains the VPA findings by positing that the hippocampus binds arbitrary relations among individual elements (Cohen and Eichenbaum, 1993).However, this associative perspective is at odds with other accounts.The cognitive map theory, for instance, suggests that the hippocampus specifically supports flexible, allocentric representations of spatial relationships (O' Keefe and Nadel, 1978).While the scene construction theory (see also the emergent memory account; Graham et al., 2010) proposes that the anterior hippocampus constructs models of the world in the form of spatially coherent scenes (Hassabis and Maguire, 2007;Maguire and Mullally, 2013;Zeidman and Maguire, 2016).These latter two theories do not explain why learning of VPA is invariably compromised following hippocampal damage.Indeed, for decades, this has been the elephant in the room of hippocampal theories with a visuospatial bias.
Resolving the tension among hippocampal theories concerning the VPA could be important for leveraging a fuller understanding of hippocampal function.In taking this issue forwards, it is worthwhile first to step back.Examination of the words used in typical VPA tests shows the vast majority are highly imageable.As it stands, therefore, when using VPA tests, associative processes and imageability are conflated.One way to deal with this is to examine nonimageable abstract word pairs, which would assess binding without imageability, but these rarely feature in VPA tests used with patients or in neuroimaging.In addition, different types of imageable words are not distinguished in VPA tests.However, the scene construction theory links the anterior hippocampus specifically with scene imagery (Zeidman and Maguire, 2016;Dalton and Maguire, 2017), while the processing of single objects is usually associated with perirhinal and lateral occipital cortices (Malach et al., 1995;Murray et al., 2007).It could therefore be that a scene word in a pair engages the hippocampus and not binding per se.It has also been suggested that even where each word in a pair denotes an object, this might elicit imagery of both objects together in a scene and this is what recruits the hippocampus (Maguire and Mullally, 2013;Clark and Maguire, 2016).
To determine why VPA engages the hippocampus, we devised an fMRI encoding task with three types of word pairs: where both words in a pair denoted Scenes, where both words represented single Objects, and where both words were non-imageable Abstract words.This allowed us to separate imageability from binding, and to examine different types of imagery.
Memory tests after scanning meant that we could also consider the effect of encoding success.
Given that people vary in their use of mental imagery (Marks, 1973;Kosslyn et al., 1984;McAvinue and Robertson, 2007), we also tested groups of high, mid and low imagery users to assess whether this affected hippocampal engagement during VPA encoding.
We hypothesised that anterior hippocampal activity elicited during word pair encoding would be apparent for Scene words pairs and Object word pairs compared to Abstract word pairs.This would be best explained by the use of scene imagery, even for Object word pairs, and the effect would be most apparent in high imagery users.Furthermore, we predicted that neither associative processing nor memory performance would account for the patterns of hippocampal activity observed.As such, our main anatomical focus was the hippocampus, and of particular interest were the Object word pairs and their relationship with scene imagery.

Participants
Forty five participants took part in the fMRI study.All were healthy, right-handed, proficient in English and had normal or corrected to normal vision.Each participant gave written informed consent.The study was approved by the University College London Research Ethics Committee.Participants were recruited on the basis of their scores on the Vividness of Visual Imagery Questionnaire (VVIQ; Marks, 1973).The VVIQ is a widely-used self-report questionnaire which asks participants to bring images to mind and rate them on a 5 point scale as to their vividness (anchored at 1: "perfectly clear and as vivid as normal vision", and 5: "No image at all, you only 'know' that you are thinking of the object").Therefore, a high score on the VVIQ corresponds to low use of visual imagery.We required three groups for our fMRI study (n=15 in each), low imagery users, mid imagery users and high imagery users.Initially, 184 people completed the VVIQ.Fifteen of the highest and 15 of the lowest scorers made up the low and high imagery groups.A further 15 mid scorers served as the mid imagery group.
The groups did not differ significantly on age, gender, years of education and general intellect.
Table 1 provides details of the three groups.Means (standard deviations).Two-tailed p-values for t-tests (χ 2 test for the number of males).General intellect was measured using the Matrix Reasoning subtest (scaled scores) of the Wechsler Adult Intelligence Scale-IV (WAIS-IV; Wechsler, 2008) and the Test of Premorbid Function (TOPF; Wechsler, 2011) providing an estimate of Full Scale IQ (FSIQ) and a Verbal Comprehension Index (VCI).VVIQ=Vividness of Visual Imagery Questionnaire.

Stimuli
To ensure that any fMRI differences were due to our imagery manipulation and not other word properties, the word conditions were highly matched.Six hundred and fifty four words were required for the study -218 Scene words, 218 Object words and 218 Abstract words.Words were initially sourced from databases created by Brysbaert and colleagues, which provided ratings for concreteness, word frequency, age of acquisition, valence and arousal (Kuperman et al., 2012;Warriner et al., 2013;Brysbaert et al., 2014;van Heuven et al., 2014).It was important to control for valence and arousal given reports of higher emotional ratings for abstract words, which could influence fMRI activity (Kousta et al., 2011;Vigliocco et al., 2014).We also used data from the English Lexicon project (Balota et al., 2007) to provide lexical information about each wordword length, number of phonemes, number of syllables, number of orthographic neighbours and number of phonological and phonographic neighbours with and without homophones.
To verify that each word induced the expected imagery (i.e., scene imagery, object imagery or very little/no imagery for the abstract words), we collected two further ratings for each word.First, a rating of imageability to ensure that Scene and Object words were not only concrete but also highly imageable (although concreteness and imageability are often interchanged, while they are highly related constructs, they are not the same; Paivio et al., 1968), and additionally that Abstract words were low on imageability.Second, a decision was elicited about the type of imagery the word brought to mind, i.e., was the imagery of a scene or an isolated object.These ratings were collected from 119 participants in total using Amazon Mechanical Turk's crowdsourcing website, following the procedures employed by Brysbaert and colleagues for the databases described above.Words were classified as a Scene or Object word when there was a minimum of 70% agreement on the type of imagery brought to mind, and the mean imageability rating was greater than 3.5 (out of 5).For Abstract words, the mean imageability had to be less than or equal to 2.
An overview of the word properties is shown in Table 2.This also includes summary comparison statistics.Scene, Object and Abstract words were matched on 13 out of the 16 measures.Scene and Object words were matched on all 16 measures, whereas Abstract words, as expected, were less concrete and less imageable than Scene and Object words and had a higher age of acquisition, as is normal for abstract words (Stadthagen-Gonzalez and Davis, 2006;Kuperman et al., 2012).As well as being matched at the overall word type level as shown on Table 2, within each word type, words were assigned to one of four lists (word pairs, single words, catch trials or post-scan memory test lures), and all lists were matched on all measures.Means (standard deviations).Two-tailed p-values for t-tests (χ 2 test for the number of positive words).Note that each comparison was assessed separately in order to provide a greater opportunity for any differences between conditions to be identified.
b From van Heuven et al. (2014).The Zipf scale is a standardised measure of word frequency using a logarithmic scale.Values go from 1 (low frequency words) to 6 (high frequency words).
e Positive words were those that had a valence score greater than or equal to 5.
f Hedonic valence is the distance from neutrality (i.e., from 5), regardless of being positive or negative, as per Vigliocco et al. (2014).
h Collected for the current study as detailed in the Materials and Methods.

Experimental design and task
The fMRI task consisted of two elements, the encoding task and catch trials.The latter were included to provide an active response element and to encourage concentration during the experiment.To match the WMS-IV Verbal Paired Associate Test (Wechsler, 2009), each stimulus was presented for 4 seconds.This was followed by a jittered baseline (a central fixation cross) for between 2 and 5 seconds which aided concentration by reducing the predictability of stimulus presentation (Figure 1D).The scanning session was split into four runs of approximately equal length.Trials were presented randomly for each participant with no restrictions on what could precede or follow each trial.
Unbeknownst to participants, there were six categories of stimuliimageable Scene words, imageable Object words and non-imageable Abstract words, shown either in pairs of the same word type (Figure 1A) or as single words (Figure 1B).To equalise visual presentation between the word pairs and the single words, the latter were presented with a random letter string that did not follow the rules of the English language and did not resemble real words (Figure 1B).
The average, minimum and maximum length of the letter strings was matched to the real words.
Letter strings could either be presented at the top or the bottom of the screen.There were 45 trials of each condition, with each word shown only once to the participant.Our prime interest was in the word pair conditions, and in particular the Object word pairs, as these related directly to our research question.The single word conditions were included for the purposes of specific analyses, which are detailed in the Results section.
Participants were asked to try and commit the words to memory for later memory tests, and were specifically instructed that they would be asked to recall the word pairs as pairs.No further instructions about how to memorise the stimuli were given (i.e., we did not tell participants to use any particular strategy).Participants were told that occasionally there would be catch trials where they had to indicate using a button press if they saw a real word presented with a 'pseudoword' (Figure 1C).A pseudoword is a combination of letters that resembles a real English word and follows the rules of the English language, but is not an actual real word.
Pseudowords were generated using the English Lexicon Project (Balota et al., 2007) and were paired with Scene, Object or Abstract words.They were presented at either the top or the bottom of the screen to ensure that participants attended to both.The number of letters and orthographic neighbours of the pseudowords were matched to all of the real word conditions and across the three pseudoword groups (all p's > 0.3).Additionally, across the pseudoword groups we matched the accuracy of pseudoword identification (all p's > 0.6) as reported in the English Lexicon Project (Balota et al., 2007).Forty eight catch trials were presented over the course of the experiment, 16 trials with each of the word types, ranging between 10 and 15 in each of the four runs.Catch trials were pseudo-randomly presented to ensure regular presentation but not in a predictable manner.Feedback was provided at the end of each run as to the number of correctly identified pseudowords and incorrectly identified real words.

Post-scan recognition memory tests
Following scanning, participants had two recognition memory tests.The first was an item recognition memory test for all 405 words presented during scanning (45 words for each of three single word types, and 90 words for each of three paired word types) and a further 201 foils (67 of each word type).Each word was presented on its own in the centre of the screen for up to 5 seconds.Words were presented randomly in a different order for each participant.
Participants had to indicate for each word whether they had seen it in the scanner (old) or not (new).Following this, they rated their confidence in their answer on a 3 point scalehigh confidence, low confidence or guessing.Any trials where a participant correctly responded "old" and then indicated they were guessing were excluded from subsequent analyses.
After the item memory test, memory for the pairs of words was examined.This associative memory test presented all of the 135 word pairs shown to participants in the scanner and an additional 66 lure pairs (22 of each type), one pair at a time, for up to 5 seconds.The word pairs were presented in a different random order for each participant.The lure pairs were constructed from the single words that were presented to the participants in the scanner.
Therefore, the participants had seen all of the words presented to them in the associative recognition memory test, but not all were previously in pairs, specifically testing whether the participants could remember the correct associations.Participants were asked to indicate whether they saw that exact word pair presented to them in the scanner (old) or not (new).They were explicitly told that some pairs would be constructed from the single words they had seen during scanning and not to make judgements solely on individual words, but to consider the pair itself.Confidence ratings were obtained in the same way as for the item memory test, and trials where a participant correctly responded "old" and then indicated they were guessing were excluded from subsequent analyses.

Debriefing
On completion of the memory tests, participants were asked about their strategies for memorising the words while they were in the scanner.At this point, the participants were told about the three different types of words presented to them -Scenes, Objects and Abstract.For each word type, and separately for single words and word pairs, participants were presented with reminders of the words, and were asked to choose from a list of options as to which strategy best reflected how they attempted to memorise that word type.Options included: "I had a visual image of a scene related to this type of single word" (scene imagery), "I had a visual image of a single entity (e.g. one specific object) for a word with no other background imagery" (object imagery), "I read each word without forming any visual imagery at all" (no imagery).

Statistical analyses of the behavioural data
Stimuli creation and participant group comparisons.Comparisons between word conditions, and between the participant groups, were performed using independent samples t-tests for continuous variables and chi squared tests for categorical variables.An alpha level of p > 0.05 was used to determine that the stimuli/groups were matched.Note that each comparison was assessed separately (using t-tests or chi squared tests) in order to provide a greater opportunity for any differences between conditions to be identified.
Main study.Both within and between participants designs were used.The majority of analyses followed a within-participants design, with all participants seeing all word conditions.
Additionally, participants were split into three groups dependent on their VVIQ score allowing for between-participants analyses to be performed.
All data were assessed for outliers, defined as values that were at least 2 standard deviations away from the mean.If an outlier was identified then the participant was removed from the analysis in question (and this is explicitly noted in the Results section).Memory performance for each word condition was compared to chance level (50%) using one sample t-tests.For all within-participants analyses, when comparing across three conditions, repeated measures ANOVAs with follow-up paired t-tests were employed, and for comparison across two conditions paired t-tests were utilised.For between-participants analyses a one-way ANOVA was performed with follow up independent samples t-tests.
All ANOVAs were subjected to Greenhouse-Geisser adjustment to the degrees of freedom if Mauchly's sphericity test identified that sphericity had been violated.For all statistical tests alpha was set at 0.05.Effect sizes are reported following significant results as Cohen's d for one sample and independent sample t-tests, Eta squared for repeated measures ANOVA and Cohen's d for repeated measures (drm) for paired samples t-tests (Lakens, 2013).All analyses were performed in IBM SPSS statistics v22.

Scanning parameters and data pre-processing
T2*-weighted echo planar images (EPI) were acquired using a 3T Siemens Trio scanner (Siemens Healthcare, Erlangen, Germany) with a 32-channel head coil.fMRI data were acquired over four scanning runs using scanning parameters optimised for reducing susceptibility-induced signal loss in the medial temporal lobe: 48 transversal slices angled at -30°, TR=3.36 s, TE=30 ms, resolution=3×3x3mm, matrix size=64x74, z-shim gradient moment of -0.4mT/m ms (Weiskopf et al., 2006).Fieldmaps were acquired with a standard manufacturer's double echo gradient echo field map sequence (short TE=10 ms, long TE=12.46ms, 64 axial slices with 2 mm thickness and 1 mm gap yielding whole brain coverage; in-plane resolution 3 x 3 mm).After the functional scans, a 3D MDEFT structural scan was obtained with 1mm isotropic resolution (Deichmann et al., 2004).
Preprocessing of data was performed using SPM12 (www.fil.ion.ucl.ac.uk/spm).Functional images were co-registered to the structural image, and then realigned and unwarped using field maps.The participant's structural image was segmented and spatially normalised to a standard EPI template in MNI space with a voxel size of 3x3x3mm and the normalisation parameters were then applied to the functional data.For the univariate analyses, the functional data were smoothed using an 8mm full-width-half-maximum Gaussian kernel.The multivariate analyses used unsmoothed data.
The multivariate analysis was performed on a region of interest (ROI) that encompassed the anterior hippocampus bilaterally.This was delineated using an anatomical mask that was defined in the coronal plane and went from the first slice where the hippocampus can be observed in its most anterior extent (see Dalton et al., 2017 for more details) until the final slice of the uncus.

fMRI analysis: univariate
The six experimental word conditions were Scene, Object and Abstract words, presented as either word pairs or single words.As noted above, our prime interest was in the word pair conditions, and in particular the Object word pairs, as these related directly to our research question.We therefore directly contrasted fMRI BOLD responses between these conditions.
The single word conditions were included for the purposes of specific analyses, which are detailed in the Results section.We performed two types of whole brain analysis, one using all of the trials and another using only trials where the items were subsequently remembered.
For both analyses, the GLM consisted of the word condition regressors convolved with the haemodynamic response function, in addition to participant-specific movement regressors and physiological noise regressors.The Artifact Detection Toolbox (http://www.nitrc.org/projects/artifact_detect/)was used to identify spikes in global brain activation and these were entered as a separate regressor.Participant-specific parameter estimates for each regressor of interest were calculated for each voxel.Second level random effects analyses were then performed using one sample t-tests on the parameter estimates.For comparison across VVIQ imagery groups, we performed an ANOVA with follow up independent sample t-tests.We report results at a peak-level threshold of p less than 0.001 whole-brain uncorrected for our a priori region of interestthe hippocampusand p less than 0.05 family-wise error (FWE) corrected elsewhere.

fMRI analysis: multivariate
Multivoxel pattern analysis was used to test whether the neural representations of the Object ANOVA and paired t-tests were used to compare the similarity between conditions at the group level.This multivariate analysis was first applied to the data from all participants, and then to the three subsets of participants (low, mid and high imagery users).All data were assessed for outliers, defined as values that were at least 2 standard deviations away from the group mean.
If an outlier was identified then the participant was removed from the analysis in question (and this is explicitly noted in the Results section).
Note that the absolute correlation of the similarity value is expected to be low due to inherent neural variability and the fact that unique set of words were presented for each scanning session.As such, the important measure is the comparison of the similarity value between the conditions, not the absolute similarity value of a single condition.The range of similarity values that we found are entirely consistent with those in other studies employing a similar representational similarity approach in a variety of learning, memory and navigation tasks in a wide range of brain regions (Staresina et al., 2012;Hsieh et al., 2014;Chadwick et al., 2015;Hsieh and Ranganath, 2015;Milivojevic et al., 2015;Bellmund et al., 2016;Deuker et al., 2016;Schapiro et al., 2016;Schuck et al., 2016;Kim et al., 2017).

Behavioural
On average, participants identified 85.56% (SD=11.52) of the pseudowords during catch trials, showing that they maintained concentration during the fMRI experiment.On the post-scan item memory test, Scene, Object and Abstract words were remembered above chance and there were no differences between the conditions (Table 3).Performance on the associative memory test also showed that Scene, Object and Abstract word pairs were remembered above chance (Table 4).Comparison of memory performance across the word types found differences in performance in line with the literature (Paivio, 1969).Both types of imageable word pairs (Scene and Object) were remembered better than Abstract word pairs (Figure 2; Table 4), while Object word pairs were remembered better than Scene word pairs.Overall these behavioural findings show that, despite the challenging nature of the experiment, with so many stimuli to encode, participants engaged with the task and committed a good deal of information to memory.

fMRI
Univariate analysis.We performed two types of whole brain analysis, one using all of the trials and another using only trials where the items were subsequently remembered in the post-scan memory tests (the item memory test for the single word trials, the associative memory test for the word pairs, excluding trials where participants correctly responded "old" and then indicated they were guessing).The two analyses yielded very similar results across the whole brain.
Given that our interest was in the point at which participants were encoding the information and potentially using mental imagery to do so, we present here the results of the analysis using all of the trials.For completeness, we also report the results of the analysis using just the remembered stimuli specifically for our brain region of primary interest, the hippocampus.
We first compared the imageable (Scene and Object) and non-imageable (Abstract) word pairs.All of the conditions involved associative memory, and so we reasoned that any differences we observed, particularly in hippocampal engagement, would be due to the imageability of the Scene and Object pairs.As predicted, Scene word pairs compared to Abstract word pairs elicited greater bilateral anterior hippocampal activity (Figure 3A, full details in Table 5A).Of note, increased activity was also observed in bilateral parahippocampal, bilateral fusiform, retrosplenial and left ventromedial prefrontal cortices (vmPFC).The analysis using only the remembered stimuli showed very similar results, including for the anterior hippocampus (Table 5A).The reverse contrast identified no hippocampal engagement, but rather greater activity in middle temporal cortex (-58, -36, -2, T=6.58), temporal pole (-52, 10, -22, T=6.16) and inferior occipital cortex (40, -68, -14, T=5.59).
Object word pairs compared with the Abstract word pairs also showed greater bilateral anterior hippocampal activity, along with engagement of bilateral parahippocampal cortex, fusiform cortex and vmPFC, with increased anterior hippocampal activity also apparent when just the subsequently remembered stimuli were considered (Figure 3B, Table 5B).The reverse contrast identified no hippocampal engagement, but rather greater activity in middle temporal cortex (-62, -32, -2, T=8) and temporal pole (-54, 10, -18, T=7.12).
Increased anterior hippocampal activity was therefore observed for both Scene and Object word pairs compared to the non-imageable Abstract word pairs.As greater hippocampal engagement was apparent even when using just the remembered stimuli, it is unlikely that this result can be explained by better associative memory or successful encoding for the imageable word pairs.Rather the results suggest that the anterior hippocampal activity for word pair encoding may be related to the use of visual imagery.The sagittal slice is of the left hemisphere which is from the ch2better template brain in MRicron (Holmes et al., 1998;Rorden and Brett, 2000).The left of the image is the left side of the brain.The coloured bar indicates the tvalue associated with each voxel.(A) Scene word pairs > Abstract word pairs.(B) Object word pairs > Abstract word pairs.Images are thresholded at p < 0.001 uncorrected for display purposes.
All of the above contrasts involved word pairs, suggesting that associative binding per se cannot explain the results.However, it could still be the case that binding Abstract word pairs does elicit increased hippocampal activity but at a lower level than Scene and Object word pairs.To address this point, we compared the Abstract word pairs with the Abstract single words, as this should reveal any hippocampal activity related to associative processing of the pairs.No hippocampal engagement was evident for the Abstract word pairs (Table 6), and this was also the case when just the remembered stimuli were considered.This lends support to the idea that the use of visual imagery might be important for inducing hippocampal responses to word pairs.We also predicted that anterior hippocampal activity would be specifically influenced by the use of scene imagery, as opposed to visual imagery per se.The inclusion of both Scene and Object word pairs offered the opportunity to test this.Scene word pairs would be expected to consistently evoke scene imagery (as both words in a pair represented scenes), while Object word pairs could evoke both object and scene imagery (e.g., object imagery by imagining the two objects without a background context, or scene imagery by creating a scene and placing the two objects into it), thus potentially diluting the hippocampal scene effect.Scene word pairs might therefore activate the anterior hippocampus to a greater extent that Object word pairs.This is indeed what we found, with Scene word pairs evoking greater bilateral anterior hippocampal activity than the Object word pairs (Figure 4, Table 7A).Analysis using the just remembered stimuli gave similar results (Table 7A).Other areas that showed increased activity for the Scene pairs included the retrosplenial and parahippocampal cortices.The reverse contrast examining what was more activated for Object word pairs compared to Scene word pairs found no evidence of hippocampal activity despite better subsequent memory performance for the Object word pairs, and even when the just the remembered stimuli were examined (Table 7B).It seems therefore, that the anterior hippocampus may be particularly responsive to scene imagery.

B. Object word pairs > Scene word pairs
Left inferior temporal cortex -42, -48, -16 7.16 P < 0.001 uncorrected for the hippocampus and p < 0.05 FWE for the rest of the brain.Figure 4. Brain areas more activated by Scene word pairs than Object word pairs.The sagittal slice is of the left hemisphere which is from the ch2better template brain in MRicron (Holmes et al., 1998;Rorden and Brett, 2000).The left of the image is the left side of the brain.The coloured bar indicates the t-value associated with each voxel.Images are thresholded at p < 0.001 uncorrected for display purposes.
To summarise, our univariate analyses found that Scene word pairs engaged the anterior hippocampus the most, followed by the Object word pairs, with the Abstract word pairs not eliciting any significant increase in activation (Figure 5).This is what we predicted, and may be suggestive of particular responsivity of the anterior hippocampus to scenes.Multivariate analysis.We next sought further, more direct, evidence that our condition of main interest, Object word pairs, elicited hippocampal activity via scene imagery.Given our univariate findings and the extant literature (e.g.Zeidman and Maguire, 2016), we focused on an anatomically-defined bilateral anterior hippocampal ROI.We then used multivariate representational similarity analysis (RSA; Kriegeskorte et al., 2008) to compare the neural patterns of activity associated with encoding Object word pairs with Scene or Object single words.The single words were chosen as comparators because they consistently elicit either scene or object imagery (see Material and Methods).
Three similarity correlations were calculated.First, the similarity between Object word pairs and themselves to provide a baseline measure of similarity (i.e., the correlation of Object word pairs over the 4 runs of the scanning experiment).The two similarities of interest were the similarity between Object word pairs and Scene single words, and the similarity between Object word pairs and Object single words.Two participants showed similarity scores greater than 2 standard deviations away from the mean and were removed from further analysis, leaving a sample of 43 participants.
Repeated measures ANOVA found a significant difference between the three similarities (F2,84=3.40,p=0.038, ƞ 2 =0.075).As predicted, the neural representations in the anterior hippocampus of Object word pairs were more similar to Scene single words (Figure 6, purple bar) than to Object single words (Figure 6, light green bar; t42=2.09,p=0.042, drm=0.21).In fact, representations of Object word pairs were as similar to Scene single words as to themselves (Figure 6, orange bar; t42=0.38,p=0.71).Object word pairs were significantly less similar to Object single words than to themselves (t42=2.54,p=0.015, drm=0.23).Of note, these results cannot be explained by subsequent memory performance because Scene single words and Object single words were remembered equally well (t42=0.68,p=0.50).
Overall, these multivariate results show that within the anterior hippocampus, Object word pairs were represented in a similar manner to Scene single words, but not Object single words.
This provides further support for our hypothesis that Object word pairs evoke anterior hippocampal activity when scene imagery is involved.VVIQ and the use of imagery.As well as examining participants in one large group, as above, we also divided them into three groups based on whether they reported high, mid or low use of imagery on the VVIQ.We found no differences in memory performance among the groups on the word pair tasks (F<0.4 for all contrasts).Similarly, fMRI univariate analyses involving the word pair conditions revealed no differences in hippocampal activity.Voxel based morphology (VBM; Ashburner and Friston, 2000;Mechelli et al., 2005;Ashburner, 2009) showed no structural differences between the groups anywhere in the brain, including in the hippocampus.
Interestingly however, the imagery groups did differ in one specific waytheir strategy for memorising the Object word pairs.While strategy use was similar across the imagery groups for all other word conditions, for the Object word pairs, twice as many participants indicated using a scene imagery strategy in the high imagery group (n=12/15; 80%) than in the mid or low imagery groups (n=5/15; 33% and 6/15; 40% respectively).Comparison of scene strategy use compared to other strategy use across the imagery groups revealed this to be a significant difference (χ 2 (2) = 7.65, p = 0.022).
Given this clear difference in scene imagery use specifically for the Object word pairs, we performed the anterior hippocampus RSA analysis separately for the three imagery groups.We hypothesised that in the high imagery group, Object word pairs would be represented in the hippocampus in a similar manner to scene single words (as with our whole group analyses) whereas this would not be the case in the mid or low imagery groups.Participants with similarity values greater than 2 standard deviations away from the mean were excluded, resulting in one participant being removed from each group.Importantly, the pattern of scene imagery strategy remained the same even after the removal of these few participants (high imagery group, n=11/14; mid imagery group, n=5/14; low imagery group, n=5/14; χ 2 (2) = 6.86, p = 0.032).
As predicted, for the high imagery group, Object word pairs were more similar to Scene single words than Object single words (Figure 7A;t13=4.63,p<0.001,d=0.78).This was not the case for the mid or low imagery groups (t13=0.472, p=0.65; t13=0.20, p=0.85, respectively).Of note, the interaction between the imagery groups was significant (Figure 7B; F2,39=3.53,p=0.039, ƞ 2 =0.15).Independent samples t-tests showed that the difference between the similarities was greater in the high imagery group than in the mid and low imagery groups (t26=2.09, p=0.046, d=0.79; t26=2.72, p=0.011, d=1.03, respectively).As before, these differences cannot be explained by subsequent memory performance because all three groups showed no differences between the Scene single and Object single words (high imagery group: t13=0.35,p=0.74; mid imagery group: t13=0.40,p=0.69; low imagery group: t13=1.18,p=0.26).In summary, the neural patterns in anterior hippocampus for Object word pairs showed greater similarity with the Scene single words in the high imagery group, whereas for the mid and low imagery groups this was not the case.This provides further evidence linking the anterior hippocampus with the processing of Object word pairs through scene imagery.

Discussion
The aim of this study was to understand the role of the hippocampus in encoding verbal paired associates (VPA).There were five findings.First, we observed greater anterior hippocampal activity for imageable word pairs than non-imageable word pairs, highlighting the influence of visual imagery.Second, non-imageable word pairs compared to non-imageable single words revealed no differences in hippocampal activity, adding further support for the significance of visual imagery.Third, increased anterior hippocampal engagement was apparent for Scene word pairs more than Object word pairs, implicating specifically scene imagery.Fourth, for Object word pairs, fMRI activity patterns in the anterior hippocampus were more similar to those for scene imagery than object imagery, further underlining the propensity of the hippocampus to respond to scene imagery.Finally, our examination of high, mid and low imagery users found that the only difference between them was the use of scene imagery for encoding Object word pairs by high imagers, which in turn was linked to scene-related activity patterns in the hippocampus.Overall, our results provide evidence that anterior hippocampal engagement during VPA seems to be closely related to the use of scene imagery, even for Object word pairs.
Previous findings have hinted that visual imagery might be relevant in the hippocampal processing of verbal material such as VPA.Work in patients with right temporal lobectomies, which included removal of some hippocampal tissue, suggested that while memory for imageable word pairs was impaired, memory for non-imageable word pairs was preserved (Jones-Gotman and Milner, 1978).Furthermore, instructing these patients to use visual imagery strategies impaired both imageable and non-imageable word pair performance (Jones-Gotman, 1979).We are unaware of any study examining VPA in patients with selective bilateral hippocampal damage that has directly compared imageable and non-imageable word pairs (Clark and Maguire, 2016).More recent fMRI findings also support a possible distinction in hippocampal engagement between imageable and non-imageable word pairs.Caplan and Madan (2016) investigated the role of the hippocampus in boosting memory performance for imageable word pairs, concluding that imageability increased hippocampal activity.However, greater hippocampal activity for high over low imagery word pairs was only observed at a lenient whole brain threshold (p<0.01 uncorrected, cluster size ≥ 5), possibly because their low imagery words (e.g., muck, fright) retained quite a degree of imageability.Furthermore, they did not examine the influence of different types of visual imagery on hippocampal engagement.
Our different word types were extremely well matched across a wide range of features, with the abstract words being verified as non-imageable, and the scene and object words as reliably eliciting the relevant type of imagery.Using these stimuli we showed that hippocampal involvement in VPA is not linked to visual imagery in general but seems to be specifically related to scene imagery, even when each word in a pair denoted an object.This supports a prediction made by Maguire and Mullally (2013; see also Clark and Maguire, 2016), who noted that a scene allows us to collate a lot of information in a quick, coherent and efficient manner.
Consequently, they proposed that people may automatically use scene imagery during encoding and retrieval of imageable verbal material.For instance, we might visualise the scene within which a story is unfolding, or place the objects described in word pairs in a simple scene together.
If verbal tasks can provoke the use of imagery-based strategies, and if these strategies involve scenes, then patients with hippocampal amnesia would be expected to perform poorly on VPA tasks involving concrete imageable words because they are known to have difficulty with constructing scenes in their imagination (e.g.Hassabis et al., 2007a;Andelman et al., 2010;Race et al., 2011;Mullally et al., 2012;Kurczek et al., 2015).This impairment, which was not apparent for single objects, prompted the proposal of the scene construction theory which holds that scene imagery constructed by the hippocampus is a vital component of memory and other functions (Hassabis and Maguire, 2007;Maguire and Mullally, 2013).Findings over the last decade have since linked scenes to the hippocampus in relation to autobiographical memory (Hassabis et al., 2007b;Hassabis and Maguire, 2007) but also widely across cognition, including perception (Graham et al., 2010;Mullally et al., 2012), future-thinking (Hassabis et al., 2007a;Schacter et al., 2012), spatial navigation (Maguire et al., 2006;Clark and Maguire, 2016) and decision-making (Mullally and Maguire, 2014;McCormick et al., 2016).
Our hippocampal findings were located in the anterior portion of the hippocampus.Anterior and posterior functional differentiation is acknowledged as a feature of the hippocampus, although the exact roles played by each portion are not widely agreed (Moser and Moser, 1998;Fanselow and Dong, 2010;Poppenk et al., 2013;Strange et al., 2014;Ritchey et al., 2015).In particular, our anterior results seem to align with anterior medial hippocampus in the region of the presubiculum and parasubiculum.These areas have been highlighted as being consistently implicated in scene processing (reviewed in Zeidman and Maguire, 2016) and were recently proposed to be neuroanatomically determined to process scenes (Dalton and Maguire, 2017).
An important point to consider is whether our results can be explained by the effectiveness of encoding, as measured in a subsequent memory test.It is certainly true that people tend to recall fewer abstract than concrete words in behavioural studies of memory (Paivio, 1969;Jones, 1974;Paivio et al., 1994).We tested memory for both single words and paired words.
Memory performance for scene, object and abstract words was comparable when tested singly.
Memory for the word pairs was significantly lower for the non-imageable Abstract word pairs compared to the Scene word pairs and Object word pairs.Nevertheless, performance for all conditions was above chance, which was impressive given the large number of stimuli to be encoded with only one exposure.Increased hippocampal activity was apparent for both Scene word pairs and Object words pairs compared to the Abstract word pairs when all stimuli or only the subsequently remembered stimuli were analyzed.This shows that our results cannot be explained by encoding success.
There is a wealth of research linking the hippocampus with associative binding (e.g.Davachi, 2006;Konkel and Cohen, 2009;Eichenbaum and Cohen, 2014;Schwarb et al., 2015;Rangel et al., 2016).We do not deny this is the case, but suggest that our results provoke a reconsideration of the underlying reason for apparent associative effects.We found that the encoding of non-imageble Abstract word pairs did not elicit an increase in hippocampal activity compared to single abstract words, even when only subsequently remembered stimuli were considered.If binding per se was the reason for hippocampal involvement in our study, then this contrast should have revealed it.We suggest that the role of imageability, particularly involving scenes, has been underestimated or ignored in VPA and other associative tasks despite potentially having a significant influence on hippocampal engagement.
Our participants were self-declared low, mid or high imagery users as measured by the VVIQ.
They differed in the degree of scene imagery usage in particular during the encoding of Object word pairs, with high imagers showing the greatest amount, and concomitant hippocampal engagement.Given that scene imagery has been implicated in functions across cognition, it might be predicted that those who are able to use scene imagery well might have more successful recall of autobiographical memories and better spatial navigation.Individual differences studies are clearly required to investigate this important issue in depth, as currently there is a dearth of such work.In the present study, increased use of scene imagery by the high imagery group did not convey a memory advantage for the Object word pairs.However, in the real world, with more complex memoranda like autobiographical memories, we predict that scene imagery would promote better memory.
In conclusion, we showed a strong link between the anterior hippocampus and performance on a VPA task mediated through scene imagery.This offers a way to reconcile hippocampal theories with a visuospatial bias and the memory for verbal material.Moreover, we speculate that this could hint at a verbal memory system in humans piggy-backing on top of an evolutionarily older visual (scene) mechanism.We believe it is likely that other common verbal tests, such as story recall and list learning, which are highly imageble, may similarly engage scene imagery and the anterior hippocampus.Greater use of non-imageble abstract verbal material would seem to be prudent in future verbal memory studies.Indeed, an obvious prediction arising from our results is that patients with selective bilateral hippocampal damage would be better at recalling abstract compared to imageable word pairs, provided care is taken to match the stimuli precisely.Our data do not speak to the issue of whether or not scene construction is the primary mechanism at play within the hippocampus, as our main interest was in examining VPA, a task closely aligned with the hippocampus.What our results show, and we believe in a compelling fashion, is that anterior hippocampal engagement during VPA seems to be best explained by the use of scene imagery.

Figure 1 .
Figure 1.Example stimuli and trial timeline.(A) Examples of stimuli from each of the word types in the order of (from left to right) Scene word pair, Object word pair, Abstract word pair.(B) Examples of single word trials in the order of (from left to right) Scene single word, Object single word, Abstract single word.Single words were shown with random letter strings (which could be presented at either the top or the bottom) in order to be similar to the visual presentation of the word pairs.(C).Examples of catch trials, where a real word was presented with a pseudoword, which could be presented as either the top or bottom word.(D).Example timeline of several trials.
word pairs were more similar to the Scene single words than the Object single words.For each participant, T-statistics for each voxel in the anatomically defined anterior hippocampal ROI were computed for each condition (Object word pair, Object single word, Scene single word) and in each scanning run.The Pearson correlation between each condition was then calculated as a similarity measure (Object word pair/Object word pair, Object word pair/Scene single word, Object word pair/Object single word).The similarity measure was cross-validated across the different scanning runs to guarantee the independence of each data set.Repeated measures

Figure 2 .
Figure 2. Memory performance on the associative memory test.Error bars are 1 standard error of the mean.^ indicates a significant difference from chance (dashed line, 50%) at p < 0.001.Stars show the significant differences across the word pair types; **p < 0.01, *** p < 0.001.

Figure 3 .
Figure 3.Comparison of imageable Scene or Object word pairs with non-imageable Abstract word pairs.The sagittal slice is of the left hemisphere which is from the ch2better template brain in MRicron(Holmes et al., 1998;Rorden and Brett, 2000).The left of the image is the left side of the brain.The coloured bar indicates the tvalue associated with each voxel.(A) Scene word pairs > Abstract word pairs.(B) Object word pairs > Abstract word pairs.Images are thresholded at p < 0.001 uncorrected for display purposes.

Figure 5 .
Figure 5.Comparison of each word pair condition with the fixation cross baseline.Mean beta values extracted from a bilateral anatomical mask of the anterior hippocampus for each of the word pair conditions compared to the central fixation cross baseline.Error bars are 1 standard error of the mean.A repeated measures ANOVA showed significant differences between the conditions (F1.69,74.51=16.06,p <0.001, ƞ 2 =0.27).Follow-up paired t-tests revealed significant differences between the word pair conditions: Scene word pairs vs Object word pairs t44=2.97,p=0.005, drm=0.30;Scene word pairs vs Abstract word pairs t44=6.46,p<0.001, drm=0.70;Object word pairs vs Abstract word pairs t44= 2.51, p=0.016, drm=0.34.*p < 0.05, **p < 0.01, ***p<0.001.

Figure 6 .
Figure 6.The neural similarity of Object word pairs, Scene single words and Object single words.Object Pair Object Pairthe similarity between Object word pairs between runs.Object Pair Scene Singlethe similarity between Object word pairs and Scene single words.Object Pair Object Singlethe similarity between Object word pairs and Object single words.Error bars represent 1 standard error of the mean adjusted for repeated measures(Morey, 2008).*p < 0.05.

Figure 7 .
Figure 7. RSA comparisons of the three imagery groups.(A) The neural similarity of Object word pairs, Scene single words and Object single words when split by self-reported imagery use.Object Pair Scene Singlethe similarity between Object word pairs and Scene single words.Object Pair Object Singlethe similarity between Object word pairs and Object single words.(B) The difference in similarity between Object word pairs and Scene single words compared to Object words pairs and Object single words in the imagery groups.Error bars represent 1 standard error of the mean.**p < 0.01, ***p<0.001.

Table 1 .
Characteristics of the participant groups.

Table 2 .
Properties of each word type.

Table 3 .
Performance (% correct) on the post-scan item memory test (non-guessing trials).

Table 4 .
Performance (% correct) on the post-scan associative memory test (non-guessing

Table 5 .
Imageable word pairs compared with Abstract word pairs. A.

Table 6 .
Abstract word pairs compared with Abstract single words.

Table 7 .
Scene word pairs compared with Object word pairs. A.