Reading memory formation from the eyes

At any time, we are processing thousands of stimuli, but only few of them will be remembered hours or days later. Is there any way to predict which ones? Here, we tested whether the pupil response to ongoing stimuli, an indicator of physiological arousal known to be relevant for memory formation, is a reliable predictor of long‐term memory for these stimuli, over at least 1 day. Pupil dilation was tracked while participants performed visual and auditory encoding tasks. Memory was tested immediately after encoding and 24 hr later. Irrespective of the encoding modality, trial‐by‐trial variations in pupil dilation predicted reliably which stimuli were recalled in the immediate and 24 hr‐delayed tests, in particular for emotionally arousing stimuli. These results show that our eyes may provide a window into the formation of long‐term memories. Furthermore, our findings underline the important role of central arousal systems in the rapid formation of memories in the brain, possibly by gating synaptic plasticity mechanisms in the neocortex.

encoding and mainly at the group level, which precluded an item-specific prediction. Whether the pupil response may indeed forecast the long-term retrieval of individual stimuli remained unknown.
We tested here the hypothesis that the pupil response during encoding of stimuli-and, by inference, phasic elevation of central arousal-predicts trial-by-trial long-term memory formation. In two independent tasks, a visual picture encoding task and an auditory word encoding task, participants encoded either pictures or words while their pupillary responses were tracked with an eye tracker. The use of two tasks allowed us to assess the robustness of the hypothesized predictive value of the pupil response during encoding for subsequent memory. Moreover, because visual stimulation per se leads already to a pupil response, the use of an additional auditory task enabled us to examine the association between pupil dilation during encoding and later memory when any visual artifacts could be ruled out. Memory for the stimuli was tested both immediately after encoding and 24 hr later. Because pupil dilation reflects also emotional arousal (Goldinger & Papesh, 2012;Lempert, Glimcher, & Phelps, 2015) and the well-known memory enhancement for emotional relative to neutral events (Christianson, 1992) is crucial for several psychopathologies, including PTSD (Pitman, 1989;de Quervain, Schwabe, & Roozendaal, 2017), we included neutral and emotionally arousing stimuli to further examine whether pupil dilation may have particular predictive value for emotional memory formation.

| Participants
In all, 54 healthy native speakers of German (age: 18-35 years, M = 25.35 years; 27 women, 27 men) without a history of any neurologic or mental disorders participated in this study. All of them reported normal or corrected-to-normal visual acuity and were naïve to the purpose of the experiment. The sample size was based on an apriori sample size calculation using G*Power (Faul, Erdfelder, Lang, & Buchner, 2007), showing that a sample of 54 participants is required to detect a medium-sized effect of f = 0.25 with a power of 0.95, given an α of 0.05. Due to technical failure, pupil data were missing for three participants in the picture encoding task and for six participants in the word encoding task, thus leaving a sample of 51 and 48 participants, respectively, in the corresponding analyses. All participants provided written informed consent and were paid a moderate monetary compensation for participation. The study protocol conforms to the Declaration of Helsinki and was approved by the ethics committee of the Faculty of Psychology and Human Movement at the University of Hamburg (approval no. 2016_79).

| Apparatus
The experiment was programmed and presented in MATLAB (The MathWorks, Natick, MA) using the Psychophysics Toolbox (Brainard, 1997), in combination with the eyetracking software BeGaze 3.0 (SensoMotoric Systems, SMI). Stimuli were presented on a 24-inch Dell monitor with a resolution of 1,920 × 1,200 pixels and a refresh rate of 60 Hz. Participants sat in a dimly lit, sound-attenuated room with their head in a chin rest at a distance of 60 cm from the screen. Pupil size was monitored in both eyes using a RED250mobile (SMI; sampling rate: 250 Hz). The eye tracker was calibrated applying the 9-point calibration and validation procedure before each of the two encoding tasks.

| Experimental tasks, stimuli, and procedure
After their arrival at the laboratory, participants first completed standard questionnaires to assess their depressive mood, chronic stress level, as well as state and trait anxiety, all of which may affect (emotional) memory processes (see Appendix S1). Next, the eye tracker was calibrated and participants performed two encoding tasks: a picture encoding task and a word encoding task. Task order was counterbalanced across participants.

| Picture encoding task
The stimulus set for the picture encoding task consisted of 150 emotionally neutral and 150 negative pictures taken from the International Affective Picture System (IAPS; Lang, Bradley, & Cuthbert, 1997) and other open online sources (all images are available at https://doi.org/10.5281/ zenodo.1246100). Pictures were presented in greyscale and modified in MATLAB so that they all had the same average luminance. During encoding, 75 neutral and 75 negative pictures were randomly chosen from the picture pool and presented in randomized order for 3 s at the center of the screen, against a grey background that was equiluminant to the pictures. While encoding the pictures, participants were requested to memorize the pictures (intentional encoding) and to evaluate the emotionality of the shown picture on a 4-point scale from 0 ("neutral") to 3 ("very negative"). Between pictures, we presented a grey fixation cross for 3-6 s.

| Word encoding task
The stimulus set for the word encoding task consisted of 120 emotionally neutral and 120 negative German nouns. Words were taken from standardized German word data sets (Böcker, Gruber, & Gauggel, 2014;Schwibbe, Räder, Schwibbe, & Geiken-Pophanken, 1981). We created audio files for these words with the help of the software Audacity ® . All audio files and a list of used words are available at https:// doi.org/10.5281/zenodo.1246100. During encoding, 60 neutral and 60 negative words were randomly chosen from the word pool and presented in randomized order via headphones. While listening to the words, participants looked at a fixation cross shown at the center of the screen with the same grey background as in the picture encoding task and were requested to memorize the words (intentional encoding). The inter-trial interval between the presentations of words varied between 3 and 6 s.

| Immediate and delayed memory testing
Immediately after each of the two encoding tasks as well as 24 hr after the encoding session, participants performed a free recall test, in which they were asked to report verbally as many of the presented pictures and words, respectively, as possible. The experimenter noted the recalled items on a check list, while standing in the back of the participant, thus preventing any direct feedback. If it was not entirely clear to which picture a participant was referring to in the free recall test for the picture, he/she was asked to provide more details until the recalled picture could be clearly identified. There was no time limit for the free recall tests. To assess the predictive value of the pupil response during encoding for longterm memory, participants completed a second free recall test 24 hr after encoding. The procedure of this delayed memory test was exactly the same as in the immediate free recall test.
After the 24 hr-delayed free recall test, participants completed also recognition tests for the pictures and words. In these tests, participants saw all pictures and words, respectively, that were presented on the first day and an equal number of novel neutral and negative items in randomized order on a computer screen. Participants were requested to indicate for each item whether it had been presented on day 1 ("old") or not ("new"). For items that were identified as "old," participants were further asked to rate on a scale from 1 ("not certain") to 4 ("very certain") how confident they were that the item was indeed "old." Because free recall reflects participants' ability to actually retrieve information better than recognition, free recall appears to be more sensitive to arousal effects than recognition (Bradley, Greenwald, Petry, & Lang, 1992) and recall and recognition appear to rely on distinct encoding mechanisms (Staresina & Davachi, 2006), our analyses focused primarily on the free recall tests. Data for the recognition test are presented in the Appendix S1 and Figure S1.

| Pupil data preprocessing and analyses
The pupil data were preprocessed as described in Urai et al. (2017). Missing data and blinks, as detected by the SMI software, were linearly interpolated. We estimated the effect of blinks and saccades on the pupil response through deconvolution, and removed these responses from the data using linear regression. The residual pupil time series were z-scored per run, and resampled to 50 Hz. We segmented the continuous pupil data into epochs corresponding to experimental trials, and baseline corrected the single-trial data by subtracting the average pupil size in the 2 s before stimulus onset. We then defined pupil responses as the average in 1-3 s after stimulus onset; this window was chosen to take into account the delay of the pupil response (De Gee et al., 2014) and encompass the full presentation duration of the pictures. Statistics on the pupil time course were corrected using cluster-based permutation testing.

| Behavioral data analysis
Data from the picture and word recall tasks on day 1 and 2 were quantified as the fraction of recalled stimuli relative to the number of stimuli presented during encoding. Recall performance for neutral and negative stimuli was subjected to paired t-tests. To assess the predictive value of the pupil size during encoding for subsequent memory, we first subjected the data to a subsequent memory analysis, in which we asked whether the pupil size during encoding differed for subsequently remembered and forgotten items. To this end, we subjected the pupil data to an ANOVA with the factors subsequent memory (remembered vs forgotten) and stimulus emotionality (neutral vs. negative).
To analyze on a trial-by-trial basis whether subsequent memory for each individual item can be predicted by pupil dilation during encoding, we employed a logistic regression approach. More specifically, we performed for all participants individual logic regressions estimating the predictive value of the pupil response to an individual item for the subsequent recall of this item. The logistic regression was performed both separately for neutral and negative items and for all items together. The beta values from these individual logistic regressions were then subjected to t-tests at the group level to assess whether the beta values were reliably different from zero and different for neutral and negative items. All reported p values are two-tailed.

| Pupil response predicts emotional memory formation for pictures
Participants' emotionality ratings during picture encoding confirmed the classification into neutral and negative pictures (mean rating (SEM) for neutral pictures: 0.15 (0.02), for negative pictures: 1.98 (0.05); t(53) = 38.19, p < 0.001, d = 7.35). As expected (Christianson, 1992), negative pictures were significantly better remembered than neutral pictures, both in the immediate free recall test (t(53) = 12.89, p < 0.001, d = 2.47; Figure 1a) and in the 24 hr-delayed recall (t(53) = 12.39, p < 0.001, d = 2.38; Figure 1b). This emotional memory enhancement was reflected in participants' pupil responses (Figure 1c). The pupil initially constricted during stimulus presentation, an effect only evident for pictures, not for words (compare with Figure 2c), which is due to the pupil response to the presentation of high-contrast images. This constriction was followed by an evoked dilation, the amplitude of which was modulated by emotional content: Pupil responses (defined as the average baseline-corrected pupil size from 1 to 3 s after stimulus onset, see Methods) were significantly stronger in response to negative as compared to neutral pictures (t(50) = 11.16, p < 0.001, d = 2.15; Figure 1c).
Importantly, pupil responses during encoding were significantly stronger for items that were subsequently remembered in the immediate free recall test (Figure 1d; main effect subsequent memory: F 1,50 = 9.28, p = 0.004, η 2 p = 0.10) and the 24 hr-delayed test ( Figure 1e; F 1,50 = 6.62, p = 0.013, η 2 p = 0.09). This effect was driven by the emotionally negative stimuli: Significant interactions of stimulus emotionality and subsequent memory in the immediate (F 1,50 = 10.58, p = 0.002, η 2 p = 0.21) and delayed (F 1,50 = 4.21, p = 0.045, η 2 p = 0.08) recall tests revealed that the pupil response during encoding was larger for remembered than forgotten pictures, when pictures were negative (immediate recall: t(50) = 4.83, p < 0.001, d = 0.97; 24 hr-delayed recall: t(50) = 3.81, p < 0.001, d = 0.76) but not when pictures were neutral (immediate recall: t(50) = 0.51, p = 0.610, d = 0.10; 24 hr-delayed recall: t(50) = 0.08, p = 0.939, d = 0.02). Adding the factor retention delay (immediate vs 24 hr-delayed) explicitly to the model confirmed that there was, across recall tests, a significant difference in pupil dilation during encoding between subsequently remembered and forgotten pictures (F 1,50 = 9.00, p = 0.004, η 2 p = 0.15), between neutral and negative pictures (F 1,50 = 92.41, p < 0.001, η 2 p = 0.65), a significant interaction of stimulus emotionality and subsequent memory (F 1,50 = 7.33, p = 0.009, η 2 p = 0.13), but no effect of retention delay (main effect: F 1,50 = 0.64, p = 0.428, η 2 p = 0.01; all interaction effects including the factor retention delay: all F < 2.73, all p > 0.106, all  shows that the pattern of results was indeed comparable for the immediate and delayed tests. These results so far established that pupil responses were larger for negative compared to neutral pictures, and that this pupil response to emotional stimuli was, on average, larger for pictures that were subsequently recalled relative to those that were not. We then set out to determine the predictive power of pupil response for subsequent memory on a trialby-trial basis, using logistic regression (see Method). The pupil response during encoding, across items of different emotionality, was a reliable predictor of trial-by-trail memory in the immediate free recall test (average beta value  Figure 1d and e). These differences in the predictive value of the pupil response for subsequent memory between neutral and negative pictures were statistically significant (immediate recall: t(50) = 4.46, p < 0.001, d = 0.88; 24 hr-delayed recall: t(50) = 3.69, p = 0.001, d = 0.73). Accordingly, an emotionality × retention delay ANOVA on the beta values of the regression analysis showed a main effect of picture emotionality (F 1,50 = 18.62, p < 0.001, η 2 p = 0.27) but no influence of the retention delay on the predictive value of pupil dilation during encoding for subsequent memory (main effect: F 1,50 = 0.18, p = 0.893, η 2 p < 0.01; emotionality × retention delay: F 1,50 = 1.95, p = 0.169, η 2 p = 0.04).

| Pupil response during encoding predicts subsequent memory for words
The findings from the picture encoding task show that the pupil response was a reliable predictor of trial-by-trial longterm memory, in particular for emotionally arousing pictures. We replicated these findings in a different stimulus modality, that is, auditory encoding of words. Recall performance was not reliably different for neutral and negative words, words than for the pictures may be due to fact that the learned words were more abstract than the pictures, making emotional words less arousing and less salient than emotional pictures, in combination with the well-known inferior memory for words relative to pictures (Paivio & Csapo, 1973); see also Figures 1a and b, 2a and b). However, also during the encoding of words the pupil response was significantly stronger for negative compared to neutral words (t(48) = 2.72, p = 0.009, d = 0.56; Figure 2c). Note that this emotion-related pupil dilation could not be explained by any differences in visual stimulation as items were presented auditorily. Again, the pupil response was overall stronger for words that were remembered in the immediate (F 1,48 = 23.59, p < 0.001; η 2 p = 0.32) and delayed free recall test (F 1,46 = 10.02, p = 0.003, η 2 p = 0.15) compared to those that were not remembered. This subsequent memory effect was not influenced by the emotionality of the stimuli (subsequent memory × stimulus emotionality for the immediate recall: F 1,48 = 0.09, p = 0.763, η 2 p = 0.00; for the 24 hr-delayed recall: F 1,46 = 0.04, p = 0.840, η 2 p = 0.00), suggesting that memory for both neutral and negative words was predicted equally well by the pupil response during encoding, as displayed in Figure 2d and e (left panels). Including the factor retention delay (immediate vs 24 hr-delayed) explicitly into the model confirmed that there was, across recall tests, a significant difference in pupil dilation during encoding between subsequently remembered and forgotten pictures (F 1,46 = 16.77, p < 0.001, η 2 p = 0.27), a trend for a difference between neutral and negative pictures (F 1,46 = 3.42, p = 0.071, η 2 p = 0.07), but neither an interaction of stimulus emotionality and subsequent memory (F 1,46 = 0.01, p = 0.919, η 2 p < 0.01), nor an effect of retention delay (main effect: F 1,46 = 0.02, p = 0.968, η 2 p < 0.01; all interaction effects including the factor retention delay: all F < 1.05, all p > 0.311, all η 2 p < 0.03). The absence of any effects of retention delay shows that the pattern of results was indeed comparable for the immediate and delayed tests.

| DISCUSSION
Our results demonstrate that task-evoked pupil responses predict on a trial-by-trial basis which information will be remembered in the long-run, for at least 24 hr and in particular for emotionally arousing information. This effect generalized across visual and auditory encoding tasks, thus allowing us to establish the memory-predictive value of the pupil response across sensory modalities. The pupil response to task is thought to reflect phasic activity of brainstem neuromodulatory nuclei, including the noradrenergic locus coeruleus (Aston-Jones & Cohen, 2005;Joshi et al., 2016). We propose that the predictive value of the pupil response during encoding for subsequent memory is mainly owing to this link of the pupil dilation to locus coeruleus noradrenergic activity. The role of noradrenaline in memory processes is very well established (Cahill & McGaugh, 1998;Cahill, Prins, Weber, & McGaugh, 1994;Mather, Clewett, Sakaki, & Harley, 2016;McGaugh, 2000McGaugh, , 2015Sara, 2009;Schwabe, Nader, Wolf, Beaudry, & Pruessner, 2012;Strange & Dolan, 2004). At the neural level, noradrenaline is known to stimulate activity in the amygdala, which then modulates memory processes in other areas, such as the hippocampus (Cahill, Babinsky, Markowitsch, & McGaugh, 1995;Cahill & McGaugh, 1998;Cahill et al., 1996;McGaugh, 2000). Neurophysiological data further indicate that noradrenaline has a critical impact on long-term potentiation and long-term depression (Harley, 2007;Huang, Huganir, & Kirkwood, 2013), key processes of synaptic plasticity underlying memory formation (Bliss & Collinridge, 1993;Ito, 1989). Moreover, noradrenaline facilitates protein synthesis processes promoting long-lasting memory (Cirelli & Tononi, 2000;Gelinas & Nguyen, 2005). Together, these data show the crucial role of noradrenaline in memory formation and we assume that the task-related pupil response may provide a window into the initiation of the noradrenalinerelated memory machinery.
Some previous studies have examined the link between pupil dilation and memory (Clewett et al., 2018;Goldinger & Papesh, 2012

| 1531
BERGT Papesh, Goldinger, & Hout, 2011). Our present findings extend these previous studies in several important ways. First, previous studies have shown that pupil responses, averaged across many trials, differ between memorized and forgotten items. By contrast, we tested here the predictive value of the pupil response at the single-trial level. Doing so is critical for the prediction of specific memories, as well as for evaluating the utility of pupil responses as an easily measurable physiological marker of memory formation.
Second, the current study is, to the best of our knowledge, one of the first to show that the pupil response predicts whether stimuli will be retained in the long run (for, at least, 24 hr). The delays between encoding phase and memory test were confined to less than 30 min in previous work, when memory consolidation, known to take hours (McGaugh, 2000), had (at best) just started. Assessing the stability of pupil-linked memory effects over several delays is important to determine whether the pupil predicts, beyond encoding, also consolidation processes and actual long-term memory and thus to evaluate their real-life behavioral significance. Doing so for two delays a day apart in the current study revealed that pupil responses predicted the immediate and delayed recall equally well. This, in turn, showed that pupil responses are reliable predictors of long-term memories and indicates that pupil-linked arousal mechanisms appear to specifically facilitate the encoding of new memories rather than the memory consolidation processes. This is in line with the idea that the phasic release of modulatory neurotransmitters reflected in pupil dilations (De Gee et al., 2017;Joshi et al., 2016) help memorize information by gating synaptic plasticity mechanisms in the cerebral cortex (Cooke, Komorowski, Kaplan, Gavornik, & Bear, 2015;Roelfsema, van Ooyen, & Watanabe, 2010). It should be noted, however, that the immediate free recall test may well have affected performance in the 24 hr-dealyed test. In particular, there is considerable evidence that the retrieval of an item may foster its subsequent memory (Karpicke & Roediger, 2008;Roediger & Karpicke, 2006). Thus, future studies on the predictive value of the pupil response during encoding for subsequent memory should include both a group that recalls the learned material immediately after encoding and at a later time point and a control group that omits the immediate recall.
Third, while previous studies used mainly recognition tests to assess memory, we here assessed both free recall and recognition performance. Free recall provides more insight into the search in, or retrieval from, memory than recognition, while the latter requires merely the comparison of present information to representations in memory. In fact, in our data, pupil dilation significantly predicted only free recall performance, but not recognition performance (see Appendix S1). This pattern of results might suggest that the arousal reflected in the pupil response aids particularly the search process in memory and less the comparison of information to the internal representation, in line with previous evidence suggesting that free recall is more sensitive to arousal effects than recognition (Bradley et al., 1992). Alternatively, the absence of an effect on recognition may also be owing to the excellent (near-ceiling) performance in the recognition test.
Beyond spontaneous or cognitive task-evoked variation in pupil size, the pupil dilates in response to emotionally arousing events (Goldinger & Papesh, 2012;Lempert et al., 2015). Indeed, we found here a modulation of the task-evoked pupil dilations by emotional content, in particular in the picture encoding task. In the picture encoding task, pupil dilation predicted memory formation only for stimuli with emotional value, which is generally in line with the view that locus coeruleus noradrenergic activity, thought to be reflected in the pupil response, facilitates in particular high-priority information (Mather et al., 2016). For neutral pictures, however, there was no differential pupil dilation for subsequently remembered vs. forgotten items. This latter finding may point to the role of other factors than arousal in memory formation, such as the level of processing, which may be less well captured by the pupil response. In the word encoding task, however, the pupil response predicted subsequent memory for both, neutral and negative items. This difference between tasks might be due to the fact that words were presented auditorily which triggered already a pupil response, corroborating findings showing that tones may lead to pupil dilation and hence promote subsequent memory (Hoffing & Seitz, 2015). Moreover, pictures are thought to be more salient than words and to elicit emotional arousal more easily than words (Carr, McCauley, Sperber, & Parmlee, 1982). In line with this view, it has been suggested that for pictures emotion may be evoked more rapidly and that regions such as the amygdala respond faster to emotional pictures than they do to emotional words (Gianotti et al., 2008;Kensinger & Schacter, 2006;Kim, Yoon, & Park, 2004). The different potency of emotional pictures and words to elicit emotional arousal and saliency may well have contributed to the reduced (to absent) emotional modulation of memory for words, whereas there was a strong emotional memory enhancement for pictures, as well as to the slightly different findings with respect to the emotionalityspecific predictive value of the pupil response for memory.
In conclusion, our findings demonstrate that our eyes may indeed provide a window into the making of (emotional) longterm memories. So far, subsequent memory paradigms have been used in combination with electroencephalography (EEG) or functional magnetic resonance imaging (fMRI) to identify neural predictors of later memory (Paller & Wagner, 2002). Compared to these complex neuroimaging techniques, pupillometry provides an easily accessible and much cheaper index of memory formation, in particular in the face of recently developed mobile eye-tracking devices. Our data suggest that such devices may be used, for instance in therapeutic or educational settings, to achieve a key goal of memory research, to predict which information will be remembered in the future.