Viewing one’s body during encoding boosts episodic memory

Episodic autobiographical memories (EAMs) are recollections of contextually rich and personally relevant past events. EAM has been linked to the sense of self, allowing one to mentally travel back in subjective time and re-experience past events. However, the sense of self has recently been linked to online multisensory processing and bodily self-consciousness (BSC). It is currently unknown whether EAM depends on BSC mechanisms. Here, we used a new immersive virtual reality (VR) system that maintained the perceptual richness of life episodes and fully controlled the experimental stimuli during encoding and retrieval, including the participant’s body. We report that the present VR setup permits to measure recognition memory for complex and embodied 3D scenes during encoding and retrieval, that recognition memory depends on delay and number of changed elements, and that viewing one’s body as part of the virtual scene (as found in BSC studies) enhances delayed retrieval. This body effect was not observed when no virtual body or a moving control object was shown. These data show that embodied views improve recognition memory for 3D life-like scenes, thereby linking the sense of self, and BSC in particular, to episodic memory and the re-experiencing of specific past events in EAM.


INTRODUCTION
A defining feature of episodic autobiographical memory (EAM) is the capacity to provide information about the content of our conscious personal experiences of "when" and "where" events occurred as well as "what" happened 1,2 . Previous studies defined EAM as the recall of contextually rich and personally relevant past events that are associated with specific sensoryperceptual and cognitive-emotional details [3][4][5][6][7][8][9][10] . EAM has been distinguished from semantic autobiographical memory, the latter being associated with general self-knowledge and the recall of personal facts that are independent of re-experiencing specific past events [11][12][13][14][15][16][17] .
In a series of seminal papers, Endel Tulving highlighted the subjective dimension of EAM associated with the re-experiencing of specific past events by pointing out the importance of the sense of self and introducing his influential notion of autonoetic consciousness. He argued that autonoetic consciousness is of fundamental relevance to EAM, allowing one to mentally travel back in subjective time and recollect one's previous experiences 2, [18][19][20] . Tulving distinguished autonoetic consciousness from noetic consciousness, linking the latter to semantic memory and semantic autobiographical memory and to knowing about (rather than re-experiencing) specific past events. Others extended Tulving's notion of EAM and proposed that it is contributing to the sense of self across time 10,12,[21][22][23][24][25] and developed behavioral tasks such as mental time travel [26][27][28][29][30][31] .
Although, several other cognitive domains have been proposed to contribute to the sense of self (i.e. language, mental imagery, facial self-recognition [32][33][34][35] , recent research has highlighted the importance of non-cognitive multisensory and sensorimotor contributions to the sense of self. This novel theoretical and experimental approach is based on behavioral 36,37 , neuroimaging [38][39][40] and clinical data 39,41 and involves the processing and integration of different bodily stimuli to the sense of self: bodily self-consciousness (BSC); for review see 42,43 . BSC includes conscious experiences such as self-identification and self-location 37,36,44,45 , as well as the firstperson perspective 39,46,47 . This work was based on clinical observations in neurological patients with so-called out-of-body experiences characterized by changes in the sense of self, in particular of the experienced self-location and perspective from an embodied first-person perspective to a third-person perspective 39,41 and has been able to induce milder, but comparable, states in healthy subjects using virtual reality (VR) technology to provide multisensory stimulation 36,39,47 .
Given the link of BSC with subjective experience and previous claims that subjective reexperiencing of specific past events is a fundamental component of EAM 2,18 , we argue that multisensory bodily processing may not only be of relevance for BSC, but also for consciousness concerning past events. Recent findings have shown that BSC impacts several perceptual and cognitive functions such as tactile perception 48,49 , pain perception 50,51 , visual perception [52][53][54] , as well as egocentric cognitive processes 55 . Concerning episodic memory, Bergouignan et al. 56 reported that recall of EAM items and hippocampal activity during the encoding of episodic events is modulated by the visual perspective from where the event was viewed during encoding and St. Jacques et al. 57 showed that first-versus third-person perspective during retrieval modulated recall of autobiographical events and associated this with medial and lateral parietal activations. We here predicted that bodily multisensory processing that has been described to modulate BSC would interfere with EAM processes.
Traditionally, behavioral and neuroimaging EAM studies rely on questionnaires, verbal reports, interviews, or mental imagery and predominantly investigated memory retrieval by using a variety of stimuli and procedures such as cue words and pictures [57][58][59][60][61][62] . For example, important research relied on interviews with the participants 60,63 on personalized lists of significant life events of participants 9,30,64-66 , and employed different procedures asking participants to reexperience particular life episodes 62,61,58,67,68 . This differs from research investigating verbal memory through encoding and recall of word lists [69][70][71][72] or testing spatial memory with figures, spatial paths, or other visuospatial materials [73][74][75] . All these procedures, however, lack either the richness of real-life events or do not control the information during encoding. To overcome some of the limitations with respect to episodic memory, several EAM research groups have relied on advances in video technology and VR during encoding and retrieval of information (i.e. spatial navigation 76,77 ; social interactions 78,79 ). Participants were seated in front of a computer screen showing a virtual environment and asked to navigate in such environments using a joystick (encoding) and later asked to recall selected objects from the environment (retrieval). These computer-based VR studies suggest that both interactions with the environment during encoding or retrieval influence memory performance. Compared to passive participation, several VR studies showed better learning performances across free recall trials and recognition tasks 76,[80][81][82] . Plancher et al. 83 suggested that interactions with the naturalistic environment created with VR enhanced spatial memory. However, despite these important achievements, these virtual environments were mostly using non-immersive VR systems, did not employ real life like virtual scenes, and did not use VR technology that allows integrating the participants' body (and hence multisensory bodily stimulation) for the tested virtual life episodes. In the present experiments, we took advantage of a recently developed immersive VR system, which allows us to preserve the perceptual richness of life episodes, to fully control the experimental stimuli during encoding and retrieval, and to integrate and manipulate multisensory information of our participant's body in an online fashion. The present experiments had one major technological and one major scientific goal: (1) develop and test EA-like memory in the laboratory with virtual episodes using immersive VR and (2) investigate whether multisensory bodily stimulations that have been shown to impact BSC, perception, and egocentric cognition modulates EAM.
In the first experiment, we tested our immersive VR system and sought to address some of the experimental limitations of earlier EAM studies, which either had limited control of actual autobiographical stimuli and events during encoding and only examined the stage of EAM retrieval 5,59,66,84 or controlled EAM encoding, but without the immersion into the original scenes during EAM retrieval 9,56,64 . We further tested EAM performance and confidence for immersive three-dimensional (3D) VR scenes at two different time points and for different number of objects (that changed between both sessions), we predicted memory decreases depending on delay and on the number of objects changed. In the second experiment, we investigated the main hypothesis of the present experiments and tested the potential link between multisensory own body signals (that are fundamental for BSC) and EAM. We thus examined whether the presence of online and congruent multisensory cues from the subject's body (i.e. the presence of one's own physical body from the first-person viewpoint) impacts memory performance and confidence in the present VR paradigm, compared to an experimental condition where such online first-person bodily cues are absent. Finally, we performed a third (control) experiment in order to test whether the effect of multisensory bodily stimulation that we observed in the second experiment is specific to multisensory bodily cues.

Subjects
A total of 79 subjects with normal or corrected to normal vision were recruited to participate.

Virtual Reality Technology
Our VR technology uses a spherical capturing and recording system and an immersive setup for first-person perspective (1PP) replay of the recorded real environments. For recording a scene, 14 cameras (GoPro Hero4) are assembled on a spherical rig (360hero 3DH3PRO14H) and linked to 4 pairs of binaural microphones (3DIO Omni Binaural Microphone) to cover the entire sphere of perception around a viewpoint (360° horizontally and vertically, stereoscopic vision, binaural panoramic audio). A custom software (Reality Substitution Machine, RealiSM, http://lnco.epfl.ch/realism) then aggregates all data into a single high-resolution panoramic audiovisual computer format (equivalent to more than 4 stereoscopic full HD movies). A headmounted display (HMD, Oculus Rift DK2; 900x1080 per eye, FOV ~105º Vertical, 95º Horizontal) was used to immerse subjects into the recording and sound was administered with noise-cancelling headphones (BOSE QC15). Furthermore, the HMD was coupled with a stereoscopic depth camera (Duo3D MLX, 752x480 at 56Hz) mounted on its front face to capture subjects' bodies from 1PP. The RealiSM software then augments the fully immersive environment with a realistic view from which subjects could see their hands, trunk and legs from 1PP. As a result, subjects experienced as if they would be physically present in the pre-recorded scenes and seeing oneself (not a 3D avatar). The software also allows integrating 3D virtual objects seamlessly in the scene (experiment 3).

Stimuli
Subjects were immersed in three (experiment 1) or two (experiments 2 and 3) pre-recorded rooms via the HMD (see below). For the encoding session, 10 everyday-life objects (e.g., coffee machine, pen, trash bin) were placed in each room. These real-life objects created the natural context of episodic memory at both encoding and retrieval. During retrieval, rooms remained either exactly the same as during encoding (i.e. the same 10 real-life objects were again presented at the same places in the previously visited rooms) or some of the objects (i.e. 1, 2 or 3 objects) were replaced by new objects that were not previously seen in any of the scenes.

Paradigm
Each of the three experiments consisted of two sessions, an incidental encoding period (session 1) followed by an immediate (group 1) or one-hour delayed (group 2) surprise recognition task (session 2). In all three experiments, subjects were not informed that we would later test their memory for the stimuli encountered during the encoding session. Before the two experimental sessions, subjects were seated on a chair and asked to put on the HMD and headphones. They could then familiarize themselves with the VR technology for several minutes. Paradigm and testing sequence are depicted in Figure 1a.

Encoding Session
During the encoding session, to assure that subjects explored the entire room and to monitor their attention within the different 3D scenes (i.e. the different rooms), participants were instructed to freely explore each virtual room. Moreover, we programmed a virtual ball that appeared in each of the three rooms and was moving within the rooms for a duration of 30 seconds and covered all sections of the virtual room. Participants were asked to fixate the virtual ball and to follow its movements through the virtual room. In total, the target object appeared at 6 different positions in each room. After the ball stopped moving, participants freely explored each room for another 30 seconds.
The procedure in experiment 2 was identical. However, in order to test the effect of viewing one's own body during encoding we asked subjects to follow the trajectory of the ball and to point at the moving ball with their hand and finger. The main manipulation consisted in showing the participant's physical body (body condition) or not (no-body condition). This was accomplished with the use of the stereoscopic depths cameras to capture in the participant's body and by turning them on in the body and off in the no-body condition. The participant's body was inserted in real-time in the virtual room and shown from the habitual visual firstperson viewpoint. In the body condition, the subjects saw their physical hand, the trunk, and their legs (i.e. the stereoscopic depths cameras were turned on) in the HMD and as part of the virtual 3D scene (Figure 1b). In the no-body condition, the virtual 3D scene was identical except that the participant's body was missing (i.e. the stereoscopic depths cameras were turned off) (Figure 1c). The order of presentation of the body and no-body condition was counterbalanced between subjects. In experiment 2 each participant explored two rooms (i.e. with 3 rooms as in experiment 1 the experiment would have been too long).
In experiment 3, participants were also asked to follow the movement of the ball appearing in each room (by physically pointing at it with their hand and finger). Yet in the object condition they were shown a non-bodily control object, instead of their own physical body (Figure 1d; see Supplemental video). The no-body condition was the same as in experiment 2. The presentation of the no-body and object condition was counterbalanced between subjects. No explicit instructions to memorize the objects of visited rooms were provided. In experiment 3, each participant explored two rooms (i.e. to keep conditions comparable with respect to experiment 2).

Retrieval Session
During the retrieval session, which was the same for all three experiments (i.e. no body or control object was shown), subjects were informed that they would be immersed in the same rooms again. They performed a total of three blocks of 40 trials (each lasting 10 seconds).
Within the three blocks of 40 trials, we presented 10 trials, which were exactly the same as during the original encoding session (i.e. including the same 10 objects). The remaining 30 trials were different and had either 1, 2 or 3 new objects replacing the respective number of objects shown during the encoding session. The blocks and individual trials in each block were presented in a randomized order. Participants were free to re-explore the virtual scenes for 10 seconds, after which they were asked two questions that were shown on the HMD. First, participants performed a two-alternative forced choice task (yes/no) whether the virtual scene shown during the retrieval session corresponded to the virtual scene during encoding (recognition task) ("Is the scene exactly the same as when you first saw it?"). Participants indicated their response with a wireless computer mouse. Second, participants were asked how confident they were about their answer (via a rating scale projected in the HDM; range from 0 (low) to 9 (high confidence)).

S tatistical analysis
In experiment 1, an independent samples t-test for hit rate and false alarm rate was applied to test whether ABM performance differed depending on delay (i.e. immediate x one-hour delayed condition). Independent sample t-test were further used to analyze whether the hit rate and false alarm for ABM confidence ratings differed depending on delay. A mixed analyses of variance (ANOVA) with the number of objects changed (i.e. 1 object, 2 objects or 3 objects) and delay (i.e. immediate x one-hour delayed group) was performed. Further, a 2 x 3 mixed ANOVA was run to understand the effects of delay (i.e. immediate x one-hour delayed groups) and number of objects changed in a room (i.e. 1 object, 2 objects, 3 objects) for the ABM confidence for the false alarm rates. Where appropriate, Greenhouse-Geisser corrections of degrees of freedom were used. Significant ANOVA effects were explored by post-hoc tests using Bonferroni correction. The significance level was set to alpha 0.05.
In experiment 2, we performed a mixed analysis of variance (ANOVA) with delay (i.e. immediate x one-hour delayed groups) and body (i.e. body x no-body condition) on ABM performance for hit rate and false alarm rate. Further, another 2 x 2 mixed ANOVA was performed for ABM confidence (for false alarm rates) with the factors retrieval time (i.e. immediate x one-hour delayed groups) and body (i.e. body present x body absent). Further, a 2 x 2 x 3 mixed analysis of variance (ANOVA) was performed in order to test the effects of delay (i.e. immediate x one-hour delayed groups), body (i.e. body x no-body condition) and the number of objects changed (i.e. 1 object, 2 objects or 3 objects). Similarly, a three-way mixed ANOVA was run to understand the effects of delay (i.e. immediate x one-hour delayed groups), the body condition (i.e. body x no-body condition) and the number of objects changed in a room (i.e. 1 object, 2 objects, 3 objects) on the ABM confidence (for the false alarm rates).
Where appropriate, Greenhouse-Geisser corrections of degrees of freedom were used.
Significant ANOVA effects were explored by post-hoc tests using Bonferroni correction. The significance level was set to alpha 0.05.
In experiment 3, an independent samples t-test was applied to test whether ABM performance differed in the no-body versus object condition. This was done for hit rate and for false alarm rate. An independent sample t-test was also used to examine whether ABM confidence false alarm differed in the no-body versus object condition. A mixed analyses of variance (ANOVA) with the number of objects changed (i.e. 1 object, 2 objects or 3 objects) and body (i.e. no-body x object) was performed. Similarly, a 2 x 3 mixed ANOVA was run to understand the effects of body (i.e. no-body x object) and number of objects changed in a room (i.e. 1 object, 2 objects, 3 objects) for the ABM confidence for the false alarm rates. Where appropriate, Greenhouse-Geisser corrections of degrees of freedom were used. Significant ANOVA effects were explored by post-hoc tests using Bonferroni correction. The significance level was set to alpha 0.05.

Experiment 1 (Immediate versus one-hour delayed condition)
Participants in the delay group showed a significant decline in performance compared to the immediate memory recognition group. Mean hit rate was significantly lower in the delay group We next examined whether performance in the present task depended on the number of objects changed within each immersive 3D scene. This analysis was conducted on the false alarm rate (as no objects changed for hits, by definition). As predicted, analysis revealed a significant main effect for the number of objects changed (F (2, 58) = 52.85, p < 0.0005, partial η 2 = 0.64) (Figure 3a). Pairwise comparisons were performed for statistically significant main effects and revealed that subjects made progressively fewer false alarms with increasing number of objects (all p-values < 0.0005).
There was also a statistically significant main effect for the number of objects changed (F (2, 58) = 4.163, p = 0.02, partial η 2 = 0.12) (Figure 3b), revealing that subjects were progressively more confident in their performance with increasing number of objects that were changed between both sessions. These data show that subjects made more recognition errors and were less confident in conditions in which less objects were changed between encoding and retrieval.

Experiment 2 (Body versus no-body condition)
Data for hit rates showed a significant two-way interaction between the time of retrieval and  (Figure 4a). The same analysis for false alarms rate did not reveal any differences F (1, 30) = 0.002, p =0.96, partial η 2 = .00 (Figure 4b). These data show that recognition of immersive 3D scenes, that also include the first-person view of the subject's body, mimicking real-life experience is modulated and enhanced with respect to the same scenes without such a bodily view.
Confidence ratings for hits did not reveal any differences between the time of retrieval and during encoding or not, our subjects' confidence was equal across conditions. These data from experiment 2 show that subjects made more recognition errors and were less confident in conditions in which less objects were changed between encoding and retrieval, as in experiment1.

Experiment 3 (object vs no-body condition)
There was no significant difference in hit rates for subjects in the object condition ( There was no significant difference between the no-body and object conditions.
We also tested whether the confidence in the performance accuracy depended on the number of

DISCUSSION
The present study allows us to draw three major conclusions. First, the present VR setup permits to measure recognition memory for 3D scenes that are immersive, rich in contextual detail, and that further integrates the moving body of the participant in online fashion. Our VR setup, thus, approaches real-life experiences in controlled laboratory conditions. Moreover, the present VR setup allowed us to project the same 3D virtual scenes during the encoding and retrieval sessions, providing us arguably with a level of experimental control that is comparable to investigations in non-episodic memory. Second, applying this new setup we report that recognition memory for the tested VR scenes depends on the delay and on the number of changed elements between encoding and retrieval, comparable to findings for verbal and visual-spatial memory. Third, we show that viewing one's body as part of the virtual scene during encoding enhances delayed retrieval. This body effect was not observed when no virtual body was shown or when a moving control object (instead of the virtual body) was shown, suggesting that embodied views lead to body-specific performance changes, as reported in studies investigating BSC.

An experimental VR setup that controls real-life like episodes during encoding and retrieval
Most prior laboratory-based EAM studies used cue words or images to trigger memory retrieval and mental time travel to the past in a controlled fashion 5,6,12,31,59,76,90,91,83,92 . However, these studies controlled only for memory retrieval but not for memory encoding 4 . Contrary to these previous studies, we exposed our participants to rich and immersive real-life scenes without the need for explicit mental time travel. Unlike earlier computer-based scenarios, we also did not present participants with artificial scenarios (simulated events in 3D), but immersed them into 360° video recordings of everyday real-life scenes that we digitalized for the encoding and retrieval sessions. Using the present naturalistic and controlled VR setup, we ensured that our participants experienced virtual 3D scenes with congruent multisensory bodily information (visual, motor, vestibular); these approach real-life experience as compared to classical virtual computer game tasks that have been used for episodic memory investigations in the past 93,94 . Thus, the present VR technology and future improvements of it will open new possibilities for conducting episodic memory research under ecologically valid experimentation in the laboratory by providing not only the ability to precisely design all stimulus aspects, but also to replay fully controlled sequences of real-life events.

Delay and number of changed objects modulates recognition memory performance
Our data reveal two classical episodic memory findings. Recognition memory for real-life like scenes decays with delay and improves depending on the number of items that were changed between encoding and retrieval. Previous EAM research is compatible with these findings, but has not been able to test or quantify this directly. Specifically, while associative recognition memory for words or pictures [95][96][97] and EAM 98 we only tested short delays (i.e. one hour), our data show that subjects remembered 3D scenes better when tested immediately after encoding as compared to delayed retrieval. Our second predicted finding that recognition memory was better when more items were changed between the encoding and retrieval is also compatible with classical findings concerning the recognition of visual changes when testing long-term memory for spatial scenes, complex figures (including faces), or short texts 108,109 , further revealing the experimental validity of the present setup for research in episodic memory.

Embodiment and episodic memory of life-like events
Besides reproducing classic memory effects, the present study also reveals a new finding, i.e. Moreover, this effect has been shown to be body-specific by demonstrating that different noncorporeal objects shown from the same position and viewpoint do not alter BSC 43 . Here, we extend this BSC principle to memory research by showing in experiment 2 that the recognition of 3D scenes that included within the first-person view also the subject's body (as is characteristic of normal everyday perception) was modulated and significantly enhanced with respect to the same scenes without such a bodily view. This is compatible with previously reported effects for multisensory bodily perception 48,49 and BSC 37,39,44 . These BSC studies showed that visuo-tactile perception, as well as self-identification and self-location towards a seen human body or body part are enhanced when the body is shown in congruent position with respect to the subject's body. Accordingly, we argue that the present body effect on the recognition memory of 3D scenes is comparable to similar effects in multisensory perception and BSC (i.e. for review see 43 ) as well as a number of cognitive processes, where self-related bodily information is critical. For instance, viewing the body increases tactile perception 110 , modulates interpersonal tactile responses 111,112 , affects social cognition 113,114 , and concept processing 55 .
It could be argued that the enhanced EAM performance of experiment 2 could relate to differences in the amount of visual information provided in both conditions (higher in the body versus the no-body condition) or higher salience or attention due to the additional inclusion of the tracked body in the body condition. First, we note that addition of the tracked body actually covers or hides parts of the virtual scene and may have thus incidentally hid some of the changed items and should thus rather decrease recognition memory. Yet, the opposite was observed in experiment 2. However, in order to formally investigate the potential role of differences due to vision or attention between conditions we compared, in experiment 3, the nobody condition with a condition in which subjects viewed a non-bodily control object that was moving congruently with the participant's body in real-time. Data from this experiment revealed no memory improvement in the object condition, arguing against a visual or attentional account and further corroborating our proposal that the present recognition enhancement is due to multisensory-motor bodily stimulation that has been shown to be crucial for BSC 36,42,49,115 and characteristic of normal everyday experience. These data also argue against the possibility that the present body effect on recognition memory can be generalized to an embodied object as the object condition did not induce any performance changes. By revealing bodily effects in the present EAM paradigm, we thus link BSC to EAM, extending earlier memory work 56 that has focused on contributions of the first-person perspective in autobiographical memory or of vestibular processing on EAM 116 . Finally, based on these data we argue that the brain mechanisms of BSC are linked to those of autonoetic consciousness that are of fundamental relevance to EAM. Autonoetic consciousness is the ability to mentally travel back in subjective time and recollect one's previous experiences 2, [18][19][20] and the present data suggest that multisensory bodily processing during encoding and remembering are not only of relevance for the conscious bodily experiences of self-identification, self-location, and first-person perspective 37,36,39,[44][45][46][47] , but also autonoetic consciousness.

Confidence and episodic memory
Does confidence mimic these changes in episodic memory performance? We report, as predicted, that confidence increased jointly with memory recognition improvements for conditions in which more objects were changed. This finding is in line with several studies showing that confidence in everyday, non-arousing EAM, measured by remember/know paradigms and recollection questionnaires, declines together with the objective memory performance 95,98,99 . However, our data also show that confidence levels dissociate from memory performance, as delay dependency and the view of one's body (experiment 2) during encoding modulated recognition memory, but not confidence levels. Further research needs to target objective memory performance and subjective confidence using real-life scenes as tested with the present VR setup. The differential delay-and body-effects in the present study suggest that memory performance and confidence rely on distinct functional mechanisms 117    After a period of familiarization with the immersive VR setup, participants performed the encoding session (10 minutes) during which they were exposed to different life-like 3D scenes ( Figure 1A). Scenes were characterized by a room that contained different objects (table, photocopy machine, pen, etc.). In experiment 1, one group of participants performed the retrieval session (30 minutes) immediately after the encoding session or after a one hour delay (see main text for further detail). Figure 1B-D shows the different conditions during the encoding session that we used in experiments 1-3 (the retrieval session was the same across all experiments). Thus, participants always saw the same 3D scenes on the head-mounted display, but the body of the participant was either not seen at all ( Figure 1B; no-body condition), seen as part of the 3D scene ( Figure 1C; body condition), or instead of the body a control object was seen ( Figure 1D; control condition).