In current educational environments, dynamic visualizations such as animation and video are increasingly used for displaying dynamic systems or processes. For example, animations are used to show the formation of lighting (Mayer and Chandler 2001), blood pumping through the heart (De Koning et al. 2010a), or first-aid procedures (Arguel and Jamet 2009). One reason why dynamic visualizations are found so widely is that many people think it is easier for learners to form an internal representation of the dynamics of a system when learners can perceive these dynamics directly, rather than when they have to imagine or mentally infer the movements from static visualizations (e.g., Hegarty et al. 2003).

However, research has shown that animations often do not result in more effective learning than static visualizations (Tversky et al. 2002). According to Ayres and Paas (2007), one of the reasons that animations are not so effective is because they are transient by nature and learners are thus required to select and process new information while simultaneously remembering and integrating previously presented information. Because learners have a working memory with limited capacity and duration (Cowan 2001; Miller 1956), animations are likely to create high (ineffective) working memory load that will hinder learning.

Nevertheless, in a meta-analysis, Höffler and Leutner (2007) showed that in some cases, animations were more effective than static graphics, especially when they involved procedural motor knowledge. Also, recently a number of studies have demonstrated the advantage of dynamic over static visualizations for tasks involving human movements, like first-aid procedures (Arguel and Jamet 2009), folding origami figures (Wong et al. 2009), and knot-tying and puzzle construction (Ayres et al. 2009). In line with these findings, Van Gog et al. (2009) argued that dynamic visualizations are most effective for tasks that involve human movements such as surgical procedures or sports but much less for tasks that involve non-human movements such as mechanical or chemical processes. According to Van Gog et al., watching another human perform a task is a form of observational learning, for which humans rely on the mirror neuron system. Observing someone else performing an action activates the same cortical circuits in the brain (i.e., the mirror neuron system) that are involved in executing that action oneself (Rizzolatti and Craighero 2004). An animation showing human movement will thus automatically activate the mirror neuron system. According to Van Gog et al., this will help learners deal with the movement information by priming the execution of similar actions. The difficulties related to processing transient information in working memory may thereby be reduced and people’s understanding of the observed action will be increased. In contrast, an animation showing non-human movements will not activate the learner’s action system in the brain. Van Gog et al. conclude that this might explain why learning from animations involving non-human movements is often not so effective.

Although the mirror neuron argument seems a plausible explanation, it remains rather speculative because the exact mechanisms of how the activation of mirror neurons translates into learning benefits have yet to be identified. Nevertheless, it is clear that involvement of human movement is a key to understanding dynamic visualizations. But does this imply that dynamic visualizations will only be effective for learning if they depict human movements? In practice, many dynamic visualizations are not about human movements but about subjects such as the process of lighting formation, the functioning of a tire pump, or even abstract subjects like probability calculation (see Schnotz and Lowe 2008). In the present article, we would like to argue that learning from all kinds of visualizations, and not just for motor-related tasks, might be enhanced by involving the learner’s own motor system. Our central claim is that applying an embodied perspective to the design of animations will facilitate understanding of dynamic systems, irrespective of whether the movements depicted in the system are human or not. To support our argument, we will first briefly describe how mental representations of dynamic systems are formed using embodied theories of cognition as a theoretical framework. Next, recent findings from the fields of gesturing and learning, cognitive science, and neuroscience are discussed, which allows us to derive some concrete guidelines for how human movements and physical action could be used to foster learning from animations. Finally, we will present recommendations for directing research on applying embodiment to animations and end with a discussion on the promises of this line of research.

Cognition Is Grounded in Action

There is a long tradition of considering the role of human motor actions for learning in educational and developmental theories. Piaget, for example, already assumed that people’s actions form the basis for all learning (Piaget and Inhelder 1969). He proposed that children initially understand and act upon their environment only with sensorimotor actions, then come to understand symbolic representations concretely, and finally are able to perform formal operations on abstract information. According to Piaget, imitating a concept with one’s own body is fundamental to form a mental representation of the concept. Likewise, Bruner (1966) emphasized the role of action in learning as he claimed that learning occurs through a learner’s engagement with object manipulation so that an accurate mental representation of the object can be formed. In addition, well-established and widely used teaching methods including activity-based learning and hands-on activities have historically played an important role in teaching scientific concepts (Rapp and Kurby 2008).

Recently, emphasizing the importance of motor information in learning has received renewed attention as researchers have become interested in theories of embodied cognition. Embodied theories of cognition propose that cognition or psychological processes are influenced and shaped by the body including body morphology, sensory systems and motor systems as well as the body’s interaction with the surrounding world (Barsalou 2010; Glenberg 1997; Zwaan 1999). That is, perceptual and action-related processes are tightly linked to each other as well as to more abstract and higher-order cognitive processes such as language and mathematics (Barsalou 1999). The notion that cognition is grounded in perception and action is based on widespread findings that bodily states can cause cognitive states as well as be an effect of these states (e.g. Barsalou et al. 2003; Lakoff and Johnson 1980).

Most accounts of embodied cognition focus on the role of simulations in cognition (e.g. Barsalou 1999; Decety and Grèzes 2006). Simulations are defined as the reenactment of perceptual and motor states that were acquired during experiences with the physical world. During these experiences (e.g., throwing a ball), patterns of brain activation are formed across multiple modalities, which are then integrated into a multimodal representation in memory (e.g., how a ball feels, looks, the action of throwing). Later on, when retrieving the experience from memory, the multimodal representation captured during the experience is reactivated to simulate how the brain represented perception and action. Even mental representations of abstract concepts are formed by simulations of perceptual experiences and interaction of the body with the environment (e.g., Barsalou 1999). Abstract concepts are understood by mapping sets of correspondences of concrete concepts to those of abstract concepts, and abstract concepts are therefore supposed to be situated in physical experiences associated with concrete concepts (Barsalou and Wiemer-Hastings 2005). According to this account, all cognitive activities are supported by simulation mechanisms that share a common representational system with neural systems ordinarily used for perception and action. This is in sharp contrast with mainstream cognitive psychology from the 1960s, 1970s, and 1980s that considered cognition in terms of manipulation of abstract or amodal symbolic representations within a network of other symbols, such as semantic networks. It is now relatively well-accepted that this view of cognition is at least incomplete because it cannot explain how symbolic representations are grounded when interacting with the world (e.g., de Vega et al. 2008).

An embodied approach with a specific focus on the contribution of action to comprehension is the Indexical Hypothesis (Glenberg and Kaschak 2002; Glenberg and Robertson 1999). This theory supposes that (language) comprehension requires objects (whether perceived directly or through abstract symbols like words) to be mapped onto action experiences. These experiences may be provided by actual interactions with the objects or by reactivation of patters of brain activity that were formed during prior interactions (i.e., simulation). Hence, the perception of relevant objects triggers affordances for action (see Gibson 1979) that are stored in memory. Moreover, reasoning about (future) actions also relies on remembering affordances while suppressing perception of the environment (Glenberg et al. 1998).

In sum, embodied approaches to cognition suggest that mental simulations not only influence action observation but also how people acquire, interpret, and understand action information in their environment (e.g., Borghi et al. 2004; Dijkstra et al. 2007; Gallese and Lakoff 2005; Glenberg and Kaschak 2002).

Evidence for a tight link between action and comprehension

An increasing amount of empirical research has provided strong evidence for embodied accounts of action comprehension (for reviews, see Glenberg 2007). Using both brain imaging techniques and behavioral tasks, researchers have demonstrated that the neural substrates used to perform an action are recruited when observing someone else performs an action and that this neural activation enables action understanding (e.g., Calvo-Merino et al. 2005, 2006; Flanagan and Johansson 2003). For example, the observation of actions done with different effectors (i.e., hand, foot, mouth) activates the same motor representations that are active during the actual execution of those same actions (Buccino et al. 2001).

Another line of evidence suggests that lower-order sensorimotor representations arising from mental simulation also play an important role in a variety of higher-order cognitive tasks such as language comprehension (e.g., Dijkstra et al. 2007; Gallese and Lakoff 2005; Glenberg and Kaschak 2002; Zwaan and Taylor 2006). For example, (Zwaan et al. 2004) demonstrated that readers mentally simulate a motion described in language. Participants listened to a sentence suggesting a motion toward or away from the participant (e.g., “The pitcher hurled the softball to you” or “You hurled the softball at the shortstop”), followed by two consecutively presented pictures of the described object (e.g., the baseball). The second picture was either smaller or larger than the first picture, thus suggesting movement toward or away from the participant. Judgments about whether the two pictures had the same size were made faster when the implied movements of the sentence matched to that of the implied movement described in the sentence. These results, together with other studies, suggest that the mental representation responsible for our comprehension of motion sentences likely involves perceptual simulation of the described events (Stanfield and Zwaan 2001; Zwaan et al. 2002).

Moreover, several behavioral studies focusing on action effects have shown that perceptual simulation of a described action can affect motor performance and vice versa. For example, in an study by Glenberg and Kaschak (2002), participants listened to a sentence describing an action such as “He opened the drawer” and had to decide whether or not the sentence made sense by making an arm movement toward the body (i.e., compatible action) or a movement away from the body (i.e., incompatible action). This produced an action–sentence compatibility effect, meaning that responses were faster when the physical movements were compatible with the implied movements described in the sentences. Similarly, Zwaan and Taylor (2006) had participants listen to sentences that implied rotation in one direction (e.g., “Turn up the volume”), while judging the sensibility of the sentence by turning a knob. When the direction of the motor response corresponded to the direction of the motion implied by the sentence, sensibility judgments were faster. Sensibility judgments were slower when the turning direction of the motor response was opposite to that of the direction implied by the sentence (see also Dale et al. 2007; Solomon and Barsalou 2001). Finally, (Klatzky et al. 1989) showed that sensibility judgments of sentences such as “Throwing a dart” were made faster when participants held their hands in a shape appropriate for the action of throwing a dart.

These behavioral studies demonstrating motor resonance in cognitive tasks like language comprehension are consistent with results from neural imaging. For example, Hauk et al. (2004) demonstrated greater activation of motor cortex controlling the hand while listening to verbs such as “pick” and greater activation of motor cortex controlling the leg while listening to “kick.” Such activations of motor areas in the brain have also been found with complete sentences (Tettamanti et al. 2005). Similarly, lexical decisions to words indicating leg or arm movements are facilitated when arm or leg areas in the brain are stimulated using transcranial magnetic stimulation (Pulvermulller et al. 2005).

Thus, it seems evident from the presented studies that understanding an action through direct observation or through linguistic description entails a mental simulation of that action which is based on the reactivation or reenactment of experiential traces associated with the described action.

Grounding Learning

Recently, researchers have started to investigate the possibility that we do not only use our bodies to understand actions but that directed actions can also guide learning. Several studies have shown that manipulating learners’ actions resulted in better text comprehension (e.g., Glenberg et al. 2008), or better problem solving (e.g., Thomas and Lleras 2009). Even learning about concepts or actions that do not spontaneously evoke motor resonance is facilitated when the content can in some ways be linked to people’s own body or action repertoire. For example, in the study by Thomas and Lleras (2009), participants worked on Maier’s classic string problem. This problem required participants to tie together two strings hanging from opposite sides of the room, with the aid of some seemingly irrelevant objects like a wrench and a paperback book. The strings were too short to just pick up one string and walk to the other string, so solving the problem required some solution strategy. The most efficient strategy was to attach an object to one of the strings and make it swing, then walk to the other string, catch the swinging string, and tie both strings together. During the problem-solving exercise, participants had several short exercise breaks. As an exercise, half of the participants made swinging movements with both arms, congruent with the movement of strings in the optimal solution strategy, whereas the other half stretched both arms straight out, incongruent with the solution strategy. After 16 min, 85% of the participants in the congruent movement condition had solved the problem, compared to 62% of participants in the incongruent movement condition. As participants were not aware of the link between the exercises and the problem solution, this suggests a direct link between the arm movements and problem-solving success.

In line with embodied theories of cognition, Thomas and Lleras’ study (2009) is based on the assumption that the construction of a mental representation is affected by physical activity during learning. The level at which directed actions in this study had an impact is thus in the sensorimotor experiences during learning. This suggests that if we wish to influence learners’ mental representations of dynamic systems, we should focus on the perceptual and motor experiences during learning from dynamic visualizations.

Rapp and Kurby (2008) have suggested that enriching or improving people’s mental representations should be directly guided by the external representations that are presented to learners. As animations provide a direct depiction of the movements in a dynamic system, the main challenge is to support the construction of a mental representation of these movements in the learners’ minds. From an embodied perspective, this means that the perceptual and motor experiences during learning should be related to the animation’s movements. This sensorimotor information then becomes part of the mental simulation and together with information stored in other modalities can be retrieved when necessary to form a multimodal representation of the depicted movement. In this way, learners’ mental representation of movements (whether they are human or not) can be enriched and their understanding of animations be improved.

In the following, four strategies are presented that could make learning from animations more effective by involving human action:

  • Let the learner follow the movements using gestures.

  • Make the learner manipulate the movements through interaction with the animation.

  • “Embody” the movements in the animation using a body metaphor.

  • Stimulate learners to reconstruct the perceptual processing of the movements at the test.

For each strategy, research investigating influences of action on cognitive performance is discussed that serves as backup for the strategy. In addition, for each strategy, some concrete design implications are proposed.

Gestures

Gestures are a common, integral part of communication, and it is generally agreed that gestures can serve a wide variety of functions in communication (Goldin-Meadow 2003). Recent research within the embodied cognition framework explicitly focused on gesturing and showed that gestures can play an important role in the learning process. One way gesturing might influence learning is by instructing people to “gesture along” during the learning task. For example, Cook et al. (2008) found that encouraging children to make gestures while learning a new arithmetical strategy are more likely to retain the knowledge they gained during instruction. In contrast, asking children to speak but not gesture during learning did not result in retention of the strategy. Cook et al. suggested that gesturing facilitated learning because it provides an alternative way of representing information, embodied in people’s actions. Similar findings have been obtained when children were encouraged to gesture before instruction (Broaders et al. 2007). Moreover, only meaningful gestures and gestures congruent with the learning task improved learning of mathematical concepts (Cook and Goldin-Meadow 2006).

In addition, Goldin-Meadow et al. (2001) argued that gesturing supports learning because it externalizes cognitive processes and thus reduces the task processing demands. In their study, Goldin-Meadow et al. asked children and adults to explain how they solved a mathematical problem while simultaneously remembering a list of letters or words. Both age groups remembered more items when they gestured during their explanations than when they did not gesture. It was concluded that gesturing freed up cognitive resources needed for the explanation task, allowing the speaker to allocate more cognitive resources to the memory task. In a related study, Ping and Goldin-Meadow (2010) showed that gestures reduced cognitive load even when speakers talked about objects that were not present and therefore could not be directly indexed by their gestures. Especially when gestures add information to a message rather than reproduce the same information, working memory load appears to be reduced.

There is also evidence from learning with multimedia that people do use gestures when they need to learn something demanding and that this facilitates learning (e.g., Schwartz and Black 1996). For example, results from a study by Hegarty et al. (2005) suggested that gestures were very helpful in mental animation of movements during a spatial reasoning task. Participants were asked to determine how one part of a mechanical system of rotating gears would move, if another part moved in a particular way. Results revealed that gestures contributed to quickly and accurately determining the direction of rotation. According to Hegarty et al., gesturing allowed learners to offload some cognitive processes during the demanding mental animation task onto the gears thereby freeing up cognitive resources that can be used for trying to understand the task. In addition, gesturing also created an opportunity to map internal cognitive processes to physical objects in the environment.

Besides gestures produced by hands, other body movements such as arm movements or even eye movements have been shown to facilitate learning. For example, Dijkstra and Kaschak (2006) showed participants a series of cards with an action verb printed on it and asked them to either say the word out loud, act out the action using any part of their body, or retrieve memories associated with the action on the card. Results showed that memory performance for both the enactment group and the associating-memory group was better than the saying-out-loud group. This study suggests that consciously enacting a described action can facilitate its retrieval from memory. Other studies like the study by Thomas and Lleras (2009) discussed earlier showed that arm movements can facilitate cognitive performance even if participants are unaware of the relationship with the actual learning task.

In sum, gestures either made by hands or with other body parts facilitate learners’ understanding of the to-be learned information or increase their problem-solving success by grounding comprehension processes in physical action. These findings suggest that in order to support learning from animations, making people gesture during study could be one way to achieve this goal.

Observing gestures during learning

Another way gestures might facilitate the learning process is by incorporating these gestures in the instruction. In general, it is found that learners understand an instruction better if it is provided by a teacher talking and gesturing than if it is provided in speech alone (Kelly 2001). This has been observed in various learning tasks including mathematical equivalence tasks (Church et al. 2004; Perry et al. 1995) and tasks involving symmetry (Valenzeno et al. 2003). Valenzeno et al. (2003), for example, investigated the influence of observing gestures during a lesson about the concept of symmetry. Children watching a videotaped lesson of a teacher that gestured during his or her explanation had a better understanding of the concepts that were taught than children watching a teacher not making gestures. In addition, it has been shown that learners understand spoken instructions better when the words in the instruction are accompanied by pointing hands linking the words to objects visible in the environment (i.e., “indexing”) than when they are not (Glenberg and Robertson 1999). Even gestures referring to concrete objects that are not present can facilitate learning. Ping and Goldin-Meadow (2008) gave learners Piagetian conservation tasks with or without gesture and with or without the actual objects described in the tasks. They found that whether or not these objects were present, instruction with speech and gesture resulted in better learning about conservation than instruction with speech alone. This suggests that even gestures referring to non-present objects during instruction can promote learning.

Furthermore, Singer and Goldin-Meadow (2005) instructed learners in mathematical equivalence problems by teaching them one or two correct problem-solving strategies in speech. In addition, their instruction either contained no gestures, gestures conveying the same strategy as in speech, or gestures conveying a different strategy as in speech. It was found that learners profited from instruction with gestures but only when the gestures conveyed a different strategy than the one expressed in speech. These results suggest that if a problem-solving strategy needs to be learned, providing a second strategy can be helpful for learning but only if this additional strategy is presented in gestures. The use of multiple modalities enables processing both strategies by facilitating information processing and reducing cognitive load (Ping and Goldin-Meadow 2010), due to larger motor involvement.

In sum, these findings suggest that observing gestures performed by another agent aids understanding and facilitates learning. This implicates that in order to support learning from animations, letting people watch gestures embedded in the animation could be fruitful way to achieve this goal.

Implications for animations

The findings discussed above suggest that the effectiveness of learning from animations can be increased by making gestures or observing gestures during learning. These gestures should be related to the movements depicted in the animation, in order to aid retention and comprehension of the dynamic system displayed.

A straightforward suggestion would be to instruct learners to follow the movements in an animation with their hand or index finger. For example, if we want learners to understand the movements in a tire pump via animations, we could simply instruct them to manually follow the to-be learned movements in order to ground the observed movement in their own body movements. In turn, comprehension of the depicted system is likely to be facilitated as learners can directly link the animation’s movements to their own actions. Asking learners to reenact the movements they have just seen in an animation would be another way to foster learning from animations.

An alternative approach would be to include the gestures within the animation. An obvious option would be to include a so-called animated pedagogical agent in an animation. These lifelike characters serve as artificial tutors in multimedia instructions, and they support the learner using verbal and non-verbal modes of communication, including gestures (Atkinson 2000). The literature on the effectiveness of these gesturing agents reveals mixed effects. Better learning performances were obtained in experiments conducted by Atkinson (2002) and Lusk and Atkinson (2007) and Baylor and Kim (2009), whereas no effects or even negative effects on learning were found in some other experiments (Choi and Clark 2006; Craig et al. 2002). However, the gestures in these studies were only used for directing learners’ attention and not for embodying the movements displayed in the animation. According to the embodied cognition view, in order to be effective, gestures should be related to the movements in an animation. For example, in teaching the dynamics of rotating gears, an on-screen pedagogical agent could imitate the rotations in gesture by following the movements in an animation with his or her hands rather than just point to the gears when providing an explanation. An even more simple solution would be to include just a pointing hand or finger in the animation that follows the movements.

So in sum, learning from animations could be improved by letting learners “follow the movement” either by gesturing themselves, or by watching someone else’s gestures.

Manipulation

It is becoming increasingly evident that enactment of actions leads to better retrieval than just verbal description of these actions (e.g., Koriat and Pearlman-Avnion 2003). One line of evidence for the benefits of active manipulation of instructional materials comes from studies on language comprehension. Glenberg et al. (2004) describe an experiment where young children read a text about activities in a particular scenario (e.g., a farm scenario) and manipulated real toys (e.g., animals, farmer, barn) that were in front of them so they could portray the actions in the passage (e.g., the farmer walked in the barn). Compared to reading alone (i.e., no toy manipulation), children’s manipulation of toy objects as directed by a narrative improved their story recall, understanding of the spatial relationships, and these children were better at drawing inferences from the story. In addition, after a brief training, similar findings were obtained when children were instructed to imagine manipulating the toys. In addition, manipulating images of toys on a computer screen benefits children’s learning to a similar extent as physical manipulation of toys (Glenberg et al. 2011). Glenberg et al. (2011) also showed that, compared to re-reading, manipulating toys on a computer results in better comprehension after an interval of 1 week. These findings encourage the use of active manipulation of objects during instruction and suggest that his is an effective way to enhance comprehension. Extending the findings of Glenberg et al. (2004), Marley et al. (2007) have demonstrated that even when learners observe another person manipulating toys to represent text content, comprehension is improved compared to only reading the text.

Furthermore, manipulation of instructions presented as virtual 3-D visualizations or real-life tasks also seems to benefit learning. Allowing learners to explore the instructional content from different perspectives by (manually) changing viewpoint is an effective way to foster learning. For example, in a recent study by Keehner et al. (2009), participants watched a 3-D visualization and could either change the view of the stimulus object by rotating a real plastic egg they held in their hands or they could not interact with the visualization. Results showed that active manipulation of the object and hence its appearance on the screen resulted in better task performance than just looking at the visualization. Similarly, it has been shown that performance on a spatial inference task was improved when participants had access to visualizations of an object that allowed them to rotate the object with the mouse pad available with the computer (Cohen and Hegarty 2007). Furthermore, Bivall et al. (2011) showed that in manipulation tasks, providing sensory information to learners can improve their understanding of visualizations. Two groups of participants interacted (i.e., move and rotate) with a 3-D visual protein model allowing them to look at the moving virtual molecules but differed in whether they could feel the repulsive and attractive forces between molecules through tactile feedback with a haptic device. Results revealed that learners receiving tactile feedback learned more about the protein model and included more force-based statements in their explanations than those learning without the haptic interface. Similarly, it has been shown that medical procedures can be learned more effectively if learners do not only observe the to-be-learned skills but also receive tactile feedback during learning (Dang et al. 2001).

These studies suggest that interacting with instructional materials by manipulating real objects that produce changes in a referent on the screen or by direct manipulations to the object itself facilitates learning. Additionally, involving sensory information in the form of tactile feedback in a manipulation task might also improve learners’ understanding.

Research investigating the value of learning through physical or virtual manipulation of materials also shows that physical manipulation of actual objects by learners is not always necessary for obtaining better learning and comprehension. In a study by Triona and Klahr (2003), for example, children learned to design unconfounded experiments. They used a computer interface that closely mimicked the physical materials and used the same instructional script for both physical and virtual conditions. Triona and Klahr showed that children learned this control of variables strategy and transferred this skill to a new domain equally well whether they manipulated a physical apparatus to construct experiments or whether they manipulated virtual object on a computer screen. Studies by Klahr et al. (2007) and Zacharia and Olympiou (2010) also provide evidence that manipulation of real objects does not seem to be a prerequisite for learning specific skills or causal relations in a task; the same learning performances can be obtained with virtual objects. Similarly, in the domain of multimedia learning, Ferguson and Hegarty (1995) showed that it does not make much difference for mechanical reasoning and problem solving whether the functioning of a dynamic system (i.e., pulley system) was learned from a real machine or via line diagrams. For both types of media, learners made equal improvements on the learning task and its understanding. Only differences were found on learners’ ability to transfer their knowledge, with people with hands-on experience performing better than those studying line drawings. The opportunities to manipulate a real object as well as an object’s realism were responsible for this difference. This suggests that, at least when providing equal opportunities to interact, it does not make much of a difference for people’s memory and understanding whether real or virtual objects are manipulated but that learners’ ability to transfer the acquired skills or knowledge is improved through manipulation. Nevertheless, differences in learner performance might emerge as a result of several aspects of the learning situation, such as the affordances an object provides to a learner. For example, Manches et al. (2010) investigated the effect of physical and virtual representations (i.e., blocks) on primary school children’s strategies in a numerical partitioning task. Results showed that the use of the mouse constrained learners who manipulated virtual objects to move only one piece at the time, whereas in the physical condition learners could use both hands to move multiple pieces simultaneously. Hence, fewer effective strategies were used in the virtual manipulative group resulting in worse performance. These findings suggest that the affordances of interactive tools may affect subsequent learning strategies and learning outcomes. Subtle differences relating to the study design such as these should therefore be carefully considered when using interactive features during learning.

The above findings are in line with research on movement understanding, where it has been shown that imagining object rotations occurs more quickly when people can physically rotate an object with their hands, even if they are not directly touching the object (Wexler and Klam 2001). For example, participants in a study by Schwartz and Holton (2000) had to study an object, mentally rotate it according to the instructions and then had to pull a string that turned a table on which the object rested so that the object was congruent with the mental image of the object. It was found that pulling the string facilitated participants’ ability to imagine the rotation of the object compared to mentally rotation alone. It thus seems sufficient for learners to use a rough approximation of reality (e.g., manipulating a computerized object via the mouse pad) or to just make available ways in which objects can be manipulated to facilitate comprehension by linking the instructional content to their own action repertoire.

Implications for animations

One well-known design principle in learning from animations is the interactivity principle, which states that animations are better understood when learners can control the pace of an animation (e.g., rewind, replay, start, stop) than when they receive a continuous presentation or when learners can act on what will happen next in the animation by changing parameters (Bétrancourt 2005). Results of previous studies investigating interactivity in animations are, however, mixed. Some studies have found significant advantages of interactivity (e.g., Schwan and Riempp 2004), whereas others have found no difference between interactive and non-interactive students (e.g., Lowe 2004). Control over pace is a simple surface feature at the interface level enabling learners to manipulate just the timing and order of the movements. The research on object manipulation during learning discussed above, however, suggests that learning from animations can be improved when learners manipulate the movements itself, and not just the pace of the animation. Active manipulation thereby provides a promising alternative way of how learners could interact with animations in order to improve their understanding. Though this way of interactivity is not very difficult to implement in animations, it is hardly used. The research on object manipulation suggests several possibilities for using manipulation in animations. Animations could be manipulated by allowing learners to manipulate real objects that translate to the referent on the computer screen such as when rotating a joystick or an actual object in a certain direction to turn a gear of a pulley system that is connected to the computer screen on which the results of the user’s actions are displayed. In addition, learners could manipulate a replica model of the system depicted in an animation. For example, when learning about the rotations of the sun around the earth via animation learners could (re)enact the movements shown in the animation during the presentation using the replica. The findings by Glenberg et al. (2008) suggest that even imagining manipulating an animation might facilitate learning, although this might require some practice. Besides the learning process, using a depiction of another person or a pedagogical agent in an animation manipulating a system should also facilitate learning (Marley et al. 2007). In each of these strategies, the possibility to link the movements to people’s own bodily experiences is fundamental to successful learning.

Body Metaphors

In contrast to concrete objects, abstract objects are not physical entities in the world. Even though our bodies do not have direct physical experiences with abstract objects, embodied theories of cognition suggest that the mental representations of abstract concepts are grounded in sensorimotor experiences because they are understood in terms of concrete concepts which are acquired through experiences with the environment (Barsalou 1999; Lakoff and Johnson 1980). Metaphors provide a good example of how abstract concepts are understood in terms of concrete concepts. For example, we talk about the abstract concept “relationships” as if they were journeys with beginnings, middles, ends, rocky parts, and smooth parts. In this respect, abstract cognition relies on bodily experiences with concrete situations.

Several recent studies have shown that (body) metaphors can strengthen learning, which might also have important implications for facilitating learning from animations. Wilson and Gibbs (2007) recently showed that comprehension of a metaphor that contains some reference to a bodily movement is enhanced if participants are actually making such a movement or imagining doing so. Engaging in or imagining doing a body action, such as chewing, before reading a metaphorical phrase, such as “chew on the idea,” facilitates the construction of the abstract concept as a physical entity, which speeds up comprehension of metaphorical action phrases. This process can also be reversed, by using a bodily metaphor for movement that will ground the mental representation of the movement. An interesting example of this process is given by Amorim et al. (2006). They embodied a simple mental rotation task by adding a “head” to the abstract stimulus that had to be rotated, or replacing the stimulus with a picture of a human body in a comparable posture. Both body metaphors led to faster reaction times and less errors on the mental rotation task. Amorim et al. (2006) argued that task performance was improved because the metaphors facilitated mapping of the cognitive coordinate system of one’s body onto the abstract shape. In turn, this spatial embodiment improves object shape matching. These studies suggest that understanding movements and action in animations is influenced by the extent to which learners can map their own body movements onto the displayed movements.

Implications for animations

The above-mentioned findings have important implications for learning from animations. Based on the Amorim et al. (2006) study, one suggestion how this could be done in animations is to add human characteristics to moving elements in the animation. For example, when asking learners to study how the rotation of one gear influences the rotation of adjacent gears, the gears could be replaced with human heads. This way, learners could make a bodily projection (Lakoff and Johnson 1999) to map one’s body axis to the referent body (i.e., head) in the animation. As another example, mechanical or biological animations often visualize processes that cannot be observed in real life such as increasing pressure in heart chambers. Understanding of these processes could be facilitated by relating this to concrete physical experiences with such processes like holding a balloon in your hands while one feels how the balloon expands as air is blown into it and how the balloon shrinks when air is released. For example, while studying an animation about increasing and decreasing pressure, learners could be instructed to place their hands on a small plastic ball which increases and decreases in size in congruence with the increasing and decreasing pressure shown in the animation.

Reconstructive Eye Movements

From memory research, it has been becoming increasingly evident that reenactment of behavior and specific “body” movements involved in the encoding phase can help during the retrieval phase. Recent examples include the finding that congruent body posture (Dijkstra et al. 2007) and reinstating effortful encoding procedures during test (Dewhurst and Brandt 2007) aid memory retrieval. Findings from these studies suggest that a multi-sensory memory trace of an event is constructed that can be retrieved more easily when the testing situation is sensorially congruent to the original event than when it is not congruent with the event.

Eye movement studies provide compelling evidence for the facilitating role of reenactment in memory retrieval of visual scenes by showing that eye movements can have a functional role in retrieval (Ferreira et al. 2008). Eye movements can especially support the processing of visuospatial aspects of mental representations. Participants listening to stories about skyscrapers while looking at a blank screen in front of them, often make eye movements in the direction (upward or downward) of the verbally described motion (Spivey and Geng 2001; also see Altmann and Kamide 2004). In addition, memory performance seems related to the extent that eye movements during study overlap with those made during the test phase. Spivey and Geng (2001) demonstrated that people systematically looked at the blank region of the screen when attempting to recall properties of an object (e.g., a skyscraper has levels) that previously occupied that location. Similarly, Laeng and Teodorescu (2002) reported two experiments in which participants first inspected a visual scene and were then asked to imagine the visual scene while looking at a blank screen. During both tasks, eye movement recordings were made and participants could move their eyes in the way they liked. Results showed that in the perceptual study task and the imagery task, participants fixated similar locations in a similar order. Moreover, the eye movements during imagery and perception tasks predicted memory accuracy. Stated differently, participants’ eye movements in the imagery task reenacted the eye movements from the initial study task.

In another series of experiments, Richardson and Spivey (2000), Richardson and Kirkham (2004), and Hoover and Richardson (2008) have extended these results to combinations of visual and auditory information. Participants watched a video clip on a computer screen of an actor (i.e., talking head or rabbit) verbally providing a piece of factual information. The talking heads appeared in turn in each of the four quadrants of a 2 × 2 grid, providing a different statement in each of the quadrants. Afterward, participants heard a statement and answered whether it was true or false while they looked at the blank grid. The results indicated that during the true/false decisions, twice as many fixations were made on the quadrant in which the relevant information had been provided.

Although these studies demonstrate a tight relationship between eye movements during encoding and retrieval, the findings do not provide conclusive evidence as to the causality of the relationship. That is, it is not possible from these studies to distinguish whether eye movements follow memory processes or vice versa. In fact, much controversy surrounds this issue and the topic is actively debated in visual search and eye movement literature (Nairne 2002). In a recent review, Ferreira et al. (2008) concluded that eye movements and memory processes most likely interact. People “look at nothing” during memory retrieval because reactivation of memory representations drives the eyes to previously viewed locations, and those viewings enhance subsequent memory retrieval (p. 1). Thus, irrespective of whether initial eye movements precede memory processes or vice versa, once eye movements are directed at previously meaningful locations, they can serve a facilitating role in memory retrieval.

In sum, the general finding is that people tend to show the same eye movement pattern during both encoding and retrieval of visual information, even if this information is absent during retrieval. A tentative suggestion from the above findings is that people’s memory performance could be facilitated by engaging in reconstructive eye movements during retrieval.

Implications for animations

Animation processing typically involves the perceptual identification of relevant movements from the depicted scene. According to an embodied account, the resulting memory trace includes all perceptual processes taking place during the learning experience. To reactivate the resulting mental model of the dynamic system, it might thus be very effective to support learners in a test situation in reconstructing the perceptual processes that took place when watching the animation. For example, learners could be provided with a static picture of the animation as a kind of spatial marker for memory retrieval of the dynamic information (Spivey et al. 2009). Instructing participants to look at specific locations in the static picture might further support the reconstruction of the movements. Also, asking participants to watch a blank screen while thinking about the answer to a test question might also automatically stimulate reconstruction of the eye movements and thus aid in memory performance.

Discussion

In this article, we have focused on the underlying nature of mental representations and its applicability and implications for improving learning from instructional animations about human and non-human movements. From the research reported above, it should be abundantly evident that human action plays a fundamental role in understanding a wide variety of cognitive tasks, even those that usually have no direct relation to people’s actions or bodies. Action and cognition are no longer seen as separate modules in the brain working independently. Rather, there are tight links between action and cognition, and the mental representations that are formed during cognitive tasks are grounded in the perceptual and motoric systems (Barsalou 2008). Action, either executed or simulated, fulfills a functional role in many cognitive processes including visual object recognition, motion perception, problem solving, understanding relations, and memory retrieval. In line with embodied theories of cognition, understanding movements involves, at least in part, a perceptual motor simulation of the movement. Moreover and perhaps most importantly from an educational or instructional perspective, guided actions such as gestures or object manipulation related to movements can influence cognitive performance. Several examples of how this could lead to better understanding and comprehension have been shown in the research reported above. However, research evidence on the utility of such linkage for educational practice is really just emerging.

Given the causal link between action and cognition, there seems good reason to expect that learning from animations could be facilitated by relating the movements and events to people’s own actions. The mental representations that we hope to engender in learners are directly influenced by the external experiences they have with the environment. Therefore, the external presentations offered to learners should guide their construction of a (simulated) mental representation of the content (Rapp and Kurby 2008). By designing animations depicting human and non-human movements in ways that align with how mental representations are formed (i.e., through perceptual and motor experience), it may be possible to help people understand those movements better. Instructional animations should then be created that exemplify and encourage sensory activities that help to understand movement depicted in the animation. Some of these activities might rely on perceptual features, while others might focus more on motor activities.

In our article, we have suggested four different strategies to enhance learning from dynamic visualizations. These “embodied” strategies have of course not been put to the test yet, at least not within the domain of instructional animations, so we are in dire need of research to establish their potential for educational practice. Moreover, it is yet unclear whether each strategy is equally suitable and/or effective for each kind of dynamic process, or each kind of learner, or for combinations of the two. To guide these research efforts, we will discuss several critical issues that could be taken into account when investigating each strategy.

Let the learner follow the movements using gestures

The general idea behind this guideline is that learners either follow a depicted movement using their own body parts (e.g., hands) or by observing someone else’s movements depicted within the animation. A potential limitation for this guideline is that learners can only follow one or two movements at a time. So for animations depicting several movements happening simultaneously, this may pose a real problem. Sequencing the movements might be a good option, for example by sequentially highlighting or cueing different movements (e.g., like in De Koning et al. 2010a). Another limitation might be that the use of the hand or other limb to follow a movement on-screen can interfere with perceiving the animation, especially as the hand will automatically cover parts of the screen. This could be solved by making the participants follow the movement not on the screen but on another surface, for example, by moving a mouse or other pointing device in front of them. An alternative would be to include the hand or other limb following the movements on-screen, by including it within the animation. Of course, watching a moving hand somewhat reduces the amount of motor activation that the learners would have generated, had they made the movements themselves. But this may be outweighed by the lack of interference caused by one’s own hands moving in front of the screen.

Finally, just “following” the movements, whether done by the learners or shown on-screen, is a rather passive strategy. It does not automatically induce constructive mental processes. Especially when learners have some prior knowledge and have no difficulty in processing the dynamics depicted in the animation, a more constructive strategy might lead to superior learning results. An interesting option would be to let the learners reconstruct the movement they just saw rather than follow it. This way, learners are actively processing the information, which contributes to the construction of a mental model of the dynamic system. Moreover, reconstructing the movement requires a lot of retrieval effort, which can aid in preventing memory decay, as is abundantly demonstrated in the testing effect literature (Roediger and Karpicke 2006).

In sum, using gestures to embody a movement depicted in a dynamic visualization seems a very promising strategy. It is relatively easy to implement and requires hardly any changes to the animation (except for the addition of a “moving hand” when including the gesture within the animation). How gestures can be used optimally will depend on many factors, such as the complexity of the movements in the animation and/or the learners’ expertise.

Make the learner manipulate the movements through interaction with the animation

Instead of making learners “follow” or “reconstruct” a movement, our second suggestion is to let learners manipulate the movements themselves by interaction with the animation. This way, the learner’s processing of a movement is automatically coupled to action, as each movement requires motor input from the learner. Again a number of issues should be considered.

First, an important issue is the design of the interaction. To “embody” the movement, it is yet unclear whether it is necessary that the interaction maps one-on-one with the depicted motion, or whether it is sufficient when the interaction sets off the motion but does not follow along. To give an example, is it necessary to rotate a wheel by “dragging it around,” or is it sufficient to “push a button” that makes the wheel spin? On the one hand, a “make the movement” interaction closely resembles our “follow the movement” strategy, with the only difference that the learner rather than the system is causing the motion. A “start the motion” interaction, on the other hand, requires much less motor activity, and the actual movement does not necessarily resemble the movement depicted on-screen. So, it seems that a “start the motion” interaction will lead to less motor resonance and thus less learning improvement than a “make the movement” interaction. On the other hand, research on learner interaction in animations has shown that providing relatively minimal interaction options, like being able to stop and play an animation, can already increase learning performance (e.g., Mayer and Chandler 2001; Hasler et al. 2007). So, aligning these interactive options with the actual movement depicted might further improve learning, without having to “follow along” every movement.

A potential drawback of the manipulation strategy is that learners can make all kinds of suboptimal choices, like random clicking or leaving several interactive options unused. Research on learner control shows mixed results on its benefits, which are at least partly due to differences in how learners have used the provided interactive options (Williams 1996). For example, in the study by Keehner et al. (2009), a lot of learners watching a 3-D visualization used the interactive control rather ineffectively. Moreover, the best interactive learners did not outperform learners who saw an optimal information presentation without any interactive options. However, in the Keehner study, the learning task was not about understanding dynamics but rather about understanding a spatial structure. The manipulation was thus aimed at changing viewpoints which required a rather complex interaction strategy from the learner. Making objects move to understand a system’s dynamics is a more simple kind of interaction that learners can perform rather intuitively. Nevertheless, it might be a good idea to couple the manipulation strategy with a comprehension strategy, such as predicting or imagining the movements to fully realize the potential of the interaction. Prior research has shown that making learners predict a motion trajectory from a static picture can increase their learning of the dynamic system (e.g., Hegarty et al. 2003).

In sum, manipulation of movement is a potentially strong strategy to embody the understanding of dynamic systems. A number of open questions, however, exist on what kind of manipulations of movements support learning best and how learners can be supported in making optimal choices when interacting. The strategy also requires that animations are designed to allow for different forms of interaction. Interestingly, new technologies like touch screens and controllers reacting to body movements enable new ways of interacting that may resemble the actual depicted movement more closely and thus strengthen the embodiment of the movement.

“Embody” the movements in the animation using a body metaphor

Our third strategy is based on the idea that learners will understand a movement much better if it is presented using one of their own body movements as a metaphor. For example, an animation about the movements of a hoisting crane could use the movements of the arm as a metaphor. This could be accomplished by imposing a picture of a human arm upon the crane, or by giving the crane human-like features, like an elbow or fingers. The strength of this strategy is that movements of machines are often described in a rather anthropomorphic way, like “the crane gently put down the container,” or “the cranes moved like dancers.” So, many body metaphors are already available in language that might aid the designer of an embodied animation. At the same time, a mismatch between the body metaphor and the actual movement can also induce misconceptions. For example, a hoisting crane can move around 360°, whereas this is rather impossible for a human arm. Nevertheless, picking the right metaphor may help learners “embody” a movement “as if they are moving themselves.” Moreover, modern 3-D virtual reality displays might make the experience even more “immersive,” by enabling the suggestion that the learner in fact “is” the moving object.

Stimulate learners to reconstruct the perceptual processing of the movements at the test

Our fourth suggestion is that reproducing the eye movements made during inspection of the animation will stimulate recall and help learners in “running” their mental model of the dynamic system. A limitation to this guideline is that the underlying evidence for the relation between reproduced eye movements and memory performance is correlational, so the justification for the guideline is rather speculative. Nevertheless, the strategy is easy to implement and does not require adaptations to the animation. The strategy might also point at a potential negative effect of using text-based questions at a test after studying dynamic visualizations. Providing learners at the test with questions that require a lot of eye movements for reading might interfere with reconstructive eye movements and thus with test performance.

Of course, the strategy is not aimed at the learning phase but at the test phase. Its effectiveness is thus limited to improving retrieval and reconstruction processes, and it will not improve the mental model of the dynamic system itself. However, it can easily be combined with one of the other strategies. Especially if body movements were made during the learning phase, reconstructing both eye movements and body movements at the test is likely to boost retrieval most optimally. Another interesting avenue is the use of eye tracking devices that enable the “replay” of eye movements. Van Gog et al. (2005) have used replays of eye movements as a “retrospective cueing” technique to aid retrieval of thought processes. This technique was also applied to animation research by De Koning et al. (2010b) and could be easily adapted to investigate the reactivation of eye movements on test performance, for example, by replaying the eye movement on a blank screen.

In sum, encouraging reconstructive eye movements at the test is an easy to apply strategy that can be combined with any of our suggested strategies.

Conclusion

In our article, we have suggested different ways of “embodying” learning from animations, which pertain to the animation, the learner, and the test situation. First, the animation itself can be embodied. Introducing a human-like character manipulating the depicted system or expressing an animation’s movements in gesture is likely to foster learning. In addition, using a bodily metaphor in an animation for representing the movements in the dynamic system provides an opportunity to improve learning. A second way to facilitate learning from animations is to make the learners embody the dynamic system. For example, understanding might be enhanced by asking learners to reenact or follow movements through gesturing. In situations where interaction with the materials is possible, learners might improve their understanding of the content by active physical (i.e., real objects) or virtual (i.e., objects or movements on the computer screen) manipulation of movements. Otherwise, learners could be instructed to imagine manipulating the presented movements or objects. Third, the test situation might be embodied as well. Learning performance might be enhanced by stimulating learners to reconstruct their eye movements as retrieval cues.

In sum, these suggestions might become a fruitful avenue for further research on instructional animations that is firmly grounded in current theories of embodied cognition. Pursuing this challenge can be very useful for developing new and innovative ways to increase the effectiveness of animations, as well as explaining the nature and the underlying mechanisms of any observed learning profits.