Original ArticlesLearning and transfer of working memory gating policies
Introduction
Humans display remarkable cognitive flexibility in novel task environments (McClelland, 2009). Given only verbal instruction, we rapidly adapt to new tasks, often achieving asymptotic levels of performance within just a few trials (Ackerman, 1988, Bhandari and Duncan, 2014, Ruge and Wolfensteller, 2010, Wolfensteller and Ruge, 2011). Such rapid adaptation relies, in part, on abstract task knowledge transferred from prior experience with other tasks. Abstract task knowledge captures regularities in the space of task environments, and can thus speed up learning in the new environment by reducing the size of the learning problem (Botvinick et al., 2009, Cole et al., 2011, Collins and Frank, 2013, Gershman and Niv, 2010).
What form does such abstract task knowledge take? The vast majority of prior studies seeking to address this question have focused on rules, or stimulus-response (S-R) mappings as the basis of task knowledge. In these frameworks, abstract rules generalize prior knowledge and thus constrain the (usually) very large space of stimulus-response-outcome contingencies afforded by a novel task environment (Badre, Kayser, & D'Esposito, 2010). Such rules can both be instructed (Cohen-Kdoshay and Meiran, 2007, Cohen-Kdoshay and Meiran, 2009, Cole et al., 2010, Meiran et al., 2015, Ruge and Wolfensteller, 2010) or transferred from prior experiences (Cole et al., 2011, Collins and Frank, 2013) to rapidly enable successful behavior in novel environments.
The implementation of a task, however, requires more than just the knowledge of stimulus-response contingencies. Even the simplest everyday task environments have dynamical structure, with events unfolding in a specific order, and with specific timing (Radvansky & Zacks, 2014). To achieve task goals in a dynamic task environment, then, one must also learn an internal control policy or task model aligned to the task’s dynamic structure for the moment-by-moment control of internal cognitive processing (Bhandari and Duncan, 2014, Duncan et al., 2008). Such implementational control policies are not typically communicated via instruction and must be discovered and implemented “on the fly”, through task experience. In other words, a ‘task-set’ must incorporate knowledge about implementational control contingencies beyond those specified in stimulus-response mappings (Rogers & Monsell, 1995).
In this paper, we ask whether control policies are themselves a form of abstract task knowledge that, like rules, can be transferred to novel task contexts. Just like different real-world tasks often share stimulus-response-outcome contingencies, they also share other forms dynamic structure (Botvinick et al., 2015, Schank and Abelson, 1977). Such shared structure affords an opportunity for generalization of internal control policies. Instead of learning new control policies from scratch, humans may build repertoires of internal control policies that are re-used in novel tasks.
We operationalize this question within the domain of working memory (WM) control – i.e. the selective use of working memory. WM control has been extensively analyzed within the gating framework (see Fig. 1), in which access to WM is controlled by a set of input and output gates (Chatham and Badre, 2015, O'Reilly and Frank, 2006, Todd et al., 2009). The contents of WM can be selectively updated by operating an input gate that determines whether stimulus information can enter WM. Similarly, operating a selective output gate allows WM to selectively influence downstream. Learning to perform a WM task, therefore, involves learning a gating policy for operating input and output gates in a moment-by-moment, task-appropriate manner (Frank & Badre, 2012). In the context of WM, a gating policy is an example of a control policy that must be aligned to the dynamic structure of the task. By learning such WM gating policies and transferring them across task contexts, humans may be able to exploit regularities in the dynamic structure of tasks.
To test this possibility, we adopt the 2nd order WM control task employed by Chatham, Frank, and Badre (2014). In their task, participants saw a sequence of three items on every trial, one of which specified a context. The context signaled which of the other two items in the sequence was the target item. Critically, there were two kinds of task structures – ‘context first’ (CF) trials, on which the first item in the sequence was the context item, and ‘context last’ (CL) trials, in which the last item in the sequence was the context item. CF and CL trials afford the use of different WM gating policies. On a CL trial, subjects had to employ a ‘selective output-gating policy’ that allowed the storage of both lower level items in WM (a non-selective input-gating operation), and the retrieval of the target item for guiding response selection (a selective output-gating operation). On a CF trial, while a similar selective output-gating policy could be employed, a more efficient ‘selective input-gating policy’ was possible. Such a policy would enable proactive coding of the contextual cue in WM, followed by selective input-gating of only the relevant lower-level item contingent on context. This allows a reduction in both, WM load, and interference from the competing non-target during response selection. Indeed, Chatham et al. (2014) presented evidence that CF and CL trials are treated differently and that well-trained subjects employ selective input-gating policies on CF trials to improve performance relative to CL trials on which the selective output-gating policy is required.
In the context of this WM control task, we ask whether selective gating policies learned in one task setting are transferred to a novel task setting. For instance, subjects exposed to an environment with only CL trials would learn a selective output-gating policy. Would this policy transfer to a new block with CF trials? In Experiment 1 we find a pattern of transfer effects that support the hypothesis that a previously learned gating policy influences initial behavior in a novel setting. We replicate these findings in Experiment 2. In addition, we provide evidence that transferred gating policies are dissociable from S-R mappings and have a much larger influence on subsequent behavior. We interpret these findings as evidence that internal control policies comprise an important form of structural task knowledge that supports behavior in novel situations.
Section snippets
Participants
85 adult, right-handed participants (34 males, 51 females; age-range: 18–30, M = 21.4, SD = 2.7) from the Providence, RI area were recruited to take part in a computer-based behavioral experiment. We endeavored to collect between 18 and 20 participants in each of four groups based on approximate effect sizes suggested by pilot data. 1 participant was excluded for prior neurological injuries, 3 were excluded as they were on psychoactive medication. 5 participants were excluded because of low
Experiment 2
The results of Experiment 1 provide evidence for the transfer of working memory gating policies. We observed an asymmetric negative transfer across changes in trial structure, even as S-R rule structure was held constant. We also observed target-position effects that provided specific evidence for the transfer of a selective output-gating policy learnt in the CL task context to a CF task context. In addition, we found an improvement in performance when the same trial structure was held constant
Discussion
Psychologists have focused almost exclusively on the relations between stimuli, contexts, responses and outcomes as a framework for understanding cognitive control and our ability to adapt and generalize to novel task environments. In this paper, we developed the hypothesis that internal control policies, required for coordinating cognitive processing during a task, form an essential component of task knowledge independently of the stimulus-response (S-R) rule structure of the task. Thus, we
Author contributions
A. Bhandari and D. Badre designed the study. A. Bhandari conducted the experiments and analyzed the data. A. Bhandari and D. Badre wrote the paper.
Data citation
The stimulusmaterials and the data from the experiments reported in this paper are publicly available on the Open Science Framework website (Bhandari, 2017).
Acknowledgements
We thank Ryan K. Fugate, Aja Evans, Celia Ford, and Adriane Spiro for assistance with data collection. This work was supported by grants from NINDS (NS065046) and NIMH (MH099078, MH111737) at the NIH, and a MURI award from the Office of Naval Research, United States (N00014-16-2832).
References (45)
- et al.
Frontal cortex and the discovery of abstract action rules
Neuron
(2010) - et al.
Goal neglect and knowledge chunking in the construction of novel behaviour
Cognition
(2014) - et al.
Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective
Cognition
(2009) - et al.
Reinforcement learning, efficient coding, and the statistics of natural tasks
Current Opinion in Behavioral Sciences
(2015) - et al.
Motor task variation induces structural learning
Current Biology
(2009) - et al.
Structure learning in action
Behavioural Brain Research
(2010) The variable nature of cognitive control: A dual mechanisms framework
Trends Cognitive Science
(2012)- et al.
Multiple gates on working memory
Current Opinion in Behavioral Sciences
(2015) - et al.
Corticostriatal output gating during selection from working memory
Neuron
(2014) - et al.
Learning latent structure: Carving nature at its joints
Current Opinion in Neurobiology
(2010)
What’s magic about magic numbers? Chunking and data compression in short-term memory
Cognition
The structure of ill structured problems
Artificial Intelligence
Determinants of individual-differences during skill acquisition - cognitive-abilities and information-processing
Journal of Experimental Psychology-General
Task switching, stimulus-response bindings, and negative priming
Control of cognitive processes: Attention and performance
Learning and transfer of working memory gating policies
Compression in visual working memory: using statistical regularities to form more efficient memory representations
Journal of Experimental Psychology: General
The representation of instructions in working memory leads to autonomous response activation: Evidence from the first trials in the flanker paradigm
The Quarterly Journal of Experimental Psychology (Hove)
The representation of instructions operates like a prepared reflex flanker compatibility effects found in first trial following S-R instructions
Experimental Psychology
Prefrontal dynamics underlying rapid instructed task learning reverse with practice
Journal of Neuroscience
Rapid transfer of abstract rules to novel contexts in human lateral prefrontal cortex
Frontiers in Human Neuroscience
Cognitive control over learning: Creating, clustering, and generalizing task-set structure
Psychological Review
Reasoning, learning, and creativity: Frontal lobe function and human decision-making
PLoS Biology
Cited by (26)
Predictive learning by a burst-dependent learning rule
2023, Neurobiology of Learning and MemoryAbstract task representations for inference and control
2022, Trends in Cognitive SciencesCitation Excerpt :This improvement is indicative of a rapid process of learning and adjustment in the earliest experiences with a task. Recent evidence from our group and others suggests that this change on early trials is at least partly due to the acquisition of new productions, independent of knowledge of the task rules [120,121]. In particular, our group recently found that switching the production for updating information to working memory during a new task caused a negative transfer effect on early trials [120].
Neural circuits and symbolic processing
2021, Neurobiology of Learning and MemoryCitation Excerpt :Another problem is transferability. Agents encountering a novel, yet similar environment will have to learn an entirely new policy and can’t rely on previous experience (Bhandari & Badre, 2018; Gamrian & Goldberg, 2018; Kansky et al., 2017). Recent approaches to life-long RL promises to minimize this issue of catastrophic forgetting (McCloskey & Cohen, 1989) by explicitly retaining learned knowledge, leveraging shared structure and learning to adapt and learn (Khetarpal et al., 2020).
Prefrontal oscillations modulate the propagation of neuronal activity required for working memory
2020, Neurobiology of Learning and MemoryCitation Excerpt :Many cognitive tasks require flexible routing of information from multiple sources to update internal rule representations and guide responses (Badre & Frank, 2012; Bhandari & Badre, 2018; Buschman, Denovellis, Diogo, Bullock, & Miller, 2012; Hasselmo & Stern, 2018; Melrose, Poulin, & Stern, 2007; Zhu, Paschalidis, Chang, Stern, & Hasselmo, 2020; Zhu, Paschalidis, & Hasselmo, 2018).
Top-down knowledge rapidly acquired through abstract rule learning biases subsequent visual attention in 9-month-old infants
2020, Developmental Cognitive NeuroscienceCitation Excerpt :We also found that these effects were specific to the first 32-second block of the rule learning task, which is consistent with prior work indicating that PFC involvement is more pronounced in early stages of learning, which can consist of as few as 8 trials, relative to late stages of learning (see Kelly and Garavan, 2005, for a review). It is also consistent with prior work indicating that learning and transfer of abstract rules stabilizes during the early trials of a task (e.g., Bhandari and Badre, 2018; Bhandari and Duncan, 2014; Cole et al., 2011). We interpret these findings as preliminary evidence that greater PFC influence over visual cortex during initial rule learning might support better learning and subsequent generalization of abstract rules organizing visual inputs into predictable sequences.