Learning and transfer of working memory gating policies

doi:10.1016/j.cognition.2017.12.001

Cognition

Volume 172, March 2018, Pages 89-100

https://doi.org/10.1016/j.cognition.2017.12.001 Get rights and content

Abstract

Abstract knowledge about the tasks we encounter enables us to rapidly and flexibly adapt to novel task contexts. Previous research has focused primarily on abstract rules that leverage shared structure in stimulus-response (S-R) mappings as the basis of such task knowledge. Here we provide evidence that working memory (WM) gating policies – a type of control policy required for internal control of WM during a task – constitute a form of abstract task knowledge that can be transferred across contexts. In two experiments, we report specific evidence for the transfer of selective WM gating policies across changes of task context. We show that this transfer is not tied to shared structure in S-R mappings, but instead in the dynamic structure of the task. Collectively, our results highlight the importance of WM gating policies in particular, and control policies in general, as a key component of the task knowledge that supports flexible behavior and task generalization.

Introduction

Humans display remarkable cognitive flexibility in novel task environments (McClelland, 2009). Given only verbal instruction, we rapidly adapt to new tasks, often achieving asymptotic levels of performance within just a few trials (Ackerman, 1988, Bhandari and Duncan, 2014, Ruge and Wolfensteller, 2010, Wolfensteller and Ruge, 2011). Such rapid adaptation relies, in part, on abstract task knowledge transferred from prior experience with other tasks. Abstract task knowledge captures regularities in the space of task environments, and can thus speed up learning in the new environment by reducing the size of the learning problem (Botvinick et al., 2009, Cole et al., 2011, Collins and Frank, 2013, Gershman and Niv, 2010).

What form does such abstract task knowledge take? The vast majority of prior studies seeking to address this question have focused on rules, or stimulus-response (S-R) mappings as the basis of task knowledge. In these frameworks, abstract rules generalize prior knowledge and thus constrain the (usually) very large space of stimulus-response-outcome contingencies afforded by a novel task environment (Badre, Kayser, & D'Esposito, 2010). Such rules can both be instructed (Cohen-Kdoshay and Meiran, 2007, Cohen-Kdoshay and Meiran, 2009, Cole et al., 2010, Meiran et al., 2015, Ruge and Wolfensteller, 2010) or transferred from prior experiences (Cole et al., 2011, Collins and Frank, 2013) to rapidly enable successful behavior in novel environments.

The implementation of a task, however, requires more than just the knowledge of stimulus-response contingencies. Even the simplest everyday task environments have dynamical structure, with events unfolding in a specific order, and with specific timing (Radvansky & Zacks, 2014). To achieve task goals in a dynamic task environment, then, one must also learn an internal control policy or task model aligned to the task’s dynamic structure for the moment-by-moment control of internal cognitive processing (Bhandari and Duncan, 2014, Duncan et al., 2008). Such implementational control policies are not typically communicated via instruction and must be discovered and implemented “on the fly”, through task experience. In other words, a ‘task-set’ must incorporate knowledge about implementational control contingencies beyond those specified in stimulus-response mappings (Rogers & Monsell, 1995).

In this paper, we ask whether control policies are themselves a form of abstract task knowledge that, like rules, can be transferred to novel task contexts. Just like different real-world tasks often share stimulus-response-outcome contingencies, they also share other forms dynamic structure (Botvinick et al., 2015, Schank and Abelson, 1977). Such shared structure affords an opportunity for generalization of internal control policies. Instead of learning new control policies from scratch, humans may build repertoires of internal control policies that are re-used in novel tasks.

We operationalize this question within the domain of working memory (WM) control – i.e. the selective use of working memory. WM control has been extensively analyzed within the gating framework (see Fig. 1), in which access to WM is controlled by a set of input and output gates (Chatham and Badre, 2015, O'Reilly and Frank, 2006, Todd et al., 2009). The contents of WM can be selectively updated by operating an input gate that determines whether stimulus information can enter WM. Similarly, operating a selective output gate allows WM to selectively influence downstream. Learning to perform a WM task, therefore, involves learning a gating policy for operating input and output gates in a moment-by-moment, task-appropriate manner (Frank & Badre, 2012). In the context of WM, a gating policy is an example of a control policy that must be aligned to the dynamic structure of the task. By learning such WM gating policies and transferring them across task contexts, humans may be able to exploit regularities in the dynamic structure of tasks.

To test this possibility, we adopt the 2nd order WM control task employed by Chatham, Frank, and Badre (2014). In their task, participants saw a sequence of three items on every trial, one of which specified a context. The context signaled which of the other two items in the sequence was the target item. Critically, there were two kinds of task structures – ‘context first’ (CF) trials, on which the first item in the sequence was the context item, and ‘context last’ (CL) trials, in which the last item in the sequence was the context item. CF and CL trials afford the use of different WM gating policies. On a CL trial, subjects had to employ a ‘selective output-gating policy’ that allowed the storage of both lower level items in WM (a non-selective input-gating operation), and the retrieval of the target item for guiding response selection (a selective output-gating operation). On a CF trial, while a similar selective output-gating policy could be employed, a more efficient ‘selective input-gating policy’ was possible. Such a policy would enable proactive coding of the contextual cue in WM, followed by selective input-gating of only the relevant lower-level item contingent on context. This allows a reduction in both, WM load, and interference from the competing non-target during response selection. Indeed, Chatham et al. (2014) presented evidence that CF and CL trials are treated differently and that well-trained subjects employ selective input-gating policies on CF trials to improve performance relative to CL trials on which the selective output-gating policy is required.

In the context of this WM control task, we ask whether selective gating policies learned in one task setting are transferred to a novel task setting. For instance, subjects exposed to an environment with only CL trials would learn a selective output-gating policy. Would this policy transfer to a new block with CF trials? In Experiment 1 we find a pattern of transfer effects that support the hypothesis that a previously learned gating policy influences initial behavior in a novel setting. We replicate these findings in Experiment 2. In addition, we provide evidence that transferred gating policies are dissociable from S-R mappings and have a much larger influence on subsequent behavior. We interpret these findings as evidence that internal control policies comprise an important form of structural task knowledge that supports behavior in novel situations.

Section snippets

Participants

85 adult, right-handed participants (34 males, 51 females; age-range: 18–30, M = 21.4, SD = 2.7) from the Providence, RI area were recruited to take part in a computer-based behavioral experiment. We endeavored to collect between 18 and 20 participants in each of four groups based on approximate effect sizes suggested by pilot data. 1 participant was excluded for prior neurological injuries, 3 were excluded as they were on psychoactive medication. 5 participants were excluded because of low

Experiment 2

The results of Experiment 1 provide evidence for the transfer of working memory gating policies. We observed an asymmetric negative transfer across changes in trial structure, even as S-R rule structure was held constant. We also observed target-position effects that provided specific evidence for the transfer of a selective output-gating policy learnt in the CL task context to a CF task context. In addition, we found an improvement in performance when the same trial structure was held constant

Discussion

Psychologists have focused almost exclusively on the relations between stimuli, contexts, responses and outcomes as a framework for understanding cognitive control and our ability to adapt and generalize to novel task environments. In this paper, we developed the hypothesis that internal control policies, required for coordinating cognitive processing during a task, form an essential component of task knowledge independently of the stimulus-response (S-R) rule structure of the task. Thus, we

Author contributions

A. Bhandari and D. Badre designed the study. A. Bhandari conducted the experiments and analyzed the data. A. Bhandari and D. Badre wrote the paper.

Data citation

The stimulusmaterials and the data from the experiments reported in this paper are publicly available on the Open Science Framework website (Bhandari, 2017).

Acknowledgements

We thank Ryan K. Fugate, Aja Evans, Celia Ford, and Adriane Spiro for assistance with data collection. This work was supported by grants from NINDS (NS065046) and NIMH (MH099078, MH111737) at the NIH, and a MURI award from the Office of Naval Research, United States (N00014-16-2832).

References (45)

D. Badre et al.
Frontal cortex and the discovery of abstract action rules
Neuron
(2010)
A. Bhandari et al.
Goal neglect and knowledge chunking in the construction of novel behaviour
Cognition
(2014)
M.M. Botvinick et al.
Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective
Cognition
(2009)
M.M. Botvinick et al.
Reinforcement learning, efficient coding, and the statistics of natural tasks
Current Opinion in Behavioral Sciences
(2015)
D.A. Braun et al.
Motor task variation induces structural learning
Current Biology
(2009)
D.A. Braun et al.
Structure learning in action
Behavioural Brain Research
(2010)
T.S. Braver
The variable nature of cognitive control: A dual mechanisms framework
Trends Cognitive Science
(2012)
C.H. Chatham et al.
Multiple gates on working memory
Current Opinion in Behavioral Sciences
(2015)
C.H. Chatham et al.
Corticostriatal output gating during selection from working memory
Neuron
(2014)
S.J. Gershman et al.
Learning latent structure: Carving nature at its joints
Current Opinion in Neurobiology
(2010)

F. Mathy et al.

What’s magic about magic numbers? Chunking and data compression in short-term memory

Cognition

(2012)

H.A. Simon

The structure of ill structured problems

Artificial Intelligence

(1974)

P.L. Ackerman

Determinants of individual-differences during skill acquisition - cognitive-abilities and information-processing

Journal of Experimental Psychology-General

(1988)

A. Allport et al.

Task switching, stimulus-response bindings, and negative priming

Control of cognitive processes: Attention and performance

(2000)

A. Bhandari