RT Journal Article SR Electronic T1 Compositional clustering in task structure learning JF bioRxiv FD Cold Spring Harbor Laboratory SP 196923 DO 10.1101/196923 A1 Nicholas Franklin A1 Michael J. Frank YR 2017 UL http://biorxiv.org/content/early/2017/10/02/196923.abstract AB Humans are remarkably adept at generalizing knowledge between experiences in a way that can be difficult for computers. Often, this entails generalizing constituent pieces of experiences that do not fully overlap, but nonetheless share useful similarities with, previously acquired knowledge. However, it is often unclear how knowledge gained in one context should generalize to another. Previous computational models and data suggest that rather than learning about each individual context, humans build latent abstract structures and learn to link these structures to arbitrary contexts, facilitating generalization. In these models, task structures that are more popular across contexts are more likely to be revisited in new contexts. However, these models predict that structures are either re - used as a whole or created from scratch, prohibiting the ability to generalize constituent parts of learned structures. This contrasts with ecological settings, where some aspects of task structure, such as the transition function, will be shared between context separately from other aspects, such as the reward function. Here, we develop a novel non - parametric Bayesian agent that forms independent latent clusters for transition and reward functions that may have different popularity across contexts. We compare this agent to an agent that jointly clusters both across a range of task domains. We show that relative performance of the two agents depends on the statistics of the task domain, including the mutual information between transition and reward functions in the environment, and the stochasticity of the observations. We formalize our analysis through an information theoretic account of the priors, and develop a meta learning agent that can dynamically arbitrate between strategies across task domains. We argue that this provides a first step in allowing for compositional structures in reinforcement learners, which should be provide a better model of human learning and additional flexibility for artificial agents.Author summary A musician may learn to generalize behaviors across instruments for different purposes, for example, reusing hand motions used when playing classical on the flute to play jazz on the saxophone. Conversely, she may learn to play a single song across many instruments that require completely distinct physical motions, but nonetheless transfer knowledge between them. This degree of compositionality is often absent from computational frameworks of learning, forcing agents either to generalize entire learned policies or to learn new policies from scratch. Here, we propose a solution to this problem that allows an agent to generalize components of a policy independently and compare it to an agent that generalizes components as a whole. We show that the degree to which one form of generalization is favored over the other is dependent on the features of task domain, with independent generalization of task components favored in environments with weak relationships between components or high degrees of noise and joint generalization of task components favored when there is a clear, discoverable relationship between task components. Furthermore, we show that the overall meta structure of the environment can be learned and leveraged by an agent that dynamically arbitrates between these forms of structure learning.