PT - JOURNAL ARTICLE AU - Wang, Jane X. AU - Kurth-Nelson, Zeb AU - Kumaran, Dharshan AU - Tirumala, Dhruva AU - Soyer, Hubert AU - Leibo, Joel Z. AU - Hassabis, Demis AU - Botvinick, Matthew TI - Prefrontal Cortex as a Meta-Reinforcement Learning System AID - 10.1101/295964 DP - 2018 Jan 01 TA - bioRxiv PG - 295964 4099 - http://biorxiv.org/content/early/2018/04/06/295964.short 4100 - http://biorxiv.org/content/early/2018/04/06/295964.full AB - Over the past twenty years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter dopamine ‘stamps in’ associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. However, a growing number of recent findings have placed this standard model under strain. In the present work, we draw on recent advances in artificial intelligence to introduce a new theory of reward-based learning. Here, the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system. This new perspective accommodates the findings that motivated the standard model, but also deals gracefully with a wider range of observations, providing a fresh foundation for future research.