Abstract
The exploration–exploitation trade-off is a fundamental problem in re-inforcement learning. To study the neural mechanisms involved in this problem, a target search task in which exploration and exploitation phases appear alternately is useful. Monkeys well trained in this task clearly understand that they have entered the exploratory phase and quickly acquire new experiences by resetting their previous experiences. In this study, we used a simple model to show that experience resetting in the exploratory phase improves performance rather than decreasing the greediness of action selection, and we then present a neural network-type model enabling experience resetting.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
sakamoto{at}tohoku-mpu.ac.jp