Abstract
The in vivo responses of dorsal raphe nucleus (DRN) serotonin neurons to emotionally-salient stimuli are a puzzle. Existing theories centred on reward, surprise, or uncertainty individually account for some aspects of serotonergic activity but not others. Here we find a unifying perspective in a biologically-constrained predictive code for cumulative future reward, a quantity called state value in reinforcement learning. Through simulations of trace conditioning experiments common in the serotonin literature, we show that our theory, called value prediction, intuitively explains phasic activation by both rewards and punishments, preference for surprising rewards but absence of a corresponding preference for punishments, and contextual modulation of tonic firing—observations that currently form the basis of many and varied serotonergic theories. Next, we re-analyzed data from a recent experiment and found serotonin neurons with activity patterns that are a surprisingly close match: our theory predicts the marginal effect of reward history on population activity with a precision ≪0.1 Hz neuron−1. Finally, we directly compared against quantitative formulations of existing ideas and found that our theory best explains both within-trial activity dynamics and trial-to-trial modulations, offering performance usually several times better than the closest alternative. Overall, our results show that previous models are not wrong, but incomplete, and that reward, surprise, salience, and uncertainty are simply different faces of a predictively-encoded value signal. By unifying previous theories, our work represents an important step towards understanding the potentially heterogeneous computational roles of serotonin in learning, behaviour, and beyond.
Competing Interest Statement
The authors have declared no competing interest.