Abstract
We use efficient coding principles borrowed from sensory neuroscience to derive the optimal population of neurons to encode rewards from a probability distribution. We find that the response properties of dopaminergic reward prediction error neurons in a rodent and a primate data set are similar to those of the efficient code in many ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions, and lower slopes; moreover, their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to this efficient code; the learning rule for the position of the neuron on the reward axis closely resembles the learning rule of distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
This is a general overhaul of the paper for the next review step. A new task has been added, additional predictions have been added. All figures and results have been revised. A suggestion how the efficient code may be learned has been added.
↵1 Ganguli and Simoncelli [18] used the interval [0, N], where N is the number of neurons, which is entirely equivalent.