The statistical structures of reinforcement learning with asymmetric value updates

https://doi.org/10.1016/j.jmp.2018.09.002Get rights and content
Under a Creative Commons license
open access

Abstract

Reinforcement learning (RL) models have been broadly used in modeling the choice behavior of humans and other animals. In standard RL models, the action values are assumed to be updated according to the reward prediction error (RPE), i.e., the difference between the obtained reward and the expected reward. Numerous studies have noted that the magnitude of the update is biased depending on the sign of the RPE. The bias is represented in RL models by differential learning rates for positive and negative RPEs. However, which aspect of behavioral data that the estimated differential learning rates reflect is not well understood. In this study, we investigate how the differential learning rates influence the statistical properties of choice behavior (i.e., the relation between past experiences and the current choice) based on theoretical considerations and numerical simulations. We clarify that when the learning rates differ, the impact of a past outcome depends on the subsequent outcomes, in contrast to standard RL models with symmetric value updates. Based on the results, we propose a model-neutral statistical test to validate the hypothesis that value updates are asymmetric. The asymmetry in the value updates induces the autocorrelation of choice (i.e., the tendency to repeat the same choice or to switch the choice irrespective of past rewards). Conversely, if an RL model without an intrinsic autocorrelation factor is fitted to data that possess an intrinsic autocorrelation, a statistical bias to overestimate the difference in learning rates arises. We demonstrate that this bias can cause a statistical artifact in RL-model fitting leading to a “pseudo-positivity bias” and a “pseudo-confirmation bias.”

Keywords

Reinforcement learning
Asymmetric value update
Learning rate
Choice perseverance
Model fitting
Logistic regression

Cited by (0)