Velocity estimation in reinforcement learning

Carlos Velázquez; Manuel Villarreal; Arturo Bouzas

doi:10.1101/432492

Abstract

The current work aims to study how people make predictions, under a reinforcement learning framework, in an environment that fluctuates from trial to trial and is corrupted with Gaussian noise. We developed a computer-based experiment where subjects were required to predict the future location of a spaceship that orbited around planet Earth. Its position was sampled from a Gaussian distribution with the mean changing at a variable velocity and four different values of variance that defined our signal-to-noise conditions. Three reinforcement learning algorithms using hierarchical Bayesian modeling were proposed as candidates to describe our data. The first and second models are the standard delta-rule and its Bayesian counterpart, the Kalman Filter. The third model is a delta-rule incorporating a velocity component which is updated using prediction errors. The main advantage of the later over the first two is that it assumes participants estimate the trial-by-trial changes in the mean of the distribution generating the observations. We used leave-one-out cross-validation and the Widely Applicable Information Criterion to compare the predictive accuracy of the models. In general, our results provided evidence in favor of the model with the velocity term and showed that the learning rate of velocity and the decision noise change depending on the value of the signal-to-noise ratio. Finally, we modeled these changes using an extension of its hierarchical structure that allows us to make prior predictions for untested signal-to-noise conditions.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.