Reinforcement learning with modulated spike timing dependent synaptic plasticity

J Neurophysiol. 2007 Dec;98(6):3648-65. doi: 10.1152/jn.00364.2007. Epub 2007 Oct 10.

Abstract

Spike timing-dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.

MeSH terms

  • Algorithms
  • Electrophysiology
  • Learning / physiology*
  • Models, Neurological
  • Neural Networks, Computer
  • Neuronal Plasticity / physiology*
  • Patch-Clamp Techniques
  • Reinforcement, Psychology*
  • Synapses / physiology*