Recent experiments have shown that spike-timing-dependent plasticity is influenced by neuromodulation. We derive theoretical conditions for successful learning of reward-related behavior for a large class of learning rules where Hebbian synaptic plasticity is conditioned on a global modulatory factor signaling reward. We show that all learning rules in this class can be separated into a term that captures the covariance of neuronal firing and reward and a second term that presents the influence of unsupervised learning. The unsupervised term, which is, in general, detrimental for reward-based learning, can be suppressed if the neuromodulatory signal encodes the difference between the reward and the expected reward—but only if the expected reward is calculated for each task and stimulus separately. If several tasks are to be learned simultaneously, the nervous system needs an internal critic that is able to predict the expected reward for arbitrary stimuli. We show that, with a critic, reward-modulated spike-timing-dependent plasticity is capable of learning motor trajectories with a temporal resolution of tens of milliseconds. The relation to temporal difference learning, the relevance of block-based learning paradigms, and the limitations of learning with a critic are discussed.
Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity
Nicolas Frémaux,Henning Sprekeler,W. Gerstner
Published 2010 in Journal of Neuroscience
ABSTRACT
PUBLICATION RECORD
- Publication year
2010
- Venue
Journal of Neuroscience
- Publication date
2010-10-06
- Fields of study
Biology, Medicine, Psychology
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
CONCEPTS
- covariance of neuronal firing and reward
The component of the learning-rule decomposition that reflects how neuronal activity covaries with reward.
- expected reward
The predicted reward level used as a reference for modulatory signaling.
- internal critic
An internal predictor that estimates expected reward for arbitrary stimuli.
Aliases: critic
- motor trajectories
Time-varying movement patterns used as the target behavior in the learning example.
- neuromodulatory signal
The global reward-related modulatory factor that gates synaptic plasticity.
- reward-modulated spike-timing-dependent plasticity
A spike-timing-dependent plasticity rule whose synaptic updates are gated by a global reward-related modulatory factor.
- simultaneous task learning
A setting in which several tasks are learned at the same time.
- task- and stimulus-specific expected reward
The expected reward estimated separately for each task and stimulus condition.
- temporal difference learning
A reinforcement-learning framework based on predicting reward differences over time.
- unsupervised learning term
The component of the learning-rule decomposition associated with unsupervised plasticity effects independent of reward.
REFERENCES
Showing 1-52 of 52 references · Page 1 of 1