A reinforcement learning example (Sutton and Barto 1998)


Barto AG, Sutton RS. (2002). Reinforcement learning: An introduction (2nd ed).

See more from authors: Barto AG · Sutton RS

References and models cited by this paper
References and models that cite this paper

Grüning A. (2007). Elman backpropagation as reinforcement for simple recurrent networks. Neural computation. 19 [PubMed]

Porr B, Wörgötter F. (2006). Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only. Neural computation. 18 [PubMed]

Barto AG, Sutton RS. (1998). Reinforcement learning: an introduction.

See more from authors: Barto AG · Sutton RS

References and models cited by this paper
References and models that cite this paper

Anastasio TJ, Gad YP. (2007). Sparse cerebellar innervation can morph the dynamics of a model oculomotor neural integrator. Journal of computational neuroscience. 22 [PubMed]

Baras D, Meir R. (2007). Reinforcement learning, spike-time-dependent plasticity, and the BCM rule. Neural computation. 19 [PubMed]

Bogacz R, Gurney K. (2007). The basal ganglia and cortex implement optimal decision making between alternative actions. Neural computation. 19 [PubMed]

Brzosko Z, Zannone S, Schultz W, Clopath C, Paulsen O. (2017). Sequential neuromodulation of Hebbian plasticity offers mechanism for effective reward-based navigation. eLife. 6 [PubMed]

Chadderdon GL, Neymotin SA, Kerr CC, Lytton WW. (2012). Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex. PloS one. 7 [PubMed]

Clopath C, Ziegler L, Vasilaki E, Büsing L, Gerstner W. (2008). Tag-trigger-consolidation: a model of early and late long-term-potentiation and depression. PLoS computational biology. 4 [PubMed]

Daw ND, Courville AC, Touretzky DS. (2006). Representation and timing in theories of the dopamine system. Neural computation. 18 [PubMed]

Florian RV. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural computation. 19 [PubMed]

Fujita H, Ishii S. (2007). Model-based reinforcement learning for partially observable games with sampling-based state estimation. Neural computation. 19 [PubMed]

Gutkin BS, Dehaene S, Changeux JP. (2006). A neurocomputational hypothesis for nicotine addiction. Proceedings of the National Academy of Sciences of the United States of America. 103 [PubMed]

Hasselmo ME. (2005). A model of prefrontal cortical mechanisms for goal-directed behavior. Journal of cognitive neuroscience. 17 [PubMed]

Hasselmo ME, Eichenbaum H. (2005). Hippocampal mechanisms for the context-dependent retrieval of episodes. Neural networks : the official journal of the International Neural Network Society. 18 [PubMed]

Hazy TE, Frank MJ, O'reilly RC. (2007). Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system. Philosophical transactions of the Royal Society of London. Series B, Biological sciences. 362 [PubMed]

Izhikevich EM. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral cortex (New York, N.Y. : 1991). 17 [PubMed]

Kulvicius T, Tamosiunaite M, Ainge J, Dudchenko P, Wörgötter F. (2008). Odor supported place cell model and goal navigation in rodents. Journal of computational neuroscience. 25 [PubMed]

Low KH, Leow WK, Ang MH Jr. (2005). An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks Neural Comput. 17

Morimoto J, Doya K. (2007). Reinforcement learning state estimator. Neural computation. 19 [PubMed]

Morita K, Kato A. (2014). Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Frontiers in neural circuits. 8 [PubMed]

Moustafa AA, Cohen MX, Sherman SJ, Frank MJ. (2008). A role for dopamine in temporal decision making and reward maximization in parkinsonism. The Journal of neuroscience : the official journal of the Society for Neuroscience. 28 [PubMed]

Nakano T, Otsuka M, Yoshimoto J, Doya K. (2015). A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity. PloS one. 10 [PubMed]

O'Reilly RC, Frank MJ. (2006). Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural computation. 18 [PubMed]

O`Reilly RC, Frank MJ. (2005). Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia Neural Comput. 18

Richmond P, Buesing L, Giugliano M, Vasilaki E. (2011). Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PloS one. 6 [PubMed]

Rivest F, Kalaska JF, Bengio Y. (2010). Alternative time representation in dopamine models. Journal of computational neuroscience. 28 [PubMed]

Roelfsema PR, van Ooyen A. (2005). Attention-gated reinforcement learning of internal representations for classification. Neural computation. 17 [PubMed]

Sakai Y, Fukai T. (2008). The actor-critic learning is behind the matching law: matching versus optimal behaviors. Neural computation. 20 [PubMed]

Smith AJ, Becker S, Kapur S. (2005). A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural computation. 17 [PubMed]

Soltani A, Wang XJ. (2006). A biophysically based neural model of matching law behavior: melioration by stochastic synapses. The Journal of neuroscience : the official journal of the Society for Neuroscience. 26 [PubMed]

Todorov E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural computation. 17 [PubMed]

Toussaint M. (2006). A sensorimotor map: modulating lateral interactions for anticipation and planning. Neural computation. 18 [PubMed]

Triesch J. (2007). Synergies between intrinsic and synaptic plasticity mechanisms. Neural computation. 19 [PubMed]

Troyer TW, Doupe AJ. (2000). An associational model of birdsong sensorimotor learning I. Efference copy and the learning of song syllables. Journal of neurophysiology. 84 [PubMed]

Troyer TW, Doupe AJ. (2000). An associational model of birdsong sensorimotor learning II. Temporal hierarchies and the learning of song sequence. Journal of neurophysiology. 84 [PubMed]

Wörgötter F, Porr B. (2005). Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural computation. 17 [PubMed]

This website requires cookies and limited processing of your personal data in order to function. By continuing to browse or otherwise use this site, you are agreeing to this use. See our Privacy policy and how to cite and terms of use.