A spiking neural network model of model-free reinforcement learning (Nakano et al 2015)

Nakano T, Otsuka M, Yoshimoto J, Doya K. (2015). A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity. PloS one. 10 [PubMed]

See more from authors: Nakano T · Otsuka M · Yoshimoto J · Doya K

References and models cited by this paper

Bakker B. (2002). Reinforcement learning with long short-term memory Neural information processing systems.

Barto AG, Sutton RS. (1998). Reinforcement learning: an introduction.

A reinforcement learning example (Sutton and Barto 1998) [Model]

Belavkin RV, Huyck CR. (2011). Conflict resolution and learning probability matching in a neural cell-assembly architecture Cognitive Systems Research. 12

Boerlin M, Denève S. (2011). Spike-based population coding and working memory. PLoS computational biology. 7 [PubMed]

Doya K. (2002). Metalearning and neuromodulation. Neural networks : the official journal of the International Neural Network Society. 15 [PubMed]

Doya K, Otsuka M, Elfwing S, Uchibe E. (2010). Free-energy based reinforcement learning for visionbased navigation with high-dimensional sensory inputs Neural Information Processing. Theory and Algorithms.

Doya K, Yoshimoto J, Otsuka M. (2008). Robust population coding in free-energy-based reinforcement learning Proceedings of the International Conference on Arti Neural Networks (ICANN).

Doya K, Yoshimoto J, Otsuka M. (2010). Free-energy-based reinforcement learning in a partially observable environment European Symposium on Artificial Neural Networks (ESANN).

Florian RV. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural computation. 19 [PubMed]

Freedman DJ, Assad JA. (2006). Experience-dependent representation of visual categories in parietal cortex. Nature. 443 [PubMed]

Freedman DJ, Riesenhuber M, Poggio T, Miller EK. (2001). Categorical representation of visual stimuli in the primate prefrontal cortex. Science (New York, N.Y.). 291 [PubMed]

Hinton GE. (2002). Training products of experts by minimizing contrastive divergence. Neural computation. 14 [PubMed]

Hinton GE, Sallans B. (2004). Reinforcement learning with factored states and actions Journal Of Machine Learning Research. 5

Izhikevich EM. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral cortex (New York, N.Y. : 1991). 17 [PubMed]

Linking STDP and Dopamine action to solve the distal reward problem (Izhikevich 2007) [Model]

Jimenez Rezende D, Gerstner W. (2014). Stochastic variational learning in recurrent spiking networks. Frontiers in computational neuroscience. 8 [PubMed]

Kistler WM, Gerstner W. (2002). Spiking neuron models.

Kwee I, Hutter M. (2001). Market-based reinforcement learning in partially observable worlds. Proceedings of the International Conference on Arti Neural Networks (ICANN).

Matsuda W et al. (2009). Single nigrostriatal dopaminergic neurons form widely spread and highly dense axonal arborizations in the neostriatum. The Journal of neuroscience : the official journal of the Society for Neuroscience. 29 [PubMed]

Miller EK, Freedman DJ, Wallis JD. (2002). The prefrontal cortex: categories, concepts and cognition. Philosophical transactions of the Royal Society of London. Series B, Biological sciences. 357 [PubMed]

Potjans W, Morrison A, Diesmann M. (2009). A spiking neural network model of an actor-critic learning agent. Neural computation. 21 [PubMed]

Reynolds JN, Hyland BI, Wickens JR. (2001). A cellular mechanism of reward-related learning. Nature. 413 [PubMed]

Roberts PD, Santiago RA, Lafferriere G. (2008). An implementation of reinforcement learning based on spike timing dependent plasticity. Biological cybernetics. 99 [PubMed]

Saeb S, Weber C, Triesch J. (2009). Goal-directed learning of features and forward models. Neural networks : the official journal of the International Neural Network Society. 22 [PubMed]

Samejima K, Ueda Y, Doya K, Kimura M. (2005). Representation of action-specific reward values in the striatum. Science (New York, N.Y.). 310 [PubMed]

Schmidhuber J. (2014). Deep learning in neural networks: An overview. CoRR abs-1404.7828.

Schultz W, Dayan P, Montague PR. (1997). A neural substrate of prediction and reward. Science (New York, N.Y.). 275 [PubMed]

Szatmary B, Izhikevich EM (2010). (). Spike-timing theory of working memory PLoS Computational Biology.

Trappenberg T, Hollensen P, Hartono P. (2011). Topographic RBM as robot controller. The 21st Annual Conference of the Japanese Neural Network Society.

Whitehead SD, Lin LJ. (1995). Reinforcement learning of non-Markov decision processes Artificial Intel. 73

References and models that cite this paper

Rössert C, Dean P, Porrill J. (2015). At the Edge of Chaos: How Cerebellar Granular Layer Network Dynamics Can Provide the Basis for Temporal Filters. PLoS computational biology. 11 [PubMed]

Basis for temporal filters in the cerebellar granular layer (Roessert et al. 2015) [Model]

Wilson CJ, Beverlin B, Netoff T. (2011). Chaotic desynchronization as the therapeutic mechanism of deep brain stimulation. Frontiers in systems neuroscience. 5 [PubMed]