Baras D. (2006). Direct policy search in reinforcement learning and synaptic plasticity in biological neural networks Unpublished masters thesis Technion. Available onlineat http:--www.ee.technion.ac.il-rmeir-BarasThesis06.pdf.
Bartlett P, Baxter J. (1999). Hebbian synaptic modifications in spiking neurons that learn Tech Rep.
Bartlett PL, Baxter J. (2001). Infinite-horizon policy-gradient estimation J Artif Intell Res. 15
Barto AG, Sutton RS. (1998). Reinforcement learning: an introduction.
Bertsekas D, Tsitsiklis J. (1996). Neuro-dynamic programming.
Bienenstock EL, Cooper LN, Munro PW. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience. 2 [PubMed]
Haykin S. (1999). Neural Networks: A Comprehensive Foundation (2nd Ed).
Izhikevich EM, Desai NS. (2003). Relating STDP to BCM. Neural computation. 15 [PubMed]
Kistler WM, Gerstner W. (2002). Spiking neuron models.
Koch C. (1999). Biophysics Of Computation: Information Processing in Single Neurons.
Rao RP, Sejnowski TJ. (2001). Spike-timing-dependent Hebbian plasticity as temporal difference learning. Neural computation. 13 [PubMed]
Richardson MJ, Melamed O, Silberberg G, Gerstner W, Markram H. (2005). Short-term synaptic plasticity orchestrates the response of pyramidal cells and interneurons to population bursts. Journal of computational neuroscience. 18 [PubMed]
Schultz W. (2002). Getting formal with dopamine and reward. Neuron. 36 [PubMed]
Seung HS. (2003). Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron. 40 [PubMed]
Shouval HZ, Intrator N, Cooper L, Blais BB. (2004). Theory of cortical plasticity.
Toyoizumi T, Pfister JP, Aihara K, Gerstner W. (2005). Generalized Bienenstock-Cooper-Munro rule for spiking neurons that maximizes information transmission. Proceedings of the National Academy of Sciences of the United States of America. 102 [PubMed]
Tsitsiklis JN, Konda VR. (2003). On actor-critic algorithms SIAM J Control Optim. 42
Wörgötter F, Porr B. (2005). Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural computation. 17 [PubMed]
Xie X, Seung HS. (2004). Learning in neural networks by reinforcement of irregular spiking. Physical review. E, Statistical, nonlinear, and soft matter physics. 69 [PubMed]
Legenstein R, Pecevski D, Maass W. (2008). A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS computational biology. 4 [PubMed]
Richmond P, Buesing L, Giugliano M, Vasilaki E. (2011). Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PloS one. 6 [PubMed]