Watkins CJCH. (1989). Learning from delayed rewards Unpublished doctoral dissertation.

See more from authors: Watkins CJCH

References and models cited by this paper
References and models that cite this paper

Kato A, Morita K. (2016). Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS computational biology. 12 [PubMed]

Morita K, Kato A. (2014). Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Frontiers in neural circuits. 8 [PubMed]

Porr B, Wörgötter F. (2006). Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only. Neural computation. 18 [PubMed]

Richmond P, Buesing L, Giugliano M, Vasilaki E. (2011). Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PloS one. 6 [PubMed]

Wörgötter F, Porr B. (2005). Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural computation. 17 [PubMed]

This website requires cookies and limited processing of your personal data in order to function. By continuing to browse or otherwise use this site, you are agreeing to this use. See our Privacy policy and how to cite and terms of use.