Barto AG, Sutton RS. (1990). Time-derivative models of Pavlovian reinforcement Learning and computational neuroscience: Foundations of adaptive networks.
Barto AG, Sutton RS. (1998). Reinforcement learning: an introduction.
Baum LE, Petrie T, Soulds G, Weiss N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains Ann Math Stat. 41
Bayer HM, Glimcher PW. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 47 [PubMed]
Bouton ME, Nelson JB. (1998). Mechanisms of feature-positive and feature-negative discrimination learning in an appetitive conditioning paradigm Occasion setting: Associative learning and cognition in animals.
Brown J, Bullock D, Grossberg S. (1999). How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. The Journal of neuroscience : the official journal of the Society for Neuroscience. 19 [PubMed]
Chrisman L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach Proceedings of the Tenth National Conference on Artificial Intelligence .
Das T, Gosavi A, Mahadevan S, Marchalleck N. (1997). Self-improving factory simulation using continuous-time average-reward reinforcement learning Proceedings of the 14th International Conference on Machine Learning.
Das T, Gosavi A, Mahadevan S, Marchalleck N. (1999). Solving semi-Markov decision problems using average reward reinforcement learning Management Science. 45
Daw N, Touretzky D, Skaggs W. (2004). Contrasting neuronal correlates between dorsal and ventral striatum in the rat Cosyne04 Comput Sys Neurosci Abstr. 1
Daw ND. (2003). Reinforcement learning models of the dopamine system and their behavioral implications Unpublished doctoral dissertation.
Daw ND, Kakade S, Dayan P. (2002). Opponent interactions between serotonin and dopamine. Neural networks : the official journal of the International Neural Network Society. 15 [PubMed]
Daw ND, Niv Y, Dayan P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience. 8 [PubMed]
Daw ND, Touretzky DS. (2002). Long-term reward prediction in TD models of the dopamine system. Neural computation. 14 [PubMed]
Dayan P. (2002). Motivated reinforcement learning Advances in neural information processing systems. 14
Dayan P, Balleine BW. (2002). Reward, motivation, and reinforcement learning. Neuron. 36 [PubMed]
Dayan P, Daw ND, Niv Y. (2005). How fast to work: Response vigor, motivation, and tonic dopamine Advances in neural information processing systems. 17
Dayan P, Daw ND, Niv Y. (2006). Actions, values, policies, and the basal ganglia Recent breakthroughs in basal ganglia research.
Dayan P, Kakade S. (2000). Acquisition in autoshaping Advances in neural information processing systems. 12
Dayan P, Niv Y, Duff MO. (2004). The effects of uncertainty on TD learning Cosyne04-Comput Sys Neurosci Abstr. 1
Dayan P, Zemel R, Huys Q, Natarajan R. (2004). Probabilistic computation in spiking neurons Advances in neural information processing systems. 17
Dempster AP, Laird NM, Rubin DB. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B. 39
Deneve S. (2004). Bayesian inference in spiking neurons Advances in neural information processing systems. 17
Dickinson A, Balleine B. (2002). The role of learning in motivation Stevens handbook of experimental psychology (3rd ed). 3
Dickinson A, Hall G, Mackintosh NJ. (1976). Surprise and the attenuation of blocking J Exp Psychol: Animal Behav Process. 2
Dickinson A, Mackintosh NJ. (1979). Reinforcer specificity in the enhancement of conditioning by posttrial surprise J Exp Psychol: Animal Behav Process. 5
Dickinson A, Smith J, Mirenowicz J. (2000). Dissociation of Pavlovian and instrumental incentive learning under dopamine antagonists. Behavioral neuroscience. 114 [PubMed]
Doya K. (1999). What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural networks : the official journal of the International Neural Network Society. 12 [PubMed]
Doya K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current opinion in neurobiology. 10 [PubMed]
Duff MO, Bradtke SJ. (1995). Reinforcement learning methods for continuous-time Markov decision problems Advances in neural information processing systems. 7
Faure A, Haberland U, Condé F, El Massioui N. (2005). Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. The Journal of neuroscience : the official journal of the Society for Neuroscience. 25 [PubMed]
Fiorillo CD, Tobler PN, Schultz W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science (New York, N.Y.). 299 [PubMed]
Gallistel CR, Gibbon J. (2000). Time, rate, and conditioning. Psychological review. 107 [PubMed]
Gallistel CR, King A, McDonald R. (2004). Sources of variability and systematic error in mouse timing behavior. Journal of experimental psychology. Animal behavior processes. 30 [PubMed]
Gibbon J. (1977). Scalar expectancy theory and Weber's law in animal timing Psychol Rev. 84
Gibbon J, Mellon RC, Leak TM, Fairhurst S. (1995). Timing processes in the reinforcement-omission effect Animal Learn Behav. 23
Gold JI, Shadlen MN. (2002). Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron. 36 [PubMed]
Guedon Y, Cocozza-Thivent C. (1990). Explicit state occupancy modeling by hidden semi-Markov models: Application of Derin's scheme Computer Speech And Language. 4
Holland PC. (1988). Excitation and inhibition in unblocking. Journal of experimental psychology. Animal behavior processes. 14 [PubMed]
Holland PC, Kenmuir C. (2005). Variations in unconditioned stimulus processing in unblocking. Journal of experimental psychology. Animal behavior processes. 31 [PubMed]
Holland PC, Lamoureux JA, Han JS, Gallagher M. (1999). Hippocampal lesions interfere with Pavlovian negative occasion setting. Hippocampus. 9 [PubMed]
Hollerman JR, Schultz W. (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature neuroscience. 1 [PubMed]
Houk JC, Adams JL, Barto AGA. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. Models Of Information Processing In The Basal Ganglia.
Joel D, Niv Y, Ruppin E. (2002). Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural networks : the official journal of the International Neural Network Society. 15 [PubMed]
Kaelbling LP, Littman ML, Cassandra AR. (1998). Planning and acting in partially observable stochastic domains Art Intell. 101
Kakade S, Dayan P. (2002). Acquisition and extinction in autoshaping. Psychological review. 109 [PubMed]
Killeen PR, Fetterman JG. (1988). A behavioral theory of timing. Psychological review. 95 [PubMed]
Kurth-nelson Z, Redish A. (2004). µagents: Action-selection in temporally dependent phenomena using temporal difference learning over a collective belief structure Soc Neurosci Abstr. 30
Levinson SE. (1986). Continuously variable duration hidden Markov models for automatic speech recognition Computer Speech And Language. 1
Lewicki MS. (2002). Efficient coding of natural sounds. Nature neuroscience. 5 [PubMed]
Lewicki MS, Olshausen BA. (1999). A probabilistic framework for the adaptation and comparison of image codes J Opt Soc Am A: Optics, Image, Science And Vision. 16
Lewicki MS, Olshausen BA, Rao RPN. (2002). Probabilistic models of the brain: Perception and neural function.
Ljungberg T, Apicella P, Schultz W. (1992). Responses of monkey dopamine neurons during learning of behavioral reactions. Journal of neurophysiology. 67 [PubMed]
Machado A. (1997). Learning the temporal dynamics of behavior. Psychological review. 104 [PubMed]
Matell MS, Meck WH. (1999). Reinforcement-induced within-trial resetting of an internal clock. Behavioural processes. 45 [PubMed]
McClure SM, Daw ND, Montague PR. (2003). A computational substrate for incentive salience. Trends in neurosciences. 26 [PubMed]
Mirenowicz J, Schultz W. (1996). Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature. 379 [PubMed]
Montague PR, Dayan P, Sejnowski TJ. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. The Journal of neuroscience : the official journal of the Society for Neuroscience. 16 [PubMed]
Moore AW, Atkeson CG. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time Mach Learn. 13
Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron. 43 [PubMed]
Niv Y, Duff MO, Dayan P. (2005). Dopamine, uncertainty and TD learning. Behavioral and brain functions : BBF. 1 [PubMed]
O'Doherty J et al. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science (New York, N.Y.). 304 [PubMed]
Owen AM. (1997). Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Progress in neurobiology. 53 [PubMed]
Pan WX, Schmidt R, Wickens JR, Hyland BI. (2005). Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. The Journal of neuroscience : the official journal of the Society for Neuroscience. 25 [PubMed]
Parkinson JA et al. (2002). Nucleus accumbens dopamine depletion impairs both acquisition and performance of appetitive Pavlovian approach behaviour: implications for mesoaccumbens dopamine function. Behavioural brain research. 137 [PubMed]
Rao RPN. (2004). Hierarchical Bayesian inference in networks of spiking neurons Advances in neural information processing systems. 17
Satoh T, Nakai S, Sato T, Kimura M. (2003). Correlated coding of motivation and outcome of decision by dopamine neurons. The Journal of neuroscience : the official journal of the Society for Neuroscience. 23 [PubMed]
Schultz W. (1998). Predictive reward signal of dopamine neurons. Journal of neurophysiology. 80 [PubMed]
Schultz W, Apicella P, Ljungberg T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. The Journal of neuroscience : the official journal of the Society for Neuroscience. 13 [PubMed]
Schultz W, Dayan P, Montague PR. (1997). A neural substrate of prediction and reward. Science (New York, N.Y.). 275 [PubMed]
Schultz W, Fiorillo CD. (2001). The reward responses of dopamine neurons persist when prediction of reward is probabilistic with respect to time or occurrence Soc Neurosci Abstr. 27
Schultz W, Romo R. (1990). Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. Journal of neurophysiology. 63 [PubMed]
Smith AJ, Becker S, Kapur S. (2005). A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural computation. 17 [PubMed]
Staddon JE, Cerutti DT. (2003). Operant conditioning. Annual review of psychology. 54 [PubMed]
Staddon JE, Higa JJ. (1999). Time and memory: towards a pacemaker-free theory of interval timing. Journal of the experimental analysis of behavior. 71 [PubMed]
Staddon JE, Innis NK. (1969). Reinforcement omission on fixed-interval schedules. Journal of the experimental analysis of behavior. 12 [PubMed]
Suri RE. (2001). Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Experimental brain research. 140 [PubMed]
Suri RE, Schultz W. (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Experimental brain research. 121 [PubMed]
Suri RE, Schultz W. (1999). A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience. 91 [PubMed]
Sutton RS. (1984). Temporal credit assignment in reinforcement learning Unpublished doctoral dissertation.
Sutton RS. (1988). Learning to predict by the method of temporal diferences Machine Learning. 3
Sutton RS. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming Proceedings of the Seventh International Conference on Machine Learning.
Szita I, Lorincz A. (2004). Kalman filter control embedded into the reinforcement learning framework. Neural computation. 16 [PubMed]
Tobler PN, Dickinson A, Schultz W. (2003). Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. The Journal of neuroscience : the official journal of the Society for Neuroscience. 23 [PubMed]
Touretzky DS, Courville AC. (2001). Modeling temporal structure in classical conditioning Advances in neural information processing systems. 14
Touretzky DS, Daw ND, Courville AC. (2004). Similarity and discrimination in classical conditioning: A latent variable account Advances in neural information processing systems. 17
Touretzky DS, Daw ND, Courville AC, Gordon GJ. (2003). Model uncertainty in classical conditioning Advances in neural information processing systems. 16
Tsitsiklis JN, Van_Roy B. (2002). On average versus discounted reward temporal-difference learning Mach Learn. 49
Ungless MA, Magill PJ, Bolam JP. (2004). Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science (New York, N.Y.). 303 [PubMed]
Voorn P, Vanderschuren LJ, Groenewegen HJ, Robbins TW, Pennartz CM. (2004). Putting a spin on the dorsal-ventral divide of the striatum. Trends in neurosciences. 27 [PubMed]
Waelti P, Dickinson A, Schultz W. (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature. 412 [PubMed]
Yin H, Barnet RC, Miller RR. (1994). Second-order conditioning and Pavlovian conditioned inhibition: operational similarities and differences. Journal of experimental psychology. Animal behavior processes. 20 [PubMed]
Fuhs MC, Touretzky DS. (2007). Context learning in the rodent hippocampus. Neural computation. 19 [PubMed]
Rivest F, Kalaska JF, Bengio Y. (2010). Alternative time representation in dopamine models. Journal of computational neuroscience. 28 [PubMed]