Barto A, Bradtke S. (1996). Linear least-squares algorithms for temporal difference learning Mach Learn. 22
Barto AG, Crites RH. (1996). Elevator group control using multiple reinforcement learning agents Mach Learn. 33
Barto AG, Mahadevan S. (2003). Recent advances in hierarchical reinforcement learning Discrete Event Dynamic Systems. 13
Barto AG, Sutton RS. (1998). Reinforcement learning: an introduction.
Barto AG, Sutton RS, Anderson CW. (1983). Neuronlike elements that can solve difficult learning control problems IEEE Trans Systems Man Cybern. 13
Boutilier C, Poole D. (1996). Computing optimal policies for partially observable decision processes using compact representations Proc 13th Natl Conf Art Intel.
Bowling M, Veloso M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning Tech Rep No CS-00-165 Carnegie Mellon.
Brafman RI. (1997). A heuristic variable grid solution method for POMDPs Proc 14th Natl Conf Art Intel.
Chrisman L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach Proceedings of the Tenth National Conference on Artificial Intelligence .
Claus C, Boutilier C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems Proc 15th Natl Conf Art Intel.
Dahl FA. (2002). The lagging anchor algorithm: Reinforcement learning in two player zero-sum games with imperfect information Mach Learn. 49
Dayan P, Watkins C. (1992). Q-learning Mach Learn. 8
Doya K, Morimoto J. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning Robotics And Autonomous Systems. 36
Doya K, Samejima K, Katagiri K, Kawato M. (2002). Multiple model-based reinforcement learning. Neural computation. 14 [PubMed]
Emery-montemerlo R, Gordon G, Schneider J. (2004). Approximate solutions for partially observable stochastic games with common payoffs Proc 3rd Intl Joint Conf Autonomous Agents and Multi-Agent Systems.
Freeverse Software. (2004). 3D Hearts deluxe.
Fudenberg D, Tirole J. (1991). Game Theory.
Hansen EA, Bernstein DS, Zilberstein S. (2004). Dynamic programming for partially observable stochastic games Proc 19th Natl Conf Art Intel.
Hauskrecht M. (2000). Value-function approximations for partially observable Markov decision processes J Artif Intell Res. 13
Hayashi A, Suematsu N. (2002). A multiagent reinforcement learning algorithm using extended optimal response Proc 1st Intl Joint Conf Auto Agents and Multi-Agent Systems.
Hu J, Wellman MP. (2003). Nash Q-learning for general-sum stochastic games J Mach Learn Res. 4
Ishii S et al. (2005). A reinforcement learning scheme for a partially-observable multi-agent game Mach Learn. 59
Kaelbling LP, Chang YH, Ho T. (2003). All learning is local: Multi-agent learning in global reward games Advances in neural information processing systems. 16
Kaelbling LP, Littman ML, Cassandra AR. (1995). Learning policies for partially observable environments: Scaling up Proc 12th Intl Conf Mach Learn.
Kaelbling LP, Littman ML, Cassandra AR. (1998). Planning and acting in partially observable stochastic domains Art Intell. 101
Kaelbling LP, Littman ML, Moore AW. (1996). Reinforcement learning: A survey J Art Intell Res. 4
Kaelbling LP, Meuleau N, Peshkin L. (1999). Learning policies with external memory Proc 16th Intl Conf Mach Learn.
Kaelbling LP, Meuleau N, Peshkin L, Kim KE. (2000). Learning finite state controllers for partially observable environments Proc 15th Ann Conf Uncertainty in Artificial Intelligence.
Littman ML. (1994). Markov games as a framework for multi-agent reinforcement learning Proceedings of the Eleventh International Conference on Machine Learning.
Littman ML, Majercik SM. (1997). Large-scale planning under uncertainty: A survey Paper presented at the NASA Workshop on Planning and Scheduling in Space.
Mahadevan S, Theocharous G. (2002). Approximate planning with hierarchical partially observable Markov decision process models for robot navigation Proc IEEE Intl Conf Robots and Automation.
Mccallum A. (1993). Overcoming incomplete perception with Util distinction memory Proc 10th Intl Conf Mach Learn.
Moore AW, Atkeson CG. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time Mach Learn. 13
Nair R, Marsella S, Tambe M, Pynadath D, Yokoo M. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings Proc 18th Intl Joint Conf Art Intel.
Nakamura Y, Ishii S, Mori T. (2004). Reinforcement learning for CPG-driven biped robot Proc 19th Natl Conf Art Intel.
Nikovski D, Nourbakhsh I. (2000). Learning probabilistic models for decision theoretic navigation of mobile robots Proc 17th Intl Conf Mach Learn.
Perkins T. (1998). Two search techniques for imperfect information games and application to hearts (Tech. Rep.).
Pfahringer B, Kaindl H, Kramer S, Furnkranz J. (1999). Learning to make good use of operational advice Paper presented at the International Conference on Machine Learning, Workshop on Machine Learning in Game Playing.
Richardson S, Gilks WR, Spiegelhalter DJ. (1996). Markov chain Monte Carlo in practice.
Sato M, Ishii S. (2000). On-line EM algorithm for the normalized gaussian network. Neural computation. 12 [PubMed]
Sato M, Ishii S, Yoshimoto J. (2003). System identification based on on-line variational Bayes method and its application to reinforcement learning Proc Intl Conf Art Neural Netw Neural Inform Process. 2714
Sejnowski TJ, Dayan P, Schraudolph NN. (2001). Learning to evaluate go positions via temporal difference methods (Tech. Rep.).
Shani G. (2004). A survey of model-based and model-free methods for resolving perceptual aliasing Tech Rep Department of Computer Science Ben-Gurion University.
Shoham Y, Powers R, Grenager T. (2004). Multi-agent reinforcement learning: A critical survey Paper presented at the Proceedings of AAAI Fall Symposium on Artificial Multi-Agent Learning.
Singh SP, Bertsekas D. (1996). Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances In Neural Information Processing Systems. 9
Singh SP, Jordan MI, Jaakkola T. (1994). Learning without state-estimation in partially observable Markovian decision processes Proc 11th Intl Conf Mach Learn.
Singh SP, Loch J. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes Proc 15th Intl Conf Mach Learn.
Smallwood RD, Sondik EJ. (1973). The optimal control of partially observable processes over a finite horizon Operations Res. 21
Sturtevant NR. (2003). Multi-player games: Algorithms and approaches Unpublished doctoral dissertation, University of California, Los Angeles.
Sutton RS. (1988). Learning to predict by the method of temporal diferences Machine Learning. 3
Sutton RS. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming Proceedings of the Seventh International Conference on Machine Learning.
Tesauro G. (1994). TD-gammon, a self-teaching backgammon program, achieves master-level play Neural Comput. 6
Thrun S. (2000). Monte Carlo POMDPs Advances in neural information processing systems. 12
Thrun S, Gordon G, Pineau J. (2003). Point-based value iteration: An anytime algorithm for POMDPs Proc 18th Intl Joint Conf Art Intel.
Veloso MM, Stone P. (2000). Multiagent systems: A survey from a machine learning perspective Auto Rob. 8
Wang X, Sandholm T. (2003). Reinforcement learning to play an optimal Nash equilibrium in team Markov games Advances in neural information processing systems. 15
White AM, Sturtevant NR. (2006). Feature construction for reinforcement learning in hearts Proc 5th Intl Conf Learn Games.
Whitehead SD, Lin LJ. (1995). Reinforcement learning of non-Markov decision processes Artificial Intel. 73
Williams RJ. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning Mach Learn. 8