Bertsekas D, Tsitsiklis J. (1996). Neuro-dynamic programming.
Bohm N, Kokai G, Mandl S. (2004). Evolving a heuristic function for the game of Tetris Proc Lernen Wissensentdeckung und Adaptivitat.
Demaine ED, Hohenberger S, Liben-nowell D. (2003). Tetris is hard, even to approximate Proc 9th Intl Computing and Combinatorics Conf.
Fahey CP. (2003). Tetris AI Available online at http:--www.colinfahey.com.
Kakade S. (2001). A natural policy gradient Advances in neural information processing systems. 14
Littman ML, Lagoudakis MG, Parr R. (2002). Least-squares methods in reinforcement learning for control SETN Proc 2nd Hellenic Conf AI.
Mannor S, Menache I, Shimkin N. (2005). Basis function adaption in temporal difference reinforcement learning Ann Operat Res. 134
Mannor S, Rubinstein RY, Gat Y. (2003). The cross-entropy method for fast policy search Proc Intl Conf Mach Learn.
Mannor S, de_Boer P, Kroese D, Rubinstein R. (2004). A tutorial on the cross entropy method Ann Operations Res. 134
Ramon J, Driessens K. (2004). On the numeric stability of gaussian processes regression for relational reinforcement learning Workshop on Relational Reinforcement Learning.
van_Roy B, Farias VF. (2006). Tetris: A study of randomized constraint sampling Probabilistic and randomized methods for design under uncertainty.