[Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposiumon Circuits and Systems, Cairo, Egypt, pp. 121-124.]Search in Google Scholar
[Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics13(5): 834-847.10.1109/TSMC.1983.6313077]Search in Google Scholar
[Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning, Journal of Artificial Intelligence Research2: 287-318.10.1613/jair.135]Search in Google Scholar
[Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, TechnicalReport EDI-INF-RR-0176, University of Edinburgh, Edinburgh.]Search in Google Scholar
[Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine LearningResearch 6: 503-556.]Search in Google Scholar
[Forbes, J. R. N. (2002). Reinforcement Learning for AutonomousVehicles, Ph.D. thesis, University of California, Berkeley, CA.]Search in Google Scholar
[Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th InternationalConference on Machine Learning, Corvallis, OR, USA, pp. 273-280.]Search in Google Scholar
[Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence4(1): 237-285.10.1613/jair.301]Search in Google Scholar
[Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, InternationalJournal of Applied Mathematics and Computer Science21(4): 717-731, DOI: 10.2478/v10006-011-0057-3.10.2478/v10006-011-0057-3]Search in Google Scholar
[Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research4: 1107-1149.]Search in Google Scholar
[Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedingsof the Sixth International Conference on Simulationof Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.]Search in Google Scholar
[Lin, L.-J. (1993). Reinforcement Learning for Robots UsingNeural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.]Search in Google Scholar
[Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). NeuralNetworks Applications, Wrocław University of Technology Press, Wrocław, (in Polish).]Search in Google Scholar
[Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.10.1007/BF00993104]Search in Google Scholar
[Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journalof Artificial Intelligence Research 11: 241-276.10.1613/jair.613]Search in Google Scholar
[Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior1(4): 437-454.10.1177/105971239300100403]Search in Google Scholar
[Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical ReportCSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.]Search in Google Scholar
[Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE2005 International Conference on Systems, Man and Cybernetics,Big Island, HI, USA, pp. 3191-3196.]Search in Google Scholar
[Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.]Search in Google Scholar
[Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the InternationalConference on Robotics and Automation, Washington,DC, USA, pp. 3404-3410.]Search in Google Scholar
[Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh InternationalConference on Machine Learning, Austin, TX, USA, pp. 216-224.]Search in Google Scholar
[Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th InternationalWorkshop on Machine Learning, Evanston, IL, USA, pp. 353-357.]Search in Google Scholar
[Sutton, R. and Barto, A. (1998). Reinforcement Learning: AnIntroduction, MIT Press, Cambridge, MA. 10.1109/TNN.1998.712192]Search in Google Scholar
[Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications36(4): 8032-8039.10.1016/j.eswa.2008.10.056]Search in Google Scholar
[Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge.]Search in Google Scholar
[Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.10.1007/978-3-642-27645-3_10]Search in Google Scholar
[Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of MachineLearning Research 7: 877-917.]Search in Google Scholar
[Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactionson Systems, Man, and Cybernetics, Part B: Cybernetics33(1): 17-27.10.1109/TSMCB.2003.80817918238153]Search in Google Scholar
[Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366. 10.1007/978-3-642-29347-4_42]Search in Google Scholar