[Ashby, W. R. 1960. Design for a Brain. Springer Science & Business Media.10.1007/978-94-015-1320-3]Search in Google Scholar
[Barto, A. G.; Singh, S.; and Chentanez, N. 2004. Intrinsically motivated learning of hierarchical collections of skills. In Proc. 3rd Int. Conf. Development Learn, 112-119.]Search in Google Scholar
[Cañamero, D. 1997. Modeling motivations and emotions as a basis for intelligent behavior. In Proceedings of the first international conference on Autonomous agents, 148-155. ACM.10.1145/267658.267688]Search in Google Scholar
[Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford, UK.]Search in Google Scholar
[Dayan, P., and Hinton, G. E. 1996. Varieties of Helmholtz machine. Neural Networks 9(8):1385-1403.10.1016/S0893-6080(96)00009-3]Search in Google Scholar
[Doya, K., and Uchibe, E. 2005. The cyber rodent project: Exploration of adaptive mechanisms for self-preservation and self-reproduction. Adaptive Behavior 13(2):149-160.10.1177/105971230501300206]Search in Google Scholar
[Elfwing, S.; Uchibe, E.; Doya, K.; and Christensen, H. I. 2005. Biologically inspired embodied evolution of survival. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 3, 2210-2216. IEEE.]Search in Google Scholar
[Hester, T., and Stone, P. 2012. Learning and using models. In Reinforcement Learning. Springer. 111-141.10.1007/978-3-642-27645-3_4]Search in Google Scholar
[Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; and Saul, L. K. 1999. An introduction to variational methods for graphical models. Machine learning 37(2):183-233.10.1023/A:1007665907178]Open DOISearch in Google Scholar
[Kaelbling, L. P.; Littman, M. L.; and Moore, A. W. 1996. Reinforcement learning: A survey. arXiv preprint cs/9605103.10.1613/jair.301]Search in Google Scholar
[Kappen, H. J.; G´omez, V.; and Opper, M. 2012. Optimal control as a graphical model inference problem. Machine learning 87(2):159-182.10.1007/s10994-012-5278-7]Search in Google Scholar
[Keramati, M., and Gutkin, B. S. 2011. A reinforcement learning theory for homeostatic regulation. In Advances in Neural Information Processing Systems, 82-90.]Search in Google Scholar
[Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.]Search in Google Scholar
[Kingma, D. P., and Welling, M. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.]Search in Google Scholar
[Konidaris, G., and Barto, A. 2006. An adaptive robot motivational system. In From Animals to Animats 9. Springer. 346-356.10.1007/11840541_29]Search in Google Scholar
[Lange, S.; Riedmiller, M.; and Voigtlander, A. 2012. Autonomous reinforcement learning on raw visual input data in a real world application. In Neural Networks (IJCNN), The 2012 International Joint Conference on, 1-8. IEEE.10.1109/IJCNN.2012.6252823]Search in Google Scholar
[Lin, L.-J. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8(3-4):293-321.10.1007/BF00992699]Open DOISearch in Google Scholar
[McFarland, D., and B¨osser, T. 1993. Intelligent behavior in animals and robots. MIT Press.]Search in Google Scholar
[McFarland, D., and Houston, A. 1981. Quantitative ethology. Pitman Advanced Pub. Program.]Search in Google Scholar
[McFarland, D., and Spier, E. 1997. Basic cycles, utility and opportunism in self-sufficient robots. Robotics and Autonomous Systems 20(2):179-190.10.1016/S0921-8890(96)00069-3]Open DOISearch in Google Scholar
[Meyer, J.-A., and Guillot, A. 1991. Simulation of adaptive behavior in animats: Review and prospect. In In J.-A. Meyer and S.W. Wilson (Eds.) From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, 2-14.]Search in Google Scholar
[Mnih, A., and Gregor, K. 2014. Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030.]Search in Google Scholar
[Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. Nature 518(7540):529-533.10.1038/nature1423625719670]Search in Google Scholar
[Nakamura, M., and Yamakawa, H. 2016. A Game-Engine-Based Learning Environment Framework for Artificial General Intelligence. In International Conference on Neural Information Processing, 351-356. Springer.10.1007/978-3-319-46687-3_39]Search in Google Scholar
[Ng, A. Y.; Harada, D.; and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, 278-287.]Search in Google Scholar
[Ogata, T., and Sugano, S. 1997. Emergence of Robot Behavior Based on Self-Preservation. Research Methodology and Embodiment of Mechanical System. Journal of the Robotics Society of Japan 15(5):710-721.10.7210/jrsj.15.710]Search in Google Scholar
[Omohundro, Stephen M, S. M. 2008. The Basic AI Drives. In Artificial General Intelligence, 2008: Proceedings of the First AGI Conference, volume 171, 483. IOS Press.]Search in Google Scholar
[Pfeifer, R., and Scheier, C. 1999. Understanding intelligence. MIT press.]Search in Google Scholar
[Ranganath, R.; Gerrish, S.; and Blei, D. M. 2013. Black box variational inference. arXiv preprint arXiv:1401.0118.]Search in Google Scholar
[Rawlik, K.; Toussaint, M.; and Vijayakumar, S. 2013. On stochastic optimal control and reinforcement learning by approximate inference. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, 3052-3056. AAAI Press.10.15607/RSS.2012.VIII.045]Search in Google Scholar
[Rummery, G. A., and Niranjan, M. 1994. On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering.]Search in Google Scholar
[Rusu, A. A.; Vecerik, M.; Roth¨orl, T.; Heess, N.; Pascanu, R.; and Hadsell, R. 2016. Sim-to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286.]Search in Google Scholar
[Sibly, R., and McFarland, D. 1976. On the fitness of behavior sequences. American Naturalist 601-617.10.1086/283093]Open DOISearch in Google Scholar
[Spier, E. 1997. From reactive behaviour to adaptive behaviour: motivational models for behaviour in animals and robots. Ph.D. Dissertation, University of Oxford.]Search in Google Scholar
[Toda, M. 1962. The design of a fungus-eater: A model of human behavior in an unsophisticated environment. Behavioral Science 7(2):164-183.10.1002/bs.3830070203]Open DOISearch in Google Scholar
[Toda, M. 1982. Man, robot, and society: Models and speculations. M. Nijhoff Pub.10.1007/978-94-017-5358-6]Open DOISearch in Google Scholar
[Todorov, E. 2008. General duality between optimal control and estimation. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, 4286-4292. IEEE.10.1109/CDC.2008.4739438]Search in Google Scholar
[Toussaint, M.; Harmeling, S.; and Storkey, A. 2006. Probabilistic inference for solving (PO) MDPs. Informatics research report 0934, University of Edinburgh.]Search in Google Scholar
[Toussaint, M. 2009. Robot trajectory optimization using approximate inference. In Proceedings of the 26th Annual International Conference on Machine Learning, 1049-1056. ACM.10.1145/1553374.1553508]Search in Google Scholar
[Vlassis, N., and Toussaint, M. 2009. Model-free reinforcement learning as mixture learning. In Proceedings of the 26th Annual International Conference on Machine Learning, 1081-1088. ACM.10.1145/1553374.1553512]Search in Google Scholar
[Walter, W. 1953. The living brain. Norton.]Search in Google Scholar
[Young, J. Z. 1966. The Memory System of the Brain. Oxford University Press.10.1525/9780520346468]Search in Google Scholar