What’s Next if Reward is Enough? Insights for AGI from Animal Reinforcement Learning

Adam, S.; Busoniu, L.; and Babuska, R. 2012. Experience Replay for Real-Time Reinforcement Learning Control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(2):201–212. Conference Name: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). Search in Google Scholar

Adler, A.; Katabi, S.; Finkes, I.; Israel, Z.; Prut, Y.; and Bergman, H. 2012. Temporal Convergence of Dynamic Cell Assemblies in the Striato-Pallidal Network. Journal of Neuroscience 32(7):2473–2484. Publisher: Society for Neuroscience Section: Articles. Search in Google Scholar

Alayrac, J.-B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; Ring, R.; Rutherford, E.; Cabi, S.; Han, T.; Gong, Z.; Samangooei, S.; Monteiro, M.; Menick, J.; Borgeaud, S.; Brock, A.; Nematzadeh, A.; Sharifzadeh, S.; Binkowski, M.; Barreira, R.; Vinyals, O.; Zisserman, A.; and Simonyan, K. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. arXiv:2204.14198 [cs]. Search in Google Scholar

Alonso, E., and Schmajuk, N. 2012. Special issue on computational models of classical conditioning guest editors’ introduction. Learning & Behavior 40(3):231–240. Search in Google Scholar

Balleine, B. W., and O’Doherty, J. P. 2010. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action. Neuropsychopharmacology 35(1):48–69. Number: 1 Publisher: Nature Publishing Group. Search in Google Scholar

Barto, A. G. Adaptive Critics and the Basal Ganglia. 20. Search in Google Scholar

Barto, A. G. 2013. Intrinsic Motivation and Reinforcement Learning. In Baldassarre, G., and Mirolli, M., eds., Intrinsically Motivated Learning in Natural and Artificial Systems. Berlin, Heidelberg: Springer. 17–47. Search in Google Scholar

Boyd, R.; Richerson, P. J.; and Henrich, J. 2011. The cultural niche: Why social learning is essential for human adaptation. Proceedings of the National Academy of Sciences 108(supplement 2):10918–10925. Publisher: Proceedings of the National Academy of Sciences. Search in Google Scholar

Bramlage, L., and Cortese, A. 2022. Generalized attention-weighted reinforcement learning. Neural Networks 145:10–21. Search in Google Scholar

Buetti-Dinh, A.; Galli, V.; Bellenberg, S.; Ilie, O.; Herold, M.; Christel, S.; Boretska, M.; Pivkin, I. V.; Wilmes, P.; Sand, W.; Vera, M.; and Dopson, M. 2019. Deep neural networks outperform human expert’s capacity in characterizing bioleaching bacterial biofilm composition. Biotechnology Reports 22:e00321. Search in Google Scholar

Byrnes, S. 2021. Reward Is Not Enough - LessWrong. Search in Google Scholar

Chang, S. W. C.; Winecoff, A. A.; and Platt, M. L. 2011. Vicarious reinforcement in rhesus macaques (macaca mulatta). Frontiers in Neuroscience 5:27. Search in Google Scholar

Cheng, C.-A.; Kolobov, A.; and Agarwal, A. 2020. Policy Improvement via Imitation of Multiple Oracles. arXiv:2007.00795 [cs, stat]. Search in Google Scholar

Chentanez, N.; Barto, A.; and Singh, S. 2004. Intrinsically Motivated Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 17. MIT Press. Search in Google Scholar

Cook, M.; Mineka, S.; Wolkenstein, B.; and Laitsch, K. 1985. Observational conditioning of snake fear in unrelated rhesus monkeys. Journal of Abnormal Psychology 94(4):591–610. Place: US Publisher: American Psychological Association. Search in Google Scholar

Danner, F. W., and Lonky, E. 1981. A Cognitive-Developmental Approach to the Effects of Rewards on Intrinsic Motivation. Child Development 52(3):1043–1052. Publisher: [Wiley, Society for Research in Child Development]. Search in Google Scholar

Daw, N. D.; Courville, A. C.; and Touretzky, D. S. 2006. Representation and Timing in Theories of the Dopamine System. Neural Computation 18(7):1637–1677. Search in Google Scholar

Daw, N. D.; Niv, Y.; and Dayan, P. 2005. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8(12):1704–1711. Number: 12 Publisher: Nature Publishing Group. Search in Google Scholar

Dawson, E. H.; Avarguès-Weber, A.; Chittka, L.; and Leadbeater, E. 2013. Learning by Observation Emerges from Simple Associations in an Insect Model. Current Biology 23(8):727–730. Search in Google Scholar

de Bruin, T.; Tuyls, K.; Kober, J.; and Babuška, R. 2015. The importance of experience replay database composition in deep reinforcement learning. 9. Search in Google Scholar

Deci, E. L., and Ryan, R. M. 1985. Conceptualizations of Intrinsic Motivation and Self-Determination. In Deci, E. L., and Ryan, R. M., eds., Intrinsic Motivation and Self-Determination in Human Behavior, Perspectives in Social Psychology. Boston, MA: Springer US. 11–40. Search in Google Scholar

DeYoung, C. G. 2013. The neuromodulator of exploration: A unifying theory of the role of dopamine in personality. Frontiers in Human Neuroscience 7. Place: Switzerland Publisher: Frontiers Media S.A. Search in Google Scholar

Di Domenico, S. I., and Ryan, R. M. 2017. The Emerging Neuroscience of Intrinsic Motivation: A New Frontier in Self-Determination Research. Frontiers in Human Neuroscience 11:145. Search in Google Scholar

Doll, B. B.; Simon, D. A.; and Daw, N. D. 2012. The ubiquity of model-based reinforcement learning. Current Opinion in Neurobiology 22(6):1075–1081. Search in Google Scholar

Fiorito, G., and Scotto, P. 1992. Observational Learning in Octopus vulgaris. Science 256(5056):545–547. Publisher: American Association for the Advancement of Science. Search in Google Scholar

Fjelland, R. 2020. Why general artificial intelligence will not be realized. Humanities and Social Sciences Communications 7(1):1–9. Number: 1 Publisher: Palgrave. Search in Google Scholar

Forestier, S.; Portelas, R.; Mollard, Y.; and Oudeyer, P.-Y. 2022. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. arXiv:1708.02190 [cs]. Search in Google Scholar

Foster, D. J., and Wilson, M. A. 2006. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084):680–683. Search in Google Scholar

Gershman, S. J., and Niv, Y. 2012. Exploring a latent cause theory of classical conditioning. Learning & Behavior 40(3):255–268. Search in Google Scholar

Gershman, S. J.; Markman, A. B.; and Otto, A. R. 2014. Retrospective revaluation in sequential decision making: a tale of two systems. Journal of Experimental Psychology. General 143(1):182–194. Search in Google Scholar

Gershman, S. J.; Moustafa, A. A.; and Ludvig, E. A. 2014. Time representation in reinforcement learning models of the basal ganglia. Frontiers in Computational Neuroscience 7:194. Search in Google Scholar

Gershman, S. J.; Norman, K. A.; and Niv, Y. 2015. Discovering latent causes in reinforcement learning. Current Opinion in Behavioral Sciences 5:43–50. Search in Google Scholar

Glimcher, P. W. 2011. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences of the United States of America 108 Suppl 3:15647–15654. Search in Google Scholar

Gupta, A.; Mendonca, R.; Liu, Y.; Abbeel, P.; and Levine, S. 2018. Meta-Reinforcement Learning of Structured Exploration Strategies. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc. Search in Google Scholar

Gurland, S. T., and Grolnick, W. S. 2003. Children’s Expectancies and Perceptions of Adults: Effects on Rapport. Child Development 74:1212–1224. Place: United Kingdom Publisher: Blackwell Publishing. Search in Google Scholar

Harlow, H. F. 1950. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. Journal of Comparative and Physiological Psychology 43:289–294. Place: US Publisher: American Psychological Association. Search in Google Scholar

Heyes, C. 2012. What’s social about social learning? Journal of Comparative Psychology 126(2):193–202. Place: US Publisher: American Psychological Association. Search in Google Scholar

Ho-Phuoc, T. 2019. CIFAR10 to Compare Visual Recognition Performance between Deep Neural Networks and Humans. arXiv:1811.07270 [cs]. arXiv: 1811.07270. Search in Google Scholar

Holland, P. C. 2004. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. Journal of Experimental Psychology. Animal Behavior Processes 30(2):104–117. Search in Google Scholar

Houthooft, R.; Chen, X.; Duan, Y.; Schulman, J.; De Turck, F.; and Abbeel, P. 2017. VIME: Variational Information Maximizing Exploration. arXiv:1605.09674 [cs, stat]. arXiv: 1605.09674. Search in Google Scholar

Jones, S. H.; Gray, J. A.; and Hemsley, D. R. 1990. The Kamin blocking effect, incidental learning and psychoticism. British Journal of Psychology (London, England: 1953) 81 ( Pt 1):95–109. Search in Google Scholar

Kahneman, D. 2011. Thinking, fast and slow. Thinking, fast and slow. New York, NY, US: Farrar, Straus and Giroux. Pages: 499. Search in Google Scholar

Leadbeater, E., and Dawson, E. H. 2017. A social insect perspective on the evolution of social learning mechanisms. Proceedings of the National Academy of Sciences 114(30):7838–7845. Publisher: Proceedings of the National Academy of Sciences. Search in Google Scholar

Leong, Y. C.; Radulescu, A.; Daniel, R.; DeWoskin, V.; and Niv, Y. 2017. Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments. Neuron 93(2):451–463. Search in Google Scholar

Lind, J.; Ghirlanda, S.; and Enquist, M. 2019. Social learning through associative processes: a computational theory. Royal Society Open Science 6(3):181777. Search in Google Scholar

Ludvig, E. A.; Sutton, R. S.; and Kehoe, E. J. 2008. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System. Neural Computation 20(12):3034–3054. Search in Google Scholar

Ludvig, E. A.; Sutton, R. S.; and Kehoe, E. J. 2012. Evaluating the TD model of classical conditioning. Learning & Behavior 40(3):305–319. Search in Google Scholar

Maia, T. V. 2009. Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9(4):343–364. Search in Google Scholar

Mohamed, S., and Rezende, D. J. 2015. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. arXiv:1509.08731 [cs, stat]. arXiv: 1509.08731. Search in Google Scholar

Momennejad, I.; Russek, E. M.; Cheong, J. H.; Botvinick, M. M.; Daw, N. D.; and Gershman, S. J. 2017. The successor representation in human reinforcement learning. Nature Human Behaviour 1(9):680–692. Number: 9 Publisher: Nature Publishing Group. Search in Google Scholar

Montague, P. R.; Dayan, P.; Person, C.; and Sejnowski, T. J. 1995. Bee foraging in uncertain environments using predictive hebbian learning. Nature 377(6551):725–728. Number: 6551 Publisher: Nature Publishing Group. Search in Google Scholar

Nagabandi, A.; Clavera, I.; Liu, S.; Fearing, R. S.; Abbeel, P.; Levine, S.; and Finn, C. 2019. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv:1803.11347 [cs, stat]. Search in Google Scholar

Ndousse, K.; Eck, D.; Levine, S.; and Jaques, N. 2021. Emergent Social Learning via Multi-agent Reinforcement Learning. arXiv:2010.00581 [cs, stat]. Search in Google Scholar

Niemiec, C. P., and Ryan, R. M. 2009. Autonomy, competence, and relatedness in the classroom: Applying self-determination theory to educational practice. Theory and Research in Education 7(2):133–144. Publisher: SAGE Publications. Search in Google Scholar

Niv, Y. 2009. Reinforcement learning in the brain. Journal of Mathematical Psychology 53(3):139–154. Search in Google Scholar

Niv, Y. 2019. Learning task-state representations. Nature Neuroscience 22(10):1544–1553. Search in Google Scholar

Olsson, A.; Knapska, E.; and Lindström, B. 2020. The neural and computational systems of social learning. Nature Reviews Neuroscience 21(4):197–212. Number: 4 Publisher: Nature Publishing Group. Search in Google Scholar

OpenAI. 2021. DALL·E: Creating images from text. https://openai.com/research/dall-e. Search in Google Scholar

OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt. Search in Google Scholar

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs]. Search in Google Scholar

Pathak, D.; Agrawal, P.; Efros, A. A.; and Darrell, T. 2017. Curiosity-Driven Exploration by Self-Supervised Prediction. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 488–489. Honolulu, HI, USA: IEEE. Search in Google Scholar

Pellis, S. M., and Burghardt, G. M. 2017. Play and exploration. In APA handbook of comparative psychology: Basic concepts, methods, neural substrate, and behavior, Vol. 1, APA handbooks in psychology®. Washington, DC, US: American Psychological Association. 699–722. Search in Google Scholar

Rakelly, K.; Zhou, A.; Finn, C.; Levine, S.; and Quillen, D. 2019. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. In Proceedings of the 36th International Conference on Machine Learning, 5331–5340. PMLR. ISSN: 2640-3498. Search in Google Scholar

Reddy, S.; Dragan, A. D.; and Levine, S. 2019. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. arXiv:1905.11108 [cs, stat]. Search in Google Scholar

Rescorla, R., and Wagner, A. 1972. A theory of Pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement. Classical Conditioning: Current Research and Theory. Search in Google Scholar

Rohani, S. R. R.; Hedayatian, S.; and Baghshah, M. S. 2022. BIMRL: Brain Inspired Meta Reinforcement Learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 9048–9053. arXiv:2210.16530 [cs]. Search in Google Scholar

Roitblat, H. 2021. Building artificial intelligence: Reward is not enough. https://bdtechtalks.com/2021/07/07/ai-reward-is-not-enough-herbert-roitblat/. Search in Google Scholar

Ross, S.; Gordon, G. J.; and Bagnell, J. A. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. arXiv:1011.0686 [cs, stat]. Search in Google Scholar

Russek, E. M.; Momennejad, I.; Botvinick, M. M.; Gershman, S. J.; and Daw, N. D. 2017. Predictive representations can link model-based reinforcement learning to model-free mechanisms. Technical report, bioRxiv. Section: New Results Type: article. Search in Google Scholar

Samborska, V.; Butler, J. L.; Walton, M. E.; Behrens, T. E. J.; and Akam, T. 2022. Complementary task representations in hippocampus and prefrontal cortex for generalizing the structure of problems. Nature Neuroscience 25(10):1314–1326. Number: 10 Publisher: Nature Publishing Group. Search in Google Scholar

Schultz, W.; Apicella, P.; and Ljungberg, T. 1993. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 13(3):900–913. Search in Google Scholar

Shapira, Z. 1976. Expectancy determinants of intrinsically motivated behavior. Journal of Personality and Social Psychology 34:1235–1244. Place: US Publisher: American Psychological Association. Search in Google Scholar

Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. Number: 7587 Publisher: Nature Publishing Group. Search in Google Scholar

Silver, D.; Singh, S.; Precup, D.; and Sutton, R. S. 2021. Reward is enough. Artificial Intelligence 299:103535. Search in Google Scholar

Singh, S.; Lewis, R. L.; Barto, A. G.; and Sorg, J. 2010. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective. IEEE Transactions on Autonomous Mental Development 2(2):70–82. Conference Name: IEEE Transactions on Autonomous Mental Development. Search in Google Scholar

Singh, S.; Lewis, R.; and Barto, A. 2009. Where Do Rewards Come From? Proceedings of the annual conference of the Cognitive Science Society 2601–2606. Search in Google Scholar

Stadie, B. C.; Abbeel, P.; and Sutskever, I. 2019. Third-Person Imitation Learning. arXiv:1703.01703 [cs]. Search in Google Scholar

Sutton, R. S. 1991. Dyna, an Integrated Architecture for Learning, Planning, and Reacting. Search in Google Scholar

Tricomi, E., and DePasque, S. 2016. The role of feedback in learning and motivation. Advances in Motivation and Achievement 19:175–202. Publisher: Emerald Group Publishing Ltd. Search in Google Scholar

Tschandl, P.; Rosendahl, C.; Akay, B. N.; Argenziano, G.; Blum, A.; Braun, R. P.; Cabo, H.; Gourhant, J.-Y.; Kreusch, J.; Lallas, A.; Lapins, J.; Marghoob, A.; Menzies, S.; Neuber, N. M.; Paoli, J.; Rabinovitz, H. S.; Rinner, C.; Scope, A.; Soyer, H. P.; Sinz, C.; Thomas, L.; Zalaudek, I.; and Kittler, H. 2019. Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks. JAMA Dermatology 155(1):58–65. Search in Google Scholar

Vamplew, P.; Smith, B. J.; Kallstrom, J.; Ramos, G.; Radulescu, R.; Roijers, D. M.; Hayes, C. F.; Heintz, F.; Mannion, P.; Libin, P. J. K.; Dazeley, R.; and Foale, C. 2021. Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021). arXiv:2112.15422 [cs]. arXiv: 2112.15422. Search in Google Scholar

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2023. Attention Is All You Need. arXiv:1706.03762 [cs]. Search in Google Scholar

Waltz, D. L. 1988. The Prospects for Building Truly Intelligent Machines. Daedalus 117(1):191–212. Publisher: The MIT Press. Search in Google Scholar

Wang, J. X.; Kurth-Nelson, Z.; Kumaran, D.; Tirumala, D.; Soyer, H.; Leibo, J. Z.; Hassabis, D.; and Botvinick, M. 2018. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience 21(6):860–868. Number: 6 Publisher: Nature Publishing Group. Search in Google Scholar

Yin, H. H.; Ostlund, S. B.; Knowlton, B. J.; and Balleine, B. W. 2005. The role of the dorsomedial striatum in instrumental conditioning. European Journal of Neuroscience 22(2):513–523. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1460-9568.2005.04218.x. Search in Google Scholar

Yin, H. H.; Knowlton, B. J.; and Balleine, B. W. 2004. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. European Journal of Neuroscience 19(1):181–189. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1460-9568.2004.03095.x. Search in Google Scholar

Zhang, S., and Sutton, R. S. 2018. A Deeper Look at Experience Replay. arXiv:1712.01275 [cs]. arXiv: 1712.01275. Search in Google Scholar

Zhou, J.; Jia, C.; Montesinos-Cartagena, M.; Gardner, M. P. H.; Zong, W.; and Schoenbaum, G. 2021. Evolving schema representations in orbitofrontal ensembles during learning. Nature 590(7847):606–611. Number: 7847 Publisher: Nature Publishing Group. Search in Google Scholar

Langue:: Anglais

Périodicité:: 2 fois par an
Sujets de la revue:: Informatique, Intelligence artificielle

RSS Feed de la revue

What’s Next if Reward is Enough? Insights for AGI from Animal Reinforcement Learning

Shreya Rajagopal

Publié en ligne: 15 déc. 2023

Pages: 15 - 40

Reçu: 17 août 2022

Accepté: 15 nov. 2023

DOI: https://doi.org/10.2478/jagi-2023-0002

Mots clésAnimal Learning, Artificial General Intelligence, Reinforcement Learning, Social Learning, Meta-Learning

© 2023 Shreya Rajagopal, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Animal Learning, Artificial General Intelligence, Reinforcement Learning, Social Learning, Meta-Learning