[
Adam, S.; Busoniu, L.; and Babuska, R. 2012. Experience Replay for Real-Time Reinforcement Learning Control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(2):201–212. Conference Name: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
]Search in Google Scholar
[
Adler, A.; Katabi, S.; Finkes, I.; Israel, Z.; Prut, Y.; and Bergman, H. 2012. Temporal Convergence of Dynamic Cell Assemblies in the Striato-Pallidal Network. Journal of Neuroscience 32(7):2473–2484. Publisher: Society for Neuroscience Section: Articles.
]Search in Google Scholar
[
Alayrac, J.-B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; Ring, R.; Rutherford, E.; Cabi, S.; Han, T.; Gong, Z.; Samangooei, S.; Monteiro, M.; Menick, J.; Borgeaud, S.; Brock, A.; Nematzadeh, A.; Sharifzadeh, S.; Binkowski, M.; Barreira, R.; Vinyals, O.; Zisserman, A.; and Simonyan, K. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. arXiv:2204.14198 [cs].
]Search in Google Scholar
[
Alonso, E., and Schmajuk, N. 2012. Special issue on computational models of classical conditioning guest editors’ introduction. Learning & Behavior 40(3):231–240.
]Search in Google Scholar
[
Balleine, B. W., and O’Doherty, J. P. 2010. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action. Neuropsychopharmacology 35(1):48–69. Number: 1 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Barto, A. G. Adaptive Critics and the Basal Ganglia. 20.
]Search in Google Scholar
[
Barto, A. G. 2013. Intrinsic Motivation and Reinforcement Learning. In Baldassarre, G., and Mirolli, M., eds., Intrinsically Motivated Learning in Natural and Artificial Systems. Berlin, Heidelberg: Springer. 17–47.
]Search in Google Scholar
[
Boyd, R.; Richerson, P. J.; and Henrich, J. 2011. The cultural niche: Why social learning is essential for human adaptation. Proceedings of the National Academy of Sciences 108(supplement 2):10918–10925. Publisher: Proceedings of the National Academy of Sciences.
]Search in Google Scholar
[
Bramlage, L., and Cortese, A. 2022. Generalized attention-weighted reinforcement learning. Neural Networks 145:10–21.
]Search in Google Scholar
[
Buetti-Dinh, A.; Galli, V.; Bellenberg, S.; Ilie, O.; Herold, M.; Christel, S.; Boretska, M.; Pivkin, I. V.; Wilmes, P.; Sand, W.; Vera, M.; and Dopson, M. 2019. Deep neural networks outperform human expert’s capacity in characterizing bioleaching bacterial biofilm composition. Biotechnology Reports 22:e00321.
]Search in Google Scholar
[
Byrnes, S. 2021. Reward Is Not Enough - LessWrong.
]Search in Google Scholar
[
Chang, S. W. C.; Winecoff, A. A.; and Platt, M. L. 2011. Vicarious reinforcement in rhesus macaques (macaca mulatta). Frontiers in Neuroscience 5:27.
]Search in Google Scholar
[
Cheng, C.-A.; Kolobov, A.; and Agarwal, A. 2020. Policy Improvement via Imitation of Multiple Oracles. arXiv:2007.00795 [cs, stat].
]Search in Google Scholar
[
Chentanez, N.; Barto, A.; and Singh, S. 2004. Intrinsically Motivated Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 17. MIT Press.
]Search in Google Scholar
[
Cook, M.; Mineka, S.; Wolkenstein, B.; and Laitsch, K. 1985. Observational conditioning of snake fear in unrelated rhesus monkeys. Journal of Abnormal Psychology 94(4):591–610. Place: US Publisher: American Psychological Association.
]Search in Google Scholar
[
Danner, F. W., and Lonky, E. 1981. A Cognitive-Developmental Approach to the Effects of Rewards on Intrinsic Motivation. Child Development 52(3):1043–1052. Publisher: [Wiley, Society for Research in Child Development].
]Search in Google Scholar
[
Daw, N. D.; Courville, A. C.; and Touretzky, D. S. 2006. Representation and Timing in Theories of the Dopamine System. Neural Computation 18(7):1637–1677.
]Search in Google Scholar
[
Daw, N. D.; Niv, Y.; and Dayan, P. 2005. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8(12):1704–1711. Number: 12 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Dawson, E. H.; Avarguès-Weber, A.; Chittka, L.; and Leadbeater, E. 2013. Learning by Observation Emerges from Simple Associations in an Insect Model. Current Biology 23(8):727–730.
]Search in Google Scholar
[
de Bruin, T.; Tuyls, K.; Kober, J.; and Babuška, R. 2015. The importance of experience replay database composition in deep reinforcement learning. 9.
]Search in Google Scholar
[
Deci, E. L., and Ryan, R. M. 1985. Conceptualizations of Intrinsic Motivation and Self-Determination. In Deci, E. L., and Ryan, R. M., eds., Intrinsic Motivation and Self-Determination in Human Behavior, Perspectives in Social Psychology. Boston, MA: Springer US. 11–40.
]Search in Google Scholar
[
DeYoung, C. G. 2013. The neuromodulator of exploration: A unifying theory of the role of dopamine in personality. Frontiers in Human Neuroscience 7. Place: Switzerland Publisher: Frontiers Media S.A.
]Search in Google Scholar
[
Di Domenico, S. I., and Ryan, R. M. 2017. The Emerging Neuroscience of Intrinsic Motivation: A New Frontier in Self-Determination Research. Frontiers in Human Neuroscience 11:145.
]Search in Google Scholar
[
Doll, B. B.; Simon, D. A.; and Daw, N. D. 2012. The ubiquity of model-based reinforcement learning. Current Opinion in Neurobiology 22(6):1075–1081.
]Search in Google Scholar
[
Fiorito, G., and Scotto, P. 1992. Observational Learning in Octopus vulgaris. Science 256(5056):545–547. Publisher: American Association for the Advancement of Science.
]Search in Google Scholar
[
Fjelland, R. 2020. Why general artificial intelligence will not be realized. Humanities and Social Sciences Communications 7(1):1–9. Number: 1 Publisher: Palgrave.
]Search in Google Scholar
[
Forestier, S.; Portelas, R.; Mollard, Y.; and Oudeyer, P.-Y. 2022. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. arXiv:1708.02190 [cs].
]Search in Google Scholar
[
Foster, D. J., and Wilson, M. A. 2006. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084):680–683.
]Search in Google Scholar
[
Gershman, S. J., and Niv, Y. 2012. Exploring a latent cause theory of classical conditioning. Learning & Behavior 40(3):255–268.
]Search in Google Scholar
[
Gershman, S. J.; Markman, A. B.; and Otto, A. R. 2014. Retrospective revaluation in sequential decision making: a tale of two systems. Journal of Experimental Psychology. General 143(1):182–194.
]Search in Google Scholar
[
Gershman, S. J.; Moustafa, A. A.; and Ludvig, E. A. 2014. Time representation in reinforcement learning models of the basal ganglia. Frontiers in Computational Neuroscience 7:194.
]Search in Google Scholar
[
Gershman, S. J.; Norman, K. A.; and Niv, Y. 2015. Discovering latent causes in reinforcement learning. Current Opinion in Behavioral Sciences 5:43–50.
]Search in Google Scholar
[
Glimcher, P. W. 2011. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences of the United States of America 108 Suppl 3:15647–15654.
]Search in Google Scholar
[
Gupta, A.; Mendonca, R.; Liu, Y.; Abbeel, P.; and Levine, S. 2018. Meta-Reinforcement Learning of Structured Exploration Strategies. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
]Search in Google Scholar
[
Gurland, S. T., and Grolnick, W. S. 2003. Children’s Expectancies and Perceptions of Adults: Effects on Rapport. Child Development 74:1212–1224. Place: United Kingdom Publisher: Blackwell Publishing.
]Search in Google Scholar
[
Harlow, H. F. 1950. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. Journal of Comparative and Physiological Psychology 43:289–294. Place: US Publisher: American Psychological Association.
]Search in Google Scholar
[
Heyes, C. 2012. What’s social about social learning? Journal of Comparative Psychology 126(2):193–202. Place: US Publisher: American Psychological Association.
]Search in Google Scholar
[
Ho-Phuoc, T. 2019. CIFAR10 to Compare Visual Recognition Performance between Deep Neural Networks and Humans. arXiv:1811.07270 [cs]. arXiv: 1811.07270.
]Search in Google Scholar
[
Holland, P. C. 2004. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. Journal of Experimental Psychology. Animal Behavior Processes 30(2):104–117.
]Search in Google Scholar
[
Houthooft, R.; Chen, X.; Duan, Y.; Schulman, J.; De Turck, F.; and Abbeel, P. 2017. VIME: Variational Information Maximizing Exploration. arXiv:1605.09674 [cs, stat]. arXiv: 1605.09674.
]Search in Google Scholar
[
Jones, S. H.; Gray, J. A.; and Hemsley, D. R. 1990. The Kamin blocking effect, incidental learning and psychoticism. British Journal of Psychology (London, England: 1953) 81 ( Pt 1):95–109.
]Search in Google Scholar
[
Kahneman, D. 2011. Thinking, fast and slow. Thinking, fast and slow. New York, NY, US: Farrar, Straus and Giroux. Pages: 499.
]Search in Google Scholar
[
Leadbeater, E., and Dawson, E. H. 2017. A social insect perspective on the evolution of social learning mechanisms. Proceedings of the National Academy of Sciences 114(30):7838–7845. Publisher: Proceedings of the National Academy of Sciences.
]Search in Google Scholar
[
Leong, Y. C.; Radulescu, A.; Daniel, R.; DeWoskin, V.; and Niv, Y. 2017. Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments. Neuron 93(2):451–463.
]Search in Google Scholar
[
Lind, J.; Ghirlanda, S.; and Enquist, M. 2019. Social learning through associative processes: a computational theory. Royal Society Open Science 6(3):181777.
]Search in Google Scholar
[
Ludvig, E. A.; Sutton, R. S.; and Kehoe, E. J. 2008. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System. Neural Computation 20(12):3034–3054.
]Search in Google Scholar
[
Ludvig, E. A.; Sutton, R. S.; and Kehoe, E. J. 2012. Evaluating the TD model of classical conditioning. Learning & Behavior 40(3):305–319.
]Search in Google Scholar
[
Maia, T. V. 2009. Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9(4):343–364.
]Search in Google Scholar
[
Mohamed, S., and Rezende, D. J. 2015. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. arXiv:1509.08731 [cs, stat]. arXiv: 1509.08731.
]Search in Google Scholar
[
Momennejad, I.; Russek, E. M.; Cheong, J. H.; Botvinick, M. M.; Daw, N. D.; and Gershman, S. J. 2017. The successor representation in human reinforcement learning. Nature Human Behaviour 1(9):680–692. Number: 9 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Montague, P. R.; Dayan, P.; Person, C.; and Sejnowski, T. J. 1995. Bee foraging in uncertain environments using predictive hebbian learning. Nature 377(6551):725–728. Number: 6551 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Nagabandi, A.; Clavera, I.; Liu, S.; Fearing, R. S.; Abbeel, P.; Levine, S.; and Finn, C. 2019. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv:1803.11347 [cs, stat].
]Search in Google Scholar
[
Ndousse, K.; Eck, D.; Levine, S.; and Jaques, N. 2021. Emergent Social Learning via Multi-agent Reinforcement Learning. arXiv:2010.00581 [cs, stat].
]Search in Google Scholar
[
Niemiec, C. P., and Ryan, R. M. 2009. Autonomy, competence, and relatedness in the classroom: Applying self-determination theory to educational practice. Theory and Research in Education 7(2):133–144. Publisher: SAGE Publications.
]Search in Google Scholar
[
Niv, Y. 2009. Reinforcement learning in the brain. Journal of Mathematical Psychology 53(3):139–154.
]Search in Google Scholar
[
Niv, Y. 2019. Learning task-state representations. Nature Neuroscience 22(10):1544–1553.
]Search in Google Scholar
[
Olsson, A.; Knapska, E.; and Lindström, B. 2020. The neural and computational systems of social learning. Nature Reviews Neuroscience 21(4):197–212. Number: 4 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
OpenAI. 2021. DALL·E: Creating images from text. https://openai.com/research/dall-e.
]Search in Google Scholar
[
OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt.
]Search in Google Scholar
[
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs].
]Search in Google Scholar
[
Pathak, D.; Agrawal, P.; Efros, A. A.; and Darrell, T. 2017. Curiosity-Driven Exploration by Self-Supervised Prediction. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 488–489. Honolulu, HI, USA: IEEE.
]Search in Google Scholar
[
Pellis, S. M., and Burghardt, G. M. 2017. Play and exploration. In APA handbook of comparative psychology: Basic concepts, methods, neural substrate, and behavior, Vol. 1, APA handbooks in psychology®. Washington, DC, US: American Psychological Association. 699–722.
]Search in Google Scholar
[
Rakelly, K.; Zhou, A.; Finn, C.; Levine, S.; and Quillen, D. 2019. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. In Proceedings of the 36th International Conference on Machine Learning, 5331–5340. PMLR. ISSN: 2640-3498.
]Search in Google Scholar
[
Reddy, S.; Dragan, A. D.; and Levine, S. 2019. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. arXiv:1905.11108 [cs, stat].
]Search in Google Scholar
[
Rescorla, R., and Wagner, A. 1972. A theory of Pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement. Classical Conditioning: Current Research and Theory.
]Search in Google Scholar
[
Rohani, S. R. R.; Hedayatian, S.; and Baghshah, M. S. 2022. BIMRL: Brain Inspired Meta Reinforcement Learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 9048–9053. arXiv:2210.16530 [cs].
]Search in Google Scholar
[
Roitblat, H. 2021. Building artificial intelligence: Reward is not enough. https://bdtechtalks.com/2021/07/07/ai-reward-is-not-enough-herbert-roitblat/.
]Search in Google Scholar
[
Ross, S.; Gordon, G. J.; and Bagnell, J. A. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. arXiv:1011.0686 [cs, stat].
]Search in Google Scholar
[
Russek, E. M.; Momennejad, I.; Botvinick, M. M.; Gershman, S. J.; and Daw, N. D. 2017. Predictive representations can link model-based reinforcement learning to model-free mechanisms. Technical report, bioRxiv. Section: New Results Type: article.
]Search in Google Scholar
[
Samborska, V.; Butler, J. L.; Walton, M. E.; Behrens, T. E. J.; and Akam, T. 2022. Complementary task representations in hippocampus and prefrontal cortex for generalizing the structure of problems. Nature Neuroscience 25(10):1314–1326. Number: 10 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Schultz, W.; Apicella, P.; and Ljungberg, T. 1993. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 13(3):900–913.
]Search in Google Scholar
[
Shapira, Z. 1976. Expectancy determinants of intrinsically motivated behavior. Journal of Personality and Social Psychology 34:1235–1244. Place: US Publisher: American Psychological Association.
]Search in Google Scholar
[
Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. Number: 7587 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Silver, D.; Singh, S.; Precup, D.; and Sutton, R. S. 2021. Reward is enough. Artificial Intelligence 299:103535.
]Search in Google Scholar
[
Singh, S.; Lewis, R. L.; Barto, A. G.; and Sorg, J. 2010. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective. IEEE Transactions on Autonomous Mental Development 2(2):70–82. Conference Name: IEEE Transactions on Autonomous Mental Development.
]Search in Google Scholar
[
Singh, S.; Lewis, R.; and Barto, A. 2009. Where Do Rewards Come From? Proceedings of the annual conference of the Cognitive Science Society 2601–2606.
]Search in Google Scholar
[
Stadie, B. C.; Abbeel, P.; and Sutskever, I. 2019. Third-Person Imitation Learning. arXiv:1703.01703 [cs].
]Search in Google Scholar
[
Sutton, R. S. 1991. Dyna, an Integrated Architecture for Learning, Planning, and Reacting.
]Search in Google Scholar
[
Tricomi, E., and DePasque, S. 2016. The role of feedback in learning and motivation. Advances in Motivation and Achievement 19:175–202. Publisher: Emerald Group Publishing Ltd.
]Search in Google Scholar
[
Tschandl, P.; Rosendahl, C.; Akay, B. N.; Argenziano, G.; Blum, A.; Braun, R. P.; Cabo, H.; Gourhant, J.-Y.; Kreusch, J.; Lallas, A.; Lapins, J.; Marghoob, A.; Menzies, S.; Neuber, N. M.; Paoli, J.; Rabinovitz, H. S.; Rinner, C.; Scope, A.; Soyer, H. P.; Sinz, C.; Thomas, L.; Zalaudek, I.; and Kittler, H. 2019. Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks. JAMA Dermatology 155(1):58–65.
]Search in Google Scholar
[
Vamplew, P.; Smith, B. J.; Kallstrom, J.; Ramos, G.; Radulescu, R.; Roijers, D. M.; Hayes, C. F.; Heintz, F.; Mannion, P.; Libin, P. J. K.; Dazeley, R.; and Foale, C. 2021. Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021). arXiv:2112.15422 [cs]. arXiv: 2112.15422.
]Search in Google Scholar
[
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2023. Attention Is All You Need. arXiv:1706.03762 [cs].
]Search in Google Scholar
[
Waltz, D. L. 1988. The Prospects for Building Truly Intelligent Machines. Daedalus 117(1):191–212. Publisher: The MIT Press.
]Search in Google Scholar
[
Wang, J. X.; Kurth-Nelson, Z.; Kumaran, D.; Tirumala, D.; Soyer, H.; Leibo, J. Z.; Hassabis, D.; and Botvinick, M. 2018. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience 21(6):860–868. Number: 6 Publisher: Nature Publishing Group.
]Search in Google Scholar
[
Yin, H. H.; Ostlund, S. B.; Knowlton, B. J.; and Balleine, B. W. 2005. The role of the dorsomedial striatum in instrumental conditioning. European Journal of Neuroscience 22(2):513–523. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1460-9568.2005.04218.x.
]Search in Google Scholar
[
Yin, H. H.; Knowlton, B. J.; and Balleine, B. W. 2004. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. European Journal of Neuroscience 19(1):181–189. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1460-9568.2004.03095.x.
]Search in Google Scholar
[
Zhang, S., and Sutton, R. S. 2018. A Deeper Look at Experience Replay. arXiv:1712.01275 [cs]. arXiv: 1712.01275.
]Search in Google Scholar
[
Zhou, J.; Jia, C.; Montesinos-Cartagena, M.; Gardner, M. P. H.; Zong, W.; and Schoenbaum, G. 2021. Evolving schema representations in orbitofrontal ensembles during learning. Nature 590(7847):606–611. Number: 7847 Publisher: Nature Publishing Group.
]Search in Google Scholar