Metrics for Assessing Generalization of Deep Reinforcement Learning in Parameterized Environments

[1] Robert Kirk, Amy Zhang, Edward Grefenstette, and Tim Rockt¨aschel. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning. Journal of Artificial Intelligence Research, 76: 201–264, January 2023. ISSN 1076-9757. doi:10.1613/jair.1.14174. Search in Google Scholar

[2] Katsuhiko Ogata. Modern Control Engineering. Prentice Hall, 2010. ISBN 978-0-13-615673-4. Search in Google Scholar

[3] Richard S. Sutton and Andrew G. Barto. Sutton & Barto Book: Reinforcement Learning: An Introduction. 2018. ISBN 978-0-262-03924-6. Search in Google Scholar

[4] Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control. 2019. ISBN 978-1-886529-39-7. Search in Google Scholar

[5] Hiroki Furuta and et al. Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning. In Proceedings of the 38th International Conference on Machine Learning, pages 3541–3552. PMLR, July 2021. Search in Google Scholar

[6] Richard S. Sutton, Michael H. Bowling, and Patrick M. Pilarski. The Alberta Plan for AI Research, August 2022. Search in Google Scholar

[7] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms. August 2017. Search in Google Scholar

[8] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust Region Policy Optimization. April 2017. Search in Google Scholar

[9] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat], August 2018. Search in Google Scholar

[10] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs, stat], July 2019. Search in Google Scholar

[11] Assaf Hallak, Dotan Di Castro, and Shie Mannor. Contextual Markov Decision Processes, February 2015. Search in Google Scholar

[12] Dibya Ghosh, Jad Rahme, Aviral Kumar, Amy Zhang, Ryan P. Adams, and Sergey Levine. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability, July 2021. Search in Google Scholar

[13] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv:1606.01540 [cs], June 2016. Search in Google Scholar

[14] Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess, and Yuval Tassa. Dm control: Software and tasks for continuous control. Software Impacts, 6: 100022, November 2020. ISSN 26659638. doi:10.1016/j.simpa.2020.100022. Search in Google Scholar

[15] Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. Leveraging Procedural Generation to Benchmark Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, pages 2048–2056. PMLR, November 2020. Search in Google Scholar

[16] Kevin Frans and Phillip Isola. Powderworld: A Platform for Understanding Generalization via Rich Task Distributions, November 2022. Search in Google Scholar

[17] Farama Foundation. Gymnasium, 2023. URL https://gymnasium.farama.org/. Search in Google Scholar

[18] Sumukh Aithal K, Dhruva Kashyap, and Natarajan Subramanyam. Robustness to Augmentations as a Generalization metric. arXiv:2101.06459 [cs], January 2021. Search in Google Scholar

[19] OpenAI, Ilge Akkaya, and et al. Solving Rubik’s Cube with a Robot Hand, October 2019. Search in Google Scholar

[20] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic Algorithms and Applications. arXiv:1812.05905 [cs, stat], January 2019. Search in Google Scholar

[21] Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, and Shimon Whiteson. A Survey of Meta-Reinforcement Learning, January 2023. Search in Google Scholar

[22] Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ahenbühl, Vladlen Koltun, and Dawn Song. Assessing Generalization in Deep Reinforcement Learning. March 2019. Search in Google Scholar

[23] Jianda Chen and Sinno Pan. Learning representations via a robust behavioral metric for deep reinforcement learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 36654–36666. Curran Associates, Inc., 2022. Search in Google Scholar

[24] Sam Witty, Jun K. Lee, Emma Tosch, Akanksha Atrey, Kaleigh Clary, Michael L. Littman, and David Jensen. Measuring and characterizing generalization in deep reinforcement learning. Applied AI Letters, 2(4), December 2021. ISSN 2689-5595, 2689-5595. doi:10.1002/ail2.45. Search in Google Scholar

[25] Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. Quantifying Generalization in Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, pages 1282–1289. PMLR, May 2019. Search in Google Scholar

[26] Qucheng Peng, Zhengming Ding, Lingjuan Lyu, Lichao Sun, and Chen Chen. RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 4118–4126. International Joint Conferences on Artificial Intelligence Organization,. ISBN 978-1-956792-03-4. doi:10.24963/ijcai.2023/458. URL https://www.ijcai.org/proceedings/2023/458 . Search in Google Scholar

[27] Qucheng Peng, Ce Zheng, and Chen Chen. Source-free Domain Adaptive Human Pose Estimation. pages 4826–4836. URL https://openaccess.thecvf.com/content/ICCV2023/html/Peng_Source-free_Domain_Adaptive_Human_Pose_Estimation_ICCV_2023_paper.html. Search in Google Scholar

[28] Xingyou Song, Yilun Du, and Jacob Jackson. An Empirical Study on Hyperparameters and their Interdependence for RL Generalization. June 2019. Search in Google Scholar

[29] Aravind Rajeswaran, Kendall Lowrey, Emanuel V. Todorov, and Sham M Kakade. Towards Generalization and Simplicity in Continuous Control. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. Search in Google Scholar

[30] Philipp Moritz and et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI’18, pages 561–577, USA, October 2018. USENIX Association. ISBN 978-1-931971-47-8. Search in Google Scholar

[31] Stephanie C. Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, and Sergio Guadarrama. Measuring the Reliability of Reinforcement Learning Algorithms, February 2020. Search in Google Scholar

[32] Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning. In Deep RL Workshop NeurIPS 2021, 2021. Search in Google Scholar

[33] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering Diverse Domains through World Models, January 2023. Search in Google Scholar

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Databases and Data Mining, Artificial Intelligence

Journal RSS Feed

Metrics for Assessing Generalization of Deep Reinforcement Learning in Parameterized Environments

Maciej Aleksandrowicz

Joanna Jaworek-Korjakowska

Published Online: Dec 25, 2023

Page range: 45 - 61

Received: Jun 24, 2023

Accepted: Oct 19, 2023

DOI: https://doi.org/10.2478/jaiscr-2024-0003

Keywordsdeep reinforcement learning, optimization, generalization, Sim2Sim transfer, adaptation

© 2024 Maciej Aleksandrowicz et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
deep reinforcement learning, optimization, generalization, Sim2Sim transfer, adaptation