An Active Exploration Method for Data Efficient Reinforcement Learning

Ahmed, N.A. and Gokhale, D. (1989). Entropy expressions and their estimators for multivariate distributions, IEEE Transactions on Information Theory35(3): 688–692.10.1109/18.30996Search in Google Scholar

Bagnell, J.A. and Schneider, J.G. (2001). Autonomous helicopter control using reinforcement learning policy search methods, IEEE International Conference on Robotics and Automation, Seoul, South Korea, Vol. 2, pp. 1615–1620.Search in Google Scholar

Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S. and Levine, S. (2017). Combining model-based and model-free updates for trajectory-centric reinforcement learning, arXiv:1703.03078.Search in Google Scholar

Deisenroth, M.P., Fox, D. and Rasmussen, C.E. (2015). Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence37(2): 408–423.10.1109/TPAMI.2013.21826353251Search in Google Scholar

Deisenroth, M. and Rasmussen, C.E. (2011). PILCO: A model-based and data-efficient approach to policy search, Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, pp. 465–472.Search in Google Scholar

Ebert, F., Finn, C., Lee, A.X. and Levine, S. (2017). Self-supervised visual planning with temporal skip connections, arXiv:1710.05268.Search in Google Scholar

Fabisch, A. and Metzen, J.H. (2014). Active contextual policy search, Journal of Machine Learning Research15(1): 3371–3399.Search in Google Scholar

Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion, arXiv:1610.00696.10.1109/ICRA.2017.7989324Search in Google Scholar

Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S. and Abbeel, P. (2015). Deep spatial autoencoders for visuomotor learning, arXiv:1509.06113.10.1109/ICRA.2016.7487173Search in Google Scholar

Gruslys, A., Azar, M.G., Bellemare, M.G. and Munos, R. (2017). The reactor: A sample-efficient actor-critic architecture, arXiv:1704.04651.Search in Google Scholar

Hayes, G. and Demiris, J. (1994). A robot controller using learning by imitation, International Symposium on Intelligent Robotic Systems676(5): 1257–1274.Search in Google Scholar

Levine, S., Finn, C., Darrell, T. and Abbeel, P. (2016). End-to-end training of deep visuomotor policies, Journal of Machine Learning Research17(1): 1334–1373.Search in Google Scholar

Nagabandi, A., Kahn, G., Fearing, R.S. and Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, arXiv:1708.02596.10.1109/ICRA.2018.8463189Search in Google Scholar

Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning, in M.H. Ang Jr. and O. Khatib (Eds.), Experimental Robotics IX, Springer, Berlin/Heidelberg, pp. 363–372.10.1007/11552246_35Search in Google Scholar

Pan, Y. and Theodorou, E.A. (2014). Probabilistic differential dynamic programming, Advances in Neural Information Processing Systems3: 1907–1915.Search in Google Scholar

Pan, Y., Theodorou, E.A. and Kontitsis, M. (2015). Sample efficient path integral control under uncertainty, Advances in Neural Information Processing Systems2015: 2314–2322.Search in Google Scholar

Price, B. and Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation, Journal of Artificial Intelligence Research19: 569–629.10.1613/jair.898Search in Google Scholar

Silver, D., Sutton, R.S. and Müller, M. (2008). Sample-based learning and search with permanent and transient memories, International Conference on Machine Learning, Helsinki, Finland, pp. 968–975.10.1145/1390156.1390278Search in Google Scholar

Sutton, R.S. (1988). Learning to predict by the methods of temporal differences, Machine Learning3(1): 9–44.10.1007/BF00115009Search in Google Scholar

Sutton, R.S. (1991). Dyna, an integrated architecture for learning, planning, and reacting, ACM Sigart Bulletin2(4): 160–163.10.1145/122344.122377Search in Google Scholar

eISSN:: 2083-8492
Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Mathematics, Applied Mathematics

Feed RSS della rivista

An Active Exploration Method for Data Efficient Reinforcement Learning

Pubblicato online: 04 lug 2019

Pagine: 351 - 362

Ricevuto: 18 lug 2018

Accettato: 31 gen 2019

DOI: https://doi.org/10.2478/amcs-2019-0026

Parole chiavereinforcement learning, information entropy, PILCO, data efficiency

© 2019 Dongfang Zhao et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Parole chiave
reinforcement learning, information entropy, PILCO, data efficiency