Accesso libero

An Active Exploration Method for Data Efficient Reinforcement Learning

International Journal of Applied Mathematics and Computer Science's Cover Image
International Journal of Applied Mathematics and Computer Science
Advances in Complex Cloud and Service Oriented Computing (special section, pp. 213-274), Anna Kobusińska, Ching-Hsien Hsu, Kwei-Jay Lin (Eds.)
INFORMAZIONI SU QUESTO ARTICOLO

Cita

Ahmed, N.A. and Gokhale, D. (1989). Entropy expressions and their estimators for multivariate distributions, IEEE Transactions on Information Theory35(3): 688–692.10.1109/18.30996Search in Google Scholar

Bagnell, J.A. and Schneider, J.G. (2001). Autonomous helicopter control using reinforcement learning policy search methods, IEEE International Conference on Robotics and Automation, Seoul, South Korea, Vol. 2, pp. 1615–1620.Search in Google Scholar

Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S. and Levine, S. (2017). Combining model-based and model-free updates for trajectory-centric reinforcement learning, arXiv:1703.03078.Search in Google Scholar

Deisenroth, M.P., Fox, D. and Rasmussen, C.E. (2015). Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence37(2): 408–423.10.1109/TPAMI.2013.21826353251Search in Google Scholar

Deisenroth, M. and Rasmussen, C.E. (2011). PILCO: A model-based and data-efficient approach to policy search, Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, pp. 465–472.Search in Google Scholar

Ebert, F., Finn, C., Lee, A.X. and Levine, S. (2017). Self-supervised visual planning with temporal skip connections, arXiv:1710.05268.Search in Google Scholar

Fabisch, A. and Metzen, J.H. (2014). Active contextual policy search, Journal of Machine Learning Research15(1): 3371–3399.Search in Google Scholar

Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion, arXiv:1610.00696.10.1109/ICRA.2017.7989324Search in Google Scholar

Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S. and Abbeel, P. (2015). Deep spatial autoencoders for visuomotor learning, arXiv:1509.06113.10.1109/ICRA.2016.7487173Search in Google Scholar

Gruslys, A., Azar, M.G., Bellemare, M.G. and Munos, R. (2017). The reactor: A sample-efficient actor-critic architecture, arXiv:1704.04651.Search in Google Scholar

Hayes, G. and Demiris, J. (1994). A robot controller using learning by imitation, International Symposium on Intelligent Robotic Systems676(5): 1257–1274.Search in Google Scholar

Levine, S., Finn, C., Darrell, T. and Abbeel, P. (2016). End-to-end training of deep visuomotor policies, Journal of Machine Learning Research17(1): 1334–1373.Search in Google Scholar

Nagabandi, A., Kahn, G., Fearing, R.S. and Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, arXiv:1708.02596.10.1109/ICRA.2018.8463189Search in Google Scholar

Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning, in M.H. Ang Jr. and O. Khatib (Eds.), Experimental Robotics IX, Springer, Berlin/Heidelberg, pp. 363–372.10.1007/11552246_35Search in Google Scholar

Pan, Y. and Theodorou, E.A. (2014). Probabilistic differential dynamic programming, Advances in Neural Information Processing Systems3: 1907–1915.Search in Google Scholar

Pan, Y., Theodorou, E.A. and Kontitsis, M. (2015). Sample efficient path integral control under uncertainty, Advances in Neural Information Processing Systems2015: 2314–2322.Search in Google Scholar

Price, B. and Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation, Journal of Artificial Intelligence Research19: 569–629.10.1613/jair.898Search in Google Scholar

Silver, D., Sutton, R.S. and Müller, M. (2008). Sample-based learning and search with permanent and transient memories, International Conference on Machine Learning, Helsinki, Finland, pp. 968–975.10.1145/1390156.1390278Search in Google Scholar

Sutton, R.S. (1988). Learning to predict by the methods of temporal differences, Machine Learning3(1): 9–44.10.1007/BF00115009Search in Google Scholar

Sutton, R.S. (1991). Dyna, an integrated architecture for learning, planning, and reacting, ACM Sigart Bulletin2(4): 160–163.10.1145/122344.122377Search in Google Scholar

eISSN:
2083-8492
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Mathematics, Applied Mathematics