[Ahmed, N.A. and Gokhale, D. (1989). Entropy expressions and their estimators for multivariate distributions, IEEE Transactions on Information Theory35(3): 688–692.10.1109/18.30996]Search in Google Scholar
[Bagnell, J.A. and Schneider, J.G. (2001). Autonomous helicopter control using reinforcement learning policy search methods, IEEE International Conference on Robotics and Automation, Seoul, South Korea, Vol. 2, pp. 1615–1620.]Search in Google Scholar
[Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S. and Levine, S. (2017). Combining model-based and model-free updates for trajectory-centric reinforcement learning, arXiv:1703.03078.]Search in Google Scholar
[Deisenroth, M.P., Fox, D. and Rasmussen, C.E. (2015). Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence37(2): 408–423.10.1109/TPAMI.2013.21826353251]Search in Google Scholar
[Deisenroth, M. and Rasmussen, C.E. (2011). PILCO: A model-based and data-efficient approach to policy search, Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, pp. 465–472.]Search in Google Scholar
[Ebert, F., Finn, C., Lee, A.X. and Levine, S. (2017). Self-supervised visual planning with temporal skip connections, arXiv:1710.05268.]Search in Google Scholar
[Fabisch, A. and Metzen, J.H. (2014). Active contextual policy search, Journal of Machine Learning Research15(1): 3371–3399.]Search in Google Scholar
[Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion, arXiv:1610.00696.10.1109/ICRA.2017.7989324]Search in Google Scholar
[Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S. and Abbeel, P. (2015). Deep spatial autoencoders for visuomotor learning, arXiv:1509.06113.10.1109/ICRA.2016.7487173]Search in Google Scholar
[Gruslys, A., Azar, M.G., Bellemare, M.G. and Munos, R. (2017). The reactor: A sample-efficient actor-critic architecture, arXiv:1704.04651.]Search in Google Scholar
[Hayes, G. and Demiris, J. (1994). A robot controller using learning by imitation, International Symposium on Intelligent Robotic Systems676(5): 1257–1274.]Search in Google Scholar
[Levine, S., Finn, C., Darrell, T. and Abbeel, P. (2016). End-to-end training of deep visuomotor policies, Journal of Machine Learning Research17(1): 1334–1373.]Search in Google Scholar
[Nagabandi, A., Kahn, G., Fearing, R.S. and Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, arXiv:1708.02596.10.1109/ICRA.2018.8463189]Search in Google Scholar
[Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning, in M.H. Ang Jr. and O. Khatib (Eds.), Experimental Robotics IX, Springer, Berlin/Heidelberg, pp. 363–372.10.1007/11552246_35]Search in Google Scholar
[Pan, Y. and Theodorou, E.A. (2014). Probabilistic differential dynamic programming, Advances in Neural Information Processing Systems3: 1907–1915.]Search in Google Scholar
[Pan, Y., Theodorou, E.A. and Kontitsis, M. (2015). Sample efficient path integral control under uncertainty, Advances in Neural Information Processing Systems2015: 2314–2322.]Search in Google Scholar
[Price, B. and Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation, Journal of Artificial Intelligence Research19: 569–629.10.1613/jair.898]Search in Google Scholar
[Silver, D., Sutton, R.S. and Müller, M. (2008). Sample-based learning and search with permanent and transient memories, International Conference on Machine Learning, Helsinki, Finland, pp. 968–975.10.1145/1390156.1390278]Search in Google Scholar
[Sutton, R.S. (1988). Learning to predict by the methods of temporal differences, Machine Learning3(1): 9–44.10.1007/BF00115009]Search in Google Scholar
[Sutton, R.S. (1991). Dyna, an integrated architecture for learning, planning, and reacting, ACM Sigart Bulletin2(4): 160–163.10.1145/122344.122377]Search in Google Scholar