[1. Barash, D.. A genetic search in policy space for solving Markov decision processes, - In: AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.]Search in Google Scholar
[2. Baxter, J., P. Bartlett. Direct gradient-based reinforcement learning, - In: Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 3, IEEE, 2000, 271-274.]Search in Google Scholar
[3. Brachetti, P., M. De Felice Ciccoli, G. Di Pillo, S. Lucidi. A new version of the Price’s algorithm for global optimization, Journal of Global Optimization, Vol. 10, No 2, 1997, 165-184.10.1023/A:1008250020656]Search in Google Scholar
[4. Carpin, S., M. Lewis, J. Wang, S. Balakirsky, C. Scrapper. Bridging the gap between simulation and reality in urban search and rescue, Robocup 2006: Robot Soccer World Cup X, 1-12.10.1007/978-3-540-74024-7_1]Search in Google Scholar
[5. Conn, A., K. Scheinberg, L. Vicente. Introduction to derivative-free optimization, Vol. 8, Society for Industrial Mathematics, 2009.10.1137/1.9780898718768]Search in Google Scholar
[6. Dorigo, M., M. Birattari, T. Stutzle. Ant colony optimization, IEEE Computational Intelligence Magazine, Vol. 1, No 4, 2006, 28-39.10.1109/CI-M.2006.248054]Search in Google Scholar
[7. Glover, F., M. Laguna. Tabu search, Vol. 1, Springer, 1998.10.1007/978-1-4615-6089-0_1]Search in Google Scholar
[8. Goldberg, D.. Genetic algorithms in search, optimization, and machine learning, Addisonwesley, 1989.]Search in Google Scholar
[9. Gomez, F., J. Schmidhuber, R. Miikkulainen. Efficient Non-Linear Control through Neuroevolution, - In: Proceedings of the European Conference on Machine Learning, Springer, Berlin, 2006, 654-662.10.1007/11871842_64]Search in Google Scholar
[10. Horst, R., P. Pardalos, N. Thoai. Introduction to global optimization, Springer, 2000.10.1007/978-1-4615-0015-5]Search in Google Scholar
[11. Jakobi, N., P. Husbands, I. Harvey. Noise and the reality gap: The use of simulation in evolutionary robotics, Advances in artificial life, 704-720.10.1007/3-540-59496-5_337]Search in Google Scholar
[12. Kakade, S.. A natural policy gradient, Advances in neural information processing systems, Vol. 14, 2001, 1531-1538.]Search in Google Scholar
[13. Kirkpatrick, S., C. Gelatt Jr, M. Vecchi. Optimization by simulated annealing, Science, Vol. 220, No 4598, 1983, 671-680.10.1126/science.220.4598.67117813860]Search in Google Scholar
[14. Kober, J., J. Peters. Policy search for motor primitives in robotics, Machine learning, Vol. 84, No 1, 2011, 171-203.10.1007/s10994-010-5223-6]Search in Google Scholar
[15. Kormushev, P., D. G. Caldwell. Simultaneous Discovery of Multiple Alternative Optimal Policies by Reinforcement Learning, - In: IEEE International Conference on Intelligent Systems (IS 2012), 2012.10.1109/IS.2012.6335136]Search in Google Scholar
[16. Lucidi, S., M. Sciandrone. On the global convergence of derivative-free methods for unconstrained optimization, SIAM Journal of Optimization, Vol. 13, No 1, 2002, 97-116.10.1137/S1052623497330392]Search in Google Scholar
[17. Peters, J., S. Schaal. Policy gradient methods for robotics, - In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2006, 2219-2225.10.1109/IROS.2006.282564]Search in Google Scholar
[18. Peters, J., S. Schaal. Reinforcement learning by reward-weighted regression for operational space control, - In: Proceedings of the 24th international conference on Machine learning, ACM, 2007, 745-750.10.1145/1273496.1273590]Search in Google Scholar
[19. Peters, J., S. Vijayakumar, S. Schaal. Natural Actor-Critic, - In: Proceedings of the 16th European Conference on Machine Learning (ECML), 2005, 280-291.10.1007/11564096_29]Search in Google Scholar
[20. Price, W.. Global optimization by controlled random search, Journal of Optimization Theory and Applications, Vol. 40, No 3, 1983, 333-348.10.1007/BF00933504]Search in Google Scholar
[21. Ribas, D., N. Palomeras, P. Ridao, M. Carreras, A. Mallios. Girona 500 AUV, from survey to intervention, IEEE/ASME Transactions on Mechatronics, Vol. 17, No 1, 2012, 46-53.10.1109/TMECH.2011.2174065]Search in Google Scholar
[22. Sutton, R., D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation, Advances in neural information processing systems, Vol. 12, No 22.]Search in Google Scholar
[23. Theodorou, E., J. Buchli, S. Schaal. A generalized path integral control approach to reinforcement learning, The Journal of Machine Learning Research, Vol. 9999, 2010, 3137-3181.]Search in Google Scholar
[24. Torczon, V., et al .. On the convergence of pattern search algorithms, SIAM Journal on optimization, Vol. 7, No 1, 1997, 1-25.10.1137/S1052623493250780]Search in Google Scholar
[25. Torn, A., A. Zilinska s. Global Optimization, Springer, 1989.]Search in Google Scholar
[26. Williams, R.. Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, Vol. 8, No 3, 1992, 229-256.10.1007/BF00992696]Search in Google Scholar