Solving Finite-Horizon Discounted Non-Stationary MDPS

Allamigeon, X., Boyet, M., Gaubert, S. (2021). Piecewise Affine Dynamical Models of Petri Nets–Application to Emergency Call Centers. Fundamenta Informaticae, 183(3–4), 169–201. DOI: 10.3233/FI-2021-2086. Search in Google Scholar

Asadi, A., Pinkley, S.N., Mes, M. (2022). A Markov decision process approach for managing medical drone deliveries. Expert Systems With Applications, 204, 117490. DOI: 10.1016/j. eswa.2022.117490. Search in Google Scholar

Bellman, R. (1958). Dynamic programming and stochastic control processes. Information and Control, 1(3), 228–239. DOI: 10.1016/S0019-9958(58)80003-0. Search in Google Scholar

Bertsekas, D. (2012). Dynamic programming and optimal control: Volume I (vol. 1). Athena scientific. Search in Google Scholar

Bertsimas, D., Mišić, V.V. (2016). Decomposable markov decision processes: A fluid optimization approach. Operations Research, 64(6), 1537–1555. DOI: 10.1287/opre.2016.1531. Search in Google Scholar

Dulac-Arnold, G., Levine, N., Mankowitz, D.J., Li, J., Paduraru, C., Gowal, S., Hester, T. (2021). Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis. Machine Learning, 110(9), 2419–2468. DOI: 10.1007/s10994-021-05961-4. Search in Google Scholar

El Akraoui, B., Daoui, C., Larach, A. (2022). Decomposition Methods for Solving Finite-Horizon Large MDPs. Journal of Mathematics, 2022. DOI: 10.1155/2022/8404716. Search in Google Scholar

Emadi, H., Atkins, E., Rastgoftar, H. (2022). A Finite-State Fixed-Corridor Model for UAS Traffic Management. ArXiv Preprint ArXiv:2204.05517. Search in Google Scholar

Feinberg, E.A. (2016). Optimality conditions for inventory control. In Optimization Challenges in Complex, Networked and Risky Systems (pp. 14–45). INFORMS. DOI: 10.1287/educ.2016.0145. Search in Google Scholar

Hordijk, A., Kallenberg, L.C.M. (1984). Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming, 30(1), 46–70. DOI: 10.1007/BF02591798. Search in Google Scholar

Howard, R.A. (1960). Dynamic programming and markov processes. MIT Press, Cambridge, MA. https://books.google.co.ma/books?id=fXJEAAAAIAAJ. Search in Google Scholar

Kallenberg, L.C.M. (1983). Linear programming and finite Markovian control problems, Math. Centre Tracts, 148, 1–245. Search in Google Scholar

Larach, A., Chafik, S., Daoui, C. (2017). Accelerated decomposition techniques for large discounted Markov decision processes. Journal of Industrial Engineering International, 13(4), 417–426. DOI: 10.1007/s40092-017-0197-7. Search in Google Scholar

Mao, W., Zheng, Z., Wu, F., Chen, G. (2018). Online Pricing for Revenue Maximization with Unknown Time Discounting Valuations. IJCAI, 440–446. DOI: 10.24963/ijcai.2018/61. Search in Google Scholar

Pavitsos, A., Kyriakidis, E.G. (2009). Markov decision models for the optimal maintenance of a production unit with an upstream buffer. Computers & Operations Research, 36(6), 1993–2006. DOI: 10.1016/j.cor.2008.06.014. Search in Google Scholar

Peng, H., Cheng, Y., Li, X. (2023). Real-Time Pricing Method for Spot Cloud Services with Non-Stationary Excess Capacity. Sustainability, 15(4), 3363. Search in Google Scholar

Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons Inc. DOI: 10.1002/9780470316887. Search in Google Scholar

Rimélé, A., Grangier, P., Gamache, M., Gendreau, M., Rousseau, L.-M. (2021). E-commerce warehousing: Learning a storage policy. ArXiv Preprint ArXiv:2101.08828. DOI: 10.48550/arXiv.2101.08828. Search in Google Scholar

Spieksma, F., Nunez-Queija, R. (2015). Markov Decision Processes. Adaptation of the Text by R. Nunez-Queija, 55. Search in Google Scholar

Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. DOI: 10.1007/BF00115009. Search in Google Scholar

White III, C.C., White, D.J. (1989). Markov decision processes. European Journal of Operational Research, 39(1), 1–16. DOI: 10.1016/0377-2217(89)90348-2. Search in Google Scholar

Wu, Y., Zhang, J., Ravey, A., Chrenko, D., Miraoui, A. (2020). Real-time energy management of photovoltaic-assisted electric vehicle charging station by markov decision process. Journal of Power Sources, 476, 228504. Search in Google Scholar

Ye, G., Lin, Q., Juang, T.-H., Liu, H. (2020). Collision-free Navigation of Human-centered Robots via Markov Games. 2020 IEEE International Conference on Robotics and Automation (ICRA), 11338–11344. DOI: 10.1109/ICRA40945.2020.9196810. Search in Google Scholar

Ye, Y. (2011). The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4), 593–603. DOI: 10.1287/moor.1110.0516. Search in Google Scholar

Zhang, Y., Kim, C.-W., Tee, K.F. (2017). Maintenance management of offshore structures using Markov process model with random transition probabilities. Structure and Infrastructure Engineering, 13(8), 1068–1080. DOI: 10.1080/15732479.2016.1236393. Search in Google Scholar

eISSN:: 1898-0198
Idioma:: Inglés

Calendario de la edición:: 2 veces al año
Temas de la revista:: Business and Economics, Political Economics, other

RSS Feed de revista

Solving Finite-Horizon Discounted Non-Stationary MDPS

Publicado en línea: 09 jun 2023

Páginas: 1 - 15

Aceptado: 26 feb 2023

DOI: https://doi.org/10.2478/foli-2023-0001

Palabras claveMarkov Decision Process, Dynamic Programming, Backward Induction algorithm

© 2023 El Akraoui Bouchra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Palabras clave
Markov Decision Process, Dynamic Programming, Backward Induction algorithm