Solving Finite-Horizon Discounted Non-Stationary MDPS

Markov Decision Processes (MDPs) are a powerful framework for modeling many real-world problems with finite-horizons that maximize the reward given a sequence of actions. Although many problems such as investment and financial market problems where the value of a reward decreases exponentially with time, require the introduction of interest rates.

Purpose

This study investigates non-stationary finite-horizon MDPs with a discount factor to account for fluctuations in rewards over time.

Research methodology

To consider the fluctuations of rewards with time, the authors define new nonstationary finite-horizon MDPs with a discount factor. First, the existence of an optimal policy for the proposed finite-horizon discounted MDPs is proven. Next, a new Discounted Backward Induction (DBI) algorithm is presented to find it. To enhance the value of their proposal, a financial model is used as an example of a finite-horizon discounted MDP and an adaptive DBI algorithm is used to solve it.

Results

The proposed method calculates the optimal values of the investment to maximize its expected total return with consideration of the time value of money.

Novelty

No existing studies have before examined dynamic finite-horizon problems that account for temporal fluctuations in rewards.

Lingua:: Inglese

Frequenza di pubblicazione:: 2 volte all'anno
Argomenti della rivista:: Economia e business, Economia politica, Economia politica, altro

Feed RSS della rivista

Solving Finite-Horizon Discounted Non-Stationary MDPS

El Akraoui Bouchra

Cherki Daoui

Pubblicato online: 09 giu 2023

Pagine: 1 - 15

Accettato: 26 feb 2023

DOI: https://doi.org/10.2478/foli-2023-0001

Parole chiaveMarkov Decision Process, Dynamic Programming, Backward Induction algorithm

© 2023 El Akraoui Bouchra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Research background

Purpose

Research methodology

Results

Novelty

Parole chiave
Markov Decision Process, Dynamic Programming, Backward Induction algorithm