Feature Reinforcement Learning: Part II. Structured MDPs

The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.

eISSN:: 1946-0163
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 2 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Informatik, Künstliche Intelligenz

Zeitschrift RSS Feed

Feature Reinforcement Learning: Part II. Structured MDPs

Online veröffentlicht: 14. Juni 2021

Seitenbereich: 71 - 86

Eingereicht: 21. Okt. 2020

Akzeptiert: 06. Apr. 2021

DOI: https://doi.org/10.2478/jagi-2021-0003

SchlüsselwörterReinforcement learning, dynamic Bayesian network, structure learning, feature selection, global vs. local reward, explore-exploit, information & complexity, rational agents, partial observability

© 2021 Marcus Hutter, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Schlüsselwörter
Reinforcement learning, dynamic Bayesian network, structure learning, feature selection, global vs. local reward, explore-exploit, information & complexity, rational agents, partial observability