1. bookVolume 12 (2021): Issue 1 (January 2021)
Journal Details
License
Format
Journal
eISSN
1946-0163
First Published
23 Nov 2011
Publication timeframe
2 times per year
Languages
English
access type Open Access

Feature Reinforcement Learning: Part II. Structured MDPs

Published Online: 14 Jun 2021
Page range: 71 - 86
Received: 21 Oct 2020
Accepted: 06 Apr 2021
Journal Details
License
Format
Journal
eISSN
1946-0163
First Published
23 Nov 2011
Publication timeframe
2 times per year
Languages
English
Abstract

The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.

Keywords

Bertsekas, D. P., and Tsitsiklis, J. N. 1996. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific. Search in Google Scholar

Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer. Search in Google Scholar

Boutilier, C.; Dean, T.; and Hanks, S. 1999. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research 11:1–94.10.1613/jair.575 Search in Google Scholar

Chow, C. K., and Liu, C. N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3):462–467.10.1109/TIT.1968.1054142 Search in Google Scholar

Dean, T., and Kanazawa, K. 1989. A Model for Reasoning about Persistence and Causation. Computational Intelligence 5(3):142–150.10.1111/j.1467-8640.1989.tb00324.x Search in Google Scholar

Friedman, N.; Geiger, D.; and Goldszmid, M. 1997. Bayesian Network Classifiers. Machine Learning 29(2):131–163.10.1023/A:1007465528199 Search in Google Scholar

Gaglio, M. 2007. Universal Search. Scholarpedia 2(11):2575.10.4249/scholarpedia.2575 Search in Google Scholar

Goertzel, B., and Pennachin, C., eds. 2007. Artificial General Intelligence. Springer.10.1007/978-3-540-68677-4 Search in Google Scholar

Grünwald, P. D. 2007. The Minimum Description Length Principle. Cambridge: The MIT Press.10.7551/mitpress/4643.001.0001 Search in Google Scholar

Guestrin, C.; Koller, D.; Parr, R.; and Venkataraman, S. 2003. Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research (JAIR) 19:399–468.10.1613/jair.1000 Search in Google Scholar

Hutter, M. 2003. Optimality of Universal Bayesian Prediction for General Loss and Alphabet. Journal of Machine Learning Research 4:971–1000. Search in Google Scholar

Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. Search in Google Scholar

Hutter, M. 2009a. Feature Dynamic Bayesian Networks. In Proc. 2nd Conf. on Artificial General Intelligence (AGI’09), volume 8, 67–73. Atlantis Press.10.2991/agi.2009.6 Search in Google Scholar

Hutter, M. 2009b. Feature Reinforcement Learning: Part I: Unstructured MDPs. Journal of Artificial General Intelligence 1:3–24.10.2478/v10229-011-0002-8 Search in Google Scholar

Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101:99–134.10.1016/S0004-3702(98)00023-X Search in Google Scholar

Kearns, M., and Koller, D. 1999. Efficient Reinforcement Learning in Factored MDPs. In Proc. 16th International Joint Conference on Artificial Intelligence (IJCAI-99), 740–747. San Francisco: Morgan Kaufmann. Search in Google Scholar

Koller, D., and Parr, R. 1999. Computing Factored Value Functions for Policies in Structured MDPs,. In Proc. 16st International Joint Conf. on Artificial Intelligence (IJCAI’99), 1332–1339. Search in Google Scholar

Koller, D., and Parr, R. 2000. Policy Iteration for Factored MDPs. In Proc. 16th Conference on Uncertainty in Artificial Intelligence (UAI-00), 326–334. San Francisco, CA: Morgan Kaufmann. Search in Google Scholar

Legg, S., and Hutter, M. 2007. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17(4):391–444.10.1007/s11023-007-9079-x Search in Google Scholar

Lewis, D. D. 1998. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proc. 10th European Conference on Machine Learning (ECML’98), 4–15. Chemnitz, DE: Springer.10.1007/BFb0026666 Search in Google Scholar

Littman, M. L.; Sutton, R. S.; and Singh, S. P. 2001. Predictive Representations of State. In Advances in Neural Information Processing Systems, volume 14, 1555–1561. MIT Press. Search in Google Scholar

McCallum, A. K. 1996. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation, Department of Computer Science, University of Rochester. Search in Google Scholar

Puterman, M. L. 1994. Markov Decision Processes — Discrete Stochastic Dynamic Programming. New York, NY: Wiley.10.1002/9780470316887 Search in Google Scholar

Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research 2008(32):663–704.10.1613/jair.2567 Search in Google Scholar

Russell, S. J., and Norvig, P. 2003. Artificial Intelligence. A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition. Search in Google Scholar

Singh, S.; Littman, M.; Jong, N.; Pardoe, D.; and Stone, P. 2003. Learning Predictive State Representations. In Proc. 20th International Conference on Machine Learning (ICML’03), 712– 719. Search in Google Scholar

Singh, S. P.; James, M. R.; and Rudary, M. R. 2004. Predictive State Representations: A New Theory for Modeling Dynamical Systems. In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI’04), 512–518. Banff, Canada: AUAI Press. Search in Google Scholar

Strehl, A. L.; Diuk, C.; and Littman, M. L. 2007. Efficient Structure Learning in Factored-State MDPs. In Proc. 27th AAAI Conference on Artificial Intelligence, 645–650. Vancouver, BC: AAAI Press. Search in Google Scholar

Sutton, R. S., and Barto, A. G. 2018. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 2nd edition. Search in Google Scholar

Szita, I., and Lörincz, A. 2008. The Many Faces of Optimism: a Unifying Approach. In Proc. 12th International Conference (ICML 2008), volume 307, 1048–1055. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo