Feature Reinforcement Learning: Part II. Structured MDPs

Boutilier, C.; Dean, T.; and Hanks, S. 1999. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research 11:1–94.10.1613/jair.575 Search in Google Scholar

Chow, C. K., and Liu, C. N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3):462–467.10.1109/TIT.1968.1054142 Search in Google Scholar

Dean, T., and Kanazawa, K. 1989. A Model for Reasoning about Persistence and Causation. Computational Intelligence 5(3):142–150.10.1111/j.1467-8640.1989.tb00324.x Search in Google Scholar

Friedman, N.; Geiger, D.; and Goldszmid, M. 1997. Bayesian Network Classifiers. Machine Learning 29(2):131–163.10.1023/A:1007465528199 Search in Google Scholar

Gaglio, M. 2007. Universal Search. Scholarpedia 2(11):2575.10.4249/scholarpedia.2575 Search in Google Scholar

Goertzel, B., and Pennachin, C., eds. 2007. Artificial General Intelligence. Springer.10.1007/978-3-540-68677-4 Search in Google Scholar

Grünwald, P. D. 2007. The Minimum Description Length Principle. Cambridge: The MIT Press.10.7551/mitpress/4643.001.0001 Search in Google Scholar

Guestrin, C.; Koller, D.; Parr, R.; and Venkataraman, S. 2003. Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research (JAIR) 19:399–468.10.1613/jair.1000 Search in Google Scholar

Hutter, M. 2003. Optimality of Universal Bayesian Prediction for General Loss and Alphabet. Journal of Machine Learning Research 4:971–1000. Search in Google Scholar

Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. Search in Google Scholar

Hutter, M. 2009a. Feature Dynamic Bayesian Networks. In Proc. 2nd Conf. on Artificial General Intelligence (AGI’09), volume 8, 67–73. Atlantis Press.10.2991/agi.2009.6 Search in Google Scholar

Hutter, M. 2009b. Feature Reinforcement Learning: Part I: Unstructured MDPs. Journal of Artificial General Intelligence 1:3–24.10.2478/v10229-011-0002-8 Search in Google Scholar

Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101:99–134.10.1016/S0004-3702(98)00023-X Search in Google Scholar

Kearns, M., and Koller, D. 1999. Efficient Reinforcement Learning in Factored MDPs. In Proc. 16th International Joint Conference on Artificial Intelligence (IJCAI-99), 740–747. San Francisco: Morgan Kaufmann. Search in Google Scholar

Koller, D., and Parr, R. 1999. Computing Factored Value Functions for Policies in Structured MDPs,. In Proc. 16st International Joint Conf. on Artificial Intelligence (IJCAI’99), 1332–1339. Search in Google Scholar

Koller, D., and Parr, R. 2000. Policy Iteration for Factored MDPs. In Proc. 16th Conference on Uncertainty in Artificial Intelligence (UAI-00), 326–334. San Francisco, CA: Morgan Kaufmann. Search in Google Scholar

Legg, S., and Hutter, M. 2007. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17(4):391–444.10.1007/s11023-007-9079-x Search in Google Scholar

Lewis, D. D. 1998. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proc. 10th European Conference on Machine Learning (ECML’98), 4–15. Chemnitz, DE: Springer.10.1007/BFb0026666 Search in Google Scholar

Littman, M. L.; Sutton, R. S.; and Singh, S. P. 2001. Predictive Representations of State. In Advances in Neural Information Processing Systems, volume 14, 1555–1561. MIT Press. Search in Google Scholar

McCallum, A. K. 1996. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation, Department of Computer Science, University of Rochester. Search in Google Scholar

Puterman, M. L. 1994. Markov Decision Processes — Discrete Stochastic Dynamic Programming. New York, NY: Wiley.10.1002/9780470316887 Search in Google Scholar

Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research 2008(32):663–704.10.1613/jair.2567 Search in Google Scholar

Russell, S. J., and Norvig, P. 2003. Artificial Intelligence. A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition. Search in Google Scholar

Singh, S.; Littman, M.; Jong, N.; Pardoe, D.; and Stone, P. 2003. Learning Predictive State Representations. In Proc. 20th International Conference on Machine Learning (ICML’03), 712– 719. Search in Google Scholar

Singh, S. P.; James, M. R.; and Rudary, M. R. 2004. Predictive State Representations: A New Theory for Modeling Dynamical Systems. In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI’04), 512–518. Banff, Canada: AUAI Press. Search in Google Scholar

Strehl, A. L.; Diuk, C.; and Littman, M. L. 2007. Efficient Structure Learning in Factored-State MDPs. In Proc. 27th AAAI Conference on Artificial Intelligence, 645–650. Vancouver, BC: AAAI Press. Search in Google Scholar

Sutton, R. S., and Barto, A. G. 2018. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 2nd edition. Search in Google Scholar

Szita, I., and Lörincz, A. 2008. The Many Faces of Optimism: a Unifying Approach. In Proc. 12th International Conference (ICML 2008), volume 307, 1048–1055. Search in Google Scholar

eISSN:: 1946-0163
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence

Journal RSS Feed

Feature Reinforcement Learning: Part II. Structured MDPs

Published Online: Jun 14, 2021

Page range: 71 - 86

Received: Oct 21, 2020

Accepted: Apr 06, 2021

DOI: https://doi.org/10.2478/jagi-2021-0003

KeywordsReinforcement learning, dynamic Bayesian network, structure learning, feature selection, global vs. local reward, explore-exploit, information & complexity, rational agents, partial observability

© 2021 Marcus Hutter, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Reinforcement learning, dynamic Bayesian network, structure learning, feature selection, global vs. local reward, explore-exploit, information & complexity, rational agents, partial observability