Feature Reinforcement Learning: Part I. Unstructured MDPs

Aarts, E. H. L., and Lenstra, J. K., eds. 1997. Local Search in Combinatorial Optimization. Discrete Mathematics and Optimization. Chichester, England: Wiley-Interscience.Search in Google Scholar

Banzhaff, W.; Nordin, P.; Keller, E.; and Francone, F. 1998. Genetic Programming. San Francisco, CA, U.S.A.: Morgan-Kaufmann.Search in Google Scholar

Barron, A. R. 1985. Logically Smooth Density Estimation. Ph.D. Dissertation, Stanford University.Search in Google Scholar

Berry, D. A., and Fristedt, B. 1985. Bandit Problems: Sequential Allocation of Experiments. London: Chapman and Hall.10.1007/978-94-015-3711-7Search in Google Scholar

Brafman, R. I., and Tennenholtz, M. 2002. R-max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research 3:213-231.Search in Google Scholar

Cover, T. M., and Thomas, J. A. 2006. Elements of Information Theory. Wiley-Intersience, 2nd edition.Search in Google Scholar

Dearden, R.; Friedman, N.; and Andre, D. 1999. Model based Bayesian Exploration. In Proc. 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), 150-159.Search in Google Scholar

Duff, M. 2002. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. Dissertation, Department of Computer Science, University of Massachusetts Amherst.Search in Google Scholar

Dzeroski, S.; de Raedt, L.; and Driessens, K. 2001. Relational Reinforcement Learning. Machine Learning 43:7-52.10.1023/A:1007694015589Search in Google Scholar

Fishman, G. 2003. Monte Carlo. Springer.Search in Google Scholar

Givan, R.; Dean, T.; and Greig, M. 2003. Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence 147(1-2):163-223.10.1016/S0004-3702(02)00376-4Search in Google Scholar

Goertzel, B., and Pennachin, C., eds. 2007. Artificial General Intelligence. Springer.10.1007/978-3-540-68677-4Search in Google Scholar

Gordon, G. 1999. Approximate Solutions to Markov Decision Processes. Ph.D. Dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.Search in Google Scholar

Grünwald, P. D. 2007. The Minimum Description Length Principle. Cambridge: The MIT Press.10.7551/mitpress/4643.001.0001Search in Google Scholar

Guyon, I., and Elisseeff, A., eds. 2003. Variable and Feature Selection. JMLR Special Issue: MIT Press.Search in Google Scholar

Hastie, T.; Tibshirani, R.; and Friedman, J. H. 2001. The Elements of Statistical Learning. Springer.10.1007/978-0-387-21606-5Search in Google Scholar

Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. 300 pages, http://www.hutter1.net/ai/uaibook.htm. http://www.hutter1.net/ai/uaibook.htmSearch in Google Scholar

Hutter, M. 2007. Universal Algorithmic Intelligence: A Mathematical TopDown Approach. In Artificial General Intelligence. Berlin: Springer. 227-290.10.1007/978-3-540-68677-4_8Search in Google Scholar

Hutter, M. 2009a. Feature Dynamic Bayesian Networks. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume 8, 67-73. Atlantis Press.10.2991/agi.2009.6Search in Google Scholar

Hutter, M. 2009b. Feature Markov Decision Processes. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume 8, 61-66. Atlantis Press.10.2991/agi.2009.30Search in Google Scholar

Hutter, M. 2009c. Feature Reinforcement Learning: Part II: Structured MDPs. In progress. Will extend Hutter (2009a).10.2478/v10229-011-0002-8Search in Google Scholar

Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101:99-134.10.1016/S0004-3702(98)00023-XSearch in Google Scholar

Kearns, M. J., and Singh, S. 1998. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning, 260-268. Morgan Kaufmann, San Francisco, CA.Search in Google Scholar

Koza, J. R. 1992. Genetic Programming. The MIT Press.Search in Google Scholar

Kumar, P. R., and Varaiya, P. P. 1986. Stochastic Systems: Estimation, Identification, and Adaptive Control. Englewood Cliffs, NJ: Prentice Hall.Search in Google Scholar

Legg, S., and Hutter, M. 2007. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17(4):391-444.10.1007/s11023-007-9079-xSearch in Google Scholar

Legg, S. 2008. Machine Super Intelligence. Ph.D. Dissertation, IDSIA, Lugano.Search in Google Scholar

Li, M., and Vitányi, P. M. B. 2008. An Introduction to Kolmogorov Complexity and its Applications. Berlin: Springer, 3rd edition.10.1007/978-0-387-49820-1Search in Google Scholar

Liang, P., and Jordan, M. 2008. An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators. In Proc. 25th International Conf. on Machine Learning (ICML'08), volume 307, 584-591. ACM.10.1145/1390156.1390230Search in Google Scholar

Liu, J. S. 2002. Monte Carlo Strategies in Scientific Computing. Springer.Search in Google Scholar

Lusena, C.; Goldsmith, J.; and Mundhenk, M. 2001. Nonapproximability Results for Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research 14:83-103.10.1613/jair.714Search in Google Scholar

MacKay, D. J. C. 2003. Information theory, inference and learning algorithms. Cambridge, MA: Cambridge University Press.Search in Google Scholar

Madani, O.; Hanks, S.; and Condon, A. 2003. On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems. Artificial Intelligence 147:5-34.10.1016/S0004-3702(02)00378-8Search in Google Scholar

McCallum, A. K. 1996. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation, Department of Computer Science, University of Rochester.Search in Google Scholar

Ng, A. Y.; Coates, A.; Diel, M.; Ganapathi, V.; Schulte, J.; Tse, B.; Berger, E.; and Liang, E. 2004. Autonomous Inverted Helicopter Flight via Reinforcement Learning. In ISER, volume 21 of Springer Tracts in Advanced Robotics, 363-372. Springer.10.1007/11552246_35Search in Google Scholar

Pankov, S. 2008. A Computational Approximation to the AIXI Model. In Proc. 1st Conference on Artificial General Intelligence, volume 171, 256-267.Search in Google Scholar

Pearlmutter, B. A. 1989. Learning State Space Trajectories in Recurrent Neural Networks. Neural Computation 1(2):263-269.10.1162/neco.1989.1.2.263Search in Google Scholar

Poland, J., and Hutter, M. 2006. Universal Learning of Repeated Matrix Games. In Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn'06), 7-14.Search in Google Scholar

Poupart, P.; Vlassis, N. A.; Hoey, J.; and Regan, K. 2006. An Analytic Solution to Discrete Bayesian Reinforcement Learning. In Proc. 23rd International Conf. on Machine Learning (ICML'06), volume 148, 697-704. Pittsburgh, PA: ACM.10.1145/1143844.1143932Search in Google Scholar

Puterman, M. L. 1994. Markov Decision Processes — Discrete Stochastic Dynamic Programming. New York, NY: Wiley.10.1002/9780470316887Search in Google Scholar

Raedt, L. D.; Hammer, B.; Hitzler, P.; and Maass, W., eds. 2008. Recurrent Neural Networks - Models, Capacities, and Applications, volume 08041 of Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany.Search in Google Scholar

Ring, M. 1994. Continual Learning in Reinforcement Environments. Ph.D. Dissertation, University of Texas, Austin.Search in Google Scholar

Ross, S., and Pineau, J. 2008. Model-Based Bayesian Reinforcement Learning in Large Structured Domains. In Proc. 24th Conference in Uncertainty in Artificial Intelligence (UAI'08), 476-483. Helsinki: AUAI Press.Search in Google Scholar

Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research 2008(32):663-704.10.1613/jair.2567Search in Google Scholar

Russell, S. J., and Norvig, P. 2003. Artificial Intelligence. A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition.Search in Google Scholar

Sanner, S., and Boutilier, C. 2009. Practical Solution Techniques for First-Order MDPs. Artificial Intelligence 173(5-6):748-788.10.1016/j.artint.2008.11.003Search in Google Scholar

Schmidhuber, J. 2004. Optimal Ordered Problem Solver. Machine Learning 54(3):211-254.10.1023/B:MACH.0000015880.99707.b2Search in Google Scholar

Schwarz, G. 1978. Estimating the Dimension of a Model. Annals of Statistics 6(2):461-464.10.1214/aos/1176344136Search in Google Scholar

Singh, S.; Littman, M.; Jong, N.; Pardoe, D.; and Stone, P. 2003. Learning Predictive State Representations. In Proc. 20th International Conference on Machine Learning (ICML'03), 712-719.Search in Google Scholar

Strehl, A. L.; Diuk, C.; and Littman, M. L. 2007. Efficient Structure Learning in Factored-State MDPs. In Proc. 27th AAAI Conference on Artificial Intelligence, 645-650. Vancouver, BC: AAAI Press.Search in Google Scholar

Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.10.1109/TNN.1998.712192Search in Google Scholar

Szita, I., and Lörincz, A. 2008. The Many Faces of Optimism: a Unifying Approach. In Proc. 12th International Conference (ICML 2008), volume 307.Search in Google Scholar

Wallace, C. S. 2005. Statistical and Inductive Inference by Minimum Message Length. Berlin: Springer.Search in Google Scholar

Willems, F. M. J.; Shtarkov, Y. M.; and Tjalkens, T. J. 1997. Reections on the Prize Paper: The Context-Tree Weighting Method: Basic Properties. IEEE Information Theory Society Newsletter 20-27.Search in Google Scholar

Wolpert, D. H., and Macready, W. G. 1997. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation 1(1):67-82.10.1109/4235.585893Search in Google Scholar

eISSN:: 1946-0163
Langue:: Anglais

Périodicité:: 2 fois par an
Sujets de la revue:: Computer Sciences, Artificial Intelligence

RSS Feed de la revue

Feature Reinforcement Learning: Part I. Unstructured MDPs

Publié en ligne: 23 nov. 2011

Pages: 3 - 24

DOI: https://doi.org/10.2478/v10229-011-0002-8

Mots clésReinforcement learning, Markov decision process, partial observability, feature learning, explore-exploit, information & complexity, rational agents

This content is open access.

Mots clés
Reinforcement learning, Markov decision process, partial observability, feature learning, explore-exploit, information & complexity, rational agents