You are previewing Markov Decision Processes in Artificial Intelligence.
O'Reilly logo
Markov Decision Processes in Artificial Intelligence

Book Description

Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision problems under uncertainty as well as Reinforcement Learning problems. Written by experts in the field, this book provides a global view of current research using MDPs in Artificial Intelligence. It starts with an introductory presentation of the fundamental aspects of MDPs (planning in MDPs, Reinforcement Learning, Partially Observable MDPs, Markov games and the use of non-classical criteria). Then it presents more advanced research trends in the domain and gives some concrete examples using illustrative applications.

Table of Contents

  1. Cover
  2. Title Page
  3. Copyright
  4. Preface
  5. List of Authors
  6. Part 1: MDPs: Models and Methods
    1. Chapter 1: Markov Decision Processes
      1. 1.1. Introduction
      2. 1.2. Markov decision problems
      3. 1.3. Value functions
      4. 1.4. Markov policies
      5. 1.5. Characterization of optimal policies
      6. 1.6. Optimization algorithms for MDPs
      7. 1.7. Conclusion and outlook
      8. 1.8. Bibliography
    2. Chapter 2: Reinforcement Learning
      1. 2.1. Introduction
      2. 2.2. Reinforcement learning: a global view
      3. 2.3. Monte Carlo methods
      4. 2.4. From Monte Carlo to temporal difference methods
      5. 2.5. Temporal difference methods
      6. 2.6. Model-based methods: learning a model
      7. 2.7. Conclusion
      8. 2.8. Bibliography
    3. Chapter 3: Approximate Dynamic Programming
      1. 3.1. Introduction
      2. 3.2. Approximate value iteration (AVI)
      3. 3.3. Approximate policy iteration (API)
      4. 3.4. Direct minimization of the Bellman residual
      5. 3.5. Towards an analysis of dynamic programming in Lp-norm
      6. 3.6. Conclusions
      7. 3.7. Bibliography
    4. Chapter 4: Factored Markov Decision Processes
      1. 4.1. Introduction
      2. 4.2. Modeling a problem with an FMDP
      3. 4.3. Planning with FMDPs
      4. 4.4. Perspectives and conclusion
      5. 4.5. Bibliography
    5. Chapter 5: Policy-Gradient Algorithms
      1. 5.1. Reminder about the notion of gradient
      2. 5.2. Optimizing a parameterized policy with a gradient algorithm
      3. 5.3. Actor-critic methods
      4. 5.4. Complements
      5. 5.5. Conclusion
      6. 5.6. Bibliography
    6. Chapter 6: Online Resolution Techniques
      1. 6.1. Introduction
      2. 6.2. Online algorithms for solving an MDP
      3. 6.3. Controlling the search
      4. 6.4. Conclusion
      5. 6.5. Bibliography
  7. Part 2: Beyond MDPs
    1. Chapter 7: Partially Observable Markov Decision Processes
      1. 7.1. Formal definitions for POMDPs
      2. 7.2. Non-Markovian problems: incomplete information
      3. 7.3. Computation of an exact policy on information states
      4. 7.4. Exact value iteration algorithms
      5. 7.5. Policy iteration algorithms
      6. 7.6. Conclusion and perspectives
      7. 7.7. Bibliography
    2. Chapter 8: Stochastic Games
      1. 8.1. Introduction
      2. 8.2. Background on game theory
      3. 8.3. Stochastic games
      4. 8.4. Conclusion and outlook
      5. 8.5. Bibliography
    3. Chapter 9: DEC-MDP/POMDP
      1. 9.1. Introduction
      2. 9.2. Preliminaries
      3. 9.3. Multiagent Markov decision processes
      4. 9.4. Decentralized control and local observability
      5. 9.5. Sub-classes of DEC-POMDPs
      6. 9.6. Algorithms for solving DEC-POMDPs
      7. 9.7. Applicative scenario: multirobot exploration
      8. 9.8. Conclusion and outlook
      9. 9.9. Bibliography
    4. Chapter 10: Non-Standard Criteria
      1. 10.1. Introduction
      2. 10.2. Multicriteria approaches
      3. 10.3. Robustness in MDPs
      4. 10.4. Possibilistic MDPs
      5. 10.5. Algebraic MDPs
      6. 10.6. Conclusion
      7. 10.7. Bibliography
  8. Part 3: Applications
    1. Chapter 11: Online Learning for Micro-Object Manipulation
      1. 11.1. Introduction
      2. 11.2. Manipulation device
      3. 11.3. Choice of the reinforcement learning algorithm
      4. 11.4. Experimental results
      5. 11.5. Conclusion
      6. 11.6. Bibliography
    2. Chapter 12: Conservation of Biodiversity
      1. 12.1. Introduction
      2. 12.2. When to protect, survey or surrender cryptic endangered species
      3. 12.3. Can sea otters and abalone co-exist?
      4. 12.4. Other applications in conservation biology and discussions
      5. 12.5. Bibliography
    3. Chapter 13: Autonomous Helicopter Searching for a Landing Area in an Uncertain Environment
      1. 13.1. Introduction
      2. 13.2. Exploration scenario
      3. 13.3. Embedded control and decision architecture
      4. 13.4. Incremental stochastic dynamic programming
      5. 13.5. Flight tests and return on experience
      6. 13.6. Conclusion
      7. 13.7. Bibliography
    4. Chapter 14: Resource Consumption Control for an Autonomous Robot
      1. 14.1. The rover’s mission
      2. 14.2. Progressive processing formalism
      3. 14.3. MDP/PRU model
      4. 14.4. Policy calculation
      5. 14.5. How to model a real mission
      6. 14.6. Extensions
      7. 14.7. Conclusion
      8. 14.8. Bibliography
    5. Chapter 15: Operations Planning
      1. 15.1. Operations planning
      2. 15.2. MDP value function approaches
      3. 15.3. Reinforcement learning: FPG
      4. 15.4. Experiments
      5. 15.5. Conclusion and outlook
      6. 15.6. Bibliography
  9. Index