Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled...

15-381: AI

Markov Systems and

Markov Decision Processes

November 11, 2009

Manuela Veloso

Carnegie Mellon

States, Actions, Observations

Passive Controlled

Fully Observable Markov Model Markov Decision

Process (MDP)

Hidden State Hidden Markov Model Partially Observable

Hidden State Hidden Markov Model

Partially Observable

MDP (POMDP)

Markov Model – Observation

• States

• Rewards

• Transition probabilities

Markov Systems with Rewards

• Finite set of n states, si

• Probabilistic state matrix, P, pij

– Probabilty of transition from si to sj

• “Goal achievement” - Reward for each state, ri

• Discount factor - γ

“Solving” Markov Systems with

Rewards• Find “value of each state”

• Process:

– Start at state si– Receive immediate reward ri– Move randomly to a new state according to

– Move randomly to a new state according to

the probability transition matrix

– Future rewards (of next state) are discounted

Solving a Markov System with Rewards

• V*(si) - expected discounted sum of future rewards

starting in state si

• V*(si) = ri + γ[pi1V*(s1) + pi2V*(s2) + ... pinV*(sn)]

Value Iteration to Solve a Markov System

with Rewards• V1(si) - expected discounted sum of future rewards

starting in state si for one step.

• V2(si) - expected discounted sum of future rewards

starting in state si for two steps.

• ...

• Vk(s ) - expected discounted sum of future rewards

• Vk(si) - expected discounted sum of future rewards

starting in state si for k steps.

• As k → ∞Vk(si) → V*(si)

• Stop when difference of k + 1 and k values is smaller

than some ∈.

3-State Example

3-State Example: Values γ = 0.5

Markov Decision Processes

• Finite set of states, s1,..., sn

• Finite set of actions, a1,..., am

• Probabilistic state,action transitions:

)action takeand current (next prob k assp ===

• Reward for each state, r1,..., rn

)action takeand current (next prob kijkij assp ===

Solving an MDP

• A policy is a mapping from states to actions.

• Optimal policy - for every state, there is no other action

that gets a higher sum of discounted future rewards.

• For every MDP there exists an optimal policy.

• Solving an MDP is finding an optimal policy.

• A specific policy converts an MDP into a plain Markov

system with rewards.

Deterministic MDP Example

(Reward on unlabelled transitions is zero.)

Nondeterministic Example

Solving MDPs

• Find the value of each state and the action

to take from each state.

• Process:

–Start in state si

–Start in state si–Receive immediate reward ri–Choose action ak ∈ A

–Change to state sj with probability

–Discount future rewards

Policy Iteration

• Start with some policy π0(si).

• Such policy transforms the MDP into a plain Markov

system with rewards.

• Compute the values of the states according to the

current policy.

• Update policy:

• Keep computing

• Stop when πk+1 = πk.

( ) ( )} γ{maxargπ 01 ∑+=

jaijiai sVprs

Value Iteration

• V*(si) - expected discounted future rewards, if we start

from state si, and we follow the optimal policy.

• Compute V* with value iteration:

– Vk(si) = maximum possible future sum of rewards

starting from state si for k steps.

• Bellman’s Equation:

• Dynamic programming

( ) ( )} γ{max

1 ∑=

ijikin sVprsV

Summary

• Markov model for state/action transitions.

• Markov systems with reward - goal achievement

• Markov decision processes - added actions

• Value, policy iteration

Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled...

Documents

L13: hidden Markov modelscourses.cs.tamu.edu/rgutier/csce630_f14/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models • Forward and Backward

Hidden Markov Models - AUusers-cs.au.dk/cstorm/courses/PRiB_f12/slides/hidden-markov-model… · Hidden Markov Models Markov Model Hidden Markov Model If the latent variables are

Hidden Markov Models

Hidden Markov Models - Home | Princeton Universityrvan/orf557/hmm080728.pdf · 1 Hidden Markov Models..... 1 1.1 Markov Processes ..... 1 1.2 Hidden Markov Models..... 4 ... able

Hidden Markov Model Nov 11, 2008 Sung-Bae Cho. Hidden Markov Model Inference of Hidden Markov Model Path Tracking of HMM Learning of Hidden Markov Model

9 Markov chains and Hidden Markov Models - Freie … · 9 Markov chains and Hidden Markov Models We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward,

Hidden Markov Models · Hidden Markov Models 1 10-601 Introduction to Machine Learning Matt Gormley Lecture 20 Nov. 7, 2018 ... Hidden Markov Model 28 A Hidden Markov Model (HMM)

Hidden Markov Models and Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov Model ... ASR Lectures 4&5 Hidden Markov Models and Gaussian

Hidden Markov

Markov hidden

Sequence Classification - with emphasis on Hidden Markov ...cs.unc.edu/~lazebnik/fall09/sequence_classification.pdf · Hidden Markov Models ... with emphasis on Hidden Markov Models

EE365: Hidden Markov Models - Stanford Universityee266.stanford.edu/lectures/hmm.pdf · EE365: Hidden Markov Models Hidden Markov Models The Viterbi Algorithm 1. Hidden Markov Models

L13: hidden Markov models - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/sp/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models

Inference in Mixed Hidden Markov Models and Applications ... · 2. From hidden Markov models to mixed hidden Markov models Many authors suggested applying hidden Markov models to

Hidden Markov Models - uml.edugrinstei/91.510/HMM/Class Sildes - Hidden Markov... · Hidden Markov Models. ... particular hidden state. • Thus a hidden Markov model is a standard

Hidden Markov Models./awm/tutorials/hmm14.pdf · Hidden Markov Models ... 14)

Hidden Markov Modelle

Hidden Markov Models in Bioinformaticscsatol/mach_learn/bemutato/Mate_Korosi_HMMpr… · Outline ˜ Markov Chain ˜ HMM (Hidden Markov Model) ˜ Hidden Markov Models in Bioinformatics

Lecture 6a: Introduction to Hidden Markov Models · Lecture 6a: Introduction to Hidden Markov Models ... Markov Chain/Hidden Markov Model ... The states are hidden from the

Hidden Markov Decision Trees...Hidden Markov Decision Trees Figure 1: The hierarchical mixture of experts as a graphical model. The E step of the learning algorithm for HME's involves