Representing Systems with Hidden Statejpineau/talks/jpineau-fss07.pdf · FSS 2007: Representing Systems with Hidden State Final discussion • Interesting to consider the same dynamical

Representing Systems with Hidden State

Dorna KASHEF HAGHIGHI, Chris HUNDT*, Prakash PANANGADEN,Joelle PINEAU and Doina PRECUP

School of Computer Science, McGill University(*now at UC Berkeley)

AAAI Fall Symposium SeriesNovember 9, 2007

FSS 2007: Representing Systems with Hidden State

How should we represent systems with hidden state?

Partially Observable Markov Decision Processes (POMDP)• System is in some “true” latent state.• Perceive observations that depend probabilistic on the state.• Very expressive model, good for state inference and planning, but:

– Very hard to learn from data.– Hidden state may be artificial (e.g. dialogue management).

Predictive representations (e.g. PSRs, OOMs, TD-nets, diversity)• State is defined as sufficient statistic of the past, which allows

predicting the future.• Good for learning, because state depends only on observable

quantities.

Our goal: Understand and unify different predictive representations.


Partially Observable Markov Decision Processes

• A set of states, S• A set of actions, A• A set of observations, O• A transition function:

• An observation emission function:

• For this discussion, we omit rewards (may be considered part of theobservation vector.)

!

"a(s,s') = P(s

t+1 = s' | st

= s,at

= a),#a$ A

!

"a(s,o) = P(o

t+1 = o | at= a,s

t+1 = s),#a$ A


A simple example

• Consider the following domain: S={s1, s2, s3, s4}, A={N, S, E, W}• For simplicity, assume the transitions are deterministic.• In each square, the agent observes the color of one of the adjacent

walls, O={Red, Blue}, with equal probability.

Question: What kinds of predictions can we make about the system?


A simple example: Future predictionsConsider the following predictions:

– If I am in state s1 and go North, I will certainly see Blue.

– If I go West then North, I will certainly see Blue.

– If I go East, I will see Red with probability 0.5.

– If I go East then North, I will see Red twice with probability 0.25.

The action sequences are experiments that we can perform on the system.For each experiment, we can verify the predicted observations from data.


Tests and Experiments

• A test is a sequence of actions followed by an observation:t = a1 … an o, n ≥ 1

• An experiment is a non-empty sequence of tests:e = t1 …. tm, m ≥ 1

– Note that special cases of experiments are s-tests (Littman et al,2002) and e-tests (Ruddary&Singh, 2004).

• A prediction for an experiment e starting in s ∈ S, denoted 〈s|e〉, isthe conditional probability that by doing the actions of e, we will getthe predicted observations.


A simple example: Looking at predictionsConsider our predictions again:

– If I am in state s1 and go North, I will certainly see Blue.

〈s1 | NB〉 = 1– If I go West then North, I will certainly see Blue.

〈s | WNB〉 = 1, ∀ s ∈ SNote that for any sequence of actions preceding the West action, the aboveprediction would still be the same.


Equivalence relations

• Two experiments are equivalent if their predictions are the same forevery state:

e1 ~ e2 ⇔ 〈s | e1〉 = 〈s | e2〉, ∀s

Note: If two experiments always give the same results, they areredundant, and only one is necessary.

• Two states are equivalent if they cannot be distinguished by anyexperiment:

s1 ~ s2 ⇔ 〈s1 | e〉 = 〈s2 | e〉, ∀e

Note: Equivalent states produce the same probability distribution overfuture trajectories, so they are redundant.


A simple example: Equivalent predictions• Consider the following experiment: NRNR

– This is equivalent to : SRSR, NRSR, NNRSSSR, …

– This is an infinite equivalence class, which we denote by a chosenexemplar: e.g. [NRNR]

– The predictions for this class: 〈s1 | [NRNR] 〉 = 0 〈s2 | [NRNR] 〉 = 0.25


Dual perspectives

• Forward view: Given a certain state, what predictions can we makeabout the future?– In classical AI, this view enables forward planning.

– It is centered around the notion of state.

• Backward view: Suppose that we want a certain experiment tosucceed, in what state should the system initially be?– This view enables backward planning.

– It is centered around the experiments.


A simple example: Dual perspectives• Forward view:

Q: If we know that the system is in s1, what predictions can we makeabout the future?


A simple example: Dual perspectives• Backward view:

Q: Suppose we want the experiment NR to succeed, in what stateshould the system be?

A: If the system starts either in state s2 or s4, the test will succeedwith probability 0.5.

• We can associate with the experiment NR a vector of predictions ofhow likely it is to succeed from every state: [0 0.5 0 0.5]T


The dual machine

• The backward view can be implemented in a dual machine.

• States of the dual machine are equivalence classes of experiments [e].

• Observations of the dual machine are states from the original machine.

• The emission fn represents the prediction probability 〈s | [e]〉, ∀ s ∈ S.

• The transition fn is deterministic: [e] →a [ae]


A simple example: A fragment of the dual machine

[NR] [NB]

[WR] [ER] [WB]N,S,E,W

N,S,E,W

N,SN,S

EEW W

N,S,E,W

γ(s1)= γ(s3)=0γ(s2)= γ(s4)=0.5

γ(s1)= γ(s3)=1γ(s2)= γ(s4)=0.5

γ(s) = 1γ(s) = 0.5γ(s) = 0

• This fragment of the dual machine captures experiments with 1 observation.E.g. [NR] →W [WR] because 〈s | WNR〉 = 〈s | WR〉, ∀ s.

• There are separate fragments for experiments with 2 observations, 3 observations, etc.

Original: Dual:


Notes on the dual machine

• The dual provides, for each experiment, the set of states from which theexperiment succeeds.– Note that the emission function is not normalized.

– Given an initial state distribution, we can get proper probabilities Pr(s|[e]).

• Experiments with different numbers of observations usually end up indisconnected components.

• Arcs represent temporal-difference relations, similar to those in TD-nets (Sutton& Tanner, 2005).– This is consistent with previous observations (Ruddary & Singh, 2004) that

e-tests yield TD-relationships and s-tests don’t.


Can we do this again?

• In the dual, we get a proper machine, with states, actions, transitions,emissions.

• Can we think about experiments on the dual machine?– Repeat previous transformations on the dual machine.

– Consider classes of equivalent experiments.

– Reverse the role of experiments and states.

• What do we obtain?


The double dual machine

• States of the double dual machine are bundles of predictions for allpossible experiments, e.g. [s]Μ’ and [sα]Μ’

– Equivalence classes of the type [sα]Μ’ can be viewed as homingsequences (Evan-Dar et al., 2005).

• The double dual assigns the same probability to any experiment as theoriginal machine. So they are equivalent machines.

• The double dual is always a deterministic system! (But can be muchlarger than the original machine.)


A simple example: The double dual machine

Original:

[NR] [NB]

[WR] [ER] [WB]N,S,E,W

N,S,E,W

N,SN,S

EEW W

N,S,E,W

γ(s1)= γ(s3)=0γ(s2)= γ(s4)=0.5

γ(s1)= γ(s3)=1γ(s2)= γ(s4)=0.5

γ(s) = 1γ(s) = 0.5γ(s) = 0

Dual:

S1 S2N,SN,S E

Wγ(NR)= 0γ(NB)= 1γ(ER)= 0.5γ(WB)= 1γ(WR)= 0 …

Double Dual:

γ(NR)= 0.5γ(NB)= 0.5γ(ER)= 0.5γ(WB)= 1γ(WR)= 0 …

Equivalent states are eliminated.

Two simple homing sequences:• Action W forces system into s1.• Action E forces system into s2.


Conjecture: Different representations are useful for different tasks

• Learn the double-dual– Advantage: it’s deterministic.

– Problem: in general, the double-dual is an infinite representation.

(In our example, it’s compact due to deterministic transitions in the original.)

– Focus on predicting accurately only the result of some experiments.

• Plan with the dual– For a given experiment, the dual tells us its probability of success from every

state.

– Given an initial state distribution: search over experiments, to find one withhigh prediction probability with respect to goal criteria.

– Start with dual fragments with short experiments, then move to longer ones.


A simple learning algorithm

Consider the following non-deterministic automaton:• A set of states, S• A set of actions, A• A set of observations, O• A joint transition-emission relation:

Can we learn this automaton (or an equivalent one) directly from data?!

" # S $ A $O$ S :

s'% "(s,a,o) if sao& ' & s'


Merge-split algorithm

• Define:– Histories: h={a1, o1, a2, o2, …, am, om}– The empty history: ε

• Construct a “history” automaton, H.

• Algorithm:– Start with one state, corresponding to the empty history, H = { ε }

– Consider all possible next states, h’ = hao

– The merge operation checks for an equivalent existing state:

h’ ~ h” ⇔ h’↑ = h”↑, where h↑ is the set of all possible future trajectories.

If found, we set the transition function accordingly: δ(h,ao)=h’’

– Otherwise the split operation is applied: H = H ∪ h’

δ(h,ao)=h’


ExampleThe flip automaton (Holmes&Isbell’06)

The learned automaton


Comments

• Merge-split constructs a deterministic history automaton.

• There is a finite number of equivalence classes of histories.– Worse-case: size is exponential in the number of states in the

original machine.

• The automaton is well defined (i.e. makes the same predictions as theoriginal model.)

• This is the minimal such automaton.

• Extending this to probabilistic machines is somewhat messy…. but weare working on it.


Final discussion

• Interesting to consider the same dynamical system from different perspectives.– There is a notion of duality between state and experiment.

– Such a notion of duality is not new.

E.g. observability vs controllability in systems theory.

• Large body of existing work on learning automaton, which I did not commenton. [Rivest&Schapire’94; James&Singh’05; Holmes&Isbell’06; …].

• Many interesting questions remain:– Can we develop a sound approximation theory for our duality?

– Can we extend this to continuous systems?

– Can we extend the learning algorithm to probabilistic systems?

Documents

Representing Systems with Hidden Statejpineau/talks/jpineau-fss07.pdf · FSS 2007: Representing Systems with Hidden State Final discussion • Interesting to consider the same dynamical