Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Exact Mode Estimation for POMDPs Exact Mode Estimation for POMDPs based on Constraint Decomposition based on Constraint Decomposition and Symbolic Encodingand Symbolic Encoding

Martin SachenbacherJuly 1, 2003

Exact vs. Approximate MEExact vs. Approximate ME

Problems of ME with incomplete belief state– Dead ends (no solutions)– Incorrect leading solutions– Incorrect probabilities of solutions

Usefulness of ME with complete belief state– As accuracy reference– As performance reference– As a starting point for approximations

Key: Compact representation of belief state– Map to semiring-based CSP– Decompose Hypergraph into Hypertree– Encode Tree Nodes symbolically as ADDs

OutlineOutline

SCSPs (Semiring-based CSPs) Mapping State Constraints to SCSPs Mapping Transition Constraints to SCSPs ADDs (Algebraic Decision Diagrams) Hypertree Decompositions of SCSPs Solving Tree-structured SCSPs Exact Mode Estimation for POMDPs as

Decomposition/ADD-based SCSP Solving Demonstration: Two Switches Example

SCSPs (Semiring-based CSPs)SCSPs (Semiring-based CSPs)

Generalization of CSPs [Bistarelli et al. 97] Domain D, Variables V, Set S, Type T V Constraints are mappings Dk S Operations (for join) and (for projection) on S (S, , , 0, 1) must for form c-semiring Dynamic Programming applicable to all SCSPs Examples

– ({0,1}, , , 0, 1): Classical CSPs– (R+, min, +, +, 0): Weighted CSPs– ([0,1], max, *, 0, 1): Probabilistic CSPs

Encoding States as SCSPsEncoding States as SCSPs

Example: Or-Gate P(Or=ok) = 99%, P(Or=fty) = 1%

xt in1 in2 outok lo lo look lo hi hiok hi lo hiok hi hi hifty * * *

0.990.990.990.990.01

Encoding Observations as SCSPsEncoding Observations as SCSPs

Example: (Probabilistic) Observation

0 1 2 3

0.60.3

0.60.90.30.0

Distribution over values for xi

Encoding Transitions as SCSPsEncoding Transitions as SCSPs

Example: (Probabilistic) CCA

0.9 0.9

xt cmd xt+1 f

0 off 00 on 00 off 10 on 11 off 01 on 01 off 11 on 1

0.90.10.10.90.90.10.10.9

cmd=offcmd=on

cmd=on

cmd=off

Transition Function

ADDs: Symbolic (graph-based) representation of functions {0,1}n R

Generalization of BDDs (functions {0,1}n {0,1}) Canonicity of representation (as for BDDs) Efficient package: CUDD

Algebraic Decision DiagramsAlgebraic Decision Diagrams

0 1 2 3

ADD Join OperationsADD Join Operations

Multiplication, addition, maximum, … Generalization of BDD operations

ABC f f*gg f>1f+g

000001010011100101110111

01121223

32010001

32131224

02020003

055105101015

00010111

max(f,g)

32121223

ExampleExample

Summation of ADD f, ADD g

3 2 1 0

0 1 2 3

4 3 2 1

ADD Projection OperationsADD Projection Operations

(f,X) (and (f,X)) obtained by summing (multiplying) values of tuples that differ only w.r.t. X

000001010011100101110111

01121223

AB (f,{C})

00011011

(f,{C})

ADD Projection OperationsADD Projection Operations

For optimization, we require operation max(f,X) that yields maximum value of tuples differing only w.r.t. X

000001010011100101110111

01121223

AB (f,{C})

00011011

(f,{C})

Not part of CUDD, but easy to implement as variant of /(f,X).

max(f,{C})

Solving SCSPs using DecompositionSolving SCSPs using Decomposition

Transform SCSPs into Hypertree H=(T,,) Compute constraint (v) for each node v Bottom-up phase for computing values Top-down phase for extracting solutions

Pseudocode for Bottom-Up PhasePseudocode for Bottom-Up Phase

Function solve(v)For Each child children(v)

(v) (v) max((child), (child) \ (v))

Next child

Return (v) Generalization of (Semi-)Join Operation

ExampleExample

Boolean Polycell

ExampleExample

Hypertree Decomposition of Boolean Polycell

O3A1CEFXYZ

A2GYZ O1ACXO2BDY

Y,Z Y C,X

ok 1 1 1fty 1 1

1fty 1 0

1fty 1 1

0fty 1 0

ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1 0 1

ok 1 1 1fty 1 1

1fty 1 1

ok 1 1 1fty 1 1

1fty 1 1

v1 v2 v3

U=.98505

U=.99U=.99U=.995U=.005 U=.01 U=.01

ExampleExample

Initial (v0)U=.98505

U=.00995

U=.00005U=.00495

fty ok 1 0 0 0 1 1fty ok 1 0 0 0 1 0fty ok 1 0 0 1 0 0fty ok 1 0 0 1 0 1fty ok 1 0 0 0 1 1fty ok 1 0 0 1 0 1

……

ok ok 1 0 0 0 0 1

ok ok 1 0 0 0 1 1

ok ok 1 0 0 1 0 1

ADD with20 nodes,5 leaves

O3A1CEFXYZv0

ExampleExample

After multiplication with max((v1),{A2,G})

ok ok 1 0 0 0 1 1

U=.98012

U=.00990

U=.00492

U=2.4E-5

U=4.9E-5

U=2.5E-7

fty ok 1 0 0 0 1 1ok ok 1 0 0 0 0

1ok ok 1 0 0 1 0 1……

O3A1CEFXYZv0

ExampleExample

After multiplication with max((v2),{O2,B,D})

ok ok 1 0 0 0 1 1

U=.97032

U=.00980

U=.00487

U=4.9E-7U=2.4E-7

U=2.5E-9

fty ok 1 0 0 0 1 1ok fty 1 0 0 0 1

…U=4.9E-5

O3A1CEFXYZv0

ExampleExample

After multiplication with max((v3),{O1,A})

ok ok 1 0 0 0 1 1

U=.00970

U=.00482

U=9.8E-5

U=4.9E-7

U=2.4E-7

U=4.9E-9

ok fty 1 0 0 1 1 1fty ok 1 0 0 0 1

…U=4.8E-5

U=2.4E-9

U=2.5E-11

Best Solution:Umax = .0097

O3A1CEFXYZv0

Pseudocode for Top-Down PhasePseudocode for Top-Down Phase

Function extractSolutions(vroot)E edges(vroot)

(vroot) max(, vars() \ decvars()vars(E))While E Do

e choose(E)v son-node(e)E (E \ e) edges(v)

0-1 (0)

div max(0-1 (v), vars())

( (v)) -1 div max(, vars() \ decvars()vars(E))

End While

“Divisor”

Restrict todecision and

shared variables

No search queue necessary

ExampleExample

Initial = max((vroot),{E,F})

ok ok 1 0 1 1 U=.00970

U=.00482

U=9.8E-5

U=4.9E-7

U=2.4E-7

U=4.9E-9

ok fty 1 1 1 1

fty ok 1 0 1 1

…U=4.8E-5

U=2.4E-9

U=2.5E-11

O3A1CXYZ

ADD with21 tuples, 33 nodes, 10 leaves

ExampleExample

After processing edge(v0,v3)

fty ok ok 1 1 U=.00970

U=.00482

U=9.8E-5

U=4.9E-7

U=2.4E-7

U=4.9E-9

ok ok fty 1 1

fty fty ok 1 1

…U=4.8E-5

U=2.4E-9

U=2.5E-11

O1O3A1YZ

ExampleExample

fty ok ok ok 1 1

U=.00970

U=.00482

U=9.8E-5

U=9.9E-7

U=4.9E-7

ok ok ok fty 1 1fty fty ok ok 1 1fty ok fty ok 1 1

…U=4.8E-5

U=2.5E-11

O1O2O3A1YZ

ExampleExample

fty ok ok ok ok U=.00970

U=.00482

U=9.8E-5

U=9.9E-7

ok ok ok fty okfty fty ok ok okfty ok fty ok ok

…U=4.8E-5

U=2.5E-11

O1O2O3A1A2

ADD with26 tuples,35 nodes, 12 leaves

U=2.4E-5#Solutions = 26

Easy to focus on leading solutions.

Application: Exact ME for POMDPsApplication: Exact ME for POMDPs

Given: POMDP (Feasible States, Observables, Control Actions, Transitions), Observations

Approach: Complete representation of belief state (through decomposition and symbolic encoding)

Benefit: Allows for exploiting Markov property

S1 …Sn

Time t

S1 …Sn

Time t+1

Algorithm: Exact ME for POMDPsAlgorithm: Exact ME for POMDPs

Construct Hypertree (offline) Construct State-ADDs for each node (offline) Construct Transition-ADDs for each node (offline) Repeat for each time step:

– Multiply nodes with Obs-ADDs (“Condition on Observations”)

– Establish consistency in the tree (Bottom-up)– Extract leading solution(s) from the tree (Top-down)

– Multiply nodes with Transition-ADDs, project on xt+1, set xt = xt+1, multiply with State-ADDs (“Transition Expansion”)

Complexity: Polynomial in width of Hypertree

ExampleExample

Adapted from Jim Kurien’s thesis

t0: Sw1.cmd = on t1: Or.out = lo, Sw1.cmd = idl, Sw2.cmd = on t2: Or.out = lo

Switches more likely to fail than Or-Gate

ExampleExample

Switch Model

lo lo lo hihi lohi hi

t1 t2lo lo hi hi

cmd=off

cmd=on

cmd=off,idlcmd=on,idl

ExampleExample

Switch Model

xt t1 t2 on lo loon hi hioff * *fty * *

1.01.01.01.0

xt cmd xt+1 f

on on onon off offon idl onon * ftyoff on onoff off offoff idl offoff * ftyfty * fty

0.950.950.950.050.950.950.950.051.0

ExampleExample

Or-Gate Model

in1 in2 out

lo lo lolo hi hihi lo hihi hi hi

xt in1 in2 outok lo lo look lo hi hiok hi lo hiok hi hi hifty * * *

xt xt+1

ok okok ftyfty fty

1.01.01.01.01.0

0.990.011.0

ExampleExample

Initial belief state (chosen):– p(Sw=on) = p(Sw=off) = 0.475, p(Sw=fty) = 0.05– p(Or=ok) = 0.99, p(Or=fty) = 0.01

Observations/Commands:– t0: Sw1.cmd=on– t1: Or.out=lo, Sw1.cmd=idl, Sw2.cmd=on– t2: Or.out=lo

Leading Solutions:– t0: Sw1=on/off, Sw2=on/off, Or=ok– t1: Sw1=fty, Sw2=off, Or=ok– t2: Sw1=on, Sw2=on, Or=fty

ConclusionConclusion

SCSPs elegant and general representation ADDs encoding of SCSPs efficient in average case,

exponential in the number of variables in worst case Decomposition factors problem into set of ADDs,

each confined to small numbers of variables The two methods complement each other well How far can we get with this combination?

Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Documents

Active Learning in POMDPs

Learning To Grasp Under Uncertainty Using POMDPs

SYLLABUS 2017 - Jingwei Zhu · 2018. 10. 4. · iii. Dynamic programming, value iteration, policy iteration, Trajectory-based algorithms iv. POMDPs (with SARSOP), DEC-POMDPs b. Approximate

Timed Default Concurrent Constraint Programmingvgupta/publications/defaultTcc-jsc.pdfJ. Symbolic Computation (1999) 11, 1–000 Timed Default Concurrent Constraint Programming VIJAY

Distributed POMDPs with Coordination Locales (DPCLs)

Symbolic-Numeric Algorithms for Constraint Solvingalur/NSFSymbolic08/ceberio.pdf · Symbolic-Numeric Algorithms for Constraint Solving: Overview, Challenges, Applications, Future

Partially Observable Markov Decision Processes (POMDPs)robotics.usc.edu/~geoff/cs599/POMDP.pdf · Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential

Grasping POMDPs

Constraint-Based Methods: Adding Algebraic Properties to Symbolic Models

Optimizing Symbolic Model Checking for Constraint-Rich Systems Randal E. Bryant Bwolen Yang, Reid Simmons, David R. O’Hallaron Carnegie Mellon University

RWset: Attacking Path Explosion in Constraint-Based Test ...€¦ · maximum branch coverage • Mixed symbolic/concrete execution (EXE/DART) • Search heuristics – Best First

Optimizing Symbolic Model Checking for Constraint-Rich Models

Symbolic Execution & Constraint Solving

Emotionally Adaptive Intelligent Tutoring Systems using POMDPs

Approximate Planning in Large POMDPs via Reusable Trajectories · Approximate Planning in Large POMDPs ... Approximate Planning in Large POMDPs via Reusable Trajectories ... the agent

Optimizing Symbolic Model Checking for Constraint-Rich Systems

Socialization 4.2-4.4. Three Perspectives Functionalist- emphasizes contributions Conflict- emphasizes conflict, competition, and constraint Symbolic

Cooperative Active Perception using POMDPs · 2008-06-14 · Cooperative Active Perception using POMDPs Matthijs T.J. Spaan Institute for Systems and Robotics Instituto Superior T´ecnico

A Concise Introduction to Decentralized POMDPs › docs › OliehoekAmato16book.pdfFrans A. Oliehoek & Christopher Amato A Concise Introduction to Decentralized POMDPs Author version

Optimal Fixed-Size Controllers for Decentralized POMDPs