A polynomial time algorithm for constructing k-maintainable policies Chitta Baral Arizona State University and Thomas Eiter Vienna University of Technology

A polynomial time algorithm for constructing k-maintainable policies

Chitta BaralArizona State University

andThomas Eiter

Vienna University of Technology

Motivation: What is `maintain’ f?

Always f, also written as □ f- too strong for many kind of maintainability (eg. maintain the room clean)

Always Eventually f, also written as □ ◊ f. - Weak in the sense it does not give an estimate on when f will be made true.

- May not be achievable in presence of continuous interference by belligerent agents.

□ f ------------------ □ ◊k f -------------------------- □ ◊ f

□ ◊3 f is a shorthand for □ ( f V O f V OO f V OOO f )But if an external agent keeps interfering how is one supposed to guarantee □ ◊3 f .k-maintain f: If there is a break from the environment for k steps, then during that the agent will reach a state where f is true.

Motivation: a controller-agent transcript

Controller (to the agent/robot): Your goal is to maintain the room clean.

Robot/Agent: Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions?

Controller: You can only clean when the room is unoccupied.Controller: By ‘maintain’ I mean ALWAYS clean.Robot/Agent: I won’t be able to guarantee that. What if while the room

is occupied some one makes it dirty?Controller: Ok, I understand. How about

ALWAYS EVENTUALLLY clean.Controller’s Boss: ‘Eventually’ is too lenient. We can’t have the room

unclean for too long. We should put some bound.

Controller-agent transcript (cont)

Controller: Sorry, Sir. I should have made it more precise.ALWAYS EVENTUALLY3 clean

Robot/Agent: Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used.

Controller: You have a good point. Let me clarify again.If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time.

Robot/Agent: I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.

Formulating k-maintainability: a system

A system is a quadruple A = (S,A,Ф, poss), where– S is the set of system states;– A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv;

– Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions;

– poss : S → 2 A is a function that describes which actions are possible (by the agent or the environment) in which states.

b

cd

hf

g

a

a’

e

a

a

a

S = {b,c,d,f,g,h}

A = {a, a’, e}

Aag = {a, a’}

Aenv = {e}

Ф : as shown in the pictureposs(b) = {a} when our policy dictates a to be executed at b.

Controls and super-controls

Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions,

– a control policy for A w.r.t. Aag is a partial

function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined.

– a super-control policy for A w.r.t. Aag is a partial function

K : S → 2 Aag such that K(s) is a subset of poss(s)

and K(s) ≠ { } whenever K(s) is defined.

Reachable states and closure

Reachable states R(A,s) from an individual state s: Given a system A = (S,A,Ф, poss) and a state s, R(A, s) is the

smallest set of states that satisfy the following conditions: (i) s is in R(A, s) ; and (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) .

Closure(S,A)of a set of states S:

Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is

defined by Closure(S,A) = Us in S R(A, s) .

b

cd

hf

g

a

a’

e

a

a

a

A = (S,A,Ф, poss)R(A,d) = {d,h}R(A,f) = {f, g, h}Closure({d,f}, A) = {d,f,g,h}

Unfoldk(s,A,K)An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s.

b

cd

hf

g

a

a’

e

a

a

a

Consider policy K : Do action a in states b, c, and d

Unfold3(b,A,K) = { <b,c,d,h>, <b,g>}

Unfold3(c,A,K) = { <c,d,h> }

a

Definition of k-maintainability: the parameters

1. a system A = (S,A,Ф, poss) ,

2. a set Aag ⊆ A of agent actions,

3. set of initial states S 4. a set of desired states E that we want to maintain,5. Maintainability parameter k.

6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and

7. a control K (mapping a relevant part of S to Aag) such that K (s) belongs to poss(s).

Basic IdeaIgnoring interference:

From any state under consideration by following the control policy one should visit E in k steps.

Accounting for interference:Broaden the states under consideration from the initial states to all reachable states due to control and the environment. (Use ``Closure’’.)When using Closure

Account for the control policy.Ignore other agent actions.Also only consider exogenous actions in exo(s).

Definition of k-maintainability possK,exo (s) is the set {K (s)} U exo(s).

AK,exo = (S,A,Ф, possK,exo)

Given a system A = (S,A,Ф, poss), a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset S of S with respect to subset E of S, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ

= s0, s1, . . . , sr in Unfoldk(s,A,K) with s0 = s, it holds that {s0, s1, . . . , sr } ∩ E ≠ { }.

b

cd

hf

g

a

a’

e

a

a

a

Consider policy K: Do action a in states b, c, and d.

poss(b) = {a,a’} possK,exo (b) = {a}

Closure({b,c},A)= {b,c,d,f,g,h}

Closure({b,c},AK,exo)= {b,c,d,h}

b

cd

hf

g

a

a’

e

a

a

a

Goal: 3-maintainable policy for S={b} w.r.t. E={h}

Such a policy: Do a in b, c, and d

b

cd

hf

g

a

a’

e

a

a

ae

Goal: Find 3-maintainable policy for S={b} w.r.t. E={h}

No such policy!

Constructing k-maintainable control policies: pre-formulation attempts

Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols.Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing.Kaelbling and Rosenschein 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.

b

cd

hf

g

a

a’

e

a

a

a

Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack.

Backward Search: Should we include both d and f.

Propositional Encoding of solutions

Input: An input I is a system A= (S, A,Φ, poss), set of goal states E S , set of initial states S S, a set Aag A, a function exo, and an integer k 0

Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is ``NO’’.

AIM: Given input I, construct sat(I) in PTIME s.t.sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, satisfying assignments for sat(I) encode possible such controls, andsat(I) is polynomially solvable.

Propositional encoding: notation

si denotes thatthere is a path from state s to some state in E using only agent actions and at most i of them.

(to which we refer as “there is an a-path from s to E of length at most i”’)

The encoding sat(I)(0) For all states s, and for all j, 0 j <k: sj sj+1

(1) For all initial states s in E : s0

(2) For all states s, t such that Φ(a,s) = t for some action a exo(s): sk tk

(3) For all states s not in E and all i, 1 i k: si t PS(s) ti-1 ,

where PS(s) = {t S | a Aag poss(s): t = Φ(a,s)}

(4) For all initial states not in E: sk

(5) For all states s not in E: s0

Constructing policies from the models of sat(I)

Let M be a model of sat(I).CM = {s S | M╞ sk}

LM (s): the smallest index j such that M╞ sj (i.e., s0, s1 ,…, sj-1 are false and sj is true)

K(s) is defined iff s CM \ E and

K(s) {a Aag | Φ(s,a) = t ,

t CM , LM (t) < LM (s) }

Proposition Let I consist of a system A= (S, Aag, Φ, poss), where Φ is deterministic, a set Aag A, sets of states E S, and S S, an exogenous function exo, and a integer k. Then,

(i) S is k-maintainable w.r.t E iff sat(I) is satisfiable.(ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.

Reverse Encodinga b is equivalent to a b is equivalent to ( b) a is equivalent tob a is equivalent tob’ a’ is equivalent toa’ b’

Rearranging sat(I) to Horn (0) For all states s and for all j, 0 j <k:

sj sj+1 s’j s’j+1

(1) For all initial states s in E: s0 s’0

(2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk'

(3) For all state s not in E and all i, 1 i k:

si tPS(s) ti-1 , s’i ^tPS(s) t’i-1

where

PS(s) = {t S | a Aag poss(s): t = Φ(a,s) };

(4) For all initial states s not in E: sk s’k

(5) For all states not in E: s0 s’0

b

cd

hhf

g

a

a’

e

a

a

a

(6) b’0, c’0, d’0, f’0, g’0 (From 5)(7) g’1, g’2, g’3 (From 3)(8) b’1, c’1 (From 6 and 3)(9) f’3 (From 7 and 2)(10) f’2 (From 9 and 0)(11) f’1 (From 10 and 0)(12) b’2 (From 8, 11, and 3)Thus M = {f’3, f’2, f’1 , f’0, g’3, g’2, g’1 , g’0, b’2, b’1, b’0, c’1, c’0, d’0}LM(b) = 3 LM(c) = 2 LM(d) = 1

Big picture of the algorithm: summary

Initialization about states not in E (5) and states with no agent transitions to compute si’ (3).

Backward reasoning from there using (2) and (3) and downward propagation using (0).Use (1) and (4) for inconsistency detection.

Computation of LM (s).

Use LM (s) to compute the control K(s).

Polynomial time generation of control policy and maximal control

policyHorn satisfiability is a well-known polynomial problem

Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time.

``Maximal Control’’:Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models.MT is computable in linear time in the size of the encoding.MT leads to a maximal control, in the sense that it works on a greatest set S‘ of states w.r.t. E such that S is a subset of S‘ . I.e. robust with respect to increasing S.

Dealing with non-deterministic transition functions

Notation: s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a.

The encoding sat'(I) has again groups (0)-(5) of clauses as follows:

(0), (1), (4) and (5) are the same as in sat(I).

(2) For any state s and t such that t Φ(a,s) for some action a exo(s): sk tk

Dealing with non-deterministic transition functions (cont.)

(3) For every state s not in E and for all i, 1 i k :

(3.1) si (a Aag poss(s)) s_ai;

(3.2) for every a Aag poss(s) and t Φ(s,a) : s_ai ti-1;

(3.3) for every a Aag poss(s) if i < k:

s_ai s_ai+1 ;

Leading to a Horn theory !

Direct algorithm using counters

Idea: c[s] = i means s’0 … s’i and c[s_a] = i means s_a’0 … s_a’i

InitializationFor all states s not in E make s’0 true. c[s]:= 0.For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k.For all states s, if agent action a is not executable in s then make s_a’0 … s_a’k true. c[s_a] := k.

The other steps are similar.The idea can then be extended to actions with durations (or costs).

Computational Complexityk-maintainability is PTIME-complete (under log-space reduction).

PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

k-maintainability is EXPTIME-complete when we have a compact representation (e.g. STRIPS like)

EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

Conclusionk-maintainability is an important notion.

Most specifications over infinite trajectories would be better off with k-maintainability like notions as part of the specification.

Role 1 of k: length of the window of opportunityRole 2 of k: bound within which maintenance is guaranteed

k-maintainability is related to Dijkstra's notion of self-stabilization.There is a big research community of self-stabilization in distributed control and fault tolerance.But they have not much focused on automatic generation of control (protocol, in their parlance)They have focused more on proving correctness of hand written protocol

Sat encoding to Horn logic program encoding – an interesting and fruitful approach to design a polynomial algorithm

One does not often think in terms of negative propositions.We have a prototype implementation using DLV.

THANK YOU!

Documents

A polynomial time algorithm for constructing k-maintainable policies Chitta Baral Arizona State University and Thomas Eiter Vienna University of Technology