55
Decision Making Decision Making in Robots and Autonomous in Robots and Autonomous Agents Agents Preferences and Elicitation (Based on material from C. Boutilier et al.) Subramanian Ramamoorthy School of Informatics 22 March, 2013

Subramanian Ramamoorthy School of Informatics 22 March, 2013

  • Upload
    lok

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Decision Making in Robots and Autonomous Agents Preferences and Elicitation (Based on material from C. Boutilier et al. ). Subramanian Ramamoorthy School of Informatics 22 March, 2013. Preference Elicitation in AI. Shopping for a Car:. Luggage Capacity? Two Door? Cost? Engine Size? - PowerPoint PPT Presentation

Citation preview

Page 1: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Decision MakingDecision Makingin Robots and Autonomous in Robots and Autonomous

AgentsAgents

Preferences and Elicitation(Based on material from C. Boutilier et al.)

Subramanian RamamoorthySchool of Informatics

22 March, 2013

Page 2: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Preference Elicitation in AILuggage Capacity?Two Door? Cost?Engine Size?Color? Options?

Shopping for a Car:

22/03/2013 2

Page 3: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Intelligence

Readiness

Objective conditions

Political situation

Budget limitations

Campaign-level goals

Experience

Beliefs

Strategic and tactical intentions and preferences

Outcome evaluation for alternative decisions

Proposals of preferable courses of actions

Better decision

Beliefs and preference refinement through

incremental questioning

Monitor and filter information

Adapt presentation

22/03/2013 3[Source: R. Brafman]

Page 4: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

The Preference Bottleneck

• Preference elicitation: the process of determining a user’s preferences/utilities to the extent necessary to make a decision on her behalf

• Why a bottleneck?– preferences vary widely– large (multiattribute) outcome spaces– quantitative utilities (the “numbers”) difficult to assess

22/03/2013 4

Page 5: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Shirt

Trousers

::::

black black red whiteblack white white redwhite black white redwhite white red white

Jacketblack white black white

22/03/2013 5

Page 6: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

worst

best22/03/2013 6

Page 7: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Automated Preference Elicitation

• The interesting questions:– decomposition of preferences– what preference info is relevant to the task at hand?– when is the elicitation effort worth the improvement it

offers in terms of decision quality?– what decision criterion to use given partial utility info?

22/03/2013 7

Page 8: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Overview

• Preference Elicitation in A.I.

– Constraint-based Optimization

– Factored Utility Models

– Types of Uncertainty

– Types of Queries

• Single User Elicitation

22/03/2013 8

Page 9: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Constraint-based Decision Problems

• Constraint-based optimization (CBO):

– outcomes over variables X = {X1 … Xn}

– constraints C over X spell out feasible decisions

– generally compact structure, e.g., X1 & X2 ¬ X3

– add a utility function u: Dom(X) → R – preferences over configurations

22/03/2013 9

Page 10: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Constraint-based Decision Problems

• Must express u compactly like C– generalized additive independence (GAI)

• model proposed by Fishburn (1967) • nice generalization of additive linear models

– given by graphical model capturing independence

22/03/2013 10

Page 11: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

11

Factored Utilities: GAI Models• Set of K factors fk over subset of vars X[k]

– “local” utility for each local configuration

• [Fishburn67] u in this form exists iff– lotteries p and q are equally preferred whenever p and q

have the same marginals over each X[k]

Kkk kfu )xx) ][(( A Bf1(A)

a: 3a: 1

f2(B)b: 3b: 1

Cf3(BC) bc: 12 bc: 2…

u(abc) = f1(a)+ f2(b)+ f3(bc)

22/03/2013

Page 12: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

12

Optimization with GAI Models

• Optimize using simple Integer Program– number of variables linear in size of GAI model

Kk

kfu k )xx) ][((A B

f1(A)a: 3a: 1

f2(B)b: 3b: 1

C f3(BC) bc: 12 bc: 2…

CAIukDomkkik

kkXI , tosubj. max

])[(][][][][},{

Xxxxx

22/03/2013

Page 13: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Difficulties in CBO

• Utility elicitation: how do we assess individual user preferences?– need to elicit GAI model structure (independence)– need to elicit (constraints on) GAI parameters– need to make decisions with imprecise parameters

22/03/2013 13

Page 14: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Strict Utility Function Uncertainty

• User’s actual utility u unknown• Assume feasible set F U = [0,1]n

– allows for unquantified or “strict” uncertainty– e.g., F a set of linear constraints on GAI terms

• How should one make a decision? elicit information?

u(red,2door,280hp) > 0.4u(red,2door,280hp) > u(blue,2door,280hp)

22/03/2013 14

Page 15: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

f2(L,N)l,n: [2,4]l,n: [1,2]

Strict Uncertainty Representation

L

NP

Utility Function

f1(L)l: [7,11]l: [2,5]

22/03/2013 15

Page 16: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Bayesian Utility Function Uncertainty

• User’s actual utility u unknown• Assume density P over U = [0,1]n

• Given belief state P, EU of decision x is:EU(x , P) = U (px . u) P( u ) du

• Decision making is easy, but elicitation harder?– must assess expected value of information in query

22/03/2013 16

Page 17: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

f2(L,N)l,n: l,n: …

Bayesian Representation

L

NP

Utility Function

f1(L)l: l:

22/03/2013 17

Page 18: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Query Types

• Comparison queries (is x preferred to x’ ?)– impose linear constraints on parameters

• k fk(x[k]) > k fk(x’[k]) – Interpretation is straightforward

U

22/03/2013 18

Page 19: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Query Types

• Bound queries (is fk(x[k]) > v ?)– response tightens bound on specific utility parameter– can be phrased as a local standard gamble query

U

22/03/2013 19

Page 20: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Overview

• Preference Elicitation in A.I.

• Single User Elicitation

– Foundations of Local queries

– Bayesian Elicitation

– Regret-based Elicitation

22/03/2013 20

Page 21: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Difficulties with Bound Queries• Bound queries focus on local factors

– but values cannot be fixed without reference to others!– seemingly “different” local prefs correspond to same u

u(Color,Doors,Power) = u1(Color,Doors) + u2(Doors,Power)

u(red,2door,280hp) = u1(red,2door) + u2(2door,280hp)

u(red,4door,280hp) = u1(red,4door) + u2(4door,280hp)

10 6 4

6 3 3

1 9

22/03/2013 21

Page 22: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Local Queries

• We wish to avoid queries on whole outcomes– can’t ask purely local outcomes– but can condition on a subset of default values

• Conditioning set C(f) for factor fi(Xi) :– variables that share factors with Xi

– setting default outcomes on C(f) renders Xi independent of remaining variables

– enables local calibration of factor values

22/03/2013 22

Page 23: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Local Standard Gamble Queries

• Local std. gamble queries– use “best” and “worst” (anchor) local outcomes

-- conditioned on default values of conditioning set– bound queries on other parameters relative to these– gives local value function v(x[i]) (e.g., v(ABC) )

• Hence we can legitimately ask local queries:

• But local Value Functions not enough: – must calibrate: requires global scaling

22/03/2013 23

Page 24: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Global Scaling• Elicit utilities of anchor outcomes wrt global best and worst

outcomes

– the 2*m “best” and “worst” outcomes for each factor

– these require global std gamble queries (note: same is true for pure additive models)

22/03/2013 24

Page 25: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Bound Query Strategies

• Identify conditioning sets Ci for each factor fi

• Decide on “default” outcome• For each fi identify top and bottom anchors

– e.g., the most and least preferred values of factor i– given default values of Ci

• Queries available:– local std gambles: use anchors for each factor, C-sets– global std gambles: gives bounds on anchor utilities

22/03/2013 25

Page 26: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Overview

• Preference Elicitation in A.I.

• Single User Elicitation

– Foundations of Local queries

– Bayesian Elicitation

– Regret-based Elicitation

22/03/2013 26

Page 27: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Partial preference informationBayesian uncertainty

• Probability distribution p over utility functions• Maximize expected (expected) utility

MEU decision x* = arg maxx Ep [u(x)]• Consider:

– elicitation costs– values of possible decisions– optimal tradeoffs between elicitation effort and

improvement in decision quality

22/03/2013 27

Page 28: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Query Selection

• At each step of elicitation process, we can– obtain more preference information– make or recommend a terminal decision

22/03/2013 28

Page 29: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Bayesian ApproachMyopic EVOI

MEU(p)

q1q2

...

r1,1 r2,1r1,2 r2,2

...

MEU(p1,1) MEU(p1,2) MEU(p2,1) MEU(p2,2)22/03/2013 29

Page 30: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Expected value of information

• MEU(p) = Ep [u(x*)]

• Expected posterior utility: EPU(q,p) = Er|q,p [MEU(pr)]• Expected value of information of query q:

EVOI(q) = EPU(q,p) – MEU(p)

MEU(p)

q1q2 ...

r1,1 r2,1r1,2 r2,2

...MEU(p1,1) MEU(p1,2) MEU(p2,1) MEU(p2,2)

22/03/2013 30

Page 31: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Bayesian ApproachMyopic EVOI

• Ask query with highest EVOI - cost• [Chajewska et al ’00]

– Global standard gamble queries (SGQ) “Is u(oi) > l?”– Multivariate Gaussian distributions over utilities

• [Braziunas and Boutilier ’05]– Local SGQ over utility factors– Mixture of uniforms distributions over utilities

22/03/2013 31

Page 32: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Local elicitation in GAI models [Braziunas and Boutilier ’05]

• Local elicitation procedure– Bayesian uncertainty over local factors– Myopic EVOI query selection

• Local comparison query “Is local value of factor setting xi greater than l”?– Binary comparison query– Requires yes/no response– query point l can be optimized analytically

22/03/2013 32

Page 33: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Experiments

• Car rental domain: 378 parameters [Boutilier et al. ’03]– 26 variables, 2-9 values each, 13 factors

• 2 strategies– Semi-random query

• Query factor and local configuration chosen at random• Query point set to the mean of local value function

– EVOI query• Search through factors and local configurations• Query point optimized analytically

22/03/2013 33

Page 34: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Experiments

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

Mixture of UniformsUniformGaussian

No. of queries

Percentage utility error (w.r.t. truemax utility)

22/03/2013 34

Page 35: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Overview

• Single User Elicitation

– Foundations of Local queries

– Bayesian Elicitation

– Regret-based Elicitation

• why MiniMax Regret (MMR) ?

• Decision making with MMR

• Elicitation with MMR

22/03/2013 35

Page 36: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Minimax Regret: Utility Uncertainty

• Regret of x w.r.t. u:

• Max regret of x w.r.t. F:

• Decision with minimax regret w.r.t. F:

),(),(),( * uxEUuxEUuxR u

),(max),( uxRFxMRFu

),()(;),(minarg *

)(

* FxMRFMMRFxMRx FXFeasx

F

22/03/2013 36

Page 37: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Why Minimax Regret?*

• Appealing decision criterion for strict uncertainty– contrast maximin, etc.– not often used for utility uncertainty

x

x’

x’x

x

x’x’x

xx’

x

x’

Bett

er

u1 u2 u3 u4 u5 u6

22/03/2013 37

Page 38: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Why Minimax Regret?

• Minimizes regret in presence of adversary– provides bound worst-case loss– robustness in the face of utility function uncertainty

• In contrast to Bayesian methods:– useful when priors not readily available– can be more tractable– effective elicitation even if priors available

22/03/2013 38

Page 39: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Overview

• Single User Elicitation

– Foundations of Local queries

– Bayesian Elicitation

– Regret-based Elicitation

• why MiniMax Regret (MMR) ?

• Decision making with MMR

• Elicitation with MMR

22/03/2013 39

Page 40: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Computing Max Regret

• Max regret MR(x,F) computed as an IP– number of variables linear in GAI model size– number of (precomputed) constants (i.e., local regret

terms) quadratic in GAI model size

– r( x[k] , x’[k] ) = uT (x’[k] ) – u (x[k] )

CAIrk k

kkkXI ik , tosubj. max

]['][']['][}',{ ]['

x

xxxx

22/03/2013 40

Page 41: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Minimax Regret in Graphical Models

• We convert minimax to min (standard trick)– obtain a MIP with one constraint per feasible config– linearly many vars (in utility model size)

• Key question: can we avoid enumerating all x’ ?

22/03/2013 41

Page 42: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Constraint Generation

• Very few constraints will be active in solution

• Iterative approach: – solve relaxed IP (using a subset of constraints)– Solve for maximally violated constraint– if any add it and repeat; else terminate

22/03/2013 42

Page 43: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Constraint Generation Performance• Key properties:

– aim: graphical structure permits practical solution– convergence (usually very fast, few constraints)– very nice anytime properties – considerable scope for approximation– produces solution x* as well as witness xw

22/03/2013 43

Page 44: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Overview

• Single User Elicitation

– Foundations of Local queries

– Bayesian Elicitation

– Regret-based Elicitation

• why MiniMax Regret (MMR) ?

• Decision making with MMR

• Elicitation with MMR

22/03/2013 44

Page 45: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Regret-based Elicitation[Boutilier, Patrascu, Poupart, Schuurmans IJCAI05; AIJ 06]

• Minimax optimal solution may not be satisfactory• Improve quality by asking queries

– new bounds on utility model parameters• Which queries to ask?

– what will reduce regret most quickly?– myopically? sequentially?

• Closed form solution seems infeasible– to date people have looked only at heuristic elicitation

22/03/2013 45

Page 46: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Elicitation Strategies I• Halve Largest Gap (HLG)

– ask if parameter with largest gap > midpoint– MMR(U) ≤ maxgap(U), hence nlog(maxgap(U)/)

queries needed to reduce regret to – bound is tight

f1(a,b) f1(a,b) f1(a,b) f1(a,b) f2(b,c) f2(b,c) f2(b,c) f2(b,c)22/03/2013 46

Page 47: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Elicitation Strategies II• Current Solution (CS)

– only ask about parameters of optimal solution x* or regret-maximizing witness xw

– intuition: focus on parameters that contribute to regret• reducing u.b. on xw or increasing l.b. on x* helps

– use early stopping to get regret bounds (CS-5sec)

f1(a,b) f1(a,b) f1(a,b) f1(a,b) f2(b,c) f2(b,c) f2(b,c) f2(b,c)22/03/2013 47

Page 48: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Elicitation Strategies III• Optimistic-pessimistic (OP)

– query largest-gap parameter in one of:• optimistic solution xo

• pessimistic solution xp

• Computation:– CS needs minimax optimization– OP needs standard optimization– HLG needs no optimization

• Termination:– CS easy– Others ?

22/03/2013 48

Page 49: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Results (Small Random)

10vars; < 5 vals

10 factors, at most 3 vars

Avg 45 trials

22/03/2013 49

Page 50: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Results (Car Rental, Unif)

26 vars; 61 billion configs

36 factors, at most 5 vars; 150 parameters

Avg 45 trials

22/03/2013 50

Page 51: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Results (Real Estate, Unif)

20 vars; 47 million configs

29 factors, at most 5 vars; 100 parameters

Avg 45 trials

22/03/2013 51

Page 52: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Results (Large Rand, Unif)

25 vars; < 5 vals

20 factors, at most 3 vars

A 45 trials

22/03/2013 52

Page 53: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Summary of Results

• CS works best on test problems– time bounds (CS-5): little impact on query quality– always know max regret (or bound) on solution– time bound adjustable (use bounds, not time)

• OP competitive on most problems– computationally faster (e.g., 0.1s vs 14s on RealEst)– no regret computed so termination decisions harder

• HLG much less promising

22/03/2013 53

Page 54: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Interpretation

• HLG: – provable regret reduced very quickly

• But:– true regret faster (often to optimality)– OP and CS restricted to feasible decisions– CS focuses on relevant parameters

22/03/2013 54

Page 55: Subramanian  Ramamoorthy School of Informatics 22 March, 2013

Concluding Remarks

• Local parameter elicitation– Theoretically sound– Computationally practical– Easier to answer

• Bayesian EVOI / Regret-based elicitation– Good guides for elicitation– Integrated in computationally tractable algorithms

• Questions for Future Work:– Sequential reasoning

22/03/2013 55