View
47
Download
0
Category
Preview:
DESCRIPTION
Making Simple Decisions. Chapter 16. Topics. Decision making under uncertainty Expected utility Utility theory and rationality Utility functions Multi-attribute utility functions Preference structures Decision networks Value of information. Uncertain Outcome of Actions. - PowerPoint PPT Presentation
Citation preview
Making Simple Decisions
Chapter 16
Topics• Decision making under uncertainty
– Expected utility
– Utility theory and rationality
– Utility functions
– Decision networks
– Value of information
Uncertain Outcome of ActionsUncertain Outcome of Actions
• Some actions may have uncertain outcomes– Action: spend $10 to buy a lottery which pays $10,000 to the
winner
– Outcome: {win, not-win}
• Each outcome is associated with some merit (utility)– Win: gain $9990
– Not-win: lose $10
• There is a probability distribution associated with the outcomes of this action (0.0001, 0.9999).
• Should I take this action?
Expected UtilityExpected Utility
• Random variable X with n values x1,…,xn and distribution (p1,…,pn)– X is the outcome of performing action A (i.e., the state reached
after A is taken)• Function U of X
– U is a mapping from states to numerical utilities (values)• The expected utility of performing action A is
EU[A] = i=1,…,n p(xi|A)U(xi)
Utility of each outcomeProbability of each outcome
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
EU(A1) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62
One State/One Action ExampleOne State/One Action Example
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
A2
s40.2 0.8
80
• EU(A1) = 62• EU(A2) = 74
One State/Two Actions ExampleOne State/Two Actions Example
MEU Principle• Decision theory: A rational agent should choose the action
that maximizes the agent’s expected utility
• Maximizing expected utility (MEU) is a normative criterion for rational choices of actions
• Must have complete model of:– Actions
– Utilities
– States
Decision networks
• Extend Bayesian nets to handle actions and utilities– a.k.a. influence diagrams
• Make use of Bayesian net inference
• Useful application: Value of Information
Decision network representation
• Chance nodes: random variables, as in Bayesian nets
• Decision nodes: actions that decision maker can take
• Utility/value nodes: the utility of the outcome state.
Airport example
Airport example II
Evaluating decision networks
• Set the evidence variables for the current state.
• For each possible value of the decision node (assume just one):– Set the decision node to that value.
– Calculate the posterior probabilities for the parent nodes of the utility node, using BN inference.
– Calculate the resulting expected utility for the action.
• Return the action with the highest expected utility.
Exercise: Umbrella network
Weather
Forecast
Umbrella
Happiness
take/don’t take
f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2
P(rain) = 0.4
U(lug, rain) = -25U(lug, ~rain) = 0U(~lug, rain) = -100U(~lug, ~rain) = 100
Lug umbrella
P(lug|take) = 1.0P(~lug|~take)=1.0
Value of Perfect Information (VPI)• How much is it worth to observe (with certainty) a random
variable X?• Suppose the agent’s current knowledge is E. The value of the current
best action is:EU(α | E) = maxA ∑i U(Resulti(A)) p(Resulti(A) | E, Do(A))
• The value of the new best action after observing the value of X is: EU(α’ | E,X) = maxA ∑i U(Resulti(A)) p(Resulti(A) | E, X, Do(A))
• …But we don’t know the value of X yet, so we have to sum over its possible values
• The value of perfect information for X is therefore: VPI(X) = ( ∑k p(xk | E) EU(αxk | xk, E)) – EU (α | E)
Probability ofeach value of X
Expected utilityof the best actiongiven that value of X
Expected utilityof the best actionif we don’t know X(i.e., currently)
VPI exercise: Umbrella network
Weather
Forecast
Umbrella
Happiness
take/don’t take
f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2
P(rain) = 0.4
U(lug, rain) = -25U(lug, ~rain) = 0U(~lug, rain) = -100U(~lug, ~rain) = 100
Lug umbrella
P(lug|take) = 1.0P(~lug|~take)=1.0
What’s the value of knowing the weather forecast before leaving home?
Information gathering agent
• Using VPI we can design an agent that gathers information (greedily)
function INFORMATION-GATHERING-AGENT (percept) return an action Persistent D a decision network
integrate percept into D j = the value that maximizes VPI(Ej) / Cost(Ej) // or VPI(Ej) - Cost(Ej) if VPI(Ej) > Cost(Ej)
return REQUEST(Ej) else
return the best action from D
Recommended