Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
CS343:ArtificialIntelligenceDecisionNetworksandValueofPerfectInformation
Prof.ScottNiekum—TheUniversityofTexasatAustin[TheseslidesbasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]
DecisionNetworks
DecisionNetworks
Weather
Forecast
Umbrella
U
DecisionNetworks
▪ MEU:choosetheactionwhichmaximizestheexpectedutilitygiventheevidence
Weather
Forecast
Umbrella
U
▪ Candirectlyoperationalizethiswithdecisionnetworks▪ Bayesnetswithnodesforutilityandactions▪ Letsuscalculatetheexpectedutilityforeachaction
▪ Newnodetypes:
▪ Chancenodes(justlikeBNs)
▪ Actions(rectangles,cannothaveparents,actasobservedevidence)
▪ Utilitynode(diamond,dependsonactionandchancenodes)
DecisionNetworks
Weather
Forecast
Umbrella
U
▪ Actionselection
▪ Instantiateallevidence
▪ Setactionnode(s)eachpossibleway
▪ Calculateposteriorforallparentsofutilitynode,giventheevidence
▪ Calculateexpectedutilityforeachaction
▪ Choosemaximizingaction
DecisionNetworks
Weather
Umbrella
U
W P(W)
sun 0.7
rain 0.3
Umbrella=leave
Umbrella=take
Optimaldecision=leave
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
DecisionsasOutcomeTrees
▪ Almostexactlylikeexpectimax/MDPs
▪ What’schanged?
U(t,s)
Weather|{} Weather|{}
takeleave
{}
sun
U(t,r)
rain
U(l,s) U(l,r)
rainsunWeather
Umbrella
U
Example:DecisionNetworks
Weather
Forecast=bad
Umbrella
U
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
W P(W|F=bad)
sun 0.34
rain 0.66
Umbrella=leave
Umbrella=take
Optimaldecision=take
DecisionsasOutcomeTrees
U(t,s)
W|{b} W|{b}
takeleave
sun
U(t,r)
rain
U(l,s) U(l,r)
rainsun
{b}
Weather
Forecast=bad
Umbrella
U
GhostbustersDecisionNetwork
GhostLocation
Sensor(1,1)
Bust
U
Sensor(1,2) Sensor(1,3) Sensor(1,n)
Sensor(2,1)
Sensor(m,1) Sensor(m,n)…
…
…
…
Ghostbusters—Wheretomeasure?
ValueofInformation
ValueofInformation
▪ Idea:computevalueofacquiringevidence▪ Canbedonedirectlyfromdecisionnetwork
▪ Example:buyingoildrillingrights▪ TwoblocksAandB,exactlyonehasoil,worthk▪ Youcandrillinonelocation▪ Priorprobabilities0.5each,&mutuallyexclusive▪ DrillingineitherAorBhasEU=k/2,MEU=k/2
▪ Question:what’sthevalueofinformationofO?▪ ValueofknowingwhichofAorBhasoil▪ ValueisexpectedgaininMEUfromnewinfo▪ Surveymaysay“oilina”or“oilinb,”prob0.5each▪ IfweknowOilLoc,MEUisk(eitherway)▪ GaininMEUfromknowingOilLoc?▪ VPI(OilLoc)=k/2▪ Fairpriceofinformation:k/2
OilLoc
DrillLoc
U
D O U
a a k
a b 0
b a 0
b b k
O P
a 1/2
b 1/2
VPIExample:Weather
Weather
Forecast
Umbrella
U
A W U
leave sun 100
leave rain 0
take sun 20
take rain 70
MEUwithnoevidence
MEUifforecastisbad
MEUifforecastisgood
F P(F)
good 0.59
bad 0.41
Forecastdistribution
W P(W)
sun 0.7
rain 0.3
W P(W|F=bad)
sun 0.34
rain 0.66
W P(W|F=good)
sun 0.95
rain 0.05
ValueofInformation
▪ AssumewehaveevidenceE=e.Valueifweactnow:
▪ AssumeweseethatE’=e’.Valueifweactthen:
▪ BUTE’isarandomvariablewhosevalueis unknown,sowedon’tknowwhate’willbe
▪ ExpectedvalueifE’isrevealedandthenweact:
▪ Valueofinformation:howmuchMEUgoesupbyrevealingE’firstthenacting,overactingnow:
VPIProperties
▪ Nonnegative
▪ NonadditiveTypically(butnotalways):
▪ Order-independent
QuickVPIQuestions
▪ Thesoupofthedayiseitherclamchowderorsplitpea,butyouwouldn’tordereitherone.What’sthevalueofknowingwhichitis?
▪ Therearetwokindsofplasticforksatapicnic.Onekindisslightlysturdier.What’sthevalueofknowingwhich?
▪ You’replayingthelottery.Theprizewillbe$0or$100.Youcanplayanynumberbetween1and100(chanceofwinningis1%).Whatisthevalueofknowingthewinningnumber?
ValueofImperfectInformation?
▪ Nosuchthing
▪ Informationcorrespondstotheobservationofanodeinthedecisionnetwork
▪ Ifdatais“noisy”thatjustmeanswedon’tobservetheoriginalvariable,butanothervariablewhichisanoisyversionoftheoriginalone
VPIQuestion
▪ VPI(OilLoc)=k/2
▪ VPI(ScoutingReport)?
▪ VPI(Scout)?
▪ VPI(Scout|ScoutingReport)?
OilLoc
DrillLoc
U
ScoutingReport
Scout
POMDPs
POMDPs
▪ MDPshave:▪ StatesS▪ ActionsA▪ TransitionfunctionP(s’|s,a)(orT(s,a,s’))▪ RewardsR(s,a,s’)
▪ POMDPsadd:▪ ObservationsO▪ ObservationfunctionP(o|s)(orO(s,o))
▪ POMDPsareMDPsoverbelief statesb(distributionsoverS)
a
s
s,a
s,a,s’s’
a
b
b,a
ob’
Example:Ghostbusters
▪ In(static)Ghostbusters:▪ Beliefstatedeterminedbyevidence
todate{e}▪ Treereallyoverevidencesets▪ Probabilisticreasoningneededto
predictwhatnewevidencewillbegained,givenpastevidenceandtheactiontaken
▪ SolvingPOMDPs▪ Oneway:usetruncatedexpectimax
tocomputeapproximatevalueofactions
▪ Whatifyouonlyconsideredbustingoronesensefollowedbyabust?
▪ YougetaVPI-basedagent!
a
{e}
e,a
e’{e,e’}
a
b
b,a
b’
abust
{e}
{e},asense
e’{e,e’}
asense
U(abust,{e})
abust
U(abust,{e,e’})
e’
MoreGenerally
▪ Generalsolutionsmapbelieffunctionstoactions▪ Candivideregionsofbeliefspace(setof
belieffunctions)intopolicyregions(getscomplexquickly)
▪ Canbuildapproximatepoliciesusingdiscretizationmethods
▪ Canfactorbelieffunctionsinvariousways
▪ Overall,POMDPsarevery(actuallyPSPACE)hard
▪ MostrealproblemsarePOMDPs,butwecanrarelysolvetheningeneral!