CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum

CS343:ArtificialIntelligenceDecisionNetworksandValueofPerfectInformation

Prof.ScottNiekum—TheUniversityofTexasatAustin[TheseslidesbasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]

DecisionNetworks

DecisionNetworks

Weather

Forecast

Umbrella

U

DecisionNetworks

▪ MEU:choosetheactionwhichmaximizestheexpectedutilitygiventheevidence

Weather

Forecast

Umbrella

U

▪ Candirectlyoperationalizethiswithdecisionnetworks▪ Bayesnetswithnodesforutilityandactions▪ Letsuscalculatetheexpectedutilityforeachaction

▪ Newnodetypes:

▪ Chancenodes(justlikeBNs)

▪ Actions(rectangles,cannothaveparents,actasobservedevidence)

▪ Utilitynode(diamond,dependsonactionandchancenodes)

DecisionNetworks

Weather

Forecast

Umbrella

U

▪ Actionselection

▪ Instantiateallevidence

▪ Setactionnode(s)eachpossibleway

▪ Calculateposteriorforallparentsofutilitynode,giventheevidence

▪ Calculateexpectedutilityforeachaction

▪ Choosemaximizingaction

DecisionNetworks

Weather

Umbrella

U

W P(W)

sun 0.7

rain 0.3

Umbrella=leave

Umbrella=take

Optimaldecision=leave

A W U(A,W)

leave sun 100

leave rain 0

take sun 20

take rain 70

DecisionsasOutcomeTrees

▪ Almostexactlylikeexpectimax/MDPs

▪ What’schanged?

U(t,s)

Weather|{} Weather|{}

takeleave

{}

sun

U(t,r)

rain

U(l,s) U(l,r)

rainsunWeather

Umbrella

U

Example:DecisionNetworks

Weather

Forecast=bad

Umbrella

U

A W U(A,W)

leave sun 100

leave rain 0

take sun 20

take rain 70

W P(W|F=bad)

sun 0.34

rain 0.66

Umbrella=leave

Umbrella=take

Optimaldecision=take

DecisionsasOutcomeTrees

U(t,s)

W|{b} W|{b}

takeleave

sun

U(t,r)

rain

U(l,s) U(l,r)

rainsun

{b}

Weather

Forecast=bad

Umbrella

U

GhostbustersDecisionNetwork

GhostLocation

Sensor(1,1)

Bust

U

Sensor(1,2) Sensor(1,3) Sensor(1,n)

Sensor(2,1)

Sensor(m,1) Sensor(m,n)…

…

…

…

Ghostbusters—Wheretomeasure?

ValueofInformation

ValueofInformation

▪ Idea:computevalueofacquiringevidence▪ Canbedonedirectlyfromdecisionnetwork

▪ Example:buyingoildrillingrights▪ TwoblocksAandB,exactlyonehasoil,worthk▪ Youcandrillinonelocation▪ Priorprobabilities0.5each,&mutuallyexclusive▪ DrillingineitherAorBhasEU=k/2,MEU=k/2

▪ Question:what’sthevalueofinformationofO?▪ ValueofknowingwhichofAorBhasoil▪ ValueisexpectedgaininMEUfromnewinfo▪ Surveymaysay“oilina”or“oilinb,”prob0.5each▪ IfweknowOilLoc,MEUisk(eitherway)▪ GaininMEUfromknowingOilLoc?▪ VPI(OilLoc)=k/2▪ Fairpriceofinformation:k/2

OilLoc

DrillLoc

U

D O U

a a k

a b 0

b a 0

b b k

O P

a 1/2

b 1/2

VPIExample:Weather

Weather

Forecast

Umbrella

U

A W U

leave sun 100

leave rain 0

take sun 20

take rain 70

MEUwithnoevidence

MEUifforecastisbad

MEUifforecastisgood

F P(F)

good 0.59

bad 0.41

Forecastdistribution

W P(W)

sun 0.7

rain 0.3

W P(W|F=bad)

sun 0.34

rain 0.66

W P(W|F=good)

sun 0.95

rain 0.05

ValueofInformation

▪ AssumewehaveevidenceE=e.Valueifweactnow:

▪ AssumeweseethatE’=e’.Valueifweactthen:

▪ BUTE’isarandomvariablewhosevalueis unknown,sowedon’tknowwhate’willbe

▪ ExpectedvalueifE’isrevealedandthenweact:

▪ Valueofinformation:howmuchMEUgoesupbyrevealingE’firstthenacting,overactingnow:

VPIProperties

▪ Nonnegative

▪ NonadditiveTypically(butnotalways):

▪ Order-independent

QuickVPIQuestions

▪ Thesoupofthedayiseitherclamchowderorsplitpea,butyouwouldn’tordereitherone.What’sthevalueofknowingwhichitis?

▪ Therearetwokindsofplasticforksatapicnic.Onekindisslightlysturdier.What’sthevalueofknowingwhich?

▪ You’replayingthelottery.Theprizewillbe$0or$100.Youcanplayanynumberbetween1and100(chanceofwinningis1%).Whatisthevalueofknowingthewinningnumber?

ValueofImperfectInformation?

▪ Nosuchthing

▪ Informationcorrespondstotheobservationofanodeinthedecisionnetwork

▪ Ifdatais“noisy”thatjustmeanswedon’tobservetheoriginalvariable,butanothervariablewhichisanoisyversionoftheoriginalone

VPIQuestion

▪ VPI(OilLoc)=k/2

▪ VPI(ScoutingReport)?

▪ VPI(Scout)?

▪ VPI(Scout|ScoutingReport)?

OilLoc

DrillLoc

U

ScoutingReport

Scout

POMDPs

POMDPs

▪ MDPshave:▪ StatesS▪ ActionsA▪ TransitionfunctionP(s’|s,a)(orT(s,a,s’))▪ RewardsR(s,a,s’)

▪ POMDPsadd:▪ ObservationsO▪ ObservationfunctionP(o|s)(orO(s,o))

▪ POMDPsareMDPsoverbelief statesb(distributionsoverS)

a

s

s,a

s,a,s’s’

a

b

b,a

ob’

Example:Ghostbusters

▪ In(static)Ghostbusters:▪ Beliefstatedeterminedbyevidence

todate{e}▪ Treereallyoverevidencesets▪ Probabilisticreasoningneededto

predictwhatnewevidencewillbegained,givenpastevidenceandtheactiontaken

▪ SolvingPOMDPs▪ Oneway:usetruncatedexpectimax

tocomputeapproximatevalueofactions

▪ Whatifyouonlyconsideredbustingoronesensefollowedbyabust?

▪ YougetaVPI-basedagent!

a

{e}

e,a

e’{e,e’}

a

b

b,a

b’

abust

{e}

{e},asense

e’{e,e’}

asense

U(abust,{e})

abust

U(abust,{e,e’})

e’

MoreGenerally

▪ Generalsolutionsmapbelieffunctionstoactions▪ Candivideregionsofbeliefspace(setof

belieffunctions)intopolicyregions(getscomplexquickly)

▪ Canbuildapproximatepoliciesusingdiscretizationmethods

▪ Canfactorbelieffunctionsinvariousways

▪ Overall,POMDPsarevery(actuallyPSPACE)hard

▪ MostrealproblemsarePOMDPs,butwecanrarelysolvetheningeneral!

Documents

CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum