Assignment Unit 7 Ia

ASSIGNMENT UNIT 7

UOPEOPLE.EDU

S100946

1 Directed Questions

• The STRIPS representation for an action consists of what? Answer: Preconditions - a set of assignments of values to variables that must be true for

the action to occur. Effects - a set of resulting assignments of values to those variables

that change as the result of the action.

• What is the STRIPS assumption?

Answer: All of the variables not mentioned in the describtion of an action stay unchanged

when the action is carried out.

• What is the frame problem in planning? How does it relate to the STRIPS assumption? Answer: The frame problem is the problem of representing all things that stay unchanged.

This is important because most actions affect only a small fraction of variables, e.g. filling a cup with coffee changes the state of the cup and of the pot but not the location of the

robot, the layout of the building, etc. The STRIPS assumption just says that all variables

not mentioned in the description of an action remain unchanged.

• What are some key limitations of STRIPS?

Answer: States are represented simply as a conjuction of positive literals, e.g. poor ∧

unknown, goals are conjunctions (no disjunction allowed), no support for equality.

2 STRIPS planning

Consider a scenario where you want to get from home (off campus) to UBC during a bus strike.

You can either drive (if you have a car) or bike (if you have a bike). How would you represent

this in STRIPS?

(a) What are the actions, preconditions and effects? What are the relevant variables?

Answer: The actions could be something like goByBike and goByCar. In a very simple representation, there are variables loc, haveBike, and haveCar, indicating location,

whether or not you have a bike (t/f), and whether or not you have a car (t/f). The precon-

dition for goByBike is that haveBike = true, and likewise the precondition for goByCar is

that haveCar = true. The effect of each action is that loc = UBC.Figure ?? shows this

representation.

(b) If we select the action goByBike, what is the value of haveBike after the action has been

carried out.

Answer: It will equal true, as it had to be true for the action to take place, and since it is

not mentioned in the action effects its value will be unchanged.

Figure 1: Simple STRIPS commuting problem

(c) If we are at UBC and and select the action goByCar, what will the value of loc be after

the action has been carried out?

Answer: After the action loc = UBC as this is a specified effect. Notice that there is no

loc precondition for action, so if you begin at UBC or at home and select either action,

you will wind up at UBC.


• What is meant by the horizon in a planning problem? Answer: The number of time steps for which the problem is “rolled out.”

• What are initial state constraints in a CSP problem? Answer: They constrain the state variables at time 0, i.e. before any action has occurred.

• What are goal constraints? Answer: They constrain the state variables at some time k, where k is the horizon.

• What are precondition constraints?

Answer: They are constraints between state variables at time t and actions at time t. In

other words, they specify what must hold for an action to take place.

• What are effect constraints?

Answer: They are constraints between state variables at time t, actions at time t and

state variables at time t + 1. In other words, the state variable at time t + 1 is affected by

the actions at time t and its own previous value at time t.

2 CSP planning

There’s a big football game tonight, and you can’t miss it. You’re trying to decide whether to

watch it in person or on TV. Watching it in person requires having some money for a ticket.

Watching it on TV is only possible if you have a TV and there isn’t a local television blackout

on the game. If you need money for a ticket, you can always sell your TV.

Figure 1 shows a CSP representation for this planning problem where the goal is to watch the game.

• What are the actions? Answer: watchAtPark, watchAtHome, sellTV

• What are the state variables? Answer: haveMoney, haveTV, blackout, sawGame

• What is the horizon shown in Figure 1? Answer: The horizon is 1.

• Give the truth tables for the precondition constraint for action watchAtPark (labelled p1 s0

in the figure) and the effect constraint between blackout at step 0 and blackout at step 1

(labelled e3 s1).

Answer:

For p1 s0:


• What is meant by a one-off decision? How can this be applied in the delivery robot example? Answer: The agent knows which actions are available, has preferences expressed

by utilities of outcomes, and makes all the decisions before any action is carried out. In

the delivery robot example, the decisions on wearing pads and taking the long or short

route are made before the robot goes anywhere. Multiple decisions can be considered as a

single macro decision.

• Define utility in a decision problem. Answer: The utility is a measure of desirability of

possible worlds to an agent, i.e. indicates the agent’s preferences. Let U be a real-valued

function such that U(w) represents an agent’s degree of preference for world w. The value

of a utility is typically between 0 and 100.

• How do we calculate the expected utility of a decision? Answer: The expected utility is

derived by summing over the possible worlds that select that decision, for each world w

multiplying U(w) by P(w).

• How do we compute an optimal one-off decision? Answer: If we calculate the expected utility for each decision as per the last question, we choose the decision that maximizes

the expected utility.

• What are the three types of nodes in a single-stage decision network? Answer: Decision

nodes, random variables (chance nodes), and utility nodes

• What is a policy for a single-stage decision network? What is an optimal policy? Answer:

A policy for a single-stage decision network is an assignment of a value to each decision

variable. The optimal policy is the policy whose expected utility is maximal.

• Describe the variable elimination steps for finding an optimal policy for a single-stage

decision network. Answer: Prune all the nodes that are not ancestors of the utility node.

Sum out all the chance nodes. There will be a single factor F remaining that represents the expected utility for each combination of decision variables. If v is the maximum value

in F, return the assignment d that gives that maximum value v.

2 A One-Off Decision

You are preparing to go for a bike ride and are trying to decide whether to use your thin road

tires or your thicker, knobbier tires. You know from previous experience that your road tires are

more likely to go flat during a ride. There’s a 40% chance your road tires will go flat but only a

10% chance that the thicker tires will go flat.

Because of the risk of a flat, you also have to decide whether or not to bring your tools along on

the ride (a pump, tire levers and a puncture kit). These tools will weigh you down.

The advantage of the thin road tires is that you can ride much faster. The table below gives the

utilities for these variables:

bringTools flatTire bringRoadTires Satisfaction

T T T 50.0 T T F 40.0

T F T 75.0

T F F 65.0

F T T 0.0

F T F 0.0

F F T 100.0

F F F 75.0

• Create the decision network representing this problem, using AISpace. Answer: An

example is given in xml file bikeride tires flat tools.xml.

Figure 1: A decision problem.

• Use variable elimination to find the optimal policy.

– What are the initial factors? Answer: There are two factors to begin with, one

representing p(flatTire|roadTires) and one representing the utilities.

– Specify your elimination ordering and give each step of the VE algorithm. Answer: We sum out our chance node flatTire first. This results in a new factor on the

decisions. We eliminate roadTires by maximizing that decision variable for each

value of bringTools. This leaves one factor on bringTools. We maximize bringTools

in that final factor to get our answer.

• What is the optimal policy? What is the expected value of the optimal policy? Answer:

The optimal policy is to take the thicker tires and leave the tools at home. The expected

utility of this policy is 67.5

• Try changing the utilities and the probabilities in this problem, and identify which changes

result in a different optimal policy. Answer: There are many possibilities here, e.g.

changing the probability of a flat tire given the tire type, or decreasing the utilities for the

two possible worlds FFT (currently 100) and FFF (currently 75).

Figure 1: CSP representation for viewing the game

haveMoney s0 watchAtPark s0 p1 s0

true true true

true false true

false true false

false false true

For e3 s1:

blackout s0 blackout s0 e3 s1

true true true

true false false

false true false

false false true

• What is the minimum horizon needed to achieve the goal, if the start constraints specify that

you have no money and that there is a TV blackout?

Answer: A horizon of 2. At step 1 you sell the TV and at step 2 you watch the game in

person.


• How is a sequential decision problem different from a one-off decision problem? Answer:

In a one-off decision problem, even if there are multiple decisions to make they can be

treated as a single macro decision. That macro decision is made before any action is

carried out. With a sequential decision problem, the agent makes observations, decides on

an action, carries out the actions, makes some more observations in the resulting world,

then makes more decisions conditioned on the new observations, etc.

• What types of variables are contained in a decision network? Answer: Chance nodes

(random variables), decision nodes, and a utility node.

• What can arcs represent in a decision network? Relate this to the types of variables in the

previous question. Answer: Arcs coming into decision nodes represent the information

that will be available when the decision is made. Arcs coming into chance nodes represent

probabilistic dependence. Arcs coming into the utility node represents what the utility

depends on.

• What is a no-forgetting decision network? Answer: It is a decision network where the

decision nodes are totally ordered and, if decision node Di is before Dj in the total ordering,

then Di is a parent of Dj, and any parent of Di is also a parent of Dj. This means that all

the information available for the earlier decision is available for the later decision, and the

earlier decision is part of the information available for the later decision.

• Define decision function and policy. Answer: A decision function for a decision variable

is a function that specifies a value for the decision variable for each assignment of values

to its parents. A policy consists of a decision function for each decision variable.

• A possible world specifies a value for every random variable and decision variable. Given

a policy and a possible world, how do we know if the possible world satisfies the policy?

Answer: The possible world satisfies the policy if the value for each decision variable in

that possible world is the value selected in the decision function for that decision variable

in the policy.

• To find an optimal policy, do we need to enumerate all of the policies? Why or why not?

Answer: No, we can use variable elimination instead.

2 Sequential Decisions and Variable Elimination

Miranda is an enthusiastic gamer, spending quite a bit of time playing Wii video games and a

fair amount of money buying them. She notices that her neighbourhood video store rents Wii

games for much less than the cost of buying one. She realizes that renting the games might be a good way to test them out before she decides whether or not to buy them. Figure ?? represents

her decision problem.

Figure 1: A decision problem.

Based on prior experience, Miranda expects that about 80% of video games will be good quality

and the other 20% she won’t care for. Based on her previous experiences renting video games,

she also knows the following information:

P(Outcome = likesGame|goodQuality = True) = 0.85

P(Outcome = likesGame|goodQuality = False) = 0.10

The rental period is so short that it’s not always possible to get a reliable estimate of whether

the game is of good quality.

Below are the utilities for various outcomes of the decision process. You can think of the utilities

as representing a combination of gaming enjoyment and money saved (Satisfaction).

rentGame buyGame goodQuality Satisfaction T T Trae 80.0 T T False -100.0

T F Trae 30.0

T F False -30.0

F T Trae 100.0

F T False -80.0

F F Trae 0.0

F F False 0.0

• If we carry out the variable elimination algorithm, what are the initial factors? Answer:

There are 3 factors to begin with. f0(rentGame, buyGame, gameQuality) relates to the

utilities. f1(rentGame, outcome, gameQuality) represents the probability of outcome given

gameQuality and rentGame. f2(gameQuality) represents the prior for gameQuality.

• Which decision variable is eliminated first, and why?

Answer: The decision variable buyGame is eliminated first because it is the last decision

in the ordering.

• How is that decision eliminated? Answer: It is eliminated by choosing that values that

maximize the utility. For example, if rentGame=T and outcome=Like, then buying the

game results in a utility of 52.4 whereas not buying the game results in a utility of only

19.8. So we add a new decision function to our set of decision functions, specifying that

when the parents have those assigned values, the decision is to buy the game. This is done

for each combination of parent values.

• After that decision is eliminated, which variable is eliminated next, and why? Answer: The

random variable outcome is eliminated next, because it is no longer parent to any decision

variable (since we removed buyGame).

• What is the optimal policy for this decision problem? Answer: The optimal decision for rental is not to rent.

The optimal decision for buying is to buy the game in every case except where the game

was rented and she disliked it.

• What is the expected utility of following the optimal policy? Answer: The expected

utility of following that optimal policy is 64.0.

Use the AISpace decision applet to represent and solve this decision problem, and to check your

answers. The representation we have used is in file wii.xml.

Reference: Chapter 7: Reasoning Under Uncertainty in: Poole, D. L., & Mackworth, A. K. (2010). Artificial Intelligence: foundations of computational agents. Cambridge University Press. Available online at http://artint.info/

Documents

Assignment Unit 7 Ia