Chapter 18 Deterministic Dynamic Programming

Chapter 18

Deterministic Dynamic Programming

to accompanyOperations Research: Applications and Algorithms

4th editionby Wayne L. Winston

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

2

Description Dynamic programming is a technique that can

be used to solve many optimization problems. In most applications, dynamic programming

obtains solutions by working backward from the end of the problem toward the beginning, thus breaking up a large, unwieldy problem into a series of smaller, more tractable problems

3

18.1 Two Puzzles Example We show how working backward can make a

seemingly difficult problem almost trivial to solve.

Suppose there are 20 matches on a table. I begin by picking up 1, 2, or 3 matches. Then my opponent must pick up 1, 2, or 3 matches. We continue in this fashion until the last match is picked up. The player who picks up the last match is the loser. How can I (the first player) be sure of winning the game?

4

If I can ensure that it will be opponent’s turn

when 1 match remains, I will certainly win. Working backward one step, if I can ensure that

it will be my opponent's turn when 5 matches remain, I will win.

If I can force my opponent to play when 5, 9, 13, 17, 21, 25, or 29 matches remain, I am sure of victory.

Thus I cannot lose if I pick up 1 match on my first turn.

5

18.2 A Network Problem Many applications of dynamic programming

reduce to finding the shortest (or longest) path that joins two points in a given network.

For larger networks dynamic programming is much more efficient for determining a shortest path than the explicit enumeration of all paths.

6

Characteristics of Dynamic Programming Applications Characteristic 1

The problem can be divided into stages with a decision required at each stage.

Characteristic 2 Each stage has a number of states associated with it.

By a state, we mean the information that is needed at any stage to make an optimal decision.

Characteristic 3 The decision chosen at any stage describes how the

state at the current stage is transformed into the state at the next stage.

7

Characteristic 4

Given the current state, the optimal decision for each of the remaining stages must not depend on previously reached states or previously chosen decisions. This idea is known as the principle of optimality.

Characteristic 5 If the states for the problem have been classified into

on of T stages, there must be a recursion that related the cost or reward earned during stages t, t+1, …., T to the cost or reward earned from stages t+1, t+2, …. T.

8

18.3 An Inventory Problem Dynamic programming can be used to solve an

inventory problem with the following characteristics:

1. Time is broken up into periods, the present period being period 1, the next period 2, and the final period T. At the beginning of period 1, the demand during each period is known.

2. At the beginning of each period, the firm must determine how many units should be produced. Production capacity during each period is limited.

9

3. Each period’s demand must be met on time from

inventory or current production. During any period in which production takes place, a fixed cost of production as well as a variable per-unit cost is incurred.

4. The firm has limited storage capacity. This is reflected by a limit on end-of-period inventory. A per-unit holding cost is incurred on each period’s ending inventory.

5. The firms goal is to minimize the total cost of meeting on time the demands for periods 1,2, …., T.

10

In this model, the firm’s inventory position is

reviewed at the end of each period, and then the production decision is made.

Such a model is called a periodic review model.

This model is in contrast to the continuous review model in which the firm knows its inventory position at all times and may place an order or begin production at any time.

11

18.4 Resource-Allocation Problems Resource-allocation problems, in which limited

resources must be allocated among several activities, are often solved by dynamic programming.

To use linear programming to do resource allocation, three assumptions must be made: Assumption 1 : The amount of a resource assigned

to an activity may be any non negative number. Assumption 2 : The benefit obtained from each

activity is proportional to the amount of the resource assigned to the activity.

12

Assumption 3: The benefit obtained from more than

one activity is the sum of the benefits obtained from the individual activities.

Even if assumptions 1 and 2 do not hold, dynamic programming can be used to solve resource-allocation problems efficiently when assumption 3 is valid and when the amount of the resource allocated to each activity is a member of a finite set.

13

Generalized Resource Allocation Problem The problem of determining the allocation of

resources that maximizes total benefit subject to the limited resource availability may be written as

where xt must be a member of {0,1,2,…}.

Tt

ttt xr

1

)(max

Tt

t

wxtgt1

)(s.t.

14

To solve this by dynamic programming, define ft(d)

to be the maximum benefit that can be obtained from activities t, t+1,…, T if d unites of the resource may be allocated to activities t, t+1,…, T.

We may generalize the recursions to this situation by writingfT+1(d) = 0 for all d

where xt must be a non-negative integer satisfying gt(xt)≤ d.

)]}([)({max)(1 ttttxt xgdfxrdf

tt

15

A Turnpike Theorem Turnpike results abound in the dynamic

programming literature. Why are the results referred to as a turnpike

theorem? Think about taking an automobile trip in which

our goal is to minimize the time needed to complete the trip.

For a long trip it may be advantageous to go slightly out of our way so that most of the trip will be spent on a turnpike, on which we can travel at the greatest speed.

16

18.5 Equipment Replacement Problems Many companies and customers face the

problem of determining how long a machine should be utilized before it should be traded in for a new one.

Problems of this type are called equipment-replacement problems and can be solved by dynamic programming.

An equipment replacement model was actually used by Phillips Petroleum to reduce costs associated with maintaining the company’s stock of trucks.

17

18.6 Formulating Dynamic Programming Recursions In many dynamic programming problems, a

given stage simply consists of all the possible states that the system can occupy at that stage.

If this is the case, then the dynamic programming recursion can often be written in the following form:Ft(i) = min{(cost during stage t) + ft+1 (new stage at stage t

+1)}

where the minimum in the above equation is over all decisions that are allowable, or feasible, when the state at state t is i.

18

Correct formulation of a recursion of the form

requires that we identify three important aspects of the problem: Aspect 1: The set of decisions that is allowable, or

feasible, for the given state and stage. Aspect 2: We must specify how the cost during the

current time periods (stage t) depends on the value of t, the current state, and the decision chosen at stage t.

Aspect 3: We must specify how the state at stage t+1 depends on the value of t, the states at stage t, and the decision chosen at stage t.

Not all recursions are of the form shown before.

19

A Fishery Example The owner of a lake must decide how many

bass to catch and sell each year. If she sells x bass during year t, then a revenue

r(x) is earned. The cost of catching x bass during a year is a

function c(x, b) of the number of bass caught during the year and of b, the number of bass in the lake at the beginning of the year. Of course, bass do reproduce.

20

A Fishery Example To model this, we assume that the number of

bass in the lake at the beginning of a year is 20% more than the number of bass left in the lake at the end of the previous year.

Assume that there are 10,000 bass in the lake at the beginning of the first year.

Develop a dynamic programming recursion that can be used to maximize the owner’s net profit over a T-year horizon.

21

A Fishery Example In problems where decisions must be made at

several points in time, there is often a trade-off of current benefits against future benefits.

At the beginning of year T, the owner of the lake need not worry about the effect that the capture of bass will have on the future population of the lake.

So the beginning of the year problem is relatively easy to solve. For this reason, we let time be the stage.

At each stage, the owner of the lake must decide how many bass to catch.

22

A Fishery Example We define xt to be the number of bass caught

during year t. To determine an optimal value of xt, the owner

of the lake need only know the number of bass (call it bt) in the lake at the beginning of year t. Therefore, the state at the beginning of year t is bt.

We define ft (bt) to be the maximum net profit that can be earned from bass caught during years t, t+1, …T given that bt bass are in the lake at the beginning of year t.

23

A Fishery Example We may now dispose of aspects 1-3 of the

recursion. Aspect 1: What are the allowable decisions? During

any year we can’t catch more bass than there are in the lake. Thus, in each state and for all t 0 ≤ xt ≤ bt

must hold. Aspect 2: What is the net profit earned during year t?

If xt bass are caught during a year that begins with bt bass in the lake, then the net profit is r(xt) – c(xt, bt).

Aspect 3: What will be the state during year t+1? The year t+1 state will be 1.2 (bt – xt).

24

A Fishery Example After year T, there are no future profits to

consider, so ft(bt)=max{r(xt) – c(xt,bt)+ft+1[1.2(bt-xt)]} where 0 ≤ xt ≤ bt.

We use this equation to work backwards until f1(10,000) has been computed.

Then to determine the optimal fishing policy, we begin by choosing x1 to be any value attaining the maximum in the equation for f1(10,000).

Then year 2 will begin with 1.2(10,000 – x1) bass in the lake.

25

A Fishery Example This means that x2 should be chosen to be any

value attaining the maximum in the equation for f2(1.2(10,000-x1)).

Continue in this fashion until optimal values of x3, x4,…xT have been determined.

26

Incorporating the Time Value of Money into Dynamic Programming Formulations

A weakness of the current formulation is that profits received during the later years are weighted the same as profits received during the earlier years.

Suppose that for some ß < 1, $1 received at the beginning of year t+1 is equivalent to ß dollars received at the beginning of year t.

27

We can incorporate this idea into the dynamic

programming recursion by replacing the previous equation with

where 0 ≤ xt ≤ bt. Then we redefine ft(bt) to be the maximum net

profit that can be earned during years t, t+1,…T. This approach can be used to account for the

time value of money in any dynamic programming formulation.

)]}(2.1[),()({max)(1 tttttxtt xbfbxcxrbf

tt

28

Computational Difficulties in Using Dynamic Programming There is a problem that limits the practical

application of dynamic programming. In many problems, the state space becomes so

large that excessive computational time is required to solve the problem by dynamic programming.

29

18.7 The Wagner-Whitin Algorithm and the Silver-Meal Heuristic The Inventory Example in this chapter is a

special case of the dynamic lot-size model. Description of Dynamic Lot-Size Model

1. Demand dt during periods t(t = 1, 2,…, T) is known at the beginning of period 1.

2. Demand for period t must be met on time from inventory or from period t production. The cost c(x) of producing x units during any period is given by c(0) =0, and for x >0, c(x) = K+cx, where K is a fixed cost for setting up production during a period, and c is a variable per-unit cost of production.

30

3. At the end of period t, the inventory level it is observed,

and a holding cost hit is incurred. We let i0 denotes the inventory level before period 1 production occurs.

4. The goal is to determine a production level xi for each period t that minimizes the total cost of meeting (on time) the demands for periods 1, 2, …, T.

5. There is a limit ct placed on period t’s ending inventory.6. There is a limit rt placed on period t’s production.

Wagner and Whitin have developed a method that greatly simplifies the computation of optimal production schedules for dynamic lot-size models.

31

Lemmas 1 and 2 are necessary for the

development of the Wagner-Whitin algorithm. Lemma 1: Suppose it is optimal to produce a

positive quantity during a period t. Then for some j=0, 1, …, T-t, the amount produced during period t must be such that after period t’s production, a quantity dt + dt+1 + … + dt+j will be in stock. In other words, if production occurs during period t, we must (for some j) produce an amount that exactly suffices to meet the demands for periods t, t+1,…, t+j.

32

Lemma 2: If it is optimal to produce anything

during period t, then it-1 <dt. In other words, production cannot occur during period t unless there is insufficient stock to meet period t demand.

With the possible exception of the first period, production will occur only during periods in which beginning inventory is zero, and during each period in which beginning inventory is zero (and dt ≠ 0), production must occur.

33

Using this insight , Wagner and Whitin

developed a recursion that can be used to determine an optimal production policy.

)(min 1,...2,1,0

jtjtTj

t fctf

34

The Silver-Meal Heuristic The Silver-Meal(S-M) Heuristic involved less

work than the Wagner-Whitin algorithm and can be used to find a near-optimal production schedule.

The S-M heuristic is based on the fact that our goal is to minimize average cost per period.

In extensive testing, the S-M heuristic usually yielded a production schedule costing less than 1% above the optimal policy obtained by the Wagner-Whitin algorithm.

35

18.8 Using Excel to Solve Dynamic Programming Problems Excel can often be used to solve DP problems. The examples in the book demonstrates how

spreadsheets can be used to solve problems. Refer to these Excel files for more information

Dpknap.xls Dpresour.xls Dpinv.xls

Documents

Chapter 18 Deterministic Dynamic Programming