47
Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic uncertainties 1.1.1 Extensive form of the stochastic decision program We consider a farmer who has a total of 500 acres of land available for growing wheat, corn and sugar beets. We denote by x 1 ,x 2 ,x 3 the amount of acres of land devoted to wheat, corn and sugar beets, re- spectively. The planting costs per acre are $ 150, $ 230, and $ 260 for wheat, corn and sugar beets. The farmer needs at least 200 tons (T) of wheat and 240 T of corn for cattle feed which can be grown on the farm or bought from a wholesaler. We refer to y 1 ,y 2 as the amount of wheat resp. corn (in tons) purchased from a wholesaler. The purchase prices of wheat resp. corn per ton are $ 238 for wheat and $ 210 for corn. The amount of wheat and corn produced in excess will be sold at prices of $ 170 per ton for wheat and $ 150 per ton for corn. For sugar beets there is a quota on production which is 6000 T for the farmer. Any amount of sugar beets up to the quota can be sold at $ 36 per ton, the amount in excess of the quota is limited to $ 10 per ton. We denote by w 1 and w 2 the amount in tons of wheat resp. corn sold and by w 3 ,w 4 the amount of sugar beets sold at the favorable price and the reduced price, respectively. The farmer knows that the average yield on his land is 2.5 T, 3.0 T and 20.0 T per acre for wheat, corn and sugar beets. The data are shown in Table 1. Table 1. Data for optimal land usage Wheat Corn Sugar Beets Yield (T/acre) 2.5 3.0 20.0 Planting cost ($/acre) 150 230 260 Purchase price ($/T) 238 210 Selling price ($/T) 170 150 36 (under 6000 T) 10 (above 6000 T) Minimum requirement (T) 200 240 Total available land: 500 acres 1

Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Embed Size (px)

Citation preview

Page 1: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Chapter 1

Stochastic Linear and Nonlinear Programming

1.1 Optimal land usage under stochastic uncertainties

1.1.1 Extensive form of the stochastic decision program

We consider a farmer who has a total of 500 acres of land availablefor growing wheat, corn and sugar beets. We denote by x1, x2, x3 theamount of acres of land devoted to wheat, corn and sugar beets, re-spectively.

The planting costs per acre are $ 150, $ 230, and $ 260 for wheat, cornand sugar beets.

The farmer needs at least 200 tons (T) of wheat and 240 T of corn forcattle feed which can be grown on the farm or bought from a wholesaler.We refer to y1, y2 as the amount of wheat resp. corn (in tons) purchasedfrom a wholesaler. The purchase prices of wheat resp. corn per ton are$ 238 for wheat and $ 210 for corn.

The amount of wheat and corn produced in excess will be sold at pricesof $ 170 per ton for wheat and $ 150 per ton for corn. For sugar beetsthere is a quota on production which is 6000 T for the farmer. Anyamount of sugar beets up to the quota can be sold at $ 36 per ton, theamount in excess of the quota is limited to $ 10 per ton. We denote byw1 and w2 the amount in tons of wheat resp. corn sold and by w3, w4

the amount of sugar beets sold at the favorable price and the reducedprice, respectively.

The farmer knows that the average yield on his land is 2.5 T, 3.0 Tand 20.0 T per acre for wheat, corn and sugar beets.

The data are shown in Table 1.

Table 1. Data for optimal land usage

Wheat Corn Sugar BeetsYield (T/acre) 2.5 3.0 20.0Planting cost ($/acre) 150 230 260Purchase price ($/T) 238 210 –Selling price ($/T) 170 150 36 (under 6000 T)

10 (above 6000 T)Minimum requirement (T) 200 240 –Total available land: 500 acres

1

Page 2: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

2 Ronald H.W. Hoppe

The farmer wants to maximize his profit. Based on the above data,this amounts to the solution of the linear program:

minimize (150x1 + 230x2 + 260x3 + 238y1 − 170w1(1.1)

+ 210y2 − 150w2 − 36w3 − 10w4)

subject to x1 + x2 + x3 ≤ 500 ,

2.5x1 + y1 − w1 ≥ 200 ,

3.0x2 + y2 − w2 ≥ 240 ,

w3 + w4 ≤ 20x3 ,

w3 ≤ 6000 ,

x1, x2, x3, y1, y2, w1, w2, w3, w4 ≥ 0 .

The solution of (1.1) is shown in Table 2.

Table 2. Solution of the linear program (’average yields’)

Culture Wheat Corn Sugar BeetsSurface (acres) 120 80 300Yield (T) 300 240 6000Purchases (T) – – –Sales (T) 100 – 6000Maximum profit: $ 118,600

The yield is sensitive to, e.g., wheather conditions. We refer to to thepreviously determined optimal solution as that one based on ’averageyields’ and consider two more scenarios, namely ’above average yields’and ’below average yields’ by a margin of ± 20 %. The associatedoptimal solutions are depicted in Table 3 and Table 4.

Table 3. Solution of the linear program (’above average yields’)

Culture Wheat Corn Sugar BeetsSurface (acres) 183.33 66.67 250Yield (T) 550 240 6000Purchases (T) – – –Sales (T) 350 – 6000Maximum profit: $ 167,667

Page 3: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 3

Table 4. Solution of the linear program (’below average yields’)

Culture Wheat Corn Sugar BeetsSurface (acres) 100 25 375Yield (T) 200 60 6000Purchases (T) – 180 –Sales (T) – – 6000Maximum profit: $ 59,950

The mean profit is the average profit of the three scenarios which is$ 115,406.

The problem for the farmer is that he has to decide on the land assign-ment, i.e., to determine x1, x2, x3 without knowing which of the threescenarios is going to happen with regard to the purchases y1, y2 andsales w1, w2, w3, w4 which depend on the yield. The variables x1, x2, x3

are called the first stage decision variables.

Hence, the decisions depend on the scenarios which are indexed byj = 1, 2, 3 with j = 1 referring to ’above average yields’, j = 2 to’average yields’ and j = 3 to ’below average yields’. We introducecorresponding new variables yij, 1 ≤ i ≤ 2, 1 ≤ j ≤ 3, and wij, 1 ≤ i ≤4, 1 ≤ j ≤ 3. For instance, w31 represents the amount of sugar beetssold at the favorable price in case ’above average yields’. The decisionvariables yij, wij are referred to as the second stage decision variables.

We assume that the three scenarios occur at the same probability of1/3. If the objective is to maximize long-run profit, we are led to thefollowing problem:

minimize(1.2) (150x1 + 230x2 + 260x3

−1

3(170w11 − 238y11 + 150w21 − 210y21 + 36w31 + 10w41)

−1

3(170w12 − 238y12 + 150w22 − 210y22 + 36w32 + 10w42)

−1

3(170w13 − 238y13 + 150w23 − 210y23 + 36w33 + 10w43)

)

subject to x1 + x2 + x3 ≤ 500 ,

3.0x1 + y11 − w11 ≥ 200 ,

3.6x2 + y21 − w21 ≥ 240 ,

Page 4: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

4 Ronald H.W. Hoppe

w31 + w41 ≤ 24x3 ,

w31 ≤ 6000 ,

2.5x1 + y12 − w12 ≥ 200 ,

3.0x2 + y22 − w22 ≥ 240 ,

w32 + w42 ≤ 20x3 ,

w32 ≤ 6000 ,

2.0x1 + y13 − w13 ≥ 200 ,

2.4x2 + y23 − w23 ≥ 240 ,

w33 + w43 ≤ 16x3 ,

w33 ≤ 6000 ,

x, y, w ≥ 0 .

The optimization problem (1.2) is called a stochastic decision prob-lem. In particular, (1.2) is said to be the extensive form of thestochastic program. The reason for this notation is that it explicitlydescribes the second stage variables for all possible scenarios. Its opti-mal solution is shown in Table 5.

Table 5. Solution of the stochastic decision problem (1.2)

Wheat Corn Sugar BeetsFirst stage Surface (acres) 170 80 250s = 1 Yield (T) 510 288 6000above Purchases (T) – – –average Sales (T) 310 48 6000s = 2 Yield (T) 425 240 5000average Purchases (T) – – –

Sales (T) 225 – 5000s = 3 Yield (T) 340 192 4000below Purchases (T) – 48 –average Sales (T) 140 – 4000Maximum profit: $ 108,390

We see that the solution differs from those obtained in case of perfecta priori information. The distinctive feature is that in a stochasticsetting the decisions have to be hedged against the various possiblescenarios (cf. Tables 2,3 and 4). We also see that the expected max-imum profit ($ 108,390) differs from the mean value ($ 115,406) ofthe maximum profits of the three scenarios in case of perfect a priori

Page 5: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 5

information. The difference $ 7016 is called the Expected Value ofPerfect Information (EVPI).

A variant of the above stochastic decision problem is that the farmermakes the first stage decision (allocation of land) on the basis of ’av-erage yields’ according to Table 2. If the yields are again random with20 % above resp. below average, it has to be observed that the plant-ing costs are deterministic, but the purchases and sales depend on theyield. This leads to a reduced stochastic decision program where themaximum profit turns out to be $ 107,240 which is in this case less thanthe maximum profit $ 108,390 of the stochastic decision program (1.2).The difference $ 1,150 is called the Value of the Stochastic Solu-tion (VSS) reflecting the possible gain by solving the full stochasticmodel.

1.1.2 Two-stage stochastic program with recourse

For a stochastic decision program, we denote by x ∈ lRn1 , x ≥ 0, thevector of first stage decision variables. It is subject to constraints

(1.3) Ax ≤ b ,

where A ∈ lRm1×n1 , b ∈ lRm1 are a fixed matrix and vector, respectively.In the optimal land usage problem, x = (x1, x2, x3)

T represents theamount of acres devoted to the three different crops. Here, m1 = 1 andA = (1 1 1), b = 500.We further denote by ξ a random vector whose realizations provideinformation on the second stage decisions y which is a random vectorwith realizations in lRn2

+ . In the optimal land usage problem, ξ =(t1, t2, t3)

T with ti = ti(s), 1 ≤ i ≤ 3, where s ∈ 1, 2, 3 stands for thepossible scenarios (’above average’, ’average’, and ’below average’). Inother words, ti(s) represents the yield of crop i under scenario s.y = (y1, y2, y3 = w1, y4 = w2, y5 = w3, y6 = w4)

T is the random vectorwhose realizations y(s), s ∈ 1, 2, 3, are the second stage decisionson the amount of crop to be purchased or sold at scenario s. Therelationship between x and y can be expressed according to

(1.4) Wy = h − Tx ,

where h ∈ lRm2 is a fixed vector, W ∈ lRm2×n2 is a fixed matrix, andT is a random matrix with realizations T(s) ∈ lRm2×n1 .W is called the recourse matrix and T is referred to as the techno-logy matrix.

Page 6: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

6 Ronald H.W. Hoppe

The second stage decision problems can be stated as

minimize qTy(1.5)

subject to Wy + Tx ≥ h,

y ≥ 0

for given q ∈ lRn2 . We set

(1.6) Q(x, ξ) := min qTy | Wy + Tx ≥ h .

In the optimal land usage problem, a second stage decision problem forscenario s can be written as

minimize 238y1 − 170w1 + 210y2 − 150w2 − 36w3 − 10w4(1.7)

subject to t1(s)x1 + y1 − w1 ≥ 200 ,

t2(s)x2 + y2 − w2 ≥ 240 ,

t3(s)x3 − w3 − w4 ≥ 0 ,

w3 ≤ 6000 ,

y, w ≥ 0 .

Altogether, the stochastic program can be formulated according to

minimize cT x + EξQ(x, ξ)(1.8)

subject to Ax ≤ b ,

x ≥ 0 ,

where c ∈ lRn1 is given and Eξ stands for the expectation.

The problem (1.8) is called a two stage stochastic program withrecourse. It represents the implicit representation of the original sto-chastic decision problem (1.2).Finally, we refer to the function

(1.9) Q(x) := EξQ(x, ξ)

as the value function or recourse function. Then, in even morecompact form (1.8) can be written as

minimize cT x + Q(x)(1.10)

subject to Ax ≤ b ,

x ≥ 0 ,

Page 7: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 7

1.1.3 Continuous random variables: The news vendor pro-blem

As an example of a stochastic problem with continuous random vari-ables we consider the so-called news vendor problem: The settingof the problem is as follows:Every morning, a news vendor goes to the publisher and buys x news-papers at a price of c per paper. This number is bounded from aboveby xmax. The vendor tries to sell as many newspapers as possible ata selling price q. Any unsold newspapers can be returned to the pub-lisher at a return price of r < c. The demand for newspapers variesof the days and is described by a continuous random variable ξ with

probability distribution F = F (ξ), i.e., P (a ≤ ξ ≤ b) =∫ b

adF (ξ) and∫ +∞

−∞ dF (ξ) = 1.The objective is to maximize the vendor’s profit. To this end, we definey as the effective sales and w as the number of remittents. Then, theproblem can be stated as

minimize J(x) := cx + Q(x)(1.11)

subject to 0 ≤ x ≤ xmax ,

where

Q(x) := EξQ(x, ξ) ,(1.12)

Q(x, ξ) := min − qy(ξ)− rw(ξ) ,

subject to y(ξ) ≤ ξ ,

y(ξ) + w(ξ) ≤ x ,

y(ξ), w(ξ) ≥ 0 .

Note that−Q(x) is the expected profit on sales and returns and−Q(x, ξ)stands for the profit on sales and returns in case the demand is givenby ξ.We see that like the optimal land usage problem, (1.11) represents an-other two-stage stochastic linear program with fixed recourse.The optimal solution of (1.11) can be easily computed: When the de-mand ξ is known in the second stage, the optimal solution is givenaccording to

y∗(ξ) = min(ξ, x) , w∗(ξ) = max(x− ξ, 0) ,

and hence, the second stage expected value function turns out to be

Q(x) = Eξ [−q min(ξ, x) − r max(x− ξ, 0)] .

Page 8: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

8 Ronald H.W. Hoppe

The second stage expected value function can be computed by meansof the probalibility distribution F (ξ):

Q(x) =

x∫

−∞

(− qξ − r(x− ξ)

)dF (ξ) +

+∞∫

x

(− qx

)dF (ξ) =

= −(q − r)

x∫

−∞

ξ dF (ξ) − rxF (x) − qx(1− F (x)) .

Integration by parts yieldsx∫

−∞

ξ dF (ξ) = xF (x) −x∫

−∞

F (ξ) dξ ,

whence

Q(x) = −qx + (q − r)

x∫

−∞

F (ξ) dξ .

It follows that Q is differentiable in x with

Q′(x) = −q + (q − r)F (x) .

From Optimization I we know that the optimal solution x∗ of (1.11)satisfies the variational equation

(v − x∗)J ′(x∗) ≥ 0 , v ∈ K := x | 0 ≤ x ≤ xmax ,

whose solution is given by

x∗ = 0 , if J ′(0) > 0 ,

x∗ = xmax , if J ′(xmax) < 0 ,

J ′(x∗) = 0 , otherwise .

Since J ′(x) = c + Q′(x), we find

x∗ = 0 , ifq − c

q − r< F (0) ,

x∗ = xmax , ifq − c

q − r> F (xmax) ,

x∗ = F−1(q − c

q − r) , otherwise .

Page 9: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 9

1.2 Two-stage stochastic linear programs with fixed recourse

1.2.1 Formulation of the problem and basic properties

Recalling the example concerning optimal land usage in case of sto-chastic uncertainties, we give the general formulation of a two-stagestochastic linear program with fixed recourse.

Definition 1.1 (Two-stage stochastic linear program with fixedrecourse)

Let I ⊂ N be an index set, A ⊂ Rm1×n1 and W ∈ Rm2×n2 be fixed(deterministic) matrices, T (ω) ∈ Rm2×n1 , ω ∈ I, a random matrix,c ∈ Rn1 , b ∈ Rm1 fixed (deterministic) vectors and q(ω) ∈ Rn2 , h(ω) ∈Rm2 , ω ∈ I, random vectors w.r.t. a probability space (P, Ω,A). Letfurther ξT (ω) = (q(ω)T , h(ω)T , T1(ω), · · · , Tm2(ω) ∈ RN , N := n2 +m2 + (m2 × n1), ω ∈ I, where Ti(ω), 1 ≤ i ≤ m2, are the rows of T (ω).Then, the problem

minimize cT x + Eξ(min q(ω)T y(ω))(1.13)

Ax = b ,

T (ω)x + Wy(ω) = h(ω) a.s. ,

x ≥ 0 , y(ω) ≥ 0 a.s. ,

is called a two-stage stochastic linear program with fixed re-course. The matrix W is said to be the recourse matrix and thematrix T (ω) is referred to as the technology matrix.

Definition 1.2 (Deterministic Equivalent Program (DEP))

The linear program

minimize cT x + Q(x)(1.14)

Ax = b ,

x ≥ 0 ,

where

Q(x) := Eξ(Q(x, ξ(ω))) ,(1.15)

Q(x, ξ(ω)) := minyq(ω)T y | Wy = h(ω)− T (ω)x , y ≥ 0 ,

is called the Deterministic Equivalent Program (DEP) associatedwith (1.13). The function Q is said to be the recourse function orexpected second-stage value function.

Page 10: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

10 Ronald H.W. Hoppe

Definition 1.3 (Feasible sets)

The sets

K1 := x ∈ Rn1 | Ax = b ,(1.16)

K2 := x ∈ Rn1 | Q(x) < ∞are called the first stage feasible set and the second stage feasibleset, respectively.Let Σ ⊂ RN be the support of ξ in the sense that P (ξ ∈ Σ) = 1. If Σis finite, Q(x) is the weighted sum of finitely many Q(x, ξ) values. Weuse the convention that if the values ±∞ occur, then +∞ + (−∞) =+∞. The sets

K2(ξ) := x ∈ Rn1 | Q(x, ξ) < ∞ ,(1.17)

KP2 := x ∈ Rn1 | For all ξ ∈ Σ, y ≥ 0 exists

s.th. Wy = h− Tx =⋂

ξ∈Σ

K2(ξ)

are called the elementary second stage feasible set and the pos-sibility interpretation of the second stage feasible set, respec-tively.

Definition 1.4 (Relatively complete, complete, and simplerecourse)

The stochastic program (1.13) is said to have

• relatively complete recourse, if K1 ⊂ K2,• complete recourse, if for all z ∈ Rm2 there exists y ≥ 0 such

that Wy = z,• simple recourse, if the recourse matrix W has the structure

W = [I − I].

In case of simple recourse, we partition y and q according to y =(y+, y−), and q = (q+, q−). Then, the optimal values (y+

i (ω), y−i (ω)only depend on the sign of hi(ω)− Ti(ω)x, provided qi = q+

i + q−i ≥ 0with probability one. Moreover, if hi has an associated distributionfunction Fi and mean value hi, there holds

(1.18) Qi(x) = q+i hi − (q+

i − qiFi(Tix))Tix − qi

hi≤Tix

hi dFi(hi) .

Page 11: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 11

Theorem 1.1 (Characterization of second stage feasible sets)

(i) For each ξ, the elementary second stage feasible set K2(ξ) is a closedconvex polyhedron which implies that KP

2 is a closed convex set.(ii) Moreover, if Σ is finite, KP

2 = K2.

Proof: The proof of (i) is obvious. For the proof of (ii) assume x ∈ K2.Then, Q(x) is bounded from above. Hence, Q(x, ξ) is bounded fromabove for each ξ which shows x ∈ K2(ξ) for all ξ, whence x ∈ KP

2 .Conversely, assume x ∈ KP

2 . Then, Q(x, ξ) is bounded from above forall ξ. We deduce that Q(x) is bounded from above and hence, x ∈ K2.

¤We note that in case ξ is a continuous random variable similar results

hold true, if ξ has finite second moments. For details we refer to [6]and [9].

Theorem 1.2 (Properties of the second stage value function)

Assume Q(x, ξ) > −∞. Then, there holds

(i) Q(x, ξ) is piecewise linear and convex in (h, T ).

(ii) Q(x, ξ) is piecewise linear and concave in q.

(iii) Q(x, ξ) is piecewise linear and convex in x for all x ∈ K1 ∩K2.

Proof: The piecewise linearity in (i)-(iii) follows from the existence offinitely many optimal bases for the second stage program. For detailswe refer to [7].For the proof of the convexity in (h, t) resp. in x, it suffices to provethat the function

g(z) := min qT y | Wy = zis convex in z. For λ ∈ [0, 1] and z1, z2, z1 6= z2, we consider z(λ) :=λz1 + (1 − λ)z2 and denote by y∗i , 1 ≤ i ≤ 2, optimal solutions ofthe minimization problem for z = z1 and z = z2, respectively. Then,y∗(λ) := λy∗1 + [1− λ)y∗2 is a feasible solution for z = z(λ). If y∗λ is thecorresponding optimal solution, we obtain

g(z(λ)) = qT y∗λ ≤ qT y∗(λ) =

= λqT y∗1 + (1− λ)qT y∗2 = λg(z1) + (1− λ)g(z2) .

The proof of the concavity in q is left as an exercise. ¤For similar results in case ξ is a continuous random variable with finitesecond moments we again refer to [8].

Page 12: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

12 Ronald H.W. Hoppe

1.2.2 Optimality conditions

For the derivation of the optimality conditions (KKT conditions), weassume that (1.14) has a finite optimal value. We refer to [8] for con-ditions that guarantee finiteness of the optimal value.

Theorem 1.3 (Optimality conditions)

Assume that (1.14) has a finite optimal value. A solution x∗ ∈ K1 of(1.14) is optimal if and only if there exist λ∗ ∈ Rm1 , µ∗ ∈ Rn1

+ , (µ∗)T x∗ =0, such that

(1.19) −c + AT λ∗ + µ∗ ∈ ∂Q(x∗) ,

where ∂Q(x∗) denotes the subdifferential of the recourse function Q.

Proof: As we know from Optimization I, the minimization problem

minimize J(x) := cT x + Q(x) ,

subject to Ax = b , x ≥ 0

is a convex optimization problem with a closed convex constraint setwhich can be equivalently written as

(1.20) infx∈Rn1

supλ∈Rm1 ,µ∈Rn1

+

L(x, λ, µ) ,

where the Lagrangian is given by

L(x, λ, µ) := J(x) − λT (Ax− b) − µT x .

The optimality condition for (1.20) reads

0 ∈ ∂L(x∗, λ∗, µ∗) .

The subdifferential of the Lagrangian turns out to be

∂L(x∗, λ∗, µ∗) = c + ∂Q(x∗) − AT λ∗ − µ∗ ,

which results in (1.19). ¤

Obviously, the non-easy task will be to evaluate the subdifferentialof the recourse function. The following result shows that it can bedecomposed into subgradients of the recourse for each realization of ξ.

Theorem 1.4 (Decomposition of the subgradient of the re-course function)

For x ∈ K there holds

∂Q(x) = Eω∂Q(x, ξ(ω)) + N(K2, x) ,(1.21)

N(K2, x) = v ∈ Rn1 | vT y ≤ 0 for all y s.th. x + y ∈ K2 ,

Page 13: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 13

where N(K2, x) is the normal cone of the second stage feasible set K2.

Proof: The subdifferential calculus of random convex functions withfinite expectations [10] infers

∂Q(x) = Eω∂Q(x, ξ(ω)) + rec(∂Q(x)) ,

where rec(∂Q(x)) is the recession cone of the subdifferential accordingto

rec(∂Q(x)) = v ∈ Rn1 | u + λv ∈ ∂Q(x) , λ ≥ 0 , u ∈ ∂Q(x) .

The recession cone can be equivalently written as

rec(∂Q(x)) = v ∈ Rn1 | yT (u+λv)+Q(x) ≤ Q(x+y) , λ ≥ 0 , y ∈ Rn1 .

Consequently, we have

v ∈ rec(∂Q(x)) ⇐⇒ yT v ≤ 0 for all y s.th. Q(x + y) < ∞ .

Recalling the definition of K2, we conclude. ¤

Corollary 1.5 (Optimality conditions in case of relatively com-plete recourse)

Assume that (1.14) has relatively complete recourse. Then, a solutionx∗ ∈ K1 of (1.14) is optimal if and only if there exist λ∗ ∈ Rm1 , µ∗ ∈Rn1

+ , (µ∗)T x∗ = 0, such that

(1.22) −c + AT λ∗ + µ∗ ∈ Eω∂Q(x∗, ξ(ω)) .

Proof: Taking into account that under the assumption of a relativelycomplete recourse there holds

N(K2, x) ⊂ N(K1, x) = v ∈ Rn2 | v = AT λ + µ , µ ≥ 0 , µT x = 0 ,

the result follows from Theorems 1.3 and 1.4. ¤

Corollary 1.6 (Optimality conditions in case of simple re-course)

Assume that (1.14) has relatively complete recourse. Then, a solutionx∗ ∈ K1 of (1.14) is optimal if and only if there exist λ∗ ∈ Rm1 , µ∗ ∈Rn1

+ , (µ∗)T x∗ = 0, and π∗ ∈ Rn2 with

−(q+i − qiFi(Tix

∗)) ≤ π∗i ≤ −(q+i − qiF

+i (Tix

∗))

where F+i (h) := limt→h+ Fi(t), such that

(1.23) −c + AT λ∗ + µ∗ − (π∗)T T = 0 .

Proof: We deduce from (1.18) that

∂Qi(x) = πi(Ti)T | −(q+

i −qiFi(Tix)) ≤ πi ≤ −(q+i −qiF

+i (Tix)) .

Then, (1.23) follows readily from Theorem 1.3. ¤

Page 14: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

14 Ronald H.W. Hoppe

1.2.3 The value of information

Definition 1.5 (Expected value of perfect information)

For a particular realization ξ = ξ(ω), ω ∈ I, we consider the objectivefunctional

J(x, ξ) := cT x + min qT y | Wy = h− Tx , y ≥ 0and the associated minimization problem

minx∈K1

EξJ(x, ξ) , K1 := x ∈ Rn1+ | Ax = b .

The optimal solution is sometimes referred to as the here-and-nowsolution (cf. (1.13)). We denote the optimal value of this recourseproblem by

(1.24) RP := minx∈K1

J(x, ξ) .

Another related minimization problem is to find the optimal solutionfor all possible scenarios and to consider the expected value of theassociated optimal value

(1.25) WS := Eξ minx∈K1

J(x, ξ) .

The optimal solution of (1.25) is called the wait-and-see solution.The difference between the optimal values of the here-and-now solutionand the wait-and-see solution

(1.26) EV PI := RP − WS

is referred to as the expected value of perfect information.

Example: In the optimal land usage problem, we found

WS = −$115, 406 , RP = −$108, 390

so that EV PI = $7, 016. This amount represents the value of perfectinformation w.r.t. the wheather conditions for the next season.

The computation of the wait-and-see solution requires a considerableamount of computational work. Replacing all random variables bytheir expectations leads us to a quantity which is called the value ofthe stochastic solution.

Page 15: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 15

Definition 1.6 (Value of the stochastic solution)

We denote by ξ = E(ξ) the expectation of ξ and consider the mini-mization problem

(1.27) EV := minx∈K1

J(x, ξ) .

which is dubbed the expected value problem or mean value prob-lem. An optimal solution of (1.27) is called the expected value so-lution.Denoting an optimal solution by x(ξ), the expected value

(1.28) EEV := EξJ(x(ξ), ξ)

is referred to as the expected result of using the expected valuesolution. The difference

(1.29) V SS := EEV − RP

is called the value of the stochastic solution. It measures theperformance of x(ξ) w.r.t. second stage decisions optimally chosen asfunctions of x(ξ) and ξ.

Example: In the optimal land usage problem, we have

EEV = −$107, 240 , RP = −$108, 390 ,

so that V SS = $1, 150. This amount represents the cost of ignoringuncertainty in the choice of a decision.

Theorem 1.7 (Fundamental inequalities, Part I)

Let RP , WS, and EEV be given by (1.24), (1.25) and (1.28), respec-tively. Then, there holds

(1.30) WS ≤ RP ≤ EEV .

Proof: If x∗ denotes the optimal solution of the recourse problem(1.24) and x(ξ) is the wait-and-see solution, we have

J(x(ξ), ξ) ≤ J(x∗, ξ) .

Taking the expectation on both sides results in the left inequality in(1.30). Since x∗ is the optimal solution of (1.24), whereas x(ξ) is justone solution of the recourse problem, we arrive at the second inequalityin (1.30). ¤

Page 16: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

16 Ronald H.W. Hoppe

Theorem 1.8 (Fundamental inequalities, Part II)

Let WS and EV be given by (1.25) and (1.27). Then, in case of fixed(deterministic) objective coefficients and fixed (deterministic) technol-ogy matrix T , there holds

(1.31) EV ≤ WS .

Proof: We definef(ξ) = min

x∈K1

J(x, ξ) .

Recalling (1.25) and (1.27), we see that (1.31) is equivalent to

(1.32) E(f(ξ)) ≤ f(E(ξ)) .

Since (1.32) holds true for convex functions according to Jensen’s in-equality, the only thing we have to prove is the convexity of f . In orderto do that, by duality we have

minx∈K1

J(x, ξ) = maxσ,π

σT b + πT h | σT A + πT T ≤ cT , πT W ≤ q .

We observe that the constraints of the dual problem remain unchangedfor all ξ = h. Hence, epi f is the intersection of the epigraphs of thelinear functions σT b+πT h for all feasible σ, π. The latter are obviouslyconvex, and so is then epi f . We know from Optimization I that afunction is convex if and only if its epigraph is convex. ¤

Theorem 1.9 (Fundamental inequalities, Part III)

Let RP and EEV be given by (1.24) and (1.28). Assume further thatx∗ is an optimal solution of (1.24) and that x(ξ) is a solution of theexpected value problem (1.28). Then, there holds

(1.33) RP ≥ EEV + (x∗ − x(ξ))T η , η ∈ ∂EξJ(x(ξ), ξ) .

Proof: The proof is left as an exercise. ¤

We finally derive an upper bound for the optimal value RP of therecourse problem (1.24) which is based on the observation

RP = minx∈K1

EξJ(x, ξ) ,(1.34)

J(x, ξ) := cT x + min qT y | Wy ≥ h(ξ)− Tx , y ≥ 0 .(1.35)

Note that in (1.36) the second stage constraints are inequalities.

Page 17: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 17

Theorem 1.10 (Fundamental inequalities, Part IV)

Assume that h(ξ) is bounded from above, i.e., there exists hmax suchthat h(ξ) ≤ hmax for all possible realizations of ξ. Let xmax be anoptimal solution of

minx∈K1

J(x, hmax) .

Then, there holds

(1.36) RP ≤ J(xmax, hmax) .

Proof: We see that for any ξ ∈ Σ and x ∈ K1, a feasible solution ofWy ≥ hmax − Tx, y ≥ 0, is also a feasible solution of Wy ≥ h(ξ) −Tx, y ≥ 0. Consequently, we have

J(x, hmax) ≥ J(x, h(ξ)) =⇒ J(x, hmax) ≥ EξJ(x, h(ξ)) ,

whenceJ(x, hmax) ≥ min

x∈K1

EξJ(x, h(ξ)) = RP . ¤

There is no universal relationship between EV PI and V SS. For adiscussion of this issue we refer to [1].

Page 18: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

18 Ronald H.W. Hoppe

1.3 Numerical solution of two-stage stochastic linear pro-grams with fixed recourse

1.3.1 The L-shaped method

We consider the deterministic equivalent program of a two-stage sto-chastic linear program with fixed recourse (cf. (1.13) and (1.14))

minimize cT x + Q(x)(1.37)

subject to Ax = b , x ≥ 0 ,

where

Q(x) := Eξ(Q(x, ξ(ω))) ,(1.38)

Q(x, ξ(ω)) := minyq(ω)T y | Wy = h(ω)− T (ω)x , y ≥ 0 .

The computational burden w.r.t. the DEP (1.37),(1.38) is the solutionof all second stage recourse linear programs. If the random vector ξonly has a finite number, let’s say, K realizations with probabilitiespk, 1 ≤ k ≤ K, the computational work can be significantly reduced byassociating one set of second stage decisions yk to each realization ofξ, i.e., to each realization of qk, hk, and Tk, 1 ≤ k ≤ K. In other words,we consider the following extensive form

minimize cT x +K∑

k=1

pkqTk yk(1.39)

subject to Ax = b ,

Tkx + Wyk = hk , 1 ≤ k ≤ K ,

x ≥ 0 , yk ≥ 0 , 1 ≤ k ≤ K .

A

T1 W

T2 W

• • • •

TK W

Fig.1.1. Block structure of the L-shaped method

Page 19: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 19

The L-shaped method is an iterative process with feasibility cutsand optimality cuts according to the block structure of the extensiveprogram illustrated in Fig. 1.1 which gives the method its name.

L-shaped algorithm

Step 0 (Initialization): Set r = s = ν = 0 and Dr = 0, dr =0, Es = 0, es = 0.

Step 1 (Iteration loop): Set ν = ν+1 and solve the linear program

minimize J(x, θ) := cT x + θ(1.40a)

subject to Ax = b ,(1.40b)

D` x ≥ d` , ` = 1, · · · , r ,(1.40c)

E` x + θ ≥ e` , ` = 1, · · · , s ,(1.40d)

x ≥ 0 , θ ∈ R .(1.40e)

If no constraints (1.40d) are present, set θ = 0. Denote an optimalsolution by (xν , θν). Set θν = −∞, if there are no constraints (1.40d).

Step 2 (Feasibility cuts): For k = 1, · · · , K until J(y, v+, v−) > 0solve the linear program

minimize J(y, v+, v−) := eT v+ + eT v−(1.41a)

subject to Wy + v+ − v− = hk − Tkxν ,(1.41b)

y ≥ 0 , v+ ≥ 0 , v− ≥ 0 ,(1.41c)

where e := (1, · · · , 1)T . For the first 1 ≤ k ≤ K with J(y, v+, v−) > 0let σν be the associated Lagrange multiplier and define the feasibilitycut

Dr+1 := (σν)T Tk ,(1.42a)

dr+1 := (σν)T hk . .(1.42b)

Set r = r + 1 and go back to Step 1.If J(y, v+, v−) = 0 for all 1 ≤ k ≤ K, go to Step 3.

Step 3 (Optimality cuts): For k = 1, · · · , K solve the linearprogram

minimize J(y) := qT y(1.43a)

subject to Wy = hk − Tkxν ,(1.43b)

y ≥ 0 .(1.43c)

Page 20: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

20 Ronald H.W. Hoppe

Let πνk be the Lagrange multipliers associated with an optimal solution

and define the optimality cut

Es+1 :=K∑

k=1

pk(πνk)T Tk ,(1.44a)

es+1 :=K∑

k=1

pk(πνk)T hk . .(1.44b)

Set Jν := es+1−Es+1xν . If θν ≥ Jν , stop the iteration: xν is an optimal

solution. Otherwise, set s = s + 1 and go back to Step 1.

1.3.2 Illustration of feasibility and optimality cuts

Example (Optimality cuts): We consider the minimization problem

minimize Q(x)(1.45)

subject to 0 ≤ x ≤ 10 ,

where

(1.46) Q(x, ξ) :=

ξ − x , x ≤ ξx− ξ , x ≥ ξ

,

where ξ1 = 1, ξ2 = 2, ξ3 = 4 are the possible realizations of ξ with theprobabilities pk = 1/3, 1 ≤ k ≤ 3. Note that

W = 1 , Tk =

1 , x ≤ ξk

−1 , x > ξk, qk = 1 , hk =

ξk , x ≤ ξk

−ξk , x > ξk.

Set r = s = 0, Dr = 0, dr = 0, Es = 0, es = 0, and ν = 1, x1 = 0, θ1 =−∞, and begin the iteration with Step 2:

Iteration 1: In Step 2, we find J(y, v+, v−) = 0, 1 ≤ k ≤ 3, sincex1 = 0 is feasible. In Step 3, the solution of (1.43) yields y = (1, 2, 4)T

with π1k = 1, 1 ≤ k ≤ 3. Hence, (1.44a),(1.44b) give rise to

E1 = 1 , e1 =7

3, J1 =

7

3.

Set s = 1 and begin Iteration 2.

Iteration 2: In Step 1, the solution of the minimization problem

minimize θ

subject to θ ≥ 7

3− x ,

0 ≤ x ≤ 10 , θ ∈ Ris x2 = 10, θ2 = −23

3.

Step 2 does not result in a feasibility cut, since x2 is feasible. In Step

Page 21: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 21

3, the solution of (1.43) yields y = (9, 8, 6)T with π2k = 1, 1 ≤ k ≤ 3.

Hence, from (1.44a),(1.44b) we obtain

E2 = −1 , e2 = −7

3, J2 =

23

3.

Set s = 2 and begin Iteration 3.

Iteration 3: In Step 1, the solution of the minimization problem

minimize θ

subject to θ ≥ 7

3− x ,

θ ≥ x− 7

3,

0 ≤ x ≤ 10 , θ ∈ Ris x3 = 7

3, θ3 = 0.

Step 2 does not result in a feasibility cut, since x3 is feasible. In Step 3,the solution of (1.43) gives y = (4/3, 1/3, 5/3)T with π3

k = 1, 1 ≤ k ≤ 3.The equations (1.44a),(1.44b) imply

E3 = −1

3, e3 =

1

3, J3 =

10

9.

Set s = 3 and begin Iteration 4.

Iteration 4: In Step 1, the solution of the minimization problem

minimize θ

subject to θ ≥ 7

3− x ,

θ ≥ x− 7

3,

θ ≥ x

3+

1

3,

0 ≤ x ≤ 10 , θ ∈ Ris x4 = 3

2, θ4 = 5

6.

Step 2 does not result in a feasibility cut, since x4 is feasible. In Step 3,the solution of (1.43) results in y = (1/2, 1/2, 5/2)T with π4

k = 1, 1 ≤k ≤ 3. The equations (1.44a),(1.44b) imply

E4 =1

3, e4 =

5

3, J4 =

7

6.

Set s = 4 and begin Iteration 5.

Page 22: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

22 Ronald H.W. Hoppe

Iteration 5: In Step 1, the solution of the minimization problem

minimize θ

subject to θ ≥ 7

3− x ,

θ ≥ x− 7

3,

θ ≥ x

3+

1

3,

θ ≥ 5

3− x

3,

0 ≤ x ≤ 10 , θ ∈ Ris x5 = 2, θ5 = 1.Step 2 does not result in a feasibility cut, since x5 is feasible. In Step3, the solution of (1.43) is y = (1, 0, 2)T with πν

k = 1, 1 ≤ k ≤ 3. Theequations (1.44a),(1.44b) yield

E5 =1

3, e5 =

5

3, J5 = 1 .

Since J5 = θ5, we found the optimal solution.

Example (Feasibility cuts): We consider the minimization problem

minimize 3x1 + 2x2 + Eξ(15y1 + 12y2)(1.47)

subject to 3y1 + 2y2 ≤ x1 ,

2y1 + 5y2 ≤ x2 ,

0.8ξ1 ≤ y1 ≤ ξ1 ,

0.8ξ2 ≤ y2 ≤ ξ2 ,

x ≥ 10 , y ≥ 0 ,

where ξ = (ξ1, ξ2)T with ξ1 ∈ 4, 6, ξ2 ∈ 4, 8 independently with

probability 1/2 each.This example represents an investment decision in two resources x1 andx2 which are needed in the second stage decision to cover 80 % of thedemand.Note that

c = (3, 2)T , pk =1

2, 1 ≤ k ≤ 2 , q = (15, 12)T ,

W =

(3 22 5

), T =

( −1 00 −1

).

Consider the realization ξ = (6, 8)T . Set r = s = 0, Dr = 0, dr =0, Es = 0, es = 0, and ν = 1, x1 = (0, 0)T , θ1 = −∞, and begin the

Page 23: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 23

iteration with Step 2 which results in a first feasibility cut

3x1 + x2 ≥ 123.2 .

The associated first-stage solution provided by Step 1 is

x1 = (41.067, 0)T .

The following Step 2 gives rise to the feasibility cut

x2 ≥ 22.4 .

Going back to Step 1 and computing the associated first-stage solutiongives

x2 = (33.6, 22.4)T .

Step 2 results in a third feasibility cut

x2 ≥ 41.6

with the associated first-stage solution

x3 = (27.2, 41.6)T

which guarantees feasible second-stage decisions.

Remark: This example illustrates that the formal application of thefeasibility cuts can lead to an inefficient procedure. A closer look atthe problem at hand shows that in case ξ1 = 6 and ξ2 = 8, feasibilityrequires

x1 ≥ 27.2 , x2 ≥ 41.6 .

In other words, a reasonable initial program is given by

minimize 3x1 + 2x2 + Q(x) ,

subject to x1 ≥ 27.2 ,

x2 ≥ 41.6 ,

which guarantees second-stage feasibility.Another particular case where second-stage feasibility is guaranteed(and thus Step 2 of the L-shaped method can be skipped) is a two-stage stochastic linear program with complete recourse, i.e., thereexists y ≥ 0 such that Wy = t for all t ∈ Rm2 .For further specific cases where the structure of the program simplifiessecond-stage feasibility, we refer to [1].

Page 24: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

24 Ronald H.W. Hoppe

Example: We illustrate the implementation of the L-shaped methodfor the following example:

n1 = 1 , n2 = 6 , m1 = 0 , m2 = 3 ,

c = 0 , W =

1 −1 −1 −1 0 00 1 0 0 1 00 0 1 0 0 1

.

The random variable ξ has K = 2 independent realizations with prob-ability 1/2 each which are given by

ξ1 = (q1, h1, T1)T , ξ2 = (q2, h2, T2)

T ,

where

q1 = (1, 0, 0, 0, 0, 0)T , q2 = (3/2, 0, 2/7, 1, 0, 0)T ,

h1 = (−1, 2, 7)T , h2 = (0, 2, 7)T ,

T1 = (1, 0, 0)T , T2 = T1 .

We note that for ξ1, the recourse function is given by

Q1(x) =

−x− 1 , x ≤ −10 , x ≥ −1

,

whereas for ξ2 we obtain

Q2(x) =

−1.5x , x ≤ 00 , 0 ≤ x ≤ 2

27(x− 2) , 2 ≤ x ≤ 9

x− 7 , x ≥ 9

.

We further impose the constraints

−20 ≤ x ≤ +20 .

A closer look at the problem reveals that x = 0 is an optimal solution.

Step 0 (Initialization): We choose x0 ≤ −1.

Iteration 1: We obtain

x1 = −2 , θ1 is omitted , New cut: θ ≥ −0.5− 1.25x .

Iteration 2: The second iteration yields

x2 = 20 , θ2 = −25.5 , New cut: θ ≥ −3.5 + 0.5x .

Iteration 3: The computations result in

x3 =12

7, θ3 = −37

14, New cut: θ ≥ 0 .

Page 25: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 25

Iteration 4: We get

x4 ∈ [−2

5, 7] , θ4 = 0 .

If we choose x4 ∈ [0, 2], iteration 4 terminates. Otherwise, more itera-tions are needed.

1.3.3 Finite termination property

In this section, we prove that the L-shaped method terminates aftera finite number of steps, provided ξ is a finite random variable. Inparticular, we show that

• a finite number of feasibility cuts (1.40b) is required either toprovide a feasible vector within the second stage feasible setK2 = x ∈ Rn1 | Q(x) < ∞ (cf. (1.16)) or to detect infeasibil-ity of the problem,

• a finite number of optimality cuts (1.40c) is needed to end upwith an optimal solution of (1.37),(1.38), provided there existfeasible points x ∈ K2.

We first recall the definition of Q(x) in (1.37)

Q(x) = EωQ(x, ξ(ω)) ,

Q(x, ξ(ω)) = minyq(ω)T y | Wy = h(ω)− T (ω)x , y ≥ 0 .

Lemma 1.11 (Representation of subgradients)

Let πνk , k ∈ 1, · · · , K, be an optimal multiplier associated with an

optimal solution xν of the minimization problem (1.43). Then, thereholds

(1.48) −(πνk)T Tk ∈ ∂Q(xν , ξk) .

Proof: The proof is left as an exercise. ¤

Proposition 1.12 (Finite termination of Step 2)

After a finite number of sub-steps, Step 2 (feasibility cuts) of the L-shaped method either terminates with feasible points x ∈ K2 or detectsinfeasibility of (1.37),(1.38).

Proof: We introduce the set

(1.49) pos W := t ∈ lRm2 | t = Wy, y ≥ 0and note that

(1.50) x ∈ K2 ⇐⇒ hk − Tkx ∈ pos W , 1 ≤ k ≤ K .

Page 26: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

26 Ronald H.W. Hoppe

Given a first stage solution xν , in Step 2 the minimization problem(1.41) tests whether hk − Tkx

ν ∈ pos W for all 1 ≤ k ≤ K, or if

hk − Tkxν /∈ pos W for some k ∈ 1, · · · , K. In the latter case, there

exists a hyperplane separating hk − Tkxν and pos W , i.e., there exists

σ ∈ Rm2 such that

σT t ≤ 0 , t ∈ pos W and σT (hk − Tkxν) > 0

We remark that σ can be chosen as a multiplier σνk

associated with

(1.41), since

(σνk)T W ≤ 0 and (σν

k)T (hk − Tkx

ν) > 0 .

On the other hand, we have

(1.51) x ∈ K2 =⇒ (σνk)T (hk − Tkx) ≤ 0 , 1 ≤ k ≤ K .

Due to the finiteness of ξ, there is only a finite number of optimal basesof problem (1.41) and hence, there are only finitely many constraints(σν

k)T (hk − Tkx) ≤ 0. Consequently, after a finite number of sub-stepswe either find feasible points x ∈ K2 or detect infeasibility. ¤

Proposition 1.13 (Finite termination of Step 3)

Assume feasibility of (1.37),(1.38). Then, after a finite number of sub-steps, Step 3 (optimality cuts) of the L-shaped method terminates withan optimal solution.

Proof: We first observe that (1.37),(1.38) can be equivalently statedas

minimize cT x + Θ(1.52)

Q(x) ≤ Θ ,

subject to x ∈ K1 ∩K2 ,

where K1 is the first stage feasible set K1 := x ∈ Rn1 | Ax = b (cf.(1.16)).We further note that in Step 3 of the L-shaped method we compute thesolution of (1.43) along with an associated multiplier πν

k . We know fromthe duality theory of linear programming (cf., e.g., Theorem 1.3(ii) in[4]) that

(1.53) Q(xν , ξk) = (πνk)T (hk − Tkx

ν) , 1 ≤ k ≤ K .

Moreover, taking

v ∈ ∂Q(xν , ξk) ⇐⇒ vT (x− xν) + Q(xν , ξk) ≤ Q(x, ξk) , x ∈ Rn1

into account, it follows from (1.48) in Lemma 1.11 that

(1.54) (πνk)T Tk(x

ν − x) + Q(xν , ξk) ≤ Q(x, ξk)

Page 27: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 27

Using (1.53) in (1.54), we find

(1.55) Q(x, ξk) ≥ (πνk)T (hk − Tkx) .

Denoting by T,h and πν the random variables with realizations Tk, hk

and πνk , 1 ≤ k ≤ K, respectively, and taking the expectations in (1.53)

and (1.55), we get

Q(xν) = E(πν)T (h−Txν) =K∑

k=1

pk(πνk)T (hk − Tkx

ν)

and

Q(x) = E(πν)T (h−Tx) =K∑

k=1

pk(πνk)T (hk − Tkx) .

It follows that a pair (x, Θ) is feasible for (1.52) if and only if

Θ ≥ Q(x) ≥ E(πν)T (h−Tx) ,

which corresponds to (1.40d).On the other hand, if a pair (xν , Θν) is optimal for (1.52), then

Q(xν) = Θν = E(πν)T (h−Txν) .

Consequently, at each sub-step of Step 3 we either find Θν ≥ Q(xν)which means that we found an optimal solution, or we have Θν < Q(xν)which means that we have to continue with a new first stage solutionxν+1 and associated multipliers πν+1

k , 1 ≤ k ≤ K for (1.43). Since thereis only a finite number of optimal bases associated with (1.43), therecan be only a finite number of different combinations of the multipliersand hence, Step 3 must terminate after a finite number of sub-stepswith an optimal solution for (1.37),(1.38). ¤

Unifying the results of Proposition 1.12 and Proposition 1.13, we arriveat the following finite convergence result:

Theorem 1.14 (Finite convergence of the L-shaped method)

Assume that ξ is a finite random variable. Then, after a finite num-ber of steps the L-shaped method either terminates with an optimalsolution or proves infeasibility of (1.37),(1.38).

1.3.4 The multicut version of the L-shaped method

An alternative to Step 3 of the L-shaped method, where optimalitycuts are computed with respect to the K realizations of the second-stage program and then aggregated to one cut (cf. (1.44a),(1.44b)),

Page 28: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

28 Ronald H.W. Hoppe

one can impose multiple cuts which leads to the following multicutL-shaped algorithm:

Multicut L-shaped algorithm

Step 0: Set r = ν = 0 and sk = 0, 1 ≤ k ≤ K.

Step 1: Set ν = ν + 1 and solve the linear program

minimize z(x) := cT x +K∑

k=1

θk ,(1.56a)

subject to Ax = b ,(1.56b)

D` x ≥ d` , ` = 1, · · · , r ,(1.56c)

E`(k) x + θk ≥ e`(k) , `(k) = 1, · · · , s(k) ,(1.56d)

1 ≤ k ≤ K ,

x ≥ 0 .(1.56e)

Let (xν , θν1 , · · · , θν

K) be an optimal solution of (1.56a)-(1.56e).In case there are no constraints (1.56d) for some k ∈ 1, · · · , K, weset θν

k = −∞.

Step 2 (Feasibility cuts): Step 2 is performed as in Step of theL-shaped method.

Step 3 (Optimality cuts): For 1 ≤ kleK solve the linear programs(1.43) and denote by πν

k the optimal multiplier associated with the k-thproblem. Check whether

(1.57) θνk < pk(π

νk)T (hk − Tkx

ν) .

If (1.57) is satisfied, define

Es(k)+1 = pk(πνk)T Tk ,(1.58a)

es(k)+1 = pk(πνk)T hk ,(1.58b)

set s(k) = s(k) + 1, and return to Step 1.If (1.57) does not hold true for any 1 ≤ k ≤ K, stop the algorithm: xν

is an optimal solution.

Example: We consider the same example as in Chapter 1.3.2:

Step 0 (Initialization): We choose x0 ≤ −1.

Page 29: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 29

Iteration 1: We compute

x1 = −2 , θ11andθ1

2 are omitted ,

New cuts: θ1 ≥ −0.5− 0.5x ,

θ2 ≥ −3

4x .

Iteration 2: We obtain

x2 = 20 , θ21 = −10.5 , θ2

2 = −15 ,

New cuts: θ1 ≥ 0 ,

θ2 ≥ −3.5 + 0.5x .

Iteration 3: The computations yield

x3 = 2.8 , θ31 = 0 , θ3

2 = −2.1 ,

New cut: θ2 ≥ 1

7(x− 2) .

Iteration 4: We get

x4 = 0.32 , θ41 = 0 , θ4

2 = −0.24 ,

New cut: θ2 ≥ 0 .

Iteration 5: This iteration reveals

x5 = 0 , θ51 = 0 , θ5

2 = 0 .

The algorithm terminates with x5 = 0 as an optimal solution.

1.3.5 Inner linearization methods

We consider the dual linear program with respect to (1.40a)-(1.40e)in Steps 1-3 of the L-shaped method:Find (ρ, σ, π) such that

maximize ζ = ρT b +r∑

`=1

σ`d` +s∑

`=1

π`e`,(1.59a)

subj. to ρT A +r∑

`=1

σ`D` +s∑

`=1

π`E` ≤ cT ,(1.59b)

s∑

`=1

π` = 1 , σ` ≥ 0, 0 ≤ ` ≤ r , π` ≥ 0, 0 ≤ ` ≤ s .(1.59c)

The dual program (1.59a)-(1.59c) involves

Page 30: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

30 Ronald H.W. Hoppe

• multipliers σ`, 0 ≤ ` ≤ r, on extreme rays (directions ofrecession) of the duals of the subproblems,

• multipliers π`, 0 ≤ ` ≤ s, on the expectations of extremepoints of the duals of the subproblems.

Indeed, let us consider the following dual linear program with respectto (1.43a)-(1.43c) in Step 3 (optimality cuts) of the L-shaped method:

maximize w = πT (hk − Tkxν) ,(1.60a)

subject to πT W ≤ qT .(1.60b)

From the duality theory of linear programming we know (cf. Theorem1.3 and Theorem 1.4 in [4]):

• If (1.60a)-(1.60b) is unbounded for all k, then there exists amultiplier σν such that

(σν)T W ≤ 0 , (σν)T (hk − Tkxν) > 0 ,

and the primal problem (1.41a)-(1.41c) does not have a feasiblesolution.

• If (1.60a)-(1.60b) is bounded for some k, then (1.60a)-(1.60b)is feasible and (1.41a)-(1.41c) has an optimal (primal) solution.

In other words, Step 2 of the L-shaped method is equivalent to checkingwhether (1.60a)-(1.60b) is unbounded for any k. If so, D`+1 and d` arecomputed according to (1.42a) and (1.42b) of the L-shaped methodand added to the constraints (feasibility cuts).

Next, consider the case when (1.60a)-(1.60b) has a finite optimal valueσν

k for all k, i.e., (1.41a)-(1.41c) is solvable for all k. In Step 3 of theL-shaped method, we then compute E`+1 and e`+1 according to (1.44a)and (1.44b) and add them to the constraints (optimality cuts). In thedual approach (1.59a)-(1.59c) we proceed in the same way.

Conclusion: Steps 1-3 of the L-shaped method are equivalent to solv-ing (1.59a)-(1.59c) as a master program and the maximization prob-lems (1.60a)-(1.60b) as subproblems.

This leads to the following so-called inner linearization algorithm:

Step 0: Set r = s = ν = 0.

Step 1: Set ν = ν +1. Compute (ρν , σν , πν) as the solution of (1.59a)-(1.59c) and (xν , θν) as the associated dual solution.

Step 2: For 1 ≤ k ≤ K solve the subproblems (1.60a)-(1.60b).If all subproblems are solvable, go to Step 3.If an infeasible subproblem (1.60a)-(1.60b) is found, stop the algorithm

Page 31: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 31

(the stochastic program is ill-posed).If an unbounded solution with extreme ray σν is found for some k,compute

Dr+1 := (σν)T Tk ,(1.61a)

dr+1 := (σν)T hk ,(1.61b)

set r = r + 1 and return to Step 1.

Step 3: Compute Es+1 and es+1 according to

Es+1 :=K∑

k=1

pk(πνk)T Tk ,(1.62a)

es+1 :=K∑

k=1

pk(πνk)T hk ,(1.62b)

If

(1.63) es+1 − Es+1xν − θν ≤ 0 ,

then stop: (ρν , σν , πν) and (xν , θν) are optimal solutions.On the other hand, if

(1.64) es+1 − Es+1xν − θν > 0 ,

set s = s + 1 and return to Step 1.

Remark: The name inner linearization algorithm stems from thefact that (1.59a)-(1.59c) can be interpreted as an inner linearizationof the dual program of the original L-shaped method in the sense ofthe Dantzig-Wolfe decomposition of large-scale linear programs [3].Since we solve dual problems instead of primal ones, finite conver-gence follows directly from the corresponding property of the originalL-shaped method.

Remark: With regard to the dimensionality of the problems, inmany applied cases we have n1 À m1. Then, the primal L-shapedmethod has basis matrices of order at most m1 +m2 compared to basismatrices of order n1 + n1 for the dual version. Therefore, the original(primal) L-shaped method is usually preferred.

The inner linearization method can be applied directly to the pri-mal problem (1.14), if the technology matrix T is deterministic. In this

Page 32: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

32 Ronald H.W. Hoppe

case, (1.14) can be replaced by

minimize z = cT x + Ψ(χ)(1.65)

Ax = b ,

Tx− χ = 0 ,

x ≥ 0 ,

where

Ψ(χ) := Eξψ(χ, ξ(ω)) ,(1.66)

ψ(χ, ξ(ω)) := miny≥0

q(ω)T y | Wy = h(ω)− χ .

The idea is to construct an inner linearization of the substitute Ψ(χ)of the recourse function using the generalized programming ap-proach from [2] by replacing Ψ(χ) with the convex hull of points Ψ(χ`)computed within the iterations of the algorithm. In particular, eachiteration generates an extreme point of a region of linearity for Ψ.We define Ψ+

0 (ζ) as follows

(1.67) Ψ+0 (ζ) := lim

α→∞Ψ(χ + αζ)−Ψ(χ)

α.

Generalized programming algorithm for two-stage stochasticlinear programs:

Step 0: Set s = r = ν = 0.

Step 1: Set ν = ν + 1 and solve the master linear program

minimize zν = cT x +r∑

i=1

µiΨ+0 (ζ i) +

s∑i=1

λiΨ(χi),(1.68a)

subj. to Ax = b ,(1.68b)

subj. to Tx−r∑

i=1

µiζi −

s∑i=1

λiχi = 0,(1.68c)

s∑

`=1

λi = 1 , λi ≥ 0, 1 ≤ i ≤ s ,(1.68d)

x ≥ 0 , µi ≥ 0, 1 ≤ i ≤ r .(1.68e)

If (1.68a)-(1.68e) is infeasible or unbounded, stop the algorithm.Otherwise, compute the solution (xν , µν , λν) and the dual solution(σν , πν , ρν).

Page 33: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 33

Step 2: Solve the subproblem

(1.69) minimize Ψ(χ) + (πν)T χ − ρν over χ .

If (1.69) has a solution χs+1, go to Step 3.On the other hand, if (1.69) is unbounded, there exists a recessiondirection ζr+1 such that for some χ

Ψ(χ + αζr+1) + (πν)T (χ + αζr+1) → −∞ as α → +∞ .

In this case, we define

(1.70) Ψ+0 (ζr+1) = lim

α→+∞Ψ(χ + αζ)−Ψ(χ)

α.

We set r = r + 1 and and return to Step 1.

Step 3: Check whether

(1.71) Ψ(χs+1) + (πν)T χs+1 − ρν ≥ 0 .

If (1.71) holds true, stop the algorithm: (xν , µν , λν) is an optimal so-lution of (1.65).Otherwise, set s = s + 1 and return to Step 1.

Remark: In case of a two-stage stochastic linear problem, thesubproblem (1.69) can be reformulated according to

minimizeK∑

k=1

pkqTk yk + (πν)T χ − ρν ,(1.72a)

subject to Wyk + χ = hk , 1 ≤ k ≤ K ,(1.72b)

subject to yk ≥ 0 , 1 ≤ k ≤ K .(1.72c)

In general, for k ∈ 1, · · · , K the subproblem (1.72) can not be fur-ther separated into different subproblems so that the original L-shapedmethod should be preferred. However, for problems with a simplerecourse, for each k the function Ψ(χ) is separable into components,and (1.72) can be split into K independent subproblems.

Finite termination of the generalized programming algorithm will beestablished by means of the following result:

Proposition 1.15 (Characterization of extreme points)Every optimal extreme point (y∗1, · · · , y∗K , χ∗) of the feasible region of(1.72) corresponds to an extreme point χ∗ of

(1.73) χ | Ψ(χ) = (π∗)T χ + θ ,

Page 34: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

34 Ronald H.W. Hoppe

where π∗ =∑K

k=1 π∗k and each π∗k, 1 ≤ k ≤ K, is an extreme point of

(1.74) πk | πTk W ≤ qT

k .

Proof. Let (y∗1, · · · , y∗K , χ∗) be an optimal extreme point in (1.72).Then, we have

(1.75) qTk y∗k ≤ qT

k yk for all yk with Wyk = ξk − χ∗ .

We claim that

(1.76) y∗k also is an extreme point of yk |Wyk = ξk−χ∗ , yk ≥ 0 .

Indeed, if (1.76) is not true, we could choose y∗k as the arithmetic meanof two distinct feasible y1

i and y2i .

It follows from (1.75) and (1.76) that y∗k has a complementary dualsolution π∗k, i.e.,(1.77)π∗k is an extreme point of πk | πT

k W ≤ qTk and (qT

k−(π∗k)T W )y∗k = 0 .

The proof of the assertion will now be provided by a contradictionargument: We assume that (y∗1, · · · , y∗K , χ∗) is not an extreme pointof the linearity region

(1.78) Ψ(χ) = (π∗)T χ+θ , θ = Ψ(χ∗)−(π∗)T χ∗ , π∗ =K∑

k=1

π∗k .

Then, χ∗ must be the convex combination of two χi, 1 ≤ i ≤ 2,i.e.,χ∗ = λχ1 + (1− λ)χ2, 0 < λ < 1, where

Ψ(χi) = (π∗)T χi + θ , 1 ≤ i ≤ 2 .

We also claim that(1.79)

Ψ(χi) =K∑

k=1

qTk yi

k , where qTk yi

k = (π∗k)T (hk − χi) , 1 ≤ i ≤ 2 .

Indeed, if (1.79) does not hold true, due to the feasibility of π∗k wewould have

qTk yi

k > (π∗k)T (hk − χi) , 1 ≤ i ≤ 2 ,

which would imply

Ψ(χi) > (π∗)T χi + θ .

We remark that (1.79) also implies

(1.80) ((π∗k)T W − qT

k )(λy1k + (1− λ)y2

k) = 0 ,

Page 35: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 35

and hence, due to the fact that y∗k is an extreme point of the feasibleset of the k-th recourse problem,

(1.81) y∗k = λy1k + (1− λ)y2

k .

It follows from (1.81) that

(y∗1, · · · , y∗k, χ∗) = λ(y1

1, · · · , y1k, χ

1) + (1− λ)(y21, · · · , y2

k, χ2) ,

which contradicts that (y∗1, · · · , y∗k, χ∗) is an extreme point. ¤

Proposition 1.16 (Characterization of extreme rays)Any extreme ray associated with subproblem (1.72) is an extreme rayof a region of linearity of Ψ(χ).

Proof. The proof is left as an exercise. ¤

Theorem 1.17 (Finite convergence of the generalized program-ming algorithm)The application of the generalized programming algorithm to problem(1.65) with subproblem (1.72) converges after a finite number of steps.

Proof. Each solution of subproblem (1.72) generates a new linearregion extreme value. For a new extreme ray ζr+1 we have

(1.82) Ψ+0 (ζr+1) + (πν)T ζr+1 < 0 ,

whereas

(1.83) Ψ+0 (ζ i) + (πν)T ζ i < 0 , 1 ≤ i ≤ r .

As far as a new extreme point χs+1 is concerned, such a point is addedto the constraints only if

(1.84) Ψ(χs+1) + (πν)T χs+1 − ρν < 0 ,

whereas

(1.85) Ψ(χi) + (πν)T χi − ρν < 0 , 1 ≤ i ≤ s .

Since the number of regions satisfying (1.82)-(1.85) is finite and eachregion has a finite number of extreme rays and extreme points, thealgorithm must terminate after a finite number of steps. ¤

Page 36: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

36 Ronald H.W. Hoppe

1.4 Two-stage stochastic nonlinear programs with recourse

In this section, we consider a generalization of stochastic two-stage lin-ear programs with recourse to problems involving nonlinear functions.

Definition 1.7 (Two-stage stochastic nonlinear program withrecourse)

Let I ⊂ N be an index set, f 1 : Rn1 → R, g1i : Rn1 → R, 1 ≤ i ≤ m1

and f 2(·, ω) : Rn2 → R, g2i (·, ω) : Rn2 → R, 1 ≤ i ≤ m2, t

2i (·, ω) : Rn1 →

R, 1 ≤ i ≤ m2, functions that are continuous for any fixed ω ∈ I andmeasurable in ω for any fixed first argument. Then, a minimizationproblem of the form

minimize z = f 1(x) + Q(x) ,(1.86)

subject to g1i (x) ≤ 0 , 1 ≤ i ≤ m1 ,

g1i (x) = 0 , m1 + 1 ≤ i ≤ m1 ,

where Q(x) = Eω[Q(x, ω)] and

Q(x, ω) = inf f 2(y(ω), ω) ,(1.87)

subject to t2i (x, ω) + g2i (y(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,

t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 ,

is called a two-stage stochastic nonlinear program with recoursefunction Q(x).

Remark: We note that the assumptions in Definition 1.7 imply thatQ(x, ω) is measurable in ω for all x ∈ Rn1 and hence, the recourse Q(x)is well defined.

Definition 1.8 (First and second stage feasible sets)

The set(1.88)K1 := x ∈ Rn1 | g1

i (x) ≤ 0 , 1 ≤ i ≤ m1 , g1i (x) = 0 , m1+1 ≤ i ≤ m1

is called the first-stage feasible set, whereas the sets

K2(ω) := x ∈ Rn2 | There exists y(ω) such that(1.89)

t2i (x, ω) + g2i (y(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,

t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 ,

K2 := x ∈ Rn2 | Q(x) < ∞are referred to as the second-stage feasible sets.

Page 37: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 37

Remark: We note that unlike the situation in Chapter 1.2 we do notconsider fixed recourse in order to keep an utmost amount of generality.We also remark that in case of fixed recourse the optimality conditionsdepend on the form of the objective and constraint functions anyway.

In order to ensure necessary and sufficient optimality conditionsfor (1.86),(1.87) we impose the following assumptions on the objectiveand constraint functions:

A1 (Convexity):

• The functions f 1, g1i : Rn1 → R, 1 ≤ i ≤ m1, are convex,

• The functions g1i : Rn1 → R, m1 + 1 ≤ i ≤ m1, are affine,

• The functions f 2(·, ω), g2i (·, ω) : Rn2 → R, 1 ≤ i ≤ m2, are

convex for all ω ∈ I,• The functions g2

i (·, ω) : Rn2 → R, m2 + 1 ≤ i ≤ m2, are affinefor all ω ∈ I,

• The functions t2i (·, ω) : Rn1 → R, 1 ≤ i ≤ m2, are convex for allω ∈ I,

• The functions t2i (·, ω) : Rn1 → R, m2 + 1 ≤ i ≤ m2, are affinefor all ω ∈ I.

A2 (Slater Condition):

• If Q(x) < ∞, for almost all ω ∈ I, there exists y(ω) such that

t2i (x, ω) + g2i (y(ω), ω) < 0 , 1 ≤ i ≤ m2

and

t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 .

Theorem 1.18 (Convexity of the recourse function)

Assume that assumptions (A1) and (A2) hold true. Then, the re-course function Q(x, ω) is a convex function of x for all ω ∈ I.

Proof. Suppose that yi, 1 ≤ i ≤ 2, are solutions of (1.87) with respectto xi, 1 ≤ i ≤ 2, respectively. We have to show that for λ ∈ [0, 1]

(1.90) y = λy1 + (1− λ)y2 solves (1.87) for x := λx1 + (1− λ)x2

as well. This is an easy consequence of the assumptions and left as anexercise. ¤

Theorem 1.19 (Lower semicontinuity of the recourse function)

If the second-stage feasible set K2(ω) is bounded for all ω ∈ I, thenthe recourse function Q(·, ω) is lower semicontinuous for all ω ∈ I.

Page 38: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

38 Ronald H.W. Hoppe

Proof. We have to show that for any x ∈ Rn1 and ω ∈ I we have

(1.91) Q(x, ω) ≤ lim infx→x

Q(x, ω) .

Suppose that xνν∈N is a sequence in Rn1 such that xν → x as ν →∞. Without restriction of generality, we may assume that Q(xν , ω) <∞, ν ∈ N, since otherwise we will find a subsequence N′ with thatproperty.By our assumptions, we find yν(ω), ν ∈ N, such that

t2i (xν , ω) + g2

i (yν(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,

t2i (xν , ω) + g2

i (yν(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 .

The boundedness assumption and the continuity of the functions implythat the sequence yν(ω)ν∈N is bounded, and hence, there exist y(Ω)and a subsequence N′ ⊂ N, such that yν(ω) → y(ω) as ν ∈ N′ → ∞and

t2i (x, ω) + g2i (y(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,

t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 .

Consequently, x is feasible and

Q(x, ω) ≤ f 2(x, ω) = limν→∞

f 2(xν , ω) = limν→∞

Q(xν , ω) ,

which gives the assertion. ¤

Corollary 1.20 (Further properties of the feasible set and therecourse function)

The feasible set K2 is a closed, convex set, and the expected recoursefunction Q is a lower semicontinuous convex function in x.

Proof. The proof is an immediate consequence of the assumptions andthe previous results. ¤

Remark: In general, it is difficult to decompose the feasible set K2

according to

(1.92) K2 =⋂ω∈I

K2(ω) .

A particular example, where such a decomposition can be realized isfor quadratic objective functionals f 2.

Theorem 1.21 (Optimality conditions)

Suppose that there exists

(1.93) x ∈ ri(dom(f 1(x))) ∩ ri(dom(Q(x)) ,

Page 39: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 39

where ri stands for the relative interior, and further assume that

g1i (x) < 0 , 1 ≤ i ≤ m1 ,(1.94)

g1i (x) = 0 , m1 + 1 ≤ i ≤ m1 .(1.95)

Then, x∗ ∈ Rn1 is optimal in (1.86) if and only if x∗ ∈ K1 and thereexist multipliers µ∗i ≥ 0, 1 ≤ i ≤ m1, and λ∗i , m1 + 1 ≤ i ≤ m1, suchthat

0 ∈ ∂f 1(x∗) + ∂Q(x∗) +

m1∑i=1

µ∗i ∂g1i (x

∗) +

m1∑i=m1+1

λ∗i ∂g1i (x

∗) ,(1.96)

µ∗i g1i (x

∗) = 0 , 1 ≤ i ≤ m1 .(1.97)

Proof. The assertions can be deduced readily by applying the generaltheory of nonlinear programming (cf., e.g., Chapter 2 in [4]). ¤

Remark: As far as decompositions of the subgradient ∂Q(x) intosubgradients of Q(x, ω) are concerned, in much the same way as inTheorem 1.4 of Chapter 1.2 one can show

(1.98) ∂Q(x) = Eω[∂Q(x, ω)] + N(K2, x) ,

where N(K2, x) stands for the normal cone (cf. Chapter 2 in [4]).Note that (1.98) reduces to

∂Q(x) = Eω[∂Q(x, ω)]

in case of relatively complete recourse, i.e., if K1 ⊂ K2.For the derivation of optimality conditions for problems with explicitconstraints on non-anticipativity we refer to Theorem 39 in [1].

Page 40: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

40 Ronald H.W. Hoppe

1.5 Piecewise quadratic form of the L-shaped method

Although such a systematic approach like SQP (Sequential QuadraticProgramming) for deterministic nonlinear problems does not exist ina stochastic environment, one might be tempted to reduce a generaltwo-stage stochastic nonlinear program to the successive solution oftwo-stage stochastic quadratic programs. In this section, we considera piecewise quadratic form of the L-shaped method for such two-stagestochastic quadratic programs which are of the form:

minimize z(x) = cT x +1

2xT Cx +(1.99)

+ Eξ[min(qT (ω)y(ω) +1

2yT (ω)D(ω)y(ω))] ,

subject to Ax = b ,

T (ω)x + Wy(ω) = h(ω) ,

x ≥ 0 , y(ω) ≥ 0 .

Here, A ∈ Rm1×n1 , C ∈ Rn1×n1 ,W ∈ Rm2×n2 are fixed matrices, andc ∈ Rn1 is a fixed vector. Moreover, D ∈ Rn2×n2 , T ∈ Rm2×n1 arerandom matrices and q ∈ Rn2 , h ∈ Rm2 are random vectors.

For a given realization ξ(ω), ω ∈ I, the associated recourse functioncan be defined according to

Q(x, ξ(ω)) := minqT (ω)y(ω) +1

2yT (ω)D(ω)y(ω) |(1.100)

T (ω)x + Wy(ω) = h(ω) , y(ω) ≥ 0 ,

which may attain the values ±∞, if the problem is unbounded or in-feasible, respectively.The expected recourse function is given by

(1.101) Q(x) := EξQ(x, ξ) .

We use the convention +∞+ (−∞) = +∞.The first-stage and second-stage feasible sets K1 and K2 are defined asin the previous section.

We impose the following assumptions on the data of the two-stagestochastic quadratic program:

A3:

• The random vector ξ has a discrete distribution.

Page 41: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 41

A4:

• The matrix C is positive semi-definite and the matrices D(ω)are positive semi-definite for all ω ∈ I,

• The matrix W has full row rank.

Remark: We note that (A13) implies the decomposability of thesecond-stage feasible set K2, whereas (A4) ensures convexity of therecourse functions.

An important feature of the problem is that the recourse function Q(x)is piecewise quadratic, i.e., the second-stage feasible set K2 can bedecomposed into polyhedral sets, called cells, such that Q(x) is qua-dratic on each cell.

Example: We consider the following two-stage stochastic quadraticprogram:

minimize z(x) = 2x1 + 3x2 + Eξ min−6.5y1 −(1.102)

− 7y2 +1

2y2

1 + y1y2 +1

2y2

2 ,

subject to 3x1 + 2x2 ≤ 15 ,

x1 + 2x2 ≤ 8 ,

y1 ≤ x1 , y2 ≤ x2 ,

y1 ≤ ξ1 , y2 ≤ ξ2 ,

x1 + x2 ≥ 0 , x ≥ 0 , y ≥ 0 .

We assume that ξ1 ∈ 2, 4, 6 and ξ2 ∈ 1, 3, 5 are independent ran-dom variables with probability 1/3 for each realization.

The problem can be interpreted as a portfolio problem where theissue is to minimize quadratic penalties on deviations from a meanvalue.

In the second stage of the problem, for small values of the assets xi, 1 ≤i ≤ 2, it is optimal to sell, i.e., yi = xi, 1 ≤ i ≤ 2. Indeed, for

(x1, x2) ∈ C1 := (x1, x2) | 0 ≤ x1 ≤ 2 , 0 ≤ x2 ≤ 1the optimal solution of the second stage is yi = xi, 1 ≤ i ≤ 2 for allpossible values of ξ, whence

Q(x) = Q(x, ξ) = −6.5x1 − 7x2 +1

2x2

1 + x1x2 +1

2x2

2 , (x1, x2) ∈ C1 .

Page 42: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

42 Ronald H.W. Hoppe

Definition 1.9 (Finite closed convex complex)

A finite closed convex complex K is a finite collection of closedconvex sets Cν , 1 ≤ ν ≤ M , called the cells of K, such that

int(Cν1 ∩ Cν2) = ∅, ν1 6= ν2 .

Definition 1.10 (Piecewise convex program)

A piecewise convex program is a convex program of the form

(1.103) inf z(x) | x ∈ S ,

where z : Rn → R is convex and S is a closed convex subset of dom(z)with int(S) 6= ∅.Definition 1.11 (Piecewise quadratic function)

Consider a piecewise convex program and assume that K is a finiteclosed convex complex with cells Cν , 1 ≤ ν ≤ M such that

S ⊆M⋃

ν=1

Cν ,

(1.104a)

either z ≡ −∞, or for each cell Cν , 1 ≤ ν ≤ M, there exists(1.104b)

a convex function zν : S → R which is continuously differen-

tiable on an open set containing Cν such that

z(x) = zν(x) , x ∈ Cν , 1 ≤ ν ≤ M ,

∇zν(x) ∈ ∂z(x) , x ∈ Cν , 1 ≤ ν ≤ M .

A piecewise quadratic function z : S → R is a piecewise convexfunction where on each cell Cν , 1 ≤ ν ≤ M, the function zν is a qua-dratic form.

Example: In the above example, both Q(x) and z(x) are piecewisequadratic. In particular, we have

Q(x) = −6.5x1 − 7x2 +1

2x2

1 + x1x2 +1

2x2

2 , (x1, x2) ∈ C1 ,

z(x) = −4.5x1 − 4x2 +1

2x2

1 + x1x2 +1

2x2

2 , (x1, x2) ∈ C1 .

Page 43: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 43

The numerical solution of two-stage stochastic piecewise quadraticprograms is taken care of by the following PQP algorithm:

Step 0 (Initialization): Compute a decomposition of the state spaceS according to (1.104a) into cells Cν , 1 ≤ ν ≤ M , set S1 = S andchoose x0 ∈ S1.

Step 1 (Iteration loop): For µ ≥ 1:

Step 1.1 (Determination of current cell): Determine Cµ suchthat xµ−1 ∈ Cµ and specify the quadratic form zµ(·) on Cµ accordingto (1.104b).

Step 1.2 (Solution of minimization subproblems): Compute

xµ = arg minx∈Sµ

zµ(x) ,(1.105a)

wµ = arg minx∈Cµ

zµ(x) .(1.105b)

If wµ is the limiting point of a ray on which zµ(·) is decreasing to−∞, stop the algorithm: The original PQP is unbounded. Otherwise,continue with Step 1.3.

Step 1.3 (Optimality check): Check the optimality condition

(1.106) (∇zµ(wµ))T (xµ − wµ) = 0 .

If (1.106) is satisfied, stop the algorithm: wµ is the optimal solution ofthe PQP. Otherwise, continue with Step 1.4.

Step 1.4 (Update of state space): Compute

(1.107) Sµ+1 := Sµ ∩ x | (∇zµ(wµ))T x ≤ (∇zµ(wµ))T wµ ,

set µ := µ + 1, and go to Step 1.1.

Theorem 1.22 (Finite termination of the PQP algorithm)

Under assumptions (A3) and (A4), the PQP algorithm terminates af-ter a finite number of steps with the solution of the two-stage stochasticpiecewise quadratic program.

Proof. We refer to [5]. ¤

Remark: Details concerning the appropriate construction of finiteclosed convex complexes K satisfying (1.104a),(1.104b) can be foundin [5].

Page 44: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

44 Ronald H.W. Hoppe

Example: We illustrate the implementation of the PQP algo-rithm for the piecewise quadratic program (1.102).

Step 0 (Initialization): We choose the cells Cν , 1 ≤ ν ≤ 8, as shownin Fig. 1.2.We further define

S1 = S = x ∈ R2 | 3x1 + 2x2 ≤ 15 , x1 + 2x2 ≤ 8 , x1, x2 ≥ 0

choose

x0 = (0, 0)T

and set µ = 1.

-

6

6

¾

PPq´3

u u

u

u

u

u

u

u

x2

5

4

3

2

1

x0

x3 x1

x1

w1 = w3w4 = x5 = w5

x2

x4

w2

oc1oc3

oc2

oc4

54321

Fig. 1.2. Finite closed convex complex and PQP cuts

Page 45: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 45

Iteration 1: The cell containing x0 is

C1 = x ∈ R2 | 0 ≤ x1 ≤ 2 , 0 ≤ x2 ≤ 1 ,

and the quadratic function z1 on C1 is

z1(x) = −4.5x1 − 4x2 +1

2x2

1 + x1x2 +1

2x2

2 .

Solving (1.105a),(1.105b) by means of the KKT-conditions results in

x1 = (4.5, 0)T , w1 = (2, 1)T ∈ C1 ,

whence

∇z1(w1) = (−1.5,−1)T , (∇z1(w

1))T (x1 − w1) = −2.75 6= 0 ,

and

S2 = S1 ∩ x ∈ R2 | − 1.5x1 − x2 ≤ −4 .

Iteration 2: The cell containing x1 is

C2 = x ∈ R2 | 4 ≤ x1 ≤ 6 , 0 ≤ x2 ≤ 1 , x1 + x2 ≤ 6.5 .

The quadratic function z2 on C2 is

z2(x) = −29

3− 1

6x1 − 2x2 +

1

6x2

1 +1

3x1x2 +

1

2x2

2 .

The solution of (1.105a),(1.105b) by means of the KKT-conditionsgives

x2 = (22

19,43

19)T , w2 = (4,

2

3)T ∈ C2 ,

whence

∇z2(w2) = (

25

18, 0)T , (∇z2(w

2))T (x2 − w2) 6= 0 ,

and

S3 = S2 ∩ x ∈ R2 | 25

18x1 ≤ 100

18 .

Iteration 3: The cell containing x2 is

C3 = x ∈ R2 | 0 ≤ x1 ≤ 2 , 1 ≤ x2 ≤ 3 .

The quadratic function z3 on C3 is

z3(x) = −13

6− 25

6x1 − 5

3x2 + 2x1x2 +

1

3x2

2 .

Via the solution of (1.105a),(1.105b) we obtain

x3 = (4, 0)T , w3 = w1 = (2, 1)T ,

Page 46: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

46 Ronald H.W. Hoppe

whence

S4 = S3 ∩ x ∈ R2 | − 3

2x1 +

1

3x2 ≤ −8

3 .

Iteration 4: The cell containing x3 is

C4 = x ∈ R2 | 2 ≤ x1 ≤ 4 , 0 ≤ x2 ≤ 1 .

The quadratic function z4 on C4 is

z4(x) = −11

3− 7

3x1 − 10

3x2 +

1

3x2

1 +2

3x1x2 +

1

3x2

2 .

The solution of (1.105a),(1.105b) yields

x4 ≈ (2.18, 1.81)T , w4 = (2.5, 1)T ,

whence

S5 = S4 ∩ x ∈ R2 | − 2

3x1 ≤ −2

3 .

Iteration 5: The cell containing x4 is

C5 = x ∈ R2 | 2 ≤ x1 ≤ 4 , 1 ≤ x2 ≤ 3 ∩ S .

The quadratic function z5 on C5 is

z4(x) = −101

18− 19

9x1 − 11

9x2 +

1

3x2

1 +4

9x1x2 +

1

3x2

2 .

The solution of (1.105a),(1.105b) yields

x5 = w5 = (2.5, 1)T ,

which is an optimal solution of the problem.

References

[1] J.R. Birge and F. Louveaux; Introduction to Stochastic Programming.Springer, Berlin-Heidelberg-New York, 1997

[2] G.B. Dantzig; Linear Programming and Extensions. Princeton UniversityPress, Princeton, NJ, 1963

[3] G.B. Dantzig and P. Wolfe; The decomposition principle for linear programs.Operations Research, 8, 101–111, 1960

[4] R.H.W. Hoppe; Optimization I. Handout of the course held in Fall 2006. Seehttp://www.math.uh.edu

[5] F.V. Louveaux; Piecewise convex programs. Math. Prgramming, 15, 53–62,1978

[6] D. Walkup and R.J.-B. Wets; Stochastic programs with recourse. SIAM J.Appl. Math., 15, 1299–1314, 1967

[7] D. Walkup and R.J.-B. Wets; Stochastic programs with recourse II: on thecontinuity of the objective. SIAM J. Appl. Math., 17, 98–103, 1969

Page 47: Chapter 1 Stochastic Linear and Nonlinear Programming …rohop/Spring_15/Chapter1.pdf · Chapter 1 Stochastic Linear and Nonlinear Programming 1.1 Optimal land usage under stochastic

Optimization Theory II, Spring 2007 ; Chapter 1 47

[8] R.J.-B. Wets; Characterization theorems for stochastic programs. Math. Pro-gramming, 2, 166–175, 1972

[9] R.J.-B. Wets; Stochastic programs with fixed recourse: the equivalent deter-ministic problem. SIAM Rev. 16, 309–339, 1974

[10] R.J.-B. Wets; Stochastic programming. In: Optimization (G.L. Nemhauser etal.; eds.), Handbboks in Operations Research and Management Science, Vol.I, North-Holland, Amsterdam, 1990