29
Ellipsoid Method ellipsoid method convergence proof inequality constraints feasibility problems Prof. S. Boyd, EE364b, Stanford University

ellipsoid method slides - Stanford Engineering Everywhere · ellipsoid method addresses both issues, but retains theoretical efficiency Prof. S. Boyd, EE364b, Stanford University

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Ellipsoid Method

• ellipsoid method

• convergence proof

• inequality constraints

• feasibility problems

Prof. S. Boyd, EE364b, Stanford University

Ellipsoid method

• developed by Shor, Nemirovsky, Yudin in 1970s

• used in 1979 by Khachian to show polynomial solvability of LPs

• each step requires cutting-plane or subgradient evaluation

• modest storage (O(n2))

• modest computation per step (O(n2)), via analytical formula

• efficient in theory; slow but steady in practice

Prof. S. Boyd, EE364b, Stanford University 1

Motivation

in cutting-plane methods

• serious computation is needed to find next query point(typically O(n2m), with not small constant)

• localization polyhedron grows in complexity as algorithm progresses(we can, however, prune constraints to keep m proportional to n, e.g.,m = 4n)

ellipsoid method addresses both issues, but retains theoretical efficiency

Prof. S. Boyd, EE364b, Stanford University 2

Ellipsoid algorithm for minimizing convex function

idea: localize x⋆ in an ellipsoid instead of a polyhedron

1. at iteration k we know x⋆ ∈ E(k)

2. set x(k+1) := center(E(k)); evaluate g(k) ∈ ∂f(x(k+1))(g(k) = ∇f(x(k)) if f is differentiable)

3. hence we know

x⋆ ∈ E(k) ∩ {z | g(k+1)T (z − x(k+1)) ≤ 0}

(a half-ellipsoid)

4. set E(k+1) := minimum volume ellipsoid coveringE(k) ∩ {z | g(k+1)T (z − x(k+1)) ≤ 0}

Prof. S. Boyd, EE364b, Stanford University 3

E(k)

x(k+1)

g(k+1)

E(k+1)

compared to cutting-plane methods:

• localization set doesn’t grow more complicated

• easy to compute query point

• but, we add unnecessary points in step 4

Prof. S. Boyd, EE364b, Stanford University 4

Properties of ellipsoid method

• reduces to bisection for n = 1

• simple formula for E(k+1) given E(k), g(k+1)

• E(k+1) can be larger than E(k) in diameter (max semi-axis length), butis always smaller in volume

• vol(E(k+1)) < e−12n vol(E(k))

(volume reduction factor degrades rapidly with n, compared to CG orMVE cutting-plane methods)

Prof. S. Boyd, EE364b, Stanford University 5

Example

px(0)px(1)

px(2)

Prof. S. Boyd, EE364b, Stanford University 6

p

x(3)px(4)

px(5)

Prof. S. Boyd, EE364b, Stanford University 7

Updating the ellipsoid

E(x, P ) ={

z | (z − x)TP−1(z − x) ≤ 1}

r x

r x+

r

��

E

@@

@R

E+g

Prof. S. Boyd, EE364b, Stanford University 8

(for n > 1) minimum volume ellipsoid containing half-ellipsoid

E ∩{

z | gT (z − x) ≤ 0}

is given by

x+ = x − 1

n + 1P g̃

P+ =n2

n2 − 1

(

P − 2

n + 1P g̃g̃TP

)

where g̃ = (1/√

gTPg)g

Prof. S. Boyd, EE364b, Stanford University 9

Simple stopping criterion

f(x⋆) ≥ f(x(k)) + g(k)T (x⋆ − x(k))

≥ f(x(k)) + infz∈E(k)

g(k)T (z − x(k))

= f(x(k)) −√

g(k)TP (k)g(k)

second inequality holds since x⋆ ∈ Ek

simple stopping criterion:

g(k)TP (k)g(k) ≤ ǫ =⇒ f(x(k)) − f(x⋆) ≤ ǫ

Prof. S. Boyd, EE364b, Stanford University 10

Basic ellipsoid algorithm

ellipsoid described as E(x, P ) = {z | (z − x)TP−1(z − x) ≤ 1}

given ellipsoid E(x, P ) containing x⋆, accuracy ǫ > 0

repeat1. evaluate g ∈ ∂f(x)

2. if√

gTPg ≤ ǫ, return(x)3. update ellipsoid

3a. g̃ := 1√gT Pg

g

3b. x := x − 1n+1P g̃

3c. P := n2

n2−1

(

P − 2n+1P g̃g̃TP

)

Prof. S. Boyd, EE364b, Stanford University 11

Interpretation

• change coordinates so uncertainty is isotropic (same in all directions),i.e., E is unit ball

• take subgradient step with fixed length 1/(n + 1)

• Shor calls ellipsoid method ‘gradient method with space dilation indirection of gradient’ (which, strangely enough, didn’t catch on)

Prof. S. Boyd, EE364b, Stanford University 12

Example

PWL function f(x) = maxmi=1(a

Ti x + bi), with n = 20, m = 100

0 50 100 150 200−8

−6

−4

−2

0

2

4

BBM

f(x(k)) −p

g(k)T P (k)g(k)

�f(x(k))

f⋆

k

Prof. S. Boyd, EE364b, Stanford University 13

0 500 1000 1500 200010

−4

10−3

10−2

10−1

100

k

f(k

)best−

f⋆

Prof. S. Boyd, EE364b, Stanford University 14

Improvements

• keep track of best upper and lower bounds:

uk = mini=1,...,k

f(x(i)), lk = maxi=1,...,k

(

f(x(i)) −√

g(i)TP (i)g(i))

stop when uk − lk ≤ ǫ

• can propagate Cholesky factor of P(avoids problem of P 6≻ 0 due to numerical roundoff)

Prof. S. Boyd, EE364b, Stanford University 15

0 500 1000 1500 2000−3

−2

−1

0

1

2

3

@ILk

�Uk

f⋆

k

Prof. S. Boyd, EE364b, Stanford University 16

Proof of convergence

assumptions:

• f is Lipschitz: |f(y) − f(x)| ≤ G‖y − x‖• E(0) is ball with radius R

suppose f(x(i)) > f⋆ + ǫ, i = 0, . . . , k

thenf(x) ≤ f⋆ + ǫ =⇒ x ∈ E(k)

since at iteration i we only discard points with f ≥ f(x(i))

Prof. S. Boyd, EE364b, Stanford University 17

from Lipschitz condition,

‖x − x⋆‖ ≤ ǫ/G =⇒ f(x) ≤ f⋆ + ǫ =⇒ x ∈ E(k)

so B = {x | ‖x − x⋆‖ ≤ ǫ/G} ⊆ E(k)

hence vol(B) ≤ vol(E(k)), so

αn(ǫ/G)n ≤ e−k/2nvol(E(0)) = e−k/2nαnRn

(αn is volume of unit ball in Rn)

therefore k ≤ 2n2 log(RG/ǫ)

Prof. S. Boyd, EE364b, Stanford University 18

E(0)

E(k)

x(k)

f(x) ≤ f⋆ + ǫ

B = {x | ‖x − x⋆‖ ≤ ǫ/G}

x⋆

conclusion: for k > 2n2 log(RG/ǫ),

mini=0,...,k

f(x(i)) ≤ f⋆ + ǫ

Prof. S. Boyd, EE364b, Stanford University 19

Interpretation of complexity

since x⋆ ∈ E0 = {x | ‖x − x(0)‖ ≤ R}, our prior knowledge of f⋆ is

f⋆ ∈ [f(x(0)) − GR, f(x(0))]

our prior uncertainty in f⋆ is GR

after k iterations our knowledge of f⋆ is

f⋆ ∈[

mini=0,...,k

f(x(i)) − ǫ, mini=0,...,k

f(x(i))

]

posterior uncertainty in f⋆ is ≤ ǫ

Prof. S. Boyd, EE364b, Stanford University 20

iterations required:

2n2 logRG

ǫ= 2n2 log

prior uncertainty

posterior uncertainty

efficiency: 0.72/n2 bits per gradient evaluation

Prof. S. Boyd, EE364b, Stanford University 21

Deep cut ellipsoid method

minimum volume ellipsoid containing ellipsoid intersected with halfspace

E ∩{

z | gT (z − x) + h ≤ 0}

with h ≥ 0, is given by

x+ = x − 1 + αn

n + 1P g̃

P+ =n2(1 − α2)

n2 − 1

(

P − 2(1 + αn)

(n + 1)(1 + α)P g̃g̃TP

)

where

g̃ =g

gTPg, α =

h√

gTPg

(if α > 1, intersection is empty)

Prof. S. Boyd, EE364b, Stanford University 22

Ellipsoid method with deep objective cuts

0 500 1000 1500 200010

−4

10−3

10−2

10−1

100

f

(k)

best−

f⋆

k

deep cutsshallow cuts

Prof. S. Boyd, EE364b, Stanford University 23

Inequality constrained problems

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . , m

• if x(k) feasible, update ellipsoid with objective cut

gT0 (z − x(k)) + f0(x

(k)) − f(k)best ≤ 0, g0 ∈ ∂f0(x

(k))

f(k)best is best objective value of feasible iterates so far

• if x(k) infeasible, update ellipsoid with feasibility cut

gTj (z − x(k)) + fj(x

(k)) ≤ 0, gj ∈ ∂fj(x(k))

assuming fj(x(k)) > 0

Prof. S. Boyd, EE364b, Stanford University 24

Stopping criterion

if x(k) is feasible, we have lower bound on p⋆ as before:

p⋆ ≥ f0(x(k)) −

g(k)T0 P (k)g

(k)0

if x(k) is infeasible, we have for all x ∈ E(k)

fj(x) ≥ fj(x(k)) + g

(k)Tj (x − x(k))

≥ fj(x(k)) + inf

z∈E(k)g(k)T (z − x(k))

= fj(x(k)) −

g(k)Tj P (k)g

(k)j

Prof. S. Boyd, EE364b, Stanford University 25

hence, problem is infeasible if for some j,

fj(x(k)) −

g(k)Tj P (k)g

(k)j > 0

stopping criteria:

• if x(k) is feasible and

g(k)T0 P (k)g

(k)0 ≤ ǫ (x(k) is ǫ-suboptimal)

• if fj(x(k)) −

g(k)Tj P (k)g

(k)j > 0 (problem is infeasible)

Prof. S. Boyd, EE364b, Stanford University 26

Epigraph ellipsoid method

use deep cut ellipsoid method to solve problem

minimize tsubject to f0(x) ≤ t, fi(x) ≤ 0, i = 1, . . . ,m

with variables (x, t)

• when (x(k), t(k)) infeasible for epigraph problem, use standard deepfeasibility cut

– if f0(x(k)) > t(k), use cut t ≥ gT

0 (x − x(k)) + f0(x(k))

– if fj(x(k)) > 0, use cut gT

j (x − x(k)) + fj(x(k)) ≤ 0

• when (x(k), t(k)) feasible for epigraph problem, use cut t ≤ f0(x(k))

Prof. S. Boyd, EE364b, Stanford University 27

Epigraph ellipsoid example

0 500 1000 1500 200010

−4

10−3

10−2

10−1

100

f

(k)

best−

f⋆

k

epigraph methodnon-epigraph deep cuts

Prof. S. Boyd, EE364b, Stanford University 28