30
Sparse Optimization Lecture 1: Review of Convex Optimization Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know convex optimization background various standard concepts and terminology reformulating 1 optimization and its optimality conditions 1 / 30

Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

  • Upload
    lytram

  • View
    220

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Sparse Optimization

Lecture 1: Review of Convex Optimization

Instructor: Wotao Yin

July 2013

online discussions on piazza.com

Those who complete this lecture will know

• convex optimization background

• various standard concepts and terminology

• reformulating `1 optimization and its optimality conditions

1 / 30

Page 2: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Resources for convex optimization

• Book: Convex Analysis by T. Rockafellar

• Book: Convex Optimization by S. Boyd and L. Vandenberge, along with

online videos and slides

• Book: Introductory Lectures on Convex Optimization: A Basic Course by

Y. Nesterov

• A large number of online lecture slides, notes, and videos online

2 / 30

Page 3: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Review: mathematical optimization

Formulation

minimizex

f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m,

hj(x) = 0, j = 1, . . . , p.

• decision variables: x = (x1, . . . , xn)

• objective function: f0 : Rn → R

• functions defining inequality constraints: fi : Rn → R, i = 1, . . . ,m

• functions defining equality constraints: hj : Rn → R, j = 1, . . . , p

3 / 30

Page 4: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Terminology

• feasible solutions: all points x satisfying the constraints

fi(x) ≤ 0 (i = 1, . . . ,m) and hj(x) = 0 (j = 1, . . . , p).

• feasible set: the set of all feasible solutions, often denoted by X .

• (global) (optimal) solution: feasible solution x∗ that achieves the

minimum objective value among all feasible solutions.

• local (optimal) solution: feasible solution x∗ that achieves the minimal

objective value among a neighborhood around x∗, say, the set

{x : ‖x− x∗‖ ≤ δ} ∩ X for some δ > 0

4 / 30

Page 5: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Some examples

• Find two nonnegative numbers whose product up to 9 and so that the sum

of the two numbers is a maximum.

• Find the largest area a rectangular region provided that its perimeter is not

great than 100.

• Given a sequences of nonnegative numbers, find a start point and an end

point so that the partial sum of the sequence between the two points is a

maximum.

5 / 30

Page 6: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Solving optimization problems

In general, everything is optimization, but optimization problems are generally

not solvable, even by the most powerful computers.

Some classes of problems can be solved efficiently and reliably, for example:

• least-squares problems

• linear programming problems

• quadratic programming problems

• convex optimization problems

• a subclass of network-flow problems

• submodular function minimization

(.... more, but not much more...)

• some sparse optimization problems

6 / 30

Page 7: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Least squares

minimizex

‖Ax− b‖22

• analytic solution x∗ = (ATA)−1ATb if A has independent columns

• reliable and efficient algorithms and software packages

• computation time proportional to n2k (A ∈ Rk×n), less if structured

• a mature technology (unless A is huge and/or distributed)

7 / 30

Page 8: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Linear programming (LP)

minimizex

cTx

subject to aTi x ≤ bi, i = 1, . . . ,m

• no analytic formula for solutions

• reliable and efficient algorithms and software packages

• computation time proportional to n2m if m ≥ n, less with structured data

• a mature technology

• a few standard tricks used to convert problems (with `1 or `∞, piecewise

linear functions) into linear programs

8 / 30

Page 9: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Convex optimization

minimizex

f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m,

Ax = b.

where objective and constraint functions are convex, i.e.,

fi(θx1 + (1− θ)x2) ≤ θfi(x1) + (1− θ)fi(x2)

for all i = 0, 1, . . . ,m, θ ∈ (0, 1) and x1,x2 ∈ domfi.

• no analytic solution

• relatively reliable and efficient algorithms and software packages

• computation time (roughly) proportional to max{n3, n2m,F}, where F is

cost of evaluating fi’s and their first and second derivatives.

• almost a technology

Least-squares and linear programs are special convex programs.

9 / 30

Page 10: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Non-convex optimization problems

General optimization problems are non-convex

minimizex

f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m

Local optimization methods

• find a solution which minimizes f0 among feasible solutions near it

• fast and handle large problems

• require initial guess

• provide no information about the distance to global optima

Global optimization methods

• find the global solution

• worst-case complexity grows exponentially with problem size.

These methods are often based on solving convex subproblems.

10 / 30

Page 11: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Brief history of convex optimization

theory (convex analysis): 1900–1970s

algorithms

• 1947: simplex algorithm for linear programming (Dantzig)

• 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . )

• 1970s: ellipsoid method and other subgradient methods

• 1980s: polynomial-time interior-point methods for linear programming

(Karmarkar 1984)

• late 1980s-2000s: polynomial-time interior-point methods for nonlinear

convex optimization (Nesterov & Nemirovski 1994)

• recently: revived interests in first-order (gradient-based) algorithms,

solving big-data problems

applications

• before 1990: mostly in operations research; few in engineering

• since 1990: many new applications in engineering (control, signal

processing, communications, circuit design, . . . ); new problem classes

(semidefinite and second-order cone programming, robust optimization,

sparse optimization)11 / 30

Page 12: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Convex set

A set C is called convex if the segment between any two points in C lies

entirely in C.

Formally, C is convex if for any x1,x2 ∈ C and θ ∈ (0, 1), we have

θx1 + (1− θ)x2 ∈ C.

Examples:

• Euclidean balls: B(xc, r) = {x : ‖x− xc‖2 ≤ r}

• ellipsoid: {x : (x− xc)TP−1(x− xc) ≤ 1} with P being symmetric

positive definite

• polyhedra: {x : Ax ≤ b, Cx = d} with A ∈ Rm×n, C ∈ Rp×n

• several operations preserving convexity: intersection; affine function;

perspective function; linear-fractional functions.

In most time, recognizing a convex set is not a problem.

12 / 30

Page 13: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Convex functions

A function f : Rn → R is convex if domf is convex and for any

x1,x2 ∈ domf and θ ∈ (0, 1), we have

f(θx1 + (1− θ)x2) ≤ θf(x1) + (1− θ)f(x2).

f is concave if (−f) is convex.

f is strictly convex if domf is convex and

f(θx1 + (1− θ)x2) < θf(x1) + (1− θ)f(x2).

13 / 30

Page 14: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Examples of convex functions

Examples in Rn

• affine function f(x) = aTx+ b

• norms: ‖x‖p = (∑ni=1 |xi|

p)1/p for p ≥ 1; ‖x‖∞ = maxi |xi|.

Examples in Rm×n

• affine function

f(X) = tr(ATX) + b =m∑i=1

n∑j=1

AijXij + b

• spectral norm (maximum singular value)

f(X) = ‖X‖2 = σmax(X) = (λmax(XTX))1/2

• nuclear norm

f(X) = ‖X‖∗ =min{m,n}∑

i=1

σi

14 / 30

Page 15: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Terminology

• extended value: f may take on value +∞, reduce the need of domf

• proper: exists x so that f(x) is finite

• lower semi-continuous (LSC): lim infx→x0 f(x) ≥ f(x0)

• closed: f has a closed epigraph

epif = {(x, µ) : µ ∈ R, µ ≥ f(x)}

• Lemma: a proper convex function is closed if and only if its is LSC

• subdifferential

∂f(x) = {p : f(y) ≥ f(x) + 〈p,y − x〉 ∀y}

- each p ∈ ∂f(x) is called a subgradient

- if f ∈ C1 near x, then ∂f(x) = {∇f(x)}

15 / 30

Page 16: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

First-order condition

f is differentiable if the derivative

∇f(x) =[∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

]Texists at every x ∈ domf .

first-order condition: differentiable f with convex domain is convex iff

f(y) ≥ f(x) +∇f(x)T (y − x) for all x,y ∈ domf

first-order condition: subdifferentiable f with convex domain is convex iff

f(y) ≥ f(x) + pT (y − x) for all x,y ∈ domf, p ∈ ∂f(x)

first-order optimality condition: x∗ ∈ argmin f(x)⇐⇒ 0 ∈ ∂f(x∗)

16 / 30

Page 17: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Second-order condition

f is twice differentiable if Hessian ∇2f(x) ∈ Sn defined by

∇2f(x)ij =∂2f(x)

∂xi∂xj, i, j = 1, . . . , n,

exists at every x ∈ domf .

second-order condition: twice differentiable f with convex domain is convex iff

∇2f(x) � 0, for all x ∈ domf.

Furthermore, if ∇2f(x) � 0 for all x ∈ domf , then f is strictly convex.

Very useful in general convex optimization but not so in sparse optimization

17 / 30

Page 18: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Convex optimization formulation

Standard-form convex optimization problem

minimizex

f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m,

Ax = b.

- the feasible set of a convex optimization problem is convex.

- f0, f1, . . . , fm are convex; equality constraints are affine.

18 / 30

Page 19: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Local and global solutions

Theorem

Any local solution of a convex problem is a global solution.

Proof.

Suppose that x is a local solution and y is a global solution and that

f0(y) < f0(x).

Consider z = θy + (1− θ)x. Since

f0(z) ≤ θf0(x) + (1− θ)f0(y) < f0(x)

for any θ ∈ (0, 1) and ‖x− z‖ can be arbitrary small, x cannot be a local

solution.

19 / 30

Page 20: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Optimality criterion for differentiable f0

Since the feasible set is convex and

f0(y) ≥ f0(x) +∇f0(x)T (y − x),

x is optimal iff it is feasible and

∇f0(x)T (y − x) ≥ 0 for all feasible y.

20 / 30

Page 21: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

• unconstrained problem: x is optimal if and only if

x ∈ domf0, ∇f0(x) = 0

• equality constrained problem:

minimizex

f0(x) subject to Ax = b

x is optimal if and only if their exist a vector ν such that

x ∈ domf0, Ax = b, ∇f0(x) +AT ν = 0

• minimization over nonnegative orthant

minimizex

f0(x) subject to x ≥ 0

x is optimal if and only if

x ∈ domf0, x ≥ 0,

{∇f0(x)i ≥ 0 xi = 0

∇f0(x)i = 0 xi > 0

21 / 30

Page 22: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Unconstrained problem with nondifferentiable f0

g is a subgradient of a convex function f at x ∈ domf if

f(y) ≥ f(x) + gT (y − x), ∀y ∈ domf.

the subdifferential ∂f(x) of f at x is the set of all subgradients:

∂f(x) = {g : gT (y − x) ≤ f(y)− f(x) ∀y ∈ domf}

x∗ minimizes f0(x) if and only if

0 ∈ ∂f0(x∗)

22 / 30

Page 23: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Optimality criteria in the general case

Standard form problem (not necessarily convex)

minimizex

f0(x)

s.t. fi(x) ≤ 0, i = 1, . . . ,m

hj(x) = 0, j = 1, . . . , p

domain D, optimal value p∗

Lagrangian: L : Rn × Rm × Rp → R with domL = D × Rm × Rp,

L(x, λ, ν) = f0(x) +

m∑i=1

λifi(x) +

p∑j=1

νjhj(x)

• λi is Lagrange multiplier associated with fi(x) ≤ 0

• νj is Lagrange multiplier associated with hj(x) = 0

23 / 30

Page 24: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Lagrange dual function

Lagrange dual function: g : Rm × Rp → R,

g(λ, ν) = infx∈D

L(x, λ, ν)

= infx∈D

(f0(x) +

m∑i=1

λifi(x) +

p∑j=1

νjhj(x)

)

g is concave, can be −∞ for some λ, ν.

Lower bound property: if λ � 0, then g(λ, ν) ≤ p∗.

24 / 30

Page 25: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Dual problem

Lagrange dual problem

maximizeλ,ν

g(λ, ν)

subject to λ � 0

• finds the best lower bound n p∗

• a convex optimization problem; optimal value denoted d∗

• λ, ν are dual feasible if λ � 0, (λ, ν) ∈ domg

Strong duality: d∗ = p∗

• does not hold in general

• (usually) holds for convex problems

• conditions that guarantee strong duality in convex problems are called

constraint qualifications

25 / 30

Page 26: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Slater’s constraint qualification

Strong duality holds for a convex problem

minimizex

f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m

Ax = b

if it is strictly feasible, i.e.,

∃x ∈ intD : fi(x) < 0, i = 1, . . . ,m, Ax = b

• also guarantees that the dual optimum is attained (if p∗ > −∞)

• linear inequalities do not need to hold with strict inequality

• there are many other types of constraint qualifications

• some non-convex optimization problems may have strong duality

26 / 30

Page 27: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Complementary slackness

Assume strong duality holds, x∗ is primal optimal, and (λ∗, ν∗) is dual optimal

f0(x∗) = g(λ∗, ν∗) = inf

x

(f0(x) +

m∑i=1

λ∗i fi(x) +

p∑j=1

ν∗j hj(x)

)

≤ f0(x∗) +m∑i=1

λ∗i fi(x∗) +

p∑j=1

ν∗j hj(x∗)

≤ f0(x∗)

• x∗ minimizes L(x, λ∗, ν∗)

• λ∗i fi(x∗) = 0 for i = 1, . . . ,m (complementary slackness)

27 / 30

Page 28: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Karush-Kuhn-Tucker (KKT) conditions

KKT conditions for a problem with differentiable fi, hj :

• primal constraints: fi(x) ≤ 0, i = 1, . . . ,m, hj(x) = 0, j = 1, . . . , p

• dual constraints: λ � 0

• complementary slackness: λifi(x) = 0, i = 1, . . . ,m

• gradient of Lagrangian with respect to x vanishes:

∇f0(x) +m∑i=1

λi∇fi(x) +p∑i=1

νj∇hj(x) = 0

If x̃, λ̃, ν̃ satisfy KKT for a convex problem, then they are optimal.

28 / 30

Page 29: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Exercise: constrained/unconstrained `1 problem

Consider two `1 problems

minimizex

‖x‖1

subject to Ax = b

l ≤ x ≤ u

and

minimizex

‖x‖1 +λ

2‖Ax− b‖22

subject to l ≤ x ≤ u

Exercises: derive their

• LP or QP formulations

• Lagrange dual problems

• KKT conditions

29 / 30

Page 30: Sparse Optimization - Lecture 1: Review of Convex …wotaoyin/summer2013/slides/Lec01_Convex... · Lecture 1: Review of Convex Optimization ... A large number of online lecture slides,

Exercise: total variation problem∗

The discrete total variation of a vector x ∈ Rn is

TV(x) =

n−1∑i=1

|xi+1 − xi|.

Consider problem

minimizex

TV(x) +λ

2‖Ax− b‖22

subject to l ≤ x ≤ u

Exercises: derive its

• SOCP formulation (refer to Sec.4.2.2 of Boyd&Vandenberghe)

• Lagrange dual problem

• KKT conditions

30 / 30