A Secant Method for Nonsmooth Optimization

Asef Nazari

CSIRO Melbourne

CARMA Workshop on Optimization, Nonlinear Analysis, Randomness and RiskNewcastle, Australia

12 July, 2014

Asef Nazari A Secant Method for Nonsmooth Optimization

OutlineBackground

Secant Method

1 BackgroundThe ProblemOptimization Algorithms and componentsSubgradient and Subdifferential

2 Secant MethodDefinitionsOptimality Condition and Descent DirectionThe Secant AlgorithmNumerical Results

Asef Nazari Secant Method

OutlineBackground

Secant Method

The ProblemOptimization Algorithms and componentsSubgradient and Subdifferential

The problem

Unconstrained Optimization Problem

minx∈Rn

f : Rn → Rn

Locally Lipshitz O

Why just unconstrained?

OutlineBackground

Secant Method

The problem

Unconstrained Optimization Problem

minx∈Rn

f : Rn → Rn

Locally Lipshitz O

Why just unconstrained?

OutlineBackground

Secant Method

Transforming Constrained into Unconstrained

Constrained Optimization Problem

minx∈Y

where Y ⊂ Rn.

Distance function

dist(x ,Y ) = miny∈Y‖y − x‖

Theory of penalty function (under some conditions)

minx∈Rn

f (x) + σdist(x , y)

f (x) + σdist(x , y) is a nonsmooth function

OutlineBackground

Secant Method

minx∈Y

where Y ⊂ Rn.

Distance function

minx∈Rn

OutlineBackground

Secant Method

minx∈Y

where Y ⊂ Rn.

Distance function

minx∈Rn

OutlineBackground

Secant Method

Sources of Nonsmooth Problems

Minimax Problemminx∈Rn

f (x) = max1≤i≤m

fi (x)

System of Nonlinear Equationsfi (x) = 0, i = 1, . . . ,m,

we often dominx∈Rn

‖f (x)‖

where f (x) = (f1(x), . . . , fm(x))

OutlineBackground

Secant Method

fi (x)

‖f (x)‖

where f (x) = (f1(x), . . . , fm(x))

OutlineBackground

Secant Method

fi (x)

‖f (x)‖

where f (x) = (f1(x), . . . , fm(x))

OutlineBackground

Secant Method

Structure of Optimization Algorithms

Step 1 Initial Step x0 ∈ Rn

Step 2 Termination Criteria

Step 3 Finding descent direction dk at xk

Step 4 Finding step size f (xk + αkdk) < f (xk)

Step 5 Loop xk+1 = xk + αkdk and go to step 2.

OutlineBackground

Secant Method

Classification of Algorithms Based on dk and αk

Directionsdk = −∇f (xk) Steepest Descent Methoddk = −H−1(xk)∇f (xk) Newton Methoddk = −B−1∇f (xk) Quasi-Newton Method

Step sizes

h(α) = f (xk + αdk)1 exactly solve h′(α) = 0 exact line search2 loosly solve it, inexact line serach

Trust region methods

OutlineBackground

Secant Method

Classification of Algorithms Based on dk and αk

Directionsdk = −∇f (xk) Steepest Descent Methoddk = −H−1(xk)∇f (xk) Newton Methoddk = −B−1∇f (xk) Quasi-Newton Method

Step sizesh(α) = f (xk + αdk)

1 exactly solve h′(α) = 0 exact line search2 loosly solve it, inexact line serach

Trust region methods

OutlineBackground

Secant Method

When the objective function is smooth

Descent direction f ′(xk , d) < 0

f ′(xk , d) = 〈∇f (xk), d〉 ≤ 0 descent direction

−∇f (xk) steepest descent direction

‖∇f (xk)‖ ≤ ε good stopping criteria (First Order NecessaryConditions)

OutlineBackground

Secant Method

Subgradient

Basic Inequality for convex differentiable function:

f (x) ≥ f (y) +∇f (y)T .(x − y) ∀x ∈ Dom(f )

OutlineBackground

Secant Method

Subgradient

Basic Inequality for convex differentiable function:

f (x) ≥ f (y) +∇f (y)T .(x − y) ∀x ∈ Dom(f )

OutlineBackground

Secant Method

Subgradient

g ∈ Rn is a subgradient of a convex function f at y if

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subdifferential

Set of all subgradients of f at x , ∂f (x), is calledsubdifferential.

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}for Lipschitz functions (Clarke)

∂f (x0) = conv{lim∇f (xi ) : xi −→ x0 ∇f (xi )exists}

OutlineBackground

Secant Method

Subdifferential

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}

f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}for Lipschitz functions (Clarke)

OutlineBackground

Secant Method

Subdifferential

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}

for Lipschitz functions (Clarke)

OutlineBackground

Secant Method

Subdifferential

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}for Lipschitz functions (Clarke)

OutlineBackground

Secant Method

Optimality Condition

For a smooth convex function

f (x∗) = infx∈Dom(f )

f (x)⇐⇒ 0 = ∇f (x∗)

For a nonsmooth convex function

f (x)⇐⇒ 0 ∈ ∂f (x∗)

OutlineBackground

Secant Method

Optimality Condition

For a smooth convex function

f (x)⇐⇒ 0 = ∇f (x∗)

For a nonsmooth convex function

f (x)⇐⇒ 0 ∈ ∂f (x∗)

OutlineBackground

Secant Method

Directional derivative

In Smooth Case

f ′(xk , d) = limλ→0+

f (xk + λd)− f (xk)

λ= 〈∇f (xk), d〉

In nonsmooth case

f ′(xk , d) = limλ→0+

λ= sup

g∈∂f (xk )〈gT , d〉

OutlineBackground

Secant Method

Directional derivative

In Smooth Case

f ′(xk , d) = limλ→0+

λ= 〈∇f (xk), d〉

In nonsmooth case

f ′(xk , d) = limλ→0+

λ= sup

g∈∂f (xk )〈gT , d〉

OutlineBackground

Secant Method

Descent Direction

In Smooth Case, if f ′(xk , d) = 〈∇f (xk), d〉 < 0. (−∇f (xk)steepest descent direction)

In nonsmooth case, if f ′(xk , d) < 0. It is proved that

d = −argming∈∂f (xk )

‖g‖

OutlineBackground

Secant Method

Descent Direction

In Smooth Case, if f ′(xk , d) = 〈∇f (xk), d〉 < 0. (−∇f (xk)steepest descent direction)

In nonsmooth case, if f ′(xk , d) < 0. It is proved that

‖g‖

OutlineBackground

Secant Method

In Summary

Optimality condition

f (x)⇐⇒ 0 ∈ ∂f (x∗)

Directional Derivative

f ′(xk , d) = limλ→0+

λ= sup

g∈∂f (xk )〈gT , d〉

Steepest Descent Direction

‖g‖

OutlineBackground

Secant Method

Methods for nonsmooth problems

The way we treat ∂f (x) leads to different types of algorithms innonsmooth optimization

Subgradient method (one subgradient at each iteration)

Bundle method (a bundle of subgradients in each iteration)

Gradient Sampling method

smoothing technique

OutlineBackground

Secant Method

DefinitionsOptimality Condition and Descent DirectionThe Secant AlgorithmNumerical Results

Definition of r -secant

S1 = {g ∈ Rn : ‖g‖ = 1} the unit sphere in Rn

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

g ∈ S1 and gj = gmax , v ∈ ∂f (x + rg)

s = s(x , g , r) ∈ Rn where

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

is called an r -secant of the function f at a point x in thedirection g .

OutlineBackground

Secant Method

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

OutlineBackground

Secant Method

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

OutlineBackground

Secant Method

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

OutlineBackground

Secant Method

Facts about r -secant

If n = 1,

s =f (x + rg)− f (x)

Mean Value Theorem for r -secants

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Set of all possible r -secants of the function f at the point x

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

1 compact for r > 02 (x , r) 7→ Sr f (x), r > 0 is closed and upper semi-continuous

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

1 compact for r > 0

2 (x , r) 7→ Sr f (x), r > 0 is closed and upper semi-continuous

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

OutlineBackground

Secant Method

A new set

Sc0 f (x) = conv{v ∈ Rn : ∃(g ∈ S1, rk → +0, k → +∞) :

v = limk→+∞

s(x , g , rk)},

For regular and semismooth function f at a point x ∈ Rn:

∂f (x) = Sc0 f (x).

OutlineBackground

Secant Method

A new set

Sc0 f (x) = conv{v ∈ Rn : ∃(g ∈ S1, rk → +0, k → +∞) :

v = limk→+∞

s(x , g , rk)},

For regular and semismooth function f at a point x ∈ Rn:

∂f (x) = Sc0 f (x).

OutlineBackground

Secant Method

Optimality condition

Let x ∈ Rn be a local minimizer of the function f and it isdirectionally differentiable at x . Then

0 ∈ Sc0 f (x).

x ∈ Rn is an r -stationary point for a function f on Rn if0 ∈ Sc

r f (x).

x ∈ Rn is an (r , δ)-stationary point for a function f on Rn if0 ∈ Sc

r f (x) + Bδ where

Bδ = {v ∈ Rn : ‖v‖ ≤ δ}.

OutlineBackground

Secant Method

Descent Direction

If x ∈ Rn is not an r -stationary point of a function f on Rn,

0 6∈ Scr f (x).

we can compute a descent direction using the set Scr f (x)

Let x ∈ Rn and for given r > 0

min{‖v‖ : v ∈ Scr f (x)} = ‖v0‖ > 0.

Then for g0 = −‖v0‖−1v0

f (x + rg0)− f (x) ≤ −r‖v0‖.

minimize ‖v‖2

subjectto v ∈ Scr f (x).

OutlineBackground

Secant Method

Descent Direction

0 6∈ Scr f (x).

min{‖v‖ : v ∈ Scr f (x)} = ‖v0‖ > 0.

f (x + rg0)− f (x) ≤ −r‖v0‖.

minimize ‖v‖2

OutlineBackground

Secant Method

Descent Direction

0 6∈ Scr f (x).

min{‖v‖ : v ∈ Scr f (x)} = ‖v0‖ > 0.

f (x + rg0)− f (x) ≤ −r‖v0‖.

minimize ‖v‖2

OutlineBackground

Secant Method

An algorithm for descent direction (Alg1)

step 1. compute an r -secant s1 = s(x , g1, r) . Set W 1(x) = {s1} andk = 1.

step 2. ‖wk‖2 = min{‖w‖2 : w ∈ coW k(x)}. If

‖wk‖ ≤ δ,

then stop. Otherwise go to Step 3.

step 3. gk+1 = −‖wk‖−1wk .

step 4. f (x + rgk+1)− f (x) ≤ −cr‖wk‖, then stop. Otherwise go toStep 5.

step 5. sk+1 = s(x , gk+1, r) with respect to the direction gk+1,construct the set W k+1(x) = co

{W k(x)

⋃{sk+1}

}, set

k = k + 1 and go to Step 2.

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

The secant method (r , δ)-stationary point(Alg 2)

step 1. x0 ∈ Rn and set k = 0.

step 2. Apply Alg 1 at x = xk

Either ‖vk‖ ≤ δ or for the search direction gk = −‖vk‖−1vk

step 3. If ‖vk‖ ≤ δ then stop. Otherwise go to Step 4.

step 4. xk+1 = xk + σkgk , where σk is defined as follows

σk = arg max{σ ≥ 0 : f (xk + σgk)− f (xk) ≤ −c2σ‖vk‖

Set k = k + 1 and go to Step 2.

OutlineBackground

Secant Method

OutlineBackground

Secant Method

OutlineBackground

Secant Method

OutlineBackground

Secant Method

The secant method (Alg 3)

step 2. Apply Alg 2 to xk for r = rk and δ = δk . This algorithmterminates after a finite number of iterations p > 0 and as aresult the algorithm finds (rk , δk)-stationary point xk+1.

step 3. Set k = k + 1 and go to Step 2.

OutlineBackground

Secant Method

OutlineBackground

Secant Method

OutlineBackground

Secant Method

Convergence Theorem

Theorem

Assume that the function f is locally Lipschitz and the set L(x0) isbounded for starting points x0 ∈ Rn. Then every accumulationpoint of the sequence {xk} belongs to the setX 0 = {x ∈ Rn : 0 ∈ ∂f (x)}.

OutlineBackground

Secant Method

Results

OutlineBackground

Secant Method

Results

A Secant Method for Nonsmooth Optimization · 2014. 7. 12. · A Secant Method for Nonsmooth...

Documents

Nonsmooth optimization: Subgradient Method, Proximal ...yuxiangw/classes/CS292A... · Nonsmooth optimization: Subgradient Method, Proximal Gradient Method Yu-Xiang Wang CS292A (Based

A BFGS-SQP method for nonsmooth, nonconvex, …A BFGS-SQP method for nonsmooth, nonconvex, constrained optimization and its evaluation using relative minimization profiles Frank E

Subgradient methods in nonsmooth nonconvex optimization

Convex Analysis and Nonsmooth Optimizationburke/crs/516/notes/... · 2017-03-29 · Convex Analysis and Nonsmooth Optimization Aleksandr Y. Aravkin, James V. Burke Dmitriy Drusvyatskiy

Dynamic optimization with a nonsmooth, nonconvex ... · Takashi Kamihigashi · Santanu Roy Dynamic optimization with a nonsmooth, nonconvex technology: The case of a linear objective

Proximal Gradient Method for Nonsmooth Optimization over

A feasible directions method for nonsmooth convex optimization

Nonsmooth Optimization

WIAS PGMO Workshop Nonsmooth and Stochastic Optimization ... · Nonsmooth and Stochastic Optimization with Applications to Energy Management ... It is our pleasure to host the joint

Nonsmooth Optimization Algorithms for Multicast ...Nonsmooth Optimization Algorithms for Multicast Beamforming in Content-Centric Fog Radio Access Networks Huy T. Nguyen, Hoang Duong

Nonsmooth Optimization for Statistical Learning with ...lear.inrialpes.fr/people/pierucci/defense.pdf · Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization

Michael Ulbrich Nonsmooth Newton-like Methods for ... · Michael Ulbrich Nonsmooth Newton-like Methods for Variational Inequalities and Constrained Optimization Problems in Function

A doubly stabilized bundle method for nonsmooth convex optimization

Nonsmooth Optimization for Multidisk H ∞ Synthesis - Pierre Apkarian

Nonsmooth Dynamic Optimization of Systems with Varying ... · Nonsmooth Dynamic Optimization of Systems with Varying Structure by Mehmet Yunt Submitted to the Department of Mechanical

Nonsmooth, nonconvex optimization, implications for deep ...fmalgouy/enseignement/downlo… · Nonsmooth, nonconvex optimization, implications for deep-learning S. Gerchinovitz1,

NONSMOOTH OPTIMIZATION VIA BFGS - NYU …overton/papers/pdffiles/bfgs_inexactLS.pdf · NONSMOOTH OPTIMIZATION VIA BFGS ... and the sublevel set {x : f(x) ≤f(x ... addressing the

Proximal Gradient Method for Nonsmooth Optimization over ...Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold Shixiang Chen Shiqian May Anthony Man-Cho

Nonsmooth Analysis and Optimization

Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization