Nonsmooth Optimization: Bundle...

Introduction Nonsmooth Optimization Standard Bundle Method The Goal of Research

Nonsmooth Optimization:

Bundle Methods

Kaisa Joki

kjjoki@utu.fi

University of Turku

5.11.2013

1/ 26 Kaisa Joki Bundle Methods

Outline

1 Introduction

2 Nonsmooth OptimizationConvex Nonsmooth AnalysisOptimality Condition

3 Standard Bundle MethodTheoretical BackgroundAlgorithm

4 The Goal of Research

Nonsmooth Optimization and Application Areas

In nonsmooth optimization (NSO) functions don't need to bedi�erentiable

The general problem is that we are minimizing functions thatare typically not di�erentiable at their minimizers

This type of problems arise in many �elds of applications

� Economics� Mechanics� Engineering� Computational chemistry and biology� Optimal control� Data mining

Cause of Nonsmoothness

Inherent: Original phenomenon contains variousdiscontinuities and irregularities.

Technological: Caused by some extra technologicalconstraints which may cause a nonsmooth dependencebetween variables and functions.

Methodological: Some algorithms for constrainedoptimization may lead to a nonsmooth problem (for example,the exact penalty function method).

Numerical: So called "sti� problems" which are analyticallysmooth but numerically unstable and behave like nonsmoothproblems.

Di�culties Caused by Nonsmoothness

The gradient does not exist at every point so we

can't utilize the classical theory of optimization because itrequires certain di�erentiability and strong regularityassumptions

can't use smooth (gradient based) methods because they maylead failure in convergence, in optimality test or in gradientapproximation

have di�culties de�ning a descent direction

Nonsmooth Optimization Problem

General problem

Lets consider a nonsmooth optimization problem of the form{min f(x)

s. t. x ∈ X,

Set X ⊆ Rn is a set of feasible solutions

Objective function f : Rn → R is

� not required to have continuous derivatives� supposed to be locally Lipschitz continuous on the set X

In the following the objective function f is assumed to be convex.

Convex Analysis

De�nition 1

Let function f : Rn → R be convex. The subdi�erential of f atx ∈ Rn is a set

∂f(x) ={ξ ∈ Rn | f(y) ≥ f(x) + ξT (y − x) for all y ∈ Rn

Each vector ξ ∈ ∂f(x) is called a subgradient of f at point x.

The subdi�erential ∂f(x) is

a nonempty, convex and compact set

a generalization of a classical derivative because if f : Rn → R isconvex and di�erentiable at x ∈ Rn, then ∂f(x) = {∇f(x)}

Convex Analysis

Subdi�erential

Convex Analysis

Theorem 2

If f : Rn → R is convex then for all y ∈ Rn

f(y) = max{f(x) + ξT (y − x) |x ∈ Rn, ξ ∈ ∂f(x)

Theorem 3

The direction d ∈ Rn is a descent direction for a convex function

f : Rn → R at x ∈ Rn if

ξTd < 0 for all ξ ∈ ∂f(x).

Optimality Condition

For convex functions we have the following necessary and su�cientoptimality condition:

Theorem 4

A convex function f : Rn → R attains its global minimum at point

x, if and only if

0 ∈ ∂f(x).

Methods for Nonsmooth Optimization

The main problem: We usually don't know the whole subdi�erential ofthe function but only one arbitrary subgradient at each point.

Di�erent methods to solve a nonsmooth optimization problem

Bundle Methods

Derivative Free Methods

Subgradient Methods

Gradient Sampling Methods

Hybrid Methods

Special Methods

About Standard Bundle Method

We consider an unconstrained convex nonsmooth problem{min f(x)

s. t. x ∈ Rn

Assumption: At every point x ∈ Rn we can evaluate the valuef(x) and one arbitrary ξ ∈ ∂f(x)Converges to the global minimum of f (if it exists)

The main idea:

Approximate the subdi�erential of the objective function witha bundle

Bundle consists of subgradients from previous iterations

Subgradient information is used to construct a piecewise linearapproximation to the objective function

This approximation is used to determine a descent direction

If approximation is not adequate then we add moreinformation to the bundle

Bundle

At iteration k in the current iteration point xk our bundle is

Bk ={(yj , f(yj), ξj) | j ∈ Jk

� yj ∈ Rn is a trial point� ξj ∈ ∂f(yj) is a subgradient� Jk is a nonempty subset of {1, 2, . . . , k}

By using the bundle we can construct a cutting plane model

which is a piecewise linear approximation of function f

Cutting Plane Model

The cutting plane model is

f̂k(x) = maxj∈Jk

{f(yj) + ξ

Tj (x− yj)

}for all x ∈ Rn.

It is a convex function and f̂k(x) ≤ f(x).This approximation can be written in equivalent form

f̂k(x) = maxj∈Jk

{f(xk) + ξ

Tj (x− xk)− αk

}with the linearization error

αkj = f(xk)− f(yj)− ξ

Tj (xk − yj) ≥ 0 for all j ∈ Jk.

Cutting Plane Model

Linearization error Cutting plane model

Algorithm: Direction Finding Problem

To determine the search direction dk we need to solve

mind∈Rn

{f̂k(xk + d) +

2dTMkd

}where

Mk is a positive de�nite and symmetric n× n matrix.12d

TMkd is a stabilizing term which

� guarantees existence of the unique solution dk� keeps approximation local enough

The search direction dk is also a descent direction to the originalobjective

Algorithm: Quadratic Direction Finding Problem

mind∈Rn

{maxj∈Jk

{f(xk) + ξ

Tj d− αk

2dTMkd

The problem (1) can be rewritten as a smooth quadratic

direction �nding problemmin v + 1

2dTMkd

s. t. ξTj d− αkj ≤ v ∀j ∈ Jk,

v ∈ R, d ∈ Rn

Algorithm: Search Direction and Stopping Condition

The solution (vk,dk) of (2) can also be calculated from the dualproblem

The next iteration candidate is yk+1 = xk + dk

Valuevk = f̂k(yk+1)− f(xk)

is the predicted descent of f at yk+1

If vk = 0 then the current point xk is the global minimum

It is convenient to stop algorithm when −vk ≤ ε where ε > 0 is a�nal accuracy tolerance

Algorithm: Serious and Null Step

Now we perform either a serious step or a null step

A serious step

xk+1 = yk+1

is performed iff(yk+1)− f(xk) ≤ mvk

where m ∈ (0, 1/2) is a line search parameter.

Otherwise we make a null step

xk+1 = xk

Algorithm: Updating the Bundle

In both steps we improve the approximation by adding

(yk+1, f(yk+1), ξk+1)

into the bundle where ξk+1 ∈ ∂f(yk+1)

Easiest way to do this is to set Jk+1 = Jk ∪ {k + 1}� stores all subgradients� causes di�culties with storage and computation

By using the subgradient aggregation strategy we can keep thesize of the bundle bounded

Nonconvex Bundle Methods

Objective function is only supposed to be locally Lipschitzcontinuous

Cannot guarantee even local optimality of a solution withoutsome convexity assumption

Not as e�cient methods as convex bundle methods

Best nonconvex bundle methods are generalizations of convexbundle methods with suitable modi�cations

Not developed from the nonconvex perspective

Di�erence of two Convex functions

De�nition 5

A function f : Rn → R is called a DC function if it can be written in theform

f(x) = f1(x)− f2(x)

where f1 and f2 are convex functions on Rn.

If a DC function f is nonsmooth then at least one of the functionsf1 and f2 is nonsmooth

For DC functions it is still possible to utilize convex analysis andconvex optimization theory to some extent

Many problems of nonconvex optimization can be described byusing DC functions

Future Work

Only a few e�cient nonconvex nonsmooth optimization methods exist somy goal is

Develop a local bundle method for unconstrained nonconvexnonsmooth problems where the objective is a DC function

Add good features of gradient sampling methods to possibly get abetter method

Extend the method to constrained case

Modify the local method so that we get a global solution both inunconstrained and constrained case

References

[1] Bagirov, A.M. and Ugon, J.: Codi�erential method for

minimizing nonsmooth DC functions. Journal of GlobalOptimization, Vol. 50(1), 2011, pages 3�22.

[2] Haarala, M.: Large-Scale Nonsmooth Optimization: Variable

metric bundle method with limited memory. Doctoral Thesis,University of Jyväskylä, 2004.

[3] Mäkelä, M.M.: Survey of bundle methods for nonsmooth

optimization. Optimization Methods and Software, Vol. 17(1),2002, pages 1�29.

[4] Mäkelä, M.M. and Neittaanmäki, P.: Nonsmooth Optimization:

Analysis and Algorithms with Applications to Optimal Control.

World Scienti�c Publishing Co., Singapore, 1992.

Thank you for your attention!

Nonsmooth Optimization: Bundle...

Documents

Variational Nonsmooth Mechanics via a Projected Hamilton’s ... › pdfs › 2012ACCPeMu.pdf · II. VARIATIONAL PRINCIPLES FOR NONSMOOTH MECHANICS In this section, we discuss nonsmooth

Nonsmooth thermoelastic simulations of blade-casing

Generalized Dirac Operators on Nonsmooth …faculty.missouri.edu/~mitream/diracc.pdfGeneralized Dirac Operators on Nonsmooth Manifolds and Maxwell’s Equations Marius Mitrea Abstract

Lyapunov Stability Theory of Nonsmooth Systems

Nonsmooth Vector Functions and Continuous Optimization

On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches-Feb2016.pdf · Yurii Nesterov Yurii Nesterov Introduction Some Nonsmooth Analysis Nesterov’s

A Nonsmooth Exclusion Test for Finding All Solutions of ... · A Nonsmooth Exclusion Test for Finding All Solutions of Nonlinear Equations by Vinay Kumar ... A Nonsmooth Exclusion

Multistability and nonsmooth bifurcations in the ... · Multistability and nonsmooth bifurcations in the quasiperiodically forced circle map ... The quasiperiodically forced circle

Structured Nonconvex and Nonsmooth Optimization ...zhangs/Reports/2016_JLMZ.pdfStructured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis Bo Jiang

Nonsmooth and Constrained Dynamical Systems: Stability ...homepages.laas.fr/atanwani/courses/moa19/constraintsControl.pdf · This Mini-Course: Stability and Control of Nonsmooth Systems

OULU 2015 ACTAjultika.oulu.fi/files/isbn9789526207445.pdfPartala, Juha, Algebrallisia menetelmiä kryptografiseen avaintenvaihtoon. Oulun yliopiston tutkijakoulu; Oulun yliopisto,

Nonsmooth optimization: Subgradient Method, Proximal ...yuxiangw/classes/CS292A... · Nonsmooth optimization: Subgradient Method, Proximal Gradient Method Yu-Xiang Wang CS292A (Based

TOHTORIN TUTKINNON LOPPUVAIHE · OULUN YLIOPISTON TUTKIJAKOULU Tohtorin tutkinnon loppuvaihe TOHTORIN TUTKINNON LOPPUVAIHE Ohjeita viimeiselle opintovuodelle, väitöskirjavaatimuksiin

Numerical methods for nonsmooth mechanical systems - Grenoble

Research Article Global Minimization of Nonsmooth Constrained Global …downloads.hindawi.com/journals/mpe/2014/563860.pdf · nonsmooth and smooth constrained global optimization

Graph Implementations for Nonsmooth Convex Programs - Stanford

Convex Analysis and Nonsmooth Optimization

Lecture 1. Essentials of numerical nonsmooth optimization

First-order methods for structured nonsmooth optimizationcmac.yonsei.ac.kr/lecture/SangwoonYun.pdf · First-order methods for structured nonsmooth optimization Sangwoon Yun Department

Nonsmooth, nonconvex optimization, implications for deep ...fmalgouy/enseignement/downlo… · Nonsmooth, nonconvex optimization, implications for deep-learning S. Gerchinovitz1,