View
1
Download
0
Category
Preview:
Citation preview
An introduction to optimization on manifoldsAug 30, 2021
Geometric methods in optimization and sampling,Boot camp at Simons Institute
Nicolas Boumal โ OPTIMInstitute of Mathematics, EPFL
1
nicolasboumal.net/book
Step 0 in optimization
It all starts with a set ๐๐ and a function ๐๐: ๐๐ โ ๐๐:
min๐ฅ๐ฅโ๐๐
๐๐ ๐ฅ๐ฅ
These bare objects fully specify the problem.
Any additional structure on ๐๐ and ๐๐ may (and should) be exploited for algorithmic purposes but is not part of the problem.
2
Classical unconstrained optimization
The search space is a linear space, e.g., ๐๐ = ๐๐๐๐:
min๐ฅ๐ฅโ๐๐๐๐
๐๐ ๐ฅ๐ฅ
We can choose to turn ๐๐๐๐ into a Euclidean space: ๐ข๐ข, ๐ฃ๐ฃ = ๐ข๐ขโค๐ฃ๐ฃ.
If ๐๐ is differentiable, this provides gradients โ๐๐ and Hessians โ2๐๐.These objects underpin algorithms: gradient descent, Newtonโs method...
3
โ๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ = D๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
๐๐ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ โ ๐๐ ๐ฅ๐ฅ๐ก๐ก
โ2๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = D โ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
โ๐๐ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ โ โ๐๐ ๐ฅ๐ฅ๐ก๐ก
Extend to optimization on manifolds
The search space is a smooth manifold, ๐๐ = โณ:
min๐ฅ๐ฅโโณ
๐๐ ๐ฅ๐ฅ
We can choose to turn โณ into a Riemannian manifold.
If ๐๐ is differentiable, this provides Riemannian gradients and Hessians.These objects underpin algorithms: gradient descent, Newtonโs method...
Around since the 70s; practical since the 90s.4
What is a manifold? Take one:
5
Yes
Yes
No
What is a manifold? Take two:
A few manifolds that come up in the wild
6
7https://www.popsci.com/new-roomba-knows-locationhttp://emanual.robotis.com/docs/en/platform/turtlebot3/slam
Orthonormal frames and rotations
Stiefel manifold: โณ = ๐๐ โ ๐๐๐๐ร๐๐:๐๐โค๐๐ = ๐ผ๐ผ๐๐
Rotation group: โณ = ๐๐ โ ๐๐3ร3:๐๐โค๐๐ = ๐ผ๐ผ3 and det ๐๐ = +1
Applications in sparse PCA, Structure-from-Motion, SLAM (robotics)...
The singularities of Euler angles (gimbal lock) are artificial: the rotation group is smooth.
Subspaces and fixed-rank matrices
Grassman manifold: โณ = subspaces of dimension ๐๐ in ๐๐๐๐
Fixed-rank matrices:โณ = ๐๐ โ ๐๐๐๐ร๐๐: rank ๐๐ = ๐๐
Applications to linear dimensionality reduction, data completion and denoising, large-scale matrix equations, ...
Optimization allows us to go beyond PCA (least-squares loss โก truncated SVD):can handle outlier-robust loss functions and missing data.
8Picture: https://365datascience.com/tutorials/python-tutorials/principal-components-analysis/
Positive matrices and hyperbolic space
Positive definite matrices: โณ = ๐๐ โ ๐๐๐๐ร๐๐:๐๐ = ๐๐โค and ๐๐ โป 0
Hyperbolic space: โณ = ๐ฅ๐ฅ โ ๐๐๐๐+1: ๐ฅ๐ฅ02 = 1 + ๐ฅ๐ฅ12 + โฏ+ ๐ฅ๐ฅ๐๐2
Used in metric learning, Gaussian mixture models, tree-like embeddings...
With appropriate metrics, these are Cartan-Hadamard manifolds:Complete, simply connected, with non-positive (intrinsic) curvature.Great playground for geodesic convexity.
9Picture: https://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings
A tour of technical toolsRestricted to embedded submanifolds
What is a manifold?Tangent spacesSmooth mapsDifferentialsRetractionsRiemannian manifoldsGradientsHessians
17
A subset โณ of a linear space โฐ of dimension ๐๐ isa smooth embedded submanifold of dimension ๐๐ if:
For all ๐ฅ๐ฅ โ โณ, there exists a neighborhood ๐๐ of ๐ฅ๐ฅ in โฐ, an open set ๐๐ โ ๐๐๐๐ and a diffeomorphism ๐๐:๐๐ โ ๐๐ such that ๐๐ ๐๐ โฉโณ = ๐๐ โฉ ๐ธ๐ธ where ๐ธ๐ธ is a linear subspace of dimension ๐๐.
We call โฐ the embedding space.
What is a manifold? Take three:
18
? ?
๐๐
โณ
๐๐ ๐๐ ๐๐ = ๐๐
๐๐๐๐
โฐ โก ๐๐๐๐๐ฅ๐ฅ
๐๐ ๐ฅ๐ฅ
Matrix sets in our list are manifolds: orthonormal, fixed-rank, positive definite...
Linear subspaces are manifolds.
Open subsets of manifolds are manifolds.
Products of manifolds are manifolds.
What is a manifold? Quick facts:
19
๐๐
โณ
๐๐ ๐๐ ๐๐ = ๐๐
๐๐๐๐
๐ฅ๐ฅ๐๐ ๐ฅ๐ฅโฐ โก ๐๐๐๐
โณT๐ฅ๐ฅโณ
๐ฅ๐ฅ
Tangent vectors of โณ embedded in โฐA tangent vector at ๐ฅ๐ฅ is the velocity ๐๐โฒ 0 = lim
๐ก๐กโ0๐๐ ๐ก๐ก โ๐๐ 0
๐ก๐กof a curve ๐๐:๐๐ โโณ with ๐๐ 0 = ๐ฅ๐ฅ.
The tangent space T๐ฅ๐ฅโณ is the set of all tangent vectors of โณ at ๐ฅ๐ฅ.It is a linear subspace of โฐ of the same dimension as โณ.
If โณ = ๐ฅ๐ฅ:โ ๐ฅ๐ฅ = 0 with โ:โฐ โ ๐๐๐๐ smooth and rank Dโ ๐ฅ๐ฅ = ๐๐, then T๐ฅ๐ฅโณ = ker Dโ ๐ฅ๐ฅ .
โ ๐ฅ๐ฅ = ๐ฅ๐ฅโค๐ฅ๐ฅ โ 1 = 0ker Dโ ๐ฅ๐ฅ = ๐ฃ๐ฃ: ๐ฅ๐ฅโค๐ฃ๐ฃ = 0
20
Smooth maps on/to manifoldsLet โณ,โณโฒ be (smooth, embedded) submanifolds of linear spaces โฐ,โฐโฒ.
A map ๐น๐น:โณ โโณโฒ is smooth if it has a smooth extension, i.e., if there exists a neighborhood ๐๐ of โณ in โฐ and a smooth map ๏ฟฝ๐น๐น:๐๐ โ โฐโฒ such that ๐น๐น = ๏ฟฝ๐น๐น|โณ .
Example: a cost function ๐๐:โณ โ ๐๐ is smooth if it is the restriction of a smooth ๐๐:๐๐ โ ๐๐.
Composition preserves smoothness.
21
๐๐โณ
โฐ
๐๐
๐๐
๐๐ = ๐๐|โณ
Differential of a smooth map ๐น๐น:โณ โโณโฒ
The differential of ๐น๐น at ๐ฅ๐ฅ is the map D๐น๐น ๐ฅ๐ฅ : T๐ฅ๐ฅโณ โ T๐น๐น ๐ฅ๐ฅ โณโฒ defined by:
D๐น๐น ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐น๐น โ ๐๐ โฒ 0 = lim๐ก๐กโ0
๐น๐น ๐๐ ๐ก๐ก โ ๐น๐น ๐ฅ๐ฅ๐ก๐ก
where ๐๐:๐๐ โโณ satisfies ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
Claim: D๐น๐น ๐ฅ๐ฅ is well defined and linear, and we have a chain rule.If ๏ฟฝ๐น๐น is a smooth extension of ๐น๐น, then D๐น๐น ๐ฅ๐ฅ = D ๏ฟฝ๐น๐น ๐ฅ๐ฅ |T๐ฅ๐ฅโณ .
22
Retractions: moving around on โณThe tangent bundle is the set
Tโณ = ๐ฅ๐ฅ, ๐ฃ๐ฃ : ๐ฅ๐ฅ โ โณ and ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ .Claim: Tโณ is a smooth manifold embedded in โฐ ร โฐ.
A retraction is a smooth map ๐ ๐ : Tโณ โโณ: ๐ฅ๐ฅ, ๐ฃ๐ฃ โฆ ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃsuch that each curve
๐๐ ๐ก๐ก = ๐ ๐ ๐ฅ๐ฅ ๐ก๐ก๐ฃ๐ฃsatisfies ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
E.g., metric projection: ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ is the projection of ๐ฅ๐ฅ + ๐ฃ๐ฃ to โณ.โณ = ๐๐๐๐: ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐ฅ๐ฅ + ๐ฃ๐ฃ; โณ = ๐ฅ๐ฅ: ๐ฅ๐ฅ = 1 : ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐ฅ๐ฅ+๐ฃ๐ฃ
๐ฅ๐ฅ+๐ฃ๐ฃ;
โณ = ๐๐: rank ๐๐ = ๐๐ : ๐ ๐ ๐๐ ๐๐ = SVD๐๐ ๐๐ + ๐๐ .23
๐ฅ๐ฅ
๐ฃ๐ฃ
๐ฅ๐ฅ + ๐ฃ๐ฃ๐ฅ๐ฅ + ๐ฃ๐ฃ
Riemannian manifolds
Each tangent space T๐ฅ๐ฅโณ is a linear space.Endow each one with an inner product: ๐ข๐ข, ๐ฃ๐ฃ ๐ฅ๐ฅ for ๐ข๐ข, ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ.
A vector field is a map ๐๐:โณ โ Tโณ such that ๐๐ ๐ฅ๐ฅ is tangent at ๐ฅ๐ฅ for all ๐ฅ๐ฅ.We say the inner products โ ,โ ๐ฅ๐ฅ vary smoothly with ๐ฅ๐ฅ if ๐ฅ๐ฅ โฆ ๐๐ ๐ฅ๐ฅ ,๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ is smooth for all smooth vector fields ๐๐,๐๐.
If the inner products vary smoothly with ๐ฅ๐ฅ, they form a Riemannian metric.
A Riemannian manifold is a smooth manifold with a Riemannian metric.
24
Riemannian structure and optimization
A Riemannian manifold is a smooth manifold with a smoothly varying choice of inner product on each tangent space.
A manifold can be endowed with many different Riemannian structures.
A problem min๐ฅ๐ฅโโณ
๐๐ ๐ฅ๐ฅ is defined independently of any Riemannian structure.
We choose a metric for algorithmic purposes. Akin to preconditioning.
25
โณT๐ฅ๐ฅโณ
๐ฅ๐ฅ
Riemannian submanifoldsLet the embedding space of โณ be a Euclidean space โฐ with metric โ ,โ .For example: โฐ = ๐๐๐๐ and ๐ข๐ข, ๐ฃ๐ฃ = ๐ข๐ขโค๐ฃ๐ฃ for all ๐ข๐ข, ๐ฃ๐ฃ โ ๐๐๐๐.
A convenient choice of Riemannian structure for โณ is to let:
๐ข๐ข, ๐ฃ๐ฃ ๐ฅ๐ฅ = ๐ข๐ข, ๐ฃ๐ฃ .
This is well defined because ๐ข๐ข, ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ are, in particular, elements of โฐ.
This is a Riemannian metric. With it, โณ is a Riemannian submanifold of โฐ.
26!! A Riemannian submanifold is not just a submanifold that is Riemannian !!
Riemannian gradientsThe Riemannian gradient of a smooth ๐๐:โณ โ ๐๐ is the vector field grad๐๐ defined by:
โ ๐ฅ๐ฅ, ๐ฃ๐ฃ โ Tโณ, grad๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ ๐ฅ๐ฅ = D๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ .
Claim: grad๐๐ is a well-defined smooth vector field.
If โณ is a Riemannian submanifold of a Euclidean space โฐ, then
grad๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ โ ๐๐ ๐ฅ๐ฅ ,
where Proj๐ฅ๐ฅ is the orthogonal projector from โฐ to T๐ฅ๐ฅโณ and ๐๐ is a smooth extension of ๐๐.
27
โ ๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ = D ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
๐๐ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ โ ๐๐ ๐ฅ๐ฅ๐ก๐ก
โ2 ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = D โ ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
โ ๐๐ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ โ โ ๐๐ ๐ฅ๐ฅ๐ก๐ก
Riemannian HessiansThe Riemannian Hessian of ๐๐ at ๐ฅ๐ฅ should be a symmetric linear map Hess๐๐ ๐ฅ๐ฅ : T๐ฅ๐ฅโณ โ T๐ฅ๐ฅโณdescribing gradient change.
Since grad๐๐:โณ โ Tโณ is a smooth map from one manifold to another, a natural first attempt is:
Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ =? Dgrad๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ .
However, this does not produce tangent vectors in general.To overcome this issue, we need a new derivative for vector fields: a Riemannian connection.
If โณ is a Riemannian submanifold of Euclidean space, then:
where ๐๐ is the Weingarten map of โณ. 28
โ ๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ = D ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
๐๐ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ โ ๐๐ ๐ฅ๐ฅ๐ก๐ก
โ2 ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = D โ ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
โ ๐๐ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ โ โ ๐๐ ๐ฅ๐ฅ๐ก๐ก
Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = Proj๐ฅ๐ฅ Dgrad๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ= Proj๐ฅ๐ฅ โ2 ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ + ๐๐ ๐ฃ๐ฃ, Proj๐ฅ๐ฅโฅ โ ๐๐ ๐ฅ๐ฅ
Compute the smallest eigenvalue of a symmetric matrix ๐ด๐ด โ ๐๐๐๐ร๐๐ :
min๐ฅ๐ฅโโณ
12๐ฅ๐ฅ
โค๐ด๐ด๐ฅ๐ฅ with โณ = ๐ฅ๐ฅ โ ๐๐๐๐: ๐ฅ๐ฅโค๐ฅ๐ฅ = 1
The cost function ๐๐:โณ โ ๐๐ is the restriction of the smooth function ๐๐ ๐ฅ๐ฅ = 12๐ฅ๐ฅ
โค๐ด๐ด๐ฅ๐ฅ from ๐๐๐๐ to โณ.Tangent spaces T๐ฅ๐ฅโณ = ๐ฃ๐ฃ โ ๐๐๐๐: ๐ฅ๐ฅโค๐ฃ๐ฃ = 0 .Make โณ into a Riemannian submanifold of ๐๐๐๐ with ๐ข๐ข, ๐ฃ๐ฃ = ๐ข๐ขโค๐ฃ๐ฃ.Projection to T๐ฅ๐ฅโณ: Proj๐ฅ๐ฅ ๐ง๐ง = ๐ง๐ง โ ๐ฅ๐ฅโค๐ง๐ง ๐ฅ๐ฅ.Gradient of ๐๐: โ ๐๐ ๐ฅ๐ฅ = ๐ด๐ด๐ฅ๐ฅ.
Gradient of ๐๐: grad๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ โ ๐๐ ๐ฅ๐ฅ = ๐ด๐ด๐ฅ๐ฅ โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ฅ๐ฅ.
Differential of grad๐๐: Dgrad๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐ด๐ด๐ฃ๐ฃ โ ๐ฃ๐ฃโค๐ด๐ด๐ฅ๐ฅ + ๐ฅ๐ฅโค๐ด๐ด๐ฃ๐ฃ ๐ฅ๐ฅ โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ฃ๐ฃ.Hessian of ๐๐: Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = Proj๐ฅ๐ฅ Dgrad๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = Proj๐ฅ๐ฅ ๐ด๐ด๐ฃ๐ฃ โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ฃ๐ฃ.
Example: Rayleigh quotient optimization
29
The following are equivalent for ๐ฅ๐ฅ โ โณ: ๐ฅ๐ฅ is a global minimizer; ๐ฅ๐ฅ is a unit-norm eigenvector of ๐ด๐ด for the least eigenvalue; grad๐๐ ๐ฅ๐ฅ = 0 and Hess๐๐ ๐ฅ๐ฅ โฝ 0.
Basic optimization algorithmsAlgorithms hop around the manifold using a retraction:
๐ฅ๐ฅ๐๐+1 = ๐ ๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐with some algorithm-specific tangent vector ๐ ๐ ๐๐ โ T๐ฅ๐ฅ๐๐โณ.E.g., gradient descent: ๐ ๐ ๐๐ = โ๐ก๐ก๐๐grad๐๐ ๐ฅ๐ฅ๐๐
Newtonโs method: Hess๐๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐ = โgrad๐๐ ๐ฅ๐ฅ๐๐
Convergence analyses rely on Taylor expansions of ๐๐ along retractions.For second-order retractions (e.g., metric projection on Riemannian submanifold):
๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ ๐ = ๐๐ ๐ฅ๐ฅ + grad๐๐ ๐ฅ๐ฅ , ๐ ๐ ๐ฅ๐ฅ +12
Hess๐๐ ๐ฅ๐ฅ ๐ ๐ , ๐ ๐ ๐ฅ๐ฅ + ๐๐ ๐ ๐ ๐ฅ๐ฅ3
30
A1 ๐๐ ๐ฅ๐ฅ โฅ ๐๐low for all ๐ฅ๐ฅ โ โณ
A2 ๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ ๐ โค ๐๐ ๐ฅ๐ฅ + ๐ ๐ , grad๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ + ๐ฟ๐ฟ2๐ ๐ ๐ฅ๐ฅ
2
Algorithm: ๐ฅ๐ฅ๐๐+1 = ๐ ๐ ๐ฅ๐ฅ๐๐ โ 1๐ฟ๐ฟ
grad๐๐(๐ฅ๐ฅ๐๐)
Complexity: min๐๐<๐พ๐พ
grad๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ โค 2๐ฟ๐ฟ ๐๐ ๐ฅ๐ฅ0 โ๐๐low๐พ๐พ
(same as Euclidean case)
A2 โ ๐๐ ๐ฅ๐ฅ๐๐+1 โค ๐๐ ๐ฅ๐ฅ๐๐ โ1๐ฟ๐ฟ
grad๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐2 +
12๐ฟ๐ฟ
grad๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐2
โ ๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐+1 โฅ12๐ฟ๐ฟ
grad๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐2
๐๐๐๐ โ ๐๐ ๐ฅ๐ฅ0 โ ๐๐low โฅ ๐๐ ๐ฅ๐ฅ0 โ ๐๐ ๐ฅ๐ฅ๐พ๐พ = ๏ฟฝ๐๐=0
๐พ๐พโ1
๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐+1 โฅ๐พ๐พ2๐ฟ๐ฟ
min๐๐<๐พ๐พ
grad๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐2
Gradient descent on โณ
31
๐ฅ๐ฅ
๐ ๐ ๐ฅ๐ฅ ๐ ๐
๐ ๐
Manopt: user-friendly softwareManopt is a family of toolboxes for Riemannian optimization.Go to www.manopt.org for code, a tutorial, a forum, and a list of other software.
Matlab example for min๐ฅ๐ฅ =1
๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ:
problem.M = spherefactory(n);
problem.cost = @(x) x'*A*x;
problem.egrad = @(x) 2*A*x;
x = trustregions(problem);
32
With Bamdev Mishra,P.-A. Absil & R. Sepulchre
Lead by J. Townsend,N. Koep & S. Weichwald
Lead by Ronny Bergmann
Active research directions
โข More algorithms: nonsmooth, stochastic, parallel, quasi-Newton, ...โข Constrained optimization on manifoldsโข Applications, old and new (electronic structure, deep learning)โข Complexity (upper and lower bounds)โข Role of curvatureโข Geodesic convexityโข Randomized algorithmsโข Broader generalizations: manifolds with a boundary, algebraic varietiesโข Benign non-convexity
33
34
โโฆ in fact, the great watershed in optimization isn't between linearity
and nonlinearity, but convexity and non-convexity.โ
R. T. Rockafellar, in SIAM Review, 1993
๐ฅ๐ฅ
Non-convex just means not convex.
35
36
?
โโฆ in fact, the great watershed in optimization isn't between linearity
and nonlinearity, but convexity and non-convexity.โ
R. T. Rockafellar, in SIAM Review, 1993
Non-convexity can be benign
This can mean various things. Theorem templates are on a spectrum:
โIf {conditions}, necessary optimality conditions are sufficient.โโฎ
โIf {conditions}, we can initialize a specific algorithm well.โ
The conditions (often on data) may be generous (e.g., genericity) or less so (e.g., high-probability event for non-adversarial distribution.)
Geometry and symmetry seem to play an outsized role.See for example Zhang, Qu & Wright, arxiv:2007.06753, for a review.
37
nicolasboumal.net/bookpress.princeton.edu/absil
manopt.org
Recommended