Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Optimization II:Unconstrained Multivariable
CS 205A:Mathematical Methods for Robotics, Vision, and Graphics
Doug James (and Justin Solomon)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Announcements
I Today’s class:I Unconstrained optimization:
I Newton’s method (uses Hessians)I BFGS method (no Hessians)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 2 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
UnconstrainedMultivariable Problems
minimizef : Rn→ R
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 3 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Recall
∇f (~x)
“Direction ofsteepest ascent”
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 4 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Recall
−∇f (~x)
“Direction ofsteepest descent”
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 5 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Observation
If ∇f (~x) 6= ~0, forsufficiently small α > 0,
f (~x− α∇f (~x)) ≤ f (~x)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 6 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Gradient Descent Algorithm
Iterate until convergence:
1. gk(t) ≡ f (~xk − t∇f (~xk))2. Find t∗ ≥ 0 minimizing
(or decreasing) gk3. ~xk+1 ≡ ~xk − t∗∇f (~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 7 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Stopping Condition
∇f (~xk) ≈ ~0Don’t forget:
Check optimality!
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 8 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Line Search
gk(t) ≡ f (~xk − t∇f (~xk))
I One-dimensional optimization
I Don’t have to minimize completely:
Wolfe conditions
I Constant t: “Learning rate”
Worth reading about:Nesterov’s Accelerated Gradient Descent
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 9 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Line Search
gk(t) ≡ f (~xk − t∇f (~xk))
I One-dimensional optimization
I Don’t have to minimize completely:
Wolfe conditions
I Constant t: “Learning rate”
Worth reading about:Nesterov’s Accelerated Gradient Descent
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 9 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Gradient Descent (book)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 10 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Newton’s Method (again!)
f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)
+1
2(~x− ~xk)>Hf(~xk)(~x− ~xk)
=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)
Consideration:What if Hf is not positive (semi-)definite?
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Newton’s Method (again!)
f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)
+1
2(~x− ~xk)>Hf(~xk)(~x− ~xk)
=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)
Consideration:What if Hf is not positive (semi-)definite?
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Newton’s Method (again!)
f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)
+1
2(~x− ~xk)>Hf(~xk)(~x− ~xk)
=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)
Consideration:What if Hf is not positive (semi-)definite?
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Motivation
I∇f might be hard tocompute but Hf is harder
IHf might be dense: n2
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 12 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Quasi-Newton Methods
Approximate derivatives toavoid expensive calculations
e.g. secant, Broyden, ...
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 13 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Common Optimization Assumption
I∇f known
IHf unknown or hard tocompute
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 14 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Quasi-Newton Optimization
~xk+1 = ~xk − αkB−1k ∇f (~xk)Bk ≈ Hf(~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 15 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Warning
<advanced material>
See Nocedal & Wright
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 16 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Broyden-Style Update
Bk+1(~xk+1 − ~xk) =∇f (~xk+1)−∇f (~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 17 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Additional Considerations
I Bk should be symmetricI Bk should be positive
(semi-)definite
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 18 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Davidon-Fletcher-Powell (DFP)
minBk+1
‖Bk+1 −Bk‖
s.t. B>k+1 = Bk+1
Bk+1(~xk+1 − ~xk) = ∇f (~xk+1)−∇f (~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 19 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Observation
‖Bk+1−Bk‖ small does notmean ‖B−1k+1−B
−1k ‖ is small
Idea: Try to approximateB−1k directly
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 20 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Observation
‖Bk+1−Bk‖ small does notmean ‖B−1k+1−B
−1k ‖ is small
Idea: Try to approximateB−1k directly
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 20 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
BFGS Update
minHBk+1
‖Hk+1 −Hk‖
s.t. H>k+1 = Hk+1
~xk+1 − ~xk = Hk+1(∇f (~xk+1)−∇f (~xk))
State of the art!
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 21 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
BFGS (book - typo)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 22 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Lots of Missing Details
I Choice of ‖ · ‖
I Limited-memoryalternative
Next
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 23 / 24
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Automatic Differentiation
I Techniques to numerically evaluate the derivative of a function
specified by a computer program.
I https://en.wikipedia.org/wiki/Automatic_differentiation
I Different from finite differences (approximation) and symbolic
differentiation.
I In Julia: http://www.juliadiff.orgI Example: ForwardDiff. (uses dual numbers)https://github.com/JuliaDiff/ForwardDiff.jl
Next
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 24 / 24