Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Takingcontrolbyconvexoptimization
Elad Hazan
Basedon:[Hazan,Singh,Zhang,NIPS‘17][Hazan,Lee,Singh,C.Zhang,Y.Zhang,NIPS‘18][Arora,Hazan,Lee,Singh,C.Zhang,Y.ZhangICLR‘18]
GoogleBrain
KaranSingh CyrilZhang
DynamicalSystems
𝑦" = 𝑓 ℎ" , ℎ"'( = 𝑔(ℎ", 𝑥")
Input time series 𝒙𝒕 Output time series 𝒚𝒕
𝒉𝒕
Prediction:given𝑥(, … , 𝑥",𝑦(, … ,𝑦"3( predict𝑦"Control:compute𝑥(, … , 𝑥" toachievesomereward(𝑦(,… , 𝑦")
DynamicalSystems
• Lineardynamicalsystem(LDS):
LinearDynamicalSystems
ℎ"'( = 𝐴ℎ" + 𝐵𝑥" + 𝜂"𝑦" = 𝐶ℎ" + 𝐷𝑥" + 𝜉"
(𝐴,𝐵, 𝐶,𝐷, ℎ@)
Input time series 𝒙𝒕 Output time series 𝒚𝒕
𝒉𝒕
LDSintheworld
• Rocketscience(trackingsatellitesetc.)
• Specialcases:• HiddenMarkovModels(languagetranslation)• ARMA(weather/financialforecasting)
• Controlforrobots/autonomousdrones/vehicles
• Giventhesystem(A,B,C,D,h0 ),Kalman filterrecoversht andpredictsoptimally.
• Nohiddenstate:onlinelearningfortimeseriesprediction[Anava,Hazan,Mannor,Shamir,Zeevi ‘’13,‘15],[Kuznetzov,Mohri ’15,‘17]Nohiddenstate:convexLDS[Simchovich,Mania,Tu,Jordan,Recht ‘18]
• canwerecoverthesystem??EMalgorithm:non-convexandcantakeexponentialtime[Hardt,Ma,Recht ’17]:undergenerativeassumptions,SISOsystemrecovery
• Ourresult:wecanpredict asthebestsysteminlineartime![withKaranSinghandCyrilZhang,NIPS‘17]basedonanewWaveFilter
LDS:stateoftheart
ℎ"'( = 𝐴ℎ" + 𝐵𝑥" + 𝜂"𝑦" = 𝐶ℎ" + 𝐷𝑥" + 𝜉"
OnlineLearningofLDS
• Onlinesequenceprediction,𝑡 = 1, … , 𝑇:• Observeinput𝑥"• Predictoutput 𝑦"E ∈ ℝH• Observe𝑦" ,incurringloss 𝑦" − 𝑦"E J
• Useinformationabout𝑦" tohelppredict𝑥"'( ↦ 𝑦"'(…
• Guarantee:(findingbestLDS(systemidentification) isnon-convex)
L 𝑦" − 𝑦"E J
"
≤ minQ,R,S,T,UV
L 𝑦" −𝑀 X,Y,Z,[,\V 𝑡J
"
+ 𝑂(log𝑇)
• Where𝑀 X,Y,Z,[,\V (𝑡) isthepredictionofaKalman filterwithbest(𝐴, 𝐵, 𝐶, 𝐷, ℎ@) inhindsight
ImproperlearningbyConvexRelaxation
• Theactualsignal:𝑦" = 𝐷𝑥" + 𝐵𝐴𝐶𝑥"3( + 𝐵𝐴J𝐶𝑥"3J +⋯
• An“easy”impropermachine:𝑦" = 𝑀(𝑥" + 𝑀J𝑥"3( + 𝑀b𝑥"3J +⋯
• Ifthespectrumof A < 1 − 𝛿 isbounded,thensufficetohaveO (
glog (
h parameters
solvebyconvexoptimization(linearregression)!
ImproperlearningbyConvexRelaxation
• Theactualsignal:𝑦" = 𝐷𝑥" + 𝐵𝐴𝐶𝑥"3( + 𝐵𝐴J𝐶𝑥"3J +⋯
• Ourmachine:𝑦" = 𝑀 𝑥",𝑥"3(,… = 𝑀iΦ 𝑥", 𝑥"3(,… islinear(doesnottrytorecoverD,B,A,Borhiddenstates)
• HasonlyO log (h parameters– independentofthecondition#
Intuition(scalarcase)
• 𝑦" = 1𝑥" + 𝛼𝑥"3( + 𝛼J𝑥"3J + 𝛼b𝑥"3b +⋯
• Approximateall 1𝛼𝛼J𝛼b…𝛼l3( ∈ ℝl usingsomelinearbasis𝜙(, … , 𝜙n ∈ ℝl
• “ContinuousPCA”
• 𝑍 = ∫ 𝜇 𝛼 ⊗ 𝜇 𝛼 d𝛼(@ ,𝜇 𝛼 = 1𝛼𝛼J𝛼b…𝛼l3(
• Set𝜙(,… ,𝜙n tobetopeigenvectorsof𝑍
TheMagicofHankelMatrices
𝑍 = s
1 𝛼 𝛼J 𝛼 𝛼J 𝛼b 𝛼J 𝛼b 𝛼t ⋱ 𝛼Jl3J
𝑑𝛼(
@
TheMagicofHankelMatrices
𝑍 =
1 12x
13x
12x
13x
14x
13x
14x
15x
⋱ 1
2𝑇 − 1x
TheMagicofHankelMatrices• Eigenvaluesdecayrapidly![Beckerman,Townsend‘16]
• Spanoftop𝑂| log𝑇 log (h
eigenvectors𝜖-approximates𝜇(𝛼):
s 𝜇 𝛼 − Proj��…,�� 𝜇 𝛼J
�∼ 𝑒3n
(thisholdsforevery𝛼 aswell)
TheMagicofHankelMatrices
• Ourmachine:𝑦" = 𝑀iΦ 𝑥", 𝑥"3(,…
• So:restrict𝑀 tospan 𝜙(..n, ⋅• Problemisconvex!• Onlypolylog(𝑇) timesmoreparametersthan 𝐴,𝐵, 𝐶,𝐷, ℎ@
AFilteringReinterpretation
• Interpretbasis𝜙(, … , 𝜙n asasmallfilterbank forinputs𝑥"
OnlineAlgorithm
L 𝑦" − 𝑦"E J
"
≤ minQ,R,S,T,UV
L 𝑦" −𝑀 X,Y,Z,[,\V 𝑡J
"
+ 𝑂(log𝑇)
AFilteringReinterpretation
• Concisewave-filtering algorithm:• Convolvetimeseries𝑥" byeachof𝑘 ≪ 𝑇 filters𝜙�• UseOCO algorithmtopredict𝑀" ∈ ℝH×n thatmapsfilteroutputsto𝑦"• Regret ≤ 𝑂| 𝑇polylog 𝑇
• Handlesadversarial inputs• 10-20xmoreefficientthanpyKalman (andmoreaccurate)
• Keepsimproving,whenEMishappywithlocaloptimum• Learnshigh-dimensionalsystemsprovably
• Worksinpracticeforsomemoregeneralsystems…
usefulinpractice…
Experiments
Sometimesevenworksfornon-linearsystems!
BeyondSymmetricTransitionMatrices[Hazan,Lee,Singh,C.Zhang,Y.Zhang,NIPS‘18]
• General(non-symmetric)A:notrealdiagonalizable,needcomplexdiagonalization
• Weakerresult:sameregret,butO (�
factorinrunningtime
• Thesymmetricpredictor:
𝑓" 𝑀 = 𝑀iΦ𝑥(:" − 𝑦" J
BeyondSymmetricTransitionMatrices
Thesymmetricpredictor:𝑓" 𝑀 = 𝑀iΦ𝑥(:" − 𝑦" J
1st try:discretizethecomplexphaseuptoprecision𝜀 :
𝑓" 𝑀, 𝑝 = 𝑀ipΦ𝑋(:"h − 𝑦" J
Non-convexquadratic!2nd attempt:convexrelaxation,let𝑁 ∈ 𝑅n×(/h beamatrixwith||J,(boundednorm,relaxing𝑀𝑝i tohighrank.
𝑓" 𝑁 = 𝑁Φ𝑋(:"h − 𝑦" J
Convex!
Repeatxt(�times,
onceforeachdiscretizedphase
II.Control
Setting:Linear-QuadraticControl
• Black-boxaccesstoanunknownLDS:
• Prediction:planinputsequence𝑥(:l tostabilizethesystem• Objective:minimize quadraticcost∑ 𝑦"i𝑄𝑦"l
"�( + ∑ 𝑥"i𝑅𝑥"l"�(
• Minimally-stylizedmodelofreinforcementlearning
Unknown dynamics
PreviousWork
• Identifythesystem(A,B,C,D,ℎ@)(non-convex)Noknownefficientalgorithmforsys-id
• UseKalman filtertomaintainhiddenstateestimate• Findoptimalstate-feedbackcontroller byLQR
• Gradientmethodsforproperlearning[Fazel,Ge,Kakade,Mesbahi‘18]
TowardsProvableControlinUnknownLDS[Arora,Hazan,Lee,Singh,C.Zhang,Y.ZhangICLR‘18]
• Hankeleigenvectorscanbeusedforexplorationofthesystem• Needstrongerguaranteethanforprediction:learneveryentryof𝑀
• LearnLDSefficiently inpoly 𝑛, 𝑚, 𝑑, log𝑇 , (�
episodes
• Planningtheoptimalinputsequence𝑥(:l becomesconvex
Summary1. Improperconvexrelaxation:predictioncanbeeasyevenifestimationis
computationallyhard!
2. Regressionformulationoverwave-filteredinputs:firstefficientLDSwithoutrecovery
3. Extensionstocontroletc…