Taking control by convex optimization · [Arora, Hazan, Lee, Singh, C. Zhang, Y. Zhang ICLR ‘18] • Hankel eigenvectors can be used for exploration of the system • Need stronger

Takingcontrolbyconvexoptimization

Elad Hazan

Basedon:[Hazan,Singh,Zhang,NIPS‘17][Hazan,Lee,Singh,C.Zhang,Y.Zhang,NIPS‘18][Arora,Hazan,Lee,Singh,C.Zhang,Y.ZhangICLR‘18]

GoogleBrain

KaranSingh CyrilZhang

DynamicalSystems

𝑦" = 𝑓 ℎ" , ℎ"'( = 𝑔(ℎ", 𝑥")

Input time series 𝒙𝒕 Output time series 𝒚𝒕

𝒉𝒕

Prediction:given𝑥(, … , 𝑥",𝑦(, … ,𝑦"3( predict𝑦"Control:compute𝑥(, … , 𝑥" toachievesomereward(𝑦(,… , 𝑦")

DynamicalSystems

• Lineardynamicalsystem(LDS):

LinearDynamicalSystems

ℎ"'( = 𝐴ℎ" + 𝐵𝑥" + 𝜂"𝑦" = 𝐶ℎ" + 𝐷𝑥" + 𝜉"

(𝐴,𝐵, 𝐶,𝐷, ℎ@)

Input time series 𝒙𝒕 Output time series 𝒚𝒕

𝒉𝒕

LDSintheworld

• Rocketscience(trackingsatellitesetc.)

• Specialcases:• HiddenMarkovModels(languagetranslation)• ARMA(weather/financialforecasting)

• Controlforrobots/autonomousdrones/vehicles

• Giventhesystem(A,B,C,D,h0 ),Kalman filterrecoversht andpredictsoptimally.

• Nohiddenstate:onlinelearningfortimeseriesprediction[Anava,Hazan,Mannor,Shamir,Zeevi ‘’13,‘15],[Kuznetzov,Mohri ’15,‘17]Nohiddenstate:convexLDS[Simchovich,Mania,Tu,Jordan,Recht ‘18]

• canwerecoverthesystem??EMalgorithm:non-convexandcantakeexponentialtime[Hardt,Ma,Recht ’17]:undergenerativeassumptions,SISOsystemrecovery

• Ourresult:wecanpredict asthebestsysteminlineartime![withKaranSinghandCyrilZhang,NIPS‘17]basedonanewWaveFilter

LDS:stateoftheart

ℎ"'( = 𝐴ℎ" + 𝐵𝑥" + 𝜂"𝑦" = 𝐶ℎ" + 𝐷𝑥" + 𝜉"

OnlineLearningofLDS

• Onlinesequenceprediction,𝑡 = 1, … , 𝑇:• Observeinput𝑥"• Predictoutput 𝑦"E ∈ ℝH• Observe𝑦" ,incurringloss 𝑦" − 𝑦"E J

• Useinformationabout𝑦" tohelppredict𝑥"'( ↦ 𝑦"'(…

• Guarantee:(findingbestLDS(systemidentification) isnon-convex)

L 𝑦" − 𝑦"E J

"

≤ minQ,R,S,T,UV

L 𝑦" −𝑀 X,Y,Z,[,\V 𝑡J

"

+ 𝑂(log𝑇)

• Where𝑀 X,Y,Z,[,\V (𝑡) isthepredictionofaKalman filterwithbest(𝐴, 𝐵, 𝐶, 𝐷, ℎ@) inhindsight

ImproperlearningbyConvexRelaxation

• Theactualsignal:𝑦" = 𝐷𝑥" + 𝐵𝐴𝐶𝑥"3( + 𝐵𝐴J𝐶𝑥"3J +⋯

• An“easy”impropermachine:𝑦" = 𝑀(𝑥" + 𝑀J𝑥"3( + 𝑀b𝑥"3J +⋯

• Ifthespectrumof A < 1 − 𝛿 isbounded,thensufficetohaveO (

glog (

h parameters

solvebyconvexoptimization(linearregression)!

ImproperlearningbyConvexRelaxation

• Theactualsignal:𝑦" = 𝐷𝑥" + 𝐵𝐴𝐶𝑥"3( + 𝐵𝐴J𝐶𝑥"3J +⋯

• Ourmachine:𝑦" = 𝑀 𝑥",𝑥"3(,… = 𝑀iΦ 𝑥", 𝑥"3(,… islinear(doesnottrytorecoverD,B,A,Borhiddenstates)

• HasonlyO log (h parameters– independentofthecondition#

Intuition(scalarcase)

• 𝑦" = 1𝑥" + 𝛼𝑥"3( + 𝛼J𝑥"3J + 𝛼b𝑥"3b +⋯

• Approximateall 1𝛼𝛼J𝛼b…𝛼l3( ∈ ℝl usingsomelinearbasis𝜙(, … , 𝜙n ∈ ℝl

• “ContinuousPCA”

• 𝑍 = ∫ 𝜇 𝛼 ⊗ 𝜇 𝛼 d𝛼(@ ,𝜇 𝛼 = 1𝛼𝛼J𝛼b…𝛼l3(

• Set𝜙(,… ,𝜙n tobetopeigenvectorsof𝑍

TheMagicofHankelMatrices

𝑍 = s

1 𝛼 𝛼J 𝛼 𝛼J 𝛼b 𝛼J 𝛼b 𝛼t ⋱ 𝛼Jl3J

𝑑𝛼(

@


𝑍 =

1 12x

13x

12x

13x

14x

13x

14x

15x

⋱ 1

2𝑇 − 1x

TheMagicofHankelMatrices• Eigenvaluesdecayrapidly![Beckerman,Townsend‘16]

• Spanoftop𝑂| log𝑇 log (h

eigenvectors𝜖-approximates𝜇(𝛼):

s 𝜇 𝛼 − Proj��…,�� 𝜇 𝛼J

�∼ 𝑒3n

(thisholdsforevery𝛼 aswell)


• Ourmachine:𝑦" = 𝑀iΦ 𝑥", 𝑥"3(,…

• So:restrict𝑀 tospan 𝜙(..n, ⋅• Problemisconvex!• Onlypolylog(𝑇) timesmoreparametersthan 𝐴,𝐵, 𝐶,𝐷, ℎ@

AFilteringReinterpretation

• Interpretbasis𝜙(, … , 𝜙n asasmallfilterbank forinputs𝑥"

OnlineAlgorithm

L 𝑦" − 𝑦"E J

"

≤ minQ,R,S,T,UV

L 𝑦" −𝑀 X,Y,Z,[,\V 𝑡J

"

+ 𝑂(log𝑇)

AFilteringReinterpretation

• Concisewave-filtering algorithm:• Convolvetimeseries𝑥" byeachof𝑘 ≪ 𝑇 filters𝜙�• UseOCO algorithmtopredict𝑀" ∈ ℝH×n thatmapsfilteroutputsto𝑦"• Regret ≤ 𝑂| 𝑇polylog 𝑇

• Handlesadversarial inputs• 10-20xmoreefficientthanpyKalman (andmoreaccurate)

• Keepsimproving,whenEMishappywithlocaloptimum• Learnshigh-dimensionalsystemsprovably

• Worksinpracticeforsomemoregeneralsystems…

usefulinpractice…

Experiments

Sometimesevenworksfornon-linearsystems!

BeyondSymmetricTransitionMatrices[Hazan,Lee,Singh,C.Zhang,Y.Zhang,NIPS‘18]

• General(non-symmetric)A:notrealdiagonalizable,needcomplexdiagonalization

• Weakerresult:sameregret,butO (�

factorinrunningtime

• Thesymmetricpredictor:

𝑓" 𝑀 = 𝑀iΦ𝑥(:" − 𝑦" J

BeyondSymmetricTransitionMatrices

Thesymmetricpredictor:𝑓" 𝑀 = 𝑀iΦ𝑥(:" − 𝑦" J

1st try:discretizethecomplexphaseuptoprecision𝜀 :

𝑓" 𝑀, 𝑝 = 𝑀ipΦ𝑋(:"h − 𝑦" J

Non-convexquadratic!2nd attempt:convexrelaxation,let𝑁 ∈ 𝑅n×(/h beamatrixwith||J,(boundednorm,relaxing𝑀𝑝i tohighrank.

𝑓" 𝑁 = 𝑁Φ𝑋(:"h − 𝑦" J

Convex!

Repeatxt(�times,

onceforeachdiscretizedphase

II.Control

Setting:Linear-QuadraticControl

• Black-boxaccesstoanunknownLDS:

• Prediction:planinputsequence𝑥(:l tostabilizethesystem• Objective:minimize quadraticcost∑ 𝑦"i𝑄𝑦"l

"�( + ∑ 𝑥"i𝑅𝑥"l"�(

• Minimally-stylizedmodelofreinforcementlearning

Unknown dynamics

PreviousWork

• Identifythesystem(A,B,C,D,ℎ@)(non-convex)Noknownefficientalgorithmforsys-id

• UseKalman filtertomaintainhiddenstateestimate• Findoptimalstate-feedbackcontroller byLQR

• Gradientmethodsforproperlearning[Fazel,Ge,Kakade,Mesbahi‘18]

TowardsProvableControlinUnknownLDS[Arora,Hazan,Lee,Singh,C.Zhang,Y.ZhangICLR‘18]

• Hankeleigenvectorscanbeusedforexplorationofthesystem• Needstrongerguaranteethanforprediction:learneveryentryof𝑀

• LearnLDSefficiently inpoly 𝑛, 𝑚, 𝑑, log𝑇 , (�

episodes

• Planningtheoptimalinputsequence𝑥(:l becomesconvex

Summary1. Improperconvexrelaxation:predictioncanbeeasyevenifestimationis

computationallyhard!

2. Regressionformulationoverwave-filteredinputs:firstefficientLDSwithoutrecovery

3. Extensionstocontroletc…

Documents

Taking control by convex optimization · [Arora, Hazan, Lee, Singh, C. Zhang, Y. Zhang ICLR ‘18] • Hankel eigenvectors can be used for exploration of the system • Need stronger