Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
CSLCOORDINATED SCIENCE LABORATORY
Synchronization of coupled oscillators is a game
Prashant G. Mehta1
1Coordinated Science LaboratoryDepartment of Mechanical Science and Engineering
University of Illinois at Urbana-Champaign
GERAD seminar, April 20, 2010
Acknowledgment: AFOSR, NSF
Huibing Yin Sean P. Meyn Uday V. Shanbhag
H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 2 / 75
Acknowledgement
Thanks to Minyi Huang, Roland P. Malhame, and Peter Caines
Minyi Huang Roland P. Malhame Peter Caines
M. Huang, P. Caines, and R. Malhame, “Large-population cost-coupled LQG problems with nonuniform agents:Individual-mass behavior and decentralized ε-nash equilibria ,” IEEE TAC, 2007
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 3 / 75
Millennium bridge
Video of London Millennium bridge from youtube
[11] S. H. Strogatz et al., Nature, 2005
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 4 / 75
Classical Kuramoto model
dθi(t) =
(ωi +
κ
N
N
∑j=1
sin(θj(t)−θi(t))
)dt +σ dξi(t), i = 1, . . . ,N
ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population
κ — measures the strength of coupling
[6] Y. Kuramoto (1975)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 5 / 75
Classical Kuramoto model
dθi(t) =
(ωi +
κ
N
N
∑j=1
sin(θj(t)−θi(t))
)dt +σ dξi(t), i = 1, . . . ,N
ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population
κ — measures the strength of coupling 1- 1+1
[6] Y. Kuramoto (1975)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 5 / 75
Classical Kuramoto model
dθi(t) =
(ωi +
κ
N
N
∑j=1
sin(θj(t)−θi(t))
)dt +σ dξi(t), i = 1, . . . ,N
ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population
κ — measures the strength of coupling
[6] Y. Kuramoto (1975)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 5 / 75
Classical Kuramoto model
dθi(t) =
(ωi +
κ
N
N
∑j=1
sin(θj(t)−θi(t))
)dt +σ dξi(t), i = 1, . . . ,N
ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population
κ — measures the strength of coupling
0 0.1 0.20.1
0.15
0.2
0.25
0.3 Locking
Incoherence
κ
κ < κc(γ)
R
γ
Synchrony
Incoherence
[6] Y. Kuramoto (1975)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 5 / 75
Movies of incoherence and synchrony solution
−1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
−1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Incoherence Synchrony
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 6 / 75
Problem statement
Dynamics of ith oscillator
dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0
ui(t) — control 1- 1+1
ith oscillator seeks to minimize
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[ c(θi;θ−i)︸ ︷︷ ︸
cost of anarchy
+ 12 Ru2
i︸ ︷︷ ︸cost of control
]ds
θ−i = (θj)j6=iR — control penalty
c(·) — cost function
c(θi;θ−i) =1N ∑
j 6=ic•(θi,θj), c• ≥ 0
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 7 / 75
Problem statement
Dynamics of ith oscillator
dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0
ui(t) — control 1- 1+1
ith oscillator seeks to minimize
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[ c(θi;θ−i)︸ ︷︷ ︸
cost of anarchy
+ 12 Ru2
i︸ ︷︷ ︸cost of control
]ds
θ−i = (θj)j6=iR — control penalty
c(·) — cost function
c(θi;θ−i) =1N ∑
j 6=ic•(θi,θj), c• ≥ 0
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 7 / 75
Problem statement
Dynamics of ith oscillator
dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0
ui(t) — control 1- 1+1
ith oscillator seeks to minimize
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[ c(θi;θ−i)︸ ︷︷ ︸
cost of anarchy
+ 12 Ru2
i︸ ︷︷ ︸cost of control
]ds
θ−i = (θj)j6=iR — control penalty
c(·) — cost function
c(θi;θ−i) =1N ∑
j 6=ic•(θi,θj), c• ≥ 0
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 7 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Motivation Why a game?
Quiz
In the video you just watched, why were theindividuals walking strangely?
A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an
Egyptian."D. The individuals were trying to maintain their balance.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 9 / 75
Motivation Why a game?
Quiz
In the video you just watched, why were theindividuals walking strangely?
A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an
Egyptian."D. The individuals were trying to maintain their balance.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 9 / 75
Motivation Why a game?
Quiz
In the video you just watched, why were theindividuals walking strangely?
A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an
Egyptian."D. The individuals were trying to maintain their balance.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 9 / 75
Motivation Why a game?
Quiz
In the video you just watched, why were theindividuals walking strangely?
A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an
Egyptian."D. The individuals were trying to maintain their balance.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 9 / 75
Motivation Why a game?
Quiz
In the video you just watched, why were theindividuals walking strangely?
A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an
Egyptian."D. The individuals were trying to maintain their balance.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 9 / 75
Motivation Why a game?
“Rational irrationality”
“—behavior that, on the individual level, is perfectly reasonable butthat, when aggregated in the marketplace, produces calamity.”
ExamplesMillennium bridgeFinancial market
John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 10 / 75
Motivation Why a game?
“Rational irrationality”
“—behavior that, on the individual level, is perfectly reasonable butthat, when aggregated in the marketplace, produces calamity.”
ExamplesMillennium bridgeFinancial market
John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 10 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Motivation Why Oscillators?
Hodgkin-Huxley type Neuron model
CdVdt
=−gT ·m2∞(V) ·h · (V−ET)
−gh · r · (V−Eh)− . . . . . .
dhdt
=h∞(V)−h
τh(V)drdt
=r∞(V)− r
τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
−150
−100
−50
0
50
100
Voltage
time
Neural spike train
[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 12 / 75
Motivation Why Oscillators?
Hodgkin-Huxley type Neuron model
CdVdt
=−gT ·m2∞(V) ·h · (V−ET)
−gh · r · (V−Eh)− . . . . . .
dhdt
=h∞(V)−h
τh(V)drdt
=r∞(V)− r
τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
−150
−100
−50
0
50
100
Voltage
time
Neural spike train
[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 12 / 75
Motivation Why Oscillators?
Hodgkin-Huxley type Neuron model
CdVdt
=−gT ·m2∞(V) ·h · (V−ET)
−gh · r · (V−Eh)− . . . . . .
dhdt
=h∞(V)−h
τh(V)drdt
=r∞(V)− r
τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
−150
−100
−50
0
50
100
Voltage
time
Neural spike train
−100
−50
0
50
100
0
0.2
0.4
0.6
0.8
10
0.1
0.2
0.3
0.4
Vh
r
Limit cyle
r
h v
[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 12 / 75
Motivation Why Oscillators?
Hodgkin-Huxley type Neuron model
CdVdt
=−gT ·m2∞(V) ·h · (V−ET)
−gh · r · (V−Eh)− . . . . . .
dhdt
=h∞(V)−h
τh(V)drdt
=r∞(V)− r
τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
−150
−100
−50
0
50
100
Voltage
time
Neural spike train
−100
−50
0
50
100
0
0.2
0.4
0.6
0.8
10
0.1
0.2
0.3
0.4
Vh
r
Limit cyle
r
h v
Normal form reduction−−−−−−−−−−−−−→
θi = ωi +ui ·Φ(θi)
[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 12 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Problems and results Problem statement
Finite oscillator model
Dynamics of ith oscillator
dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0
ui(t) — control 1- 1+1
ith oscillator seeks to minimize
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[ c(θi;θ−i)︸ ︷︷ ︸
cost of anarchy
+ 12 Ru2
i︸ ︷︷ ︸cost of control
]ds
θ−i = (θj)j6=iR — control penaltyc(·) — cost function
c(θi;θ−i) =1N ∑
j 6=ic•(θi,θj), c• ≥ 0
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 14 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Problems and results Main results
1. Synchronization is a solution of game
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
Incoherence
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds
1- 1+1
Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 16 / 75
Problems and results Main results
1. Synchronization is a solution of game
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
Incoherence
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds
0 0.1 0.20.1
0.15
0.2
0.25
0.3 Locking
Incoherence
κ
κ < κc(γ)
R
γ
Synchrony
Incoherence
dθi =
(ωi +
κ
N
N
∑j=1
sin(θj−θi)
)dt +σ dξi
Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 16 / 75
Problems and results Main results
2. Kuramoto control is approximately optimal
−0.2
0
0.2
0.4
0.6
ω = 1
Kuramoto
PopulationDensity
Control laws
0 π 2π θ
ui =−A∗iR
1N ∑
j 6=isin(θ −θj(t))
0 50 100 150 200 250 3002
2.5
3
3.5
4
4.5
5
5.5
6
t
k = 0.01; R = 1000
Ai
A*
Learning algorithm:dAi
dt=−ε . . .
Yin et.al. CDC 2010
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 17 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Derivation of model Overview
Overview of model derivation
dθi = (ωi +ui(t))dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi, t)+ 1
2 Ru2i ]ds
Influence
Influence
Mass
1 Mean-field approximationAssumption:
c(θi;θ−i(t)) =1N ∑
j6=ic•(θi,θj)
N→∞−−−−−−→ c(θ , t)
2 Optimal control of single oscillatorDecentralized control structure
[5] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM]
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 19 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Derivation of model Derivation steps
Single oscillator with given cost
Dynamics of the oscillator
dθi = (ωi +ui(t))dt +σ dξi, t ≥ 0
The cost function is assumed known
ηi(ui; c) = limT→∞
1T
∫ T
0E[ c(θi;θ−i) + 1
2 Ru2i (s)]ds
⇑c(θi(s),s)
HJB equation:
∂thi +ωi∂θ hi =1
2R(∂θ hi)2− c(θ , t)+η
∗i −
σ2
2∂
2θθ hi
Optimal control law: u∗i (t) = ϕi(θ , t) =− 1R
∂θ hi(θ , t)
[1] D. P. Bertsekas (1995); [9] S. P. Meyn, IEEE TAC, 1997
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 21 / 75
Derivation of model Derivation steps
Single oscillator with optimal control
Dynamics of the oscillator
dθi(t) =(
ωi−1R
∂θ hi(θi, t))
dt +σ dξi(t)
Fokker-Planck equation for pdf p(θ , t,ωi)
FPK: ∂tp+ωi∂θ p =1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p
[7] A. Lasota and M. C. Mackey, “Chaos, Fractals and Noise,” Springer 1994
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 22 / 75
Derivation of model Derivation steps
Mean-field Approximation
HJB equation for population
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η(ω)− σ2
2∂
2θθ h h(θ , t,ω)
Population density
∂tp+ω∂θ p =1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p p(θ , t,ω)
Enforce cost consistency
c(θ , t) =∫
Ω
∫ 2π
0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω
≈ 1N ∑
j 6=ic•(θ ,ϑ)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 23 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Derivation of model PDE model
Summary
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
Mean-field approx.: c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 25 / 75
Derivation of model PDE model
Summary
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
Mean-field approx.: c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 25 / 75
Derivation of model PDE model
Summary
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
Mean-field approx.: c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 25 / 75
Derivation of model PDE model
Summary
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
Mean-field approx.: c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 25 / 75
Derivation of model PDE model
Summary
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
Mean-field approx.: c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 25 / 75
Derivation of model PDE model
1. Solution of PDE gives ε-Nash equilibrium
Optimal control law
uoi =− 1
R∂θ h(θ(t), t,ω)
∣∣ω=ωi
ε-Nash property (as N→ ∞)
ηi(uoi ;uo−i)≤ ηi(ui;uo
−i)+O(1√N
), i = 1, . . . ,N.
So, we look for solutions of PDEs.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 26 / 75
Derivation of model PDE model
1. Solution of PDE gives ε-Nash equilibrium
Optimal control law
uoi =− 1
R∂θ h(θ(t), t,ω)
∣∣ω=ωi
ε-Nash property (as N→ ∞)
ηi(uoi ;uo−i)≤ ηi(ui;uo
−i)+O(1√N
), i = 1, . . . ,N.
So, we look for solutions of PDEs.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 26 / 75
Derivation of model PDE model
1. Solution of PDE gives ε-Nash equilibrium
Optimal control law
uoi =− 1
R∂θ h(θ(t), t,ω)
∣∣ω=ωi
ε-Nash property (as N→ ∞)
ηi(uoi ;uo−i)≤ ηi(ui;uo
−i)+O(1√N
), i = 1, . . . ,N.
So, we look for solutions of PDEs.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 26 / 75
Derivation of model PDE model
2. Incoherence solution (PDE)
Incoherence solution
h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1
2π
incoherence
h(θ , t,ω) = 0 ⇒ ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h
∂tp+ω∂θ p =1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p
c(θ , t) =∫
Ω
∫ 2π
0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 27 / 75
Derivation of model PDE model
2. Incoherence solution (PDE)
Incoherence solution
h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1
2π
incoherence
h(θ , t,ω) = 0⇒ ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h
p(θ , t,ω) = 12π⇒ ∂tp+ω∂θ p =
1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p
c(θ , t) =∫
Ω
∫ 2π
0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 27 / 75
Derivation of model PDE model
2. Incoherence solution (PDE)
Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2
(ϑ −θ
2
)Incoherence solution
h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1
2π
Optimal control u =− 1R
∂θ h = 0
Average cost
c(θ , t) =∫
Ω
∫ 2π
0
12 sin2
(θ −ϑ
2
)1
2πg(ω)dϑ dω
η∗(ω) = c(θ , t) =
14
=: η0 for all ω ∈Ω
incoherence soln.
No cost of control
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 28 / 75
Derivation of model PDE model
2. Incoherence solution (Finite population)
Closed-loop dynamics dθi = (ωi + ui︸︷︷︸=0
)dt +σ dξi(t)
Average cost
ηi = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i︸ ︷︷ ︸
=0
]dt
= limT→∞
1N ∑
j 6=i
1T
∫ T
0E[ 1
2 sin2(
θi(t)−θj(t)2
)]dt
=1N ∑
j6=i
∫ 2π
0E[ 1
2 sin2(
θi(t)−ϑ
2
)]
12π
dϑ =N−1
Nη0
−1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
incoherence
ε-Nash property
ηi(uoi ;uo−i)≤ ηi(ui;uo
−i)+O(1√N
), i = 1, . . . ,N.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 29 / 75
Derivation of model PDE model
2. Incoherence solution (Finite population)
Closed-loop dynamics dθi = (ωi + ui︸︷︷︸=0
)dt +σ dξi(t)
Average cost
ηi = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i︸ ︷︷ ︸
=0
]dt
= limT→∞
1N ∑
j 6=i
1T
∫ T
0E[ 1
2 sin2(
θi(t)−θj(t)2
)]dt
=1N ∑
j6=i
∫ 2π
0E[ 1
2 sin2(
θi(t)−ϑ
2
)]
12π
dϑ =N−1
Nη0
−1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
incoherence
ε-Nash property
ηi(uoi ;uo−i)≤ ηi(ui;uo
−i)+O(1√N
), i = 1, . . . ,N.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 29 / 75
Derivation of model PDE model
3. Synchronization is a solution of game
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
IncoherenceR−1/ 2
η(ω)
0. 1 0.15 0. 2 0.25 0. 3 0.350. 1
0.15
0. 2
0.25
ω= 0.95
ω= 1
ω= 1.05
R > Rc
η(ω) = η0
R < R
c
η(ω) < η0
c
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds η(ω) = min
uiηi(ui;uo
−i)
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1t = 38.24
Synchrony solution of
Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) GERAD Apr. 20, 2010 30 / 75
Derivation of model PDE model
3. Synchronization is a solution of game
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
IncoherenceR−1/ 2
η(ω)
0. 1 0.15 0. 2 0.25 0. 3 0.350. 1
0.15
0. 2
0.25
ω= 0.95
ω= 1
ω= 1.05
R > Rc
η(ω) = η0
R < R
c
η(ω) < η0
c
incoherence soln.
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds η(ω) = min
uiηi(ui;uo
−i)
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1t = 38.24
Synchrony solution of
Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) GERAD Apr. 20, 2010 30 / 75
Derivation of model PDE model
3. Synchronization is a solution of game
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
IncoherenceR−1/ 2
η(ω)
0. 1 0.15 0. 2 0.25 0. 3 0.350. 1
0.15
0. 2
0.25
ω= 0.95
ω= 1
ω= 1.05
R > Rc
η(ω) = η0
R < R
c
η(ω) < η0
c
synchrony soln.
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds η(ω) = min
uiηi(ui;uo
−i)
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1t = 38.24
Synchrony solution of
Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) GERAD Apr. 20, 2010 30 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Analysis of phase transition Incoherence solution
Overview of the steps
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2
(ϑ −θ
2
)Incoherence solution
h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1
2π
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 32 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Analysis of phase transition Bifurcation analysis
Linearization and spectra
Linearized PDE (about incoherence solution)
∂
∂ tz(θ , t,ω) =
(−ω∂θ h− c− σ2
2 ∂ 2θθ
h−ω∂θ p+ 1
2πR ∂ 2θθ
h+ σ2
2 ∂ 2θθ
p
)=: LRz(θ , t,ω)
Spectrum of the linear operator1 Continuous spectrum S(k)+∞
k=−∞
S(k) :=λ ∈ C
∣∣λ =±σ2
2k2− kωi for all ω ∈Ω
2 Discrete spectrum
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 34 / 75
Analysis of phase transition Bifurcation analysis
Linearization and spectra
Linearized PDE (about incoherence solution)
∂
∂ tz(θ , t,ω) =
(−ω∂θ h− c− σ2
2 ∂ 2θθ
h−ω∂θ p+ 1
2πR ∂ 2θθ
h+ σ2
2 ∂ 2θθ
p
)=: LRz(θ , t,ω)
Spectrum of the linear operator1 Continuous spectrum S(k)+∞
k=−∞
S(k) :=λ ∈ C
∣∣λ =±σ2
2k2− kωi for all ω ∈Ω
2 Discrete spectrum
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 34 / 75
Analysis of phase transition Bifurcation analysis
Linearization and spectra
Linearized PDE (about incoherence solution)
∂
∂ tz(θ , t,ω) =
(−ω∂θ h− c− σ2
2 ∂ 2θθ
h−ω∂θ p+ 1
2πR ∂ 2θθ
h+ σ2
2 ∂ 2θθ
p
)=: LRz(θ , t,ω)
Spectrum of the linear operator1 Continuous spectrum S(k)+∞
k=−∞
S(k) :=λ ∈ C
∣∣λ =±σ2
2k2− kωi for all ω ∈Ω
−0.2 −0.1 0 0.1 0.2 0.3
−3
−2
−1
0
1
2
3
real
ima
g
γ = 0.1
R decreases
k=2 k=2
k=1 k=1
2 Discrete spectrum
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 34 / 75
Analysis of phase transition Bifurcation analysis
Linearization and spectra
Linearized PDE (about incoherence solution)
∂
∂ tz(θ , t,ω) =
(−ω∂θ h− c− σ2
2 ∂ 2θθ
h−ω∂θ p+ 1
2πR ∂ 2θθ
h+ σ2
2 ∂ 2θθ
p
)=: LRz(θ , t,ω)
Spectrum of the linear operator1 Continuous spectrum S(k)+∞
k=−∞
S(k) :=λ ∈ C
∣∣λ =±σ2
2k2− kωi for all ω ∈Ω
−0.2 −0.1 0 0.1 0.2 0.3
−3
−2
−1
0
1
2
3
real
ima
g
γ = 0.1
R decreases
k=2 k=2
k=1 k=1
2 Discrete spectrum
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 34 / 75
Analysis of phase transition Bifurcation analysis
Bifurcation diagram (Hamiltonian Hopf)
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
Stability proof
[3] Dellnitz et al., Int. Series Num. Math., 1992
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 35 / 75
Analysis of phase transition Bifurcation analysis
Bifurcation diagram (Hamiltonian Hopf)
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
Stability proof
−0.2 −0.1 0 0.1 0.2
-0.6
-0.8
-1
-1.2
-1.4
real
imag
(a)
Cont. spectrum; ind. of RDisc. spectrum; fn. of R
Bifurcation point
[3] Dellnitz et al., Int. Series Num. Math., 1992
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 35 / 75
Analysis of phase transition Bifurcation analysis
Bifurcation diagram (Hamiltonian Hopf)
Characteristic eqn:1
8R
∫Ω
g(ω)
(λ − σ2
2 +ωi)(λ + σ2
2 +ωi)dω +1 = 0.
Stability proof
−0.2 −0.1 0 0.1 0.2
-0.6
-0.8
-1
-1.2
-1.4
real
imag
(a)
Cont. spectrum; ind. of RDisc. spectrum; fn. of R
Bifurcation point
0 0.05 0.1 0.15 0.215
20
25
30
35
40
45
50
Incoherence
R > RR
c(γ
γ
) (c))
Synchrony
0.05
[3] Dellnitz et al., Int. Series Num. Math., 1992
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 35 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Analysis of phase transition Numerics
Numerical solution of PDEs
Incoherence; R = 60incoherence
incoherence
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 37 / 75
Analysis of phase transition Numerics
Numerical solution of PDEs
Incoherence; R = 60incoherence
incoherence
Synchrony; R = 10
synchrony
synchrony
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 37 / 75
Analysis of phase transition Numerics
Bifurcation diagram
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
Incoherence R−1/2
η(ω)
0. 1 0.15 0. 2 0.25 0. 3 0.35
0. 1
0.15
0. 2
0.25
ω = 0.95
ω = 1
ω = 1.05
R > Rc
η(ω) = η0
R < Rc
η(ω) < η0
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 38 / 75
Analysis of phase transition Numerics
Bifurcation diagram
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
Incoherence R−1/2
η(ω)
0. 1 0.15 0. 2 0.25 0. 3 0.35
0. 1
0.15
0. 2
0.25
ω = 0.95
ω = 1
ω = 1.05
R > Rc
η(ω) = η0
R < Rc
η(ω) < η0
incoherence soln.
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 38 / 75
Analysis of phase transition Numerics
Bifurcation diagram
Locking
0 0.1 0.2
0.15
0.2
0.25
R−1/ 2
γγIncoherence
R > Rc(γ)
Synchrony
Incoherence R−1/2
η(ω)
0. 1 0.15 0. 2 0.25 0. 3 0.35
0. 1
0.15
0. 2
0.25
ω = 0.95
ω = 1
ω = 1.05
R > Rc
η(ω) = η0
R < Rc
η(ω) < η0
synchrony soln.
dθi = (ωi +ui)dt +σ dξi
ηi(ui;u−i) = limT→∞
1T
∫ T
0E[c(θi;θ−i)+ 1
2 Ru2i ]ds
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 38 / 75
Analysis of phase transition Numerics
Bifurcation diagram with another cost
c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
R−1/2
0 1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
θ
p
R = 23.8359
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 39 / 75
Analysis of phase transition Numerics
Bifurcation diagram with another cost
c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
R−1/2
0 1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
θ
p
R = 23.8359
0 1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
θ
p
R = 23.8359
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 39 / 75
Analysis of phase transition Numerics
Bifurcation diagram with another cost
c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
R−1/2
0 1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
θ
p
R = 23.8359
c•(θ ,ϑ) = 1− cos(θ −ϑ)− cos3(θ −ϑ)
0 0.5 1 1.50
0.2
0.4
0.6
0.8
1
R−1/2
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 40 / 75
Analysis of phase transition Numerics
Bifurcation diagram with another cost
c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
R−1/2
0 1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
θ
p
R = 23.8359
c•(θ ,ϑ) = 1− cos(θ −ϑ)− cos3(θ −ϑ)
0 1 2 3 4 5 6
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
θ
p
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 40 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Learning Q-function approximation
Comparison to Kuramoto law
Control law u = ϕ(θ , t,ω) =− 1R
∂θ h(θ , t,ω)
−0.2
0
0.2
0.4
0.6
ω = 0.95
ω = 1
ω = 1.05
PopulationDensity
Control laws
0 π 2π θ
Equivalent control law in Kuramoto oscillator
u(Kur)i =
κ
N
N
∑j=1
sin(θj(t)−θi)N→∞≈ κ0 sin(ϑ0 + t−θi)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 42 / 75
Learning Q-function approximation
Comparison to Kuramoto law
Control law u = ϕ(θ , t,ω) =− 1R
∂θ h(θ , t,ω)
−0.2
0
0.2
0.4
0.6
ω = 0.95
ω = 1
ω = 1.05
Kuramoto
Population
Density
Control laws
0 π 2π θ
Equivalent control law in Kuramoto oscillator
u(Kur)i =
κ
N
N
∑j=1
sin(θj(t)−θi)N→∞≈ κ0 sin(ϑ0 + t−θi)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 42 / 75
Learning Q-function approximation
Optimality equation minuic(θ ;θ−i(t))+ 1
2 Ru2i +Duihi(θ , t)︸ ︷︷ ︸
=: Hi(θ ,ui;θ−i(t))
= η∗i
Optimal control law Kuramoto law
u∗i =− 1R
∂θ hi(θ , t) u(Kur)i =−κ
N ∑j 6=i
sin(θi−θj(t))
Parameterization:
H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1
2 Ru2i +(ωi−1+ui)AiS(φi)+
σ2
2AiC(φi)
where
S(φ)(θ ,θ−i) =1N ∑
j 6=isin(θ −θj−φ), C(φ)(θ ,θ−i) =
1N ∑
j 6=icos(θ −θj−φ)
Approx. optimal control:
u(Ai,φi)i = argmin
ui
H(Ai,φi)i (θ ,ui;θ−i(t))=−Ai
RS(φi)(θ ,θ−i)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 43 / 75
Learning Q-function approximation
Optimality equation minuic(θ ;θ−i(t))+ 1
2 Ru2i +Duihi(θ , t)︸ ︷︷ ︸
=: Hi(θ ,ui;θ−i(t))
= η∗i
Optimal control law Kuramoto law
u∗i =− 1R
∂θ hi(θ , t) u(Kur)i =−κ
N ∑j 6=i
sin(θi−θj(t))
Parameterization:
H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1
2 Ru2i +(ωi−1+ui)AiS(φi)+
σ2
2AiC(φi)
where
S(φ)(θ ,θ−i) =1N ∑
j 6=isin(θ −θj−φ), C(φ)(θ ,θ−i) =
1N ∑
j 6=icos(θ −θj−φ)
Approx. optimal control:
u(Ai,φi)i = argmin
ui
H(Ai,φi)i (θ ,ui;θ−i(t))=−Ai
RS(φi)(θ ,θ−i)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 43 / 75
Learning Q-function approximation
Optimality equation minuic(θ ;θ−i(t))+ 1
2 Ru2i +Duihi(θ , t)︸ ︷︷ ︸
=: Hi(θ ,ui;θ−i(t))
= η∗i
Optimal control law Kuramoto law
u∗i =− 1R
∂θ hi(θ , t) u(Kur)i =−κ
N ∑j 6=i
sin(θi−θj(t))
Parameterization:
H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1
2 Ru2i +(ωi−1+ui)AiS(φi)+
σ2
2AiC(φi)
where
S(φ)(θ ,θ−i) =1N ∑
j 6=isin(θ −θj−φ), C(φ)(θ ,θ−i) =
1N ∑
j 6=icos(θ −θj−φ)
Approx. optimal control:
u(Ai,φi)i = argmin
ui
H(Ai,φi)i (θ ,ui;θ−i(t))=−Ai
RS(φi)(θ ,θ−i)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 43 / 75
Learning Q-function approximation
Optimality equation minuic(θ ;θ−i(t))+ 1
2 Ru2i +Duihi(θ , t)︸ ︷︷ ︸
=: Hi(θ ,ui;θ−i(t))
= η∗i
Optimal control law Kuramoto law
u∗i =− 1R
∂θ hi(θ , t) u(Kur)i =−κ
N ∑j 6=i
sin(θi−θj(t))
Parameterization:
H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1
2 Ru2i +(ωi−1+ui)AiS(φi)+
σ2
2AiC(φi)
where
S(φ)(θ ,θ−i) =1N ∑
j 6=isin(θ −θj−φ), C(φ)(θ ,θ−i) =
1N ∑
j 6=icos(θ −θj−φ)
Approx. optimal control:
u(Ai,φi)i = argmin
ui
H(Ai,φi)i (θ ,ui;θ−i(t))=−Ai
RS(φi)(θ ,θ−i)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 43 / 75
1 MotivationWhy a game?Why Oscillators?
2 Problems and resultsProblem statementMain results
3 Derivation of modelOverviewDerivation stepsPDE model
4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics
5 LearningQ-function approximationSteepest descent algorithm
Learning Steepest descent algorithm
Bellman error:
Pointwise: L (Ai,φi)(θ , t) = minuiH(Ai,φi)
i −η(A∗i ,φ
∗i )
i
Simple gradient descent algorithm
e(Ai,φi) =2
∑k=1|〈L (Ai,φi), ϕk(θ)〉|2
dAi
dt=−ε
de(Ai,φi)dAi
,dφi
dt=−ε
de(Ai,φi)dφi
(∗)
Theorem (Convergence)
Assume population is in synchrony. The ith oscillator updatesaccording to (∗). Then
Ai(t)→ A∗ =1
2σ2
The pointwise Bellman error L (Ai,0)(θ , t) = ε(R)cos2(θ − t)
where ε(R) =1
16Rσ4
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 45 / 75
Learning Steepest descent algorithm
Phase transition
Suppose all oscillators use approx. optimal control law:
ui =−A∗
R1N ∑
j 6=isin(θi−θj(t))
then the phase transition boundary is
Rc(γ) =
1
2σ4 if γ = 01
4σ2γtan−1
(2γ
σ2
)if γ > 0
0 50 100 150 200 250 3002
2.5
3
3.5
4
4.5
5
5.5
6
t
k = 0.01; R = 1000
Ai
A*
0 0.05 0.1 0.15 0.215
20
25
30
35
40
45
50
γ
R
PDE
Learning
Incoherence
Synchrony
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 46 / 75
Learning Steepest descent algorithm
Comparison of control laws
From PDE: ui = ϕ(θ , t,ω)∣∣∣∣ω=ωi
=− 1R
∂θ h(θ , t,ω)∣∣∣∣ω=ωi
From learning: ui =−A∗
R1N ∑
j6=isin(θi−θj(t)−φi)
θθ
Population
Density
Population
Density
0 π 2π 0 π 2π
ω = 0.95
ω = 0.1
Control laws
PDE
Learning
ω = 1.05
ω = 0.95
ω = 0.1
ω = 1.05
0.20.1 0.2
0.6
0.4
0.2
0
-0.4
-0.2
0.1
0
-0.1
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 47 / 75
Learning Steepest descent algorithm
Comparison of average cost
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
PDE Learning
ω = 0.95
ω = 0.1
ω = 1.05
Average Cost
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 48 / 75
Thank you!
Website: http://www.mechse.illinois.edu/research/mehtapg
Huibing Yin Sean P. Meyn Uday V. Shanbhag
H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010
Part I
Apendix
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Game-theoretic results (ε-Nash)
Infinit population limit
Recall the PDE model
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h
∂tp+ω∂θ p =1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p
c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ)p(θ , t,ω)g(ω)dθ dω
Solution h(θ , t,ω)
Control uoi (t) :=− 1
R∂θ h(θ , t,ω)
∣∣ω=ωi
By construction:ηi(uo
i ; c)≤ ηi(ui; c)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 52 / 75
Appendix Game-theoretic results (ε-Nash)
Finite population
Finite population cost
c(N)i (ϑ , t) := E
[1N ∑
j 6=ic•(θi(t);θ
o−i(t)
∣∣θi(t) = ϑ)]
θo−i = (θ o
1 , . . . ,θ oi−1,θ
oi+1, . . . ,θ
oN) with θ
oi (t) obtained from dynamics
with control ui = uoi (t)
Lemma 1: for each i = 1, . . . ,N
maxϑ∈[0,2π]
[limsup
T→∞
1T
∫ T
0|c(ϑ ,s)− c(N)
i (ϑ ,s)|ds]
= O(1√N
)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 53 / 75
Appendix Game-theoretic results (ε-Nash)
ε-Nash equilibrium
Theorem
The control uoi (t) =− 1
R∂θ h(θ(t), t,ω)
∣∣ω=ωiN
i=1 is an ε-Nash
equilibrium with respect to the finite population cost
η(POP)i (ui;uo
−i) = limT→∞
1T
∫ T
0c(N)
i + 12 Ru2
i ds;
i.e., for any adapted control ui,
η(POP)i (uo
i ;uo−i)≤ η
(POP)i (ui;uo
−i)+O(1√N
), i = 1, . . . ,N.
c(N)i (ϑ , t) := E[c(θi(t);θ
o−i(t))|θi(t) = ϑ ]
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 54 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Stability
Stability proofdxdt
=−ax+by, a > 0 ⇐ FPK eqn
dydt
= cx+ay, bc < 0 ⇐ HJB eqn
with boundary conditions x(0) = x0 and y(∞) = 0
x(t) = eatx0 +b∫ t
0e−a(t−τ)y(τ)dτ,
y(t) =−c∫
∞
tea(t−τ)x(τ)dτ.
For x0 = 0, equilibrium solution x(t)≡ 0, y(t)≡ 0Characteristic equation σ
2 = a2 +bc.Stable in the sense that
The equilibrium solution is asymptotically stable if for any initialperturbation x0, the solution x(t)→ 0 as t→ ∞. Return to spectrum
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 56 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Numerics
Algorithm for solving PDEs
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
V(θ , t,ω) = h(θ ,T− s,ω) Iterative algorithm
s
tForward time
Backward time
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 58 / 75
Appendix Numerics
Algorithm for solving PDEs
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
V(θ , t,ω) = h(θ ,T− s,ω) Iterative algorithm
s
tForward time
Backward time
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 58 / 75
Appendix Numerics
Algorithm for solving PDEs
HJB: ∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t) +η
∗− σ2
2∂
2θθ h ⇒ h(θ , t,ω)
FPK: ∂tp+ω∂θ p =1R
∂θ [p( ∂θ h )]+σ2
2∂
2θθ p ⇒ p(θ , t,ω)
c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω
V(θ , t,ω) = h(θ ,T− s,ω) Iterative algorithm
s
tForward time
Backward time
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 58 / 75
Appendix Numerics
Numerical solution of PDEs
Incoherence; R = 60incoherence
0 0.05 0.1 0.15 0.215
20
25
30
35
40
45
50
Incoherence
R > RR
c(γ
γ
) (c))
Synchrony
0.05
p(θ , t,ω) at given position θ
0 20 40 60 80 100 120 140 160 180 2000.1
0.12
0.14
0.16
0.18
0.2
t
p(θ
,t,ω
)
γ = 0.01; Iter = 1
100 105 110 115 120 125 130 135 140 145 1500.1634
0.1636
0.1638
0.164
0.1642
t
p(θ
,t,ω
)
0 20 40 60 80 100 120 140 160 180 2000.1
0.12
0.14
0.16
0.18
0.2
t
p(θ
,t,ω
)
γ = 0.01; Iter = 4
100 105 110 115 120 125 130 135 140 145 1500.162
0.164
0.166
0.168
t
p(θ
,t,ω
)
0 20 40 60 80 100 120 140 160 180 2000.1
0.12
0.14
0.16
0.18
0.2
t
p(θ
,t,ω
)
γ = 0.01; Iter = 19
100 105 110 115 120 125 130 135 140 145 1500.155
0.16
0.165
0.17
t
p(θ
,t,ω
)
Iter = 1 Iter=4 Iter=19
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 59 / 75
Appendix Numerics
Numerical solution of PDEs
Synchrony; R = 10
synchrony
synchrony
p(θ , t,ω) at given position θ
0 20 40 60 80 100 120 140 160 180 2000.1
0.15
0.2
0.25
t
p(θ
,t,ω
)
γ = 0.01; Iter = 1
100 105 110 115 120 125 130 135 140 145 1500.154
0.1542
0.1544
0.1546
t
p(θ
,t,ω
)
0 20 40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
t
p(θ
,t,ω
)
γ = 0.01; Iter = 4
100 105 110 115 120 125 130 135 140 145 1500.05
0.1
0.15
0.2
0.25
t
p(θ
,t,ω
)
0 20 40 60 80 100 120 140 160 180 200−0.5
0
0.5
1
t
p(θ
,t,ω
)
γ = 0.01; Iter = 19
100 105 110 115 120 125 130 135 140 145 150−0.5
0
0.5
1
t
p(θ
,t,ω
)Iter = 1 Iter=4 Iter=19
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 59 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Linearization/Spectra
Linearization
∂
∂ tz(θ , t,ω) =
(−ω∂θ h− c− σ2
2 ∂ 2θθ
h−ω∂θ p+ 1
2πR ∂ 2θθ
h+ σ2
2 ∂ 2θθ
p
)
Eigenvector problem λZ = LRZ
Fourier series expansion with respect to θ
H =+∞
∑k=−∞
Hk(ω)eikθ , P =+∞
∑k=−∞
Pk(ω)eikθ
Decomposition of the linear operator LR =⊕kL(k)
R
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 61 / 75
Appendix Linearization/Spectra
Linearization (cont.)
Individual operator
L(1)
R :=
(σ2
2 −ωi π
4∫
Ω·g(ω)dω
− 12πR −σ2
2 −ωi
)
L(k)
R :=
(σ2
2 k2− kωi 0− k2
2πR −σ2
2 k2− kωi
), k ≥ 2
L(−k)
R = L(k)
RContinuous spectrum S(k)+∞
k=−∞
S(k) :=
λ ∈ C∣∣λ =±σ2
2k2− kωi for all ω ∈Ω
Discrete spectrum coincides with the discrete spectrum of L
(±1)R
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 62 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Modeling
Neuron
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 64 / 75
Appendix Modeling
Neuron
Equivalent circuits
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 64 / 75
Appendix Modeling
Finite oscillator model
Dynamics of ith oscillator [10]:
dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0 (1)
θi— phase of the oscillatorξi— mutually independent Wiener processesωi — constant and chosen independently according to a fixeddistribution with density gui(t) — control
1- 1+1
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 65 / 75
Appendix Modeling
Finite oscillator model (cont.)
ith oscillator seeks to minimize
η(POP)i (ui;u−i) = lim
T→∞
1T
∫ T
0E[ c(θi;θ−i)︸ ︷︷ ︸
cost of anarchy
+ 12 Ru2
i︸ ︷︷ ︸cost of control
]ds
θ−i = (θj)j6=iR — control penaltyc(·) — cost function
c(θi;θ−i) =1N ∑
j 6=ic•(θi,θj), c• ≥ 0
u∗i Ni=1 Nash equilibrium if
u∗i minimizes η(POP)i (ui;u∗−i), for i = 1, . . . ,N.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 66 / 75
Appendix Modeling
Intuition
ith oscillator seeks to minimize
η(POP)i (ui;u−i) = lim
T→∞
1T
∫ T
0E[ c(θi;θ−i)︸ ︷︷ ︸
cost of anarchy
+ 12 Ru2
i︸ ︷︷ ︸cost of control
]ds
Trade-off between reducing the two costs
Cost associated with θi 6= θj: c(θi;θ−i) =1N ∑
j6=ic•(θi−θj)
Cost associated with control: 12 Ru2
i
Qualitative macro-behavior change when R varies
0 0.05 0.1 0.15 0.215
20
25
30
35
40
45
50
Incoherence
R > RR
c(γ
γ
) (c))
Synchrony
0.05
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 67 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Multiple oscillators
PDE model (as N→ ∞)
HJB equation⇒ obtain optimal control
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h (2)
h(θ , t,ω)— relative value functionη∗(ω)— average optimal cost
Optimal feed-back control law ϕ(θ , t,ω) :=− 1R
∂θ h(θ , t,ω)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 69 / 75
Appendix Multiple oscillators
PDE model (as N→ ∞)
HJB equation⇒ obtain optimal control
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h (2)
h(θ , t,ω)— relative value functionη∗(ω)— average optimal cost
Optimal feed-back control law ϕ(θ , t,ω) :=− 1R
∂θ h(θ , t,ω)
Fokker-Planck-Kolmogorov (FPK) equation
dθi(t) = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N
replaced by: ∂tp+∂θ
((ω + ϕ )p
)=
σ2
2∂
2θθ p
p(θ , t,ω) — probability density function
[10] Strogatz et al., Journal of Statistical Physics, 1991
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 69 / 75
Appendix Multiple oscillators
PDE model (as N→ ∞)
HJB equation⇒ obtain optimal control
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h (2)
h(θ , t,ω)— relative value functionη∗(ω)— average optimal cost
Optimal feed-back control law ϕ(θ , t,ω) :=− 1R
∂θ h(θ , t,ω)
Fokker-Planck-Kolmogorov (FPK) equation
dθi(t) = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N
replaced by: ∂tp+∂θ
((ω + ϕ )p
)=
σ2
2∂
2θθ p
p(θ , t,ω) — probability density function
[10] Strogatz et al., Journal of Statistical Physics, 1991
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 69 / 75
Appendix Multiple oscillators
PDE model (as N→ ∞)
HJB equation⇒ obtain optimal control
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h (2)
h(θ , t,ω)— relative value functionη∗(ω)— average optimal cost
Optimal feed-back control law ϕ(θ , t,ω) :=− 1R
∂θ h(θ , t,ω)
FPK equation⇒ evolution of population phase angles
∂tp+ω∂θ p =1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p (3)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 69 / 75
Appendix Multiple oscillators
PDE model (as N→ ∞)
HJB equation⇒ obtain optimal control
∂th+ω∂θ h =1
2R(∂θ h)2− c(θ , t)+η
∗− σ2
2∂
2θθ h (2)
h(θ , t,ω)— relative value functionη∗(ω)— average optimal cost
Optimal feed-back control law ϕ(θ , t,ω) :=− 1R
∂θ h(θ , t,ω)
FPK equation⇒ evolution of population phase angles
∂tp+ω∂θ p =1R
∂θ [p(∂θ h)]+σ2
2∂
2θθ p ⇒ (3)
Generate cost:∫
Ω
∫ 2π
0c•(ϑ ,θ)p(θ , t,ω)g(ω)dθ dω
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 69 / 75
Appendix Multiple oscillators
Consistency
Consistency requirement on deterministic mass influence
c(ϑ , t) =∫
Ω
∫ 2π
0c•(ϑ ,θ)p(θ , t,ω)g(ω)dθ dω
Recall: c(θi;θ−i) =1N ∑
j 6=ic•(θi,θj)
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 70 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix Learning
Q-learning
Dynamics of ith agentddt
xi = aixi +biui
ith agent seeks to minimize
ηi = limT→∞
∫ T
0e−γs ((xi(s)− z(s))2 +u2
i (s))
ds
Control ui =−kixxi− ki
zz
0 1 2 3 4 5 6 7 8 9 10
−0.0
0.0
0
1
-1
(indi
vidu
al st
ate)
(ens
embl
e st
ate)
Agent 4
[8] P. Mehta and S. Meyn, CDC, 2009
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 72 / 75
Appendix Learning
Control comparison
0.1 0.12 0.14 0.16 0.18 0.2
-0.5
0
-1
0.5
1
0.1 0.12 0.14 0.16 0.18 0.20
0.1
0.2
0.3
0.4
0.5
PDE Learning
ω = 0.95
ω = 0.1
ω = 1.05
Control Amplitude Control Phase
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 73 / 75
6 AppendixGame-theoretic results (ε-Nash)StabilityNumericsLinearization/SpectraModelingMultiple oscillatorsLearningThe New Yorker excerpt
7 Bibliography
Appendix The New Yorker excerpt
Rational irrationality
The Prisoner’s Dilemma is the obverse of Adam Smith’s theory of theinvisible hand, in which the free market coördinates the behavior ofself-seeking individuals to the benefit of all. Each businessman“intends only his own gain,” Smith wrote in “The Wealth of Nations,”“and he is in this, as in many other cases, led by an invisible hand topromote an end which was no part of his intention.” But in a marketenvironment the individual pursuit of self-interest, however rational,can give way to collective disaster. The invisible hand becomes a fist.
John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 75 / 75
Bibliography
Dimitri P. Bertsekas.Dynamic Programming and Optimal Control, volume 1.Athena Scientific, Belmont, Massachusetts, 1995.
Eric Brown, Jeff Moehlis, and Philip Holmes.On the phase reduction and response dynamics of neuraloscillator populations.Neural Computation, 16(4):673–715, 2004.
M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle.Generic bifurcations of pendula.Int. Series Num. Math., 104:111–122, 1992.
J. Guckenheimer.Isochrons and phaseless sets.J. Math. Biol., 1:259–273, 1975.
Minyi Huang, Peter E. Caines, and Roland P. Malhame.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 75 / 75
Bibliography
Large-population cost-coupled LQG problems with nonuniformagents: Individual-mass behavior and decentralized ε-nashequilibria.IEEE transactions on automatic control, 52(9):1560–1571, 2007.
Y. Kuramoto.International Symposium on Mathematical Problems in TheoreticalPhysics, volume 39 of Lecture Notes in Physics.Springer-Verlag, 1975.
Andrzej Lasota and Michael C. Mackey.Chaos, Fractals and Noise.Springer, 1994.
P. Mehta and S. Meyn.Q-learning and Pontryagin’s Minimum Principle.To appear, 48th IEEE Conference on Decision and Control,December 16-18 2009.
Sean P. Meyn.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 75 / 75
Bibliography
The policy iteration algorithm for average reward markov decisionprocesses with general state space.IEEE Transactions on Automatic Control, 42(12):1663–1680,December 1997.
S. H. Strogatz and R. E. Mirollo.Stability of incoherence in a population of coupled oscillators.Journal of Statistical Physics, 63:613–635, May 1991.
Steven H. Strogatz, Daniel M. Abrams, Bruno Eckhardt, andEdward Ott.Theoretical mechanics: Crowd synchrony on the millenniumbridge.Nature, 438:43–44, 2005.
P. G. Mehta (UIUC) GERAD Apr. 20, 2010 75 / 75