83
CSL COORDINATED SCIENCE LABORATORY Synchronization of coupled oscillators is a game Prashant G. Mehta 1 1 Coordinated Science Laboratory Department of Mechanical Science and Engineering University of Illinois at Urbana-Champaign University of Maryland, March 4, 2010 Acknowledgment: AFOSR, NSF

Maryland 2010

Embed Size (px)

Citation preview

Page 1: Maryland 2010

CSLCOORDINATED SCIENCE LABORATORY

Synchronization of coupled oscillators is a game

Prashant G. Mehta1

1Coordinated Science LaboratoryDepartment of Mechanical Science and Engineering

University of Illinois at Urbana-Champaign

University of Maryland, March 4, 2010

Acknowledgment: AFOSR, NSF

Page 2: Maryland 2010

Huibing Yin Sean P. Meyn Uday V. Shanbhag

H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 2 / 69

Page 3: Maryland 2010

Millennium bridge

Video of London Millennium bridge from youtube

[11] S. H. Strogatz et al., Nature, 2005

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 3 / 69

Page 4: Maryland 2010

Classical Kuramoto model

dθi(t) =

�ωi +

κN

N

∑j=1

sin(θj(t)−θi(t))

�dt +σ dξi(t), i = 1, . . . ,N

ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population

κ — measures the strength of coupling

[6] Y. Kuramoto (1975)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

Page 5: Maryland 2010

Classical Kuramoto model

dθi(t) =

�ωi +

κN

N

∑j=1

sin(θj(t)−θi(t))

�dt +σ dξi(t), i = 1, . . . ,N

ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population

κ — measures the strength of coupling 1- 1+1

[6] Y. Kuramoto (1975)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

Page 6: Maryland 2010

Classical Kuramoto model

dθi(t) =

�ωi +

κN

N

∑j=1

sin(θj(t)−θi(t))

�dt +σ dξi(t), i = 1, . . . ,N

ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population

κ — measures the strength of coupling

[6] Y. Kuramoto (1975)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

Page 7: Maryland 2010

Classical Kuramoto model

dθi(t) =

�ωi +

κN

N

∑j=1

sin(θj(t)−θi(t))

�dt +σ dξi(t), i = 1, . . . ,N

ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population

κ — measures the strength of coupling

0 0.1 0.20.1

0.15

0.2

0.25

0.3 Locking

Incoherence

κ

κ < κc(γ)

γ

Synchrony

Incoherence

[6] Y. Kuramoto (1975)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

Page 8: Maryland 2010

Movies of incoherence and synchrony solution

Incoherence Synchrony

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 5 / 69

Page 9: Maryland 2010

Problem statement

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control 1- 1+1

ith oscillator seeks to minimize

ηi(ui;u−i) = limT→∞

1T

� T

0E[ c(θi;θ−i)� �� �

cost of anarchy

+ 12 Ru2

i� �� �cost of control

]ds

θ−i = (θj)j�=iR — control penalty

c(·) — cost function

c(θi;θ−i) =1N ∑

j�=ic•(θi,θj), c• ≥ 0

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 6 / 69

Page 10: Maryland 2010

Problem statement

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control 1- 1+1

ith oscillator seeks to minimize

ηi(ui;u−i) = limT→∞

1T

� T

0E[ c(θi;θ−i)� �� �

cost of anarchy

+ 12 Ru2

i� �� �cost of control

]ds

θ−i = (θj)j�=iR — control penalty

c(·) — cost function

c(θi;θ−i) =1N ∑

j�=ic•(θi,θj), c• ≥ 0

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 6 / 69

Page 11: Maryland 2010

Problem statement

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control 1- 1+1

ith oscillator seeks to minimize

ηi(ui;u−i) = limT→∞

1T

� T

0E[ c(θi;θ−i)� �� �

cost of anarchy

+ 12 Ru2

i� �� �cost of control

]ds

θ−i = (θj)j�=iR — control penalty

c(·) — cost function

c(θi;θ−i) =1N ∑

j�=ic•(θi,θj), c• ≥ 0

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 6 / 69

Page 12: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 13: Maryland 2010

Motivation Why a game?

Quiz

In the video you just watched, why were theindividuals walking strangely?

A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an

Egyptian."D. The individuals were trying to maintain their balance.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 8 / 69

Page 14: Maryland 2010

Motivation Why a game?

Quiz

In the video you just watched, why were theindividuals walking strangely?

A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an

Egyptian."D. The individuals were trying to maintain their balance.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 8 / 69

Page 15: Maryland 2010

Motivation Why a game?

Quiz

In the video you just watched, why were theindividuals walking strangely?

A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an

Egyptian."D. The individuals were trying to maintain their balance.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 8 / 69

Page 16: Maryland 2010

Motivation Why a game?

Quiz

In the video you just watched, why were theindividuals walking strangely?

A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an

Egyptian."D. The individuals were trying to maintain their balance.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 8 / 69

Page 17: Maryland 2010

Motivation Why a game?

Quiz

In the video you just watched, why were theindividuals walking strangely?

A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an

Egyptian."D. The individuals were trying to maintain their balance.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 8 / 69

Page 18: Maryland 2010

Motivation Why a game?

“Rational irrationality”

“—behavior that, on the individual level, is perfectly reasonable butthat, when aggregated in the marketplace, produces calamity.”

ExamplesMillennium bridgeFinancial market

John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 9 / 69

Page 19: Maryland 2010

Motivation Why a game?

“Rational irrationality”

“—behavior that, on the individual level, is perfectly reasonable butthat, when aggregated in the marketplace, produces calamity.”

ExamplesMillennium bridgeFinancial market

John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 9 / 69

Page 20: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 21: Maryland 2010

Motivation Why Oscillators?

Hodgkin-Huxley type Neuron model

CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

Page 22: Maryland 2010

Motivation Why Oscillators?

Hodgkin-Huxley type Neuron model

CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

Page 23: Maryland 2010

Motivation Why Oscillators?

Hodgkin-Huxley type Neuron model

CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

−100−50

050

100

00.2

0.40.6

0.810

0.1

0.2

0.3

0.4

Vh

r

Limit cyle

r

h v

[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

Page 24: Maryland 2010

Motivation Why Oscillators?

Hodgkin-Huxley type Neuron model

CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

−100−50

050

100

00.2

0.40.6

0.810

0.1

0.2

0.3

0.4

Vh

r

Limit cyle

r

h v

Normal form reduction−−−−−−−−−−−−−→

θi = ωi +ui ·Φ(θi)

[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

Page 25: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 26: Maryland 2010

Problems and results Problem statement

Finite oscillator model

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control 1- 1+1

ith oscillator seeks to minimize

ηi(ui;u−i) = limT→∞

1T

� T

0E[ c(θi;θ−i)� �� �

cost of anarchy

+ 12 Ru2

i� �� �cost of control

]ds

θ−i = (θj)j�=iR — control penaltyc(·) — cost function

c(θi;θ−i) =1N ∑

j�=ic•(θi,θj), c• ≥ 0

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 13 / 69

Page 27: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 28: Maryland 2010

Problems and results Main results

1. Synchronization is a solution of game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

Incoherence

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

1- 1+1

Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 15 / 69

Page 29: Maryland 2010

Problems and results Main results

1. Synchronization is a solution of game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

Incoherence

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

0 0.1 0.20.1

0.15

0.2

0.25

0.3 Locking

Incoherence

κ

κ < κc(γ)

γ

Synchrony

Incoherence

dθi =

�ωi +

κN

N

∑j=1

sin(θj−θi)

�dt +σ dξi

Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 15 / 69

Page 30: Maryland 2010

Problems and results Main results

2. Kuramoto control is approximately optimal

−0.2

0

0.2

0.4

0.6

ω = 1

Kuramoto

PopulationDensity

Control laws

0 π 2π θ

ui =−A∗iR

1N ∑

j�=isin(θ −θj(t))

0 50 100 150 200 250 3002

2.5

3

3.5

4

4.5

5

5.5

6

t

k = 0.01; R = 1000

Ai

A*

Learning algorithm:dAi

dt=−ε . . .

Yin et.al. CDC 2010

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 16 / 69

Page 31: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 32: Maryland 2010

Derivation of model Overview

Overview of model derivation

dθi = (ωi +ui(t))dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi, t)+ 1

2 Ru2i ]ds

Influence

Influence

Mass

1 Mean-field approximationAssumption:

c(θi;θ−i(t)) =1N ∑

j�=ic•(θi,θj)

N→∞−−−−−−→ c(θ , t)

2 Optimal control of single oscillatorDecentralized control structure

[5] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM]

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 18 / 69

Page 33: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 34: Maryland 2010

Derivation of model Derivation steps

Single oscillator with given cost

Dynamics of the oscillator

dθi = (ωi +ui(t))dt +σ dξi, t ≥ 0

The cost function is assumed known

ηi(ui; c) = limT→∞

1T

� T

0E[ c(θi;θ−i) + 1

2 Ru2i (s)]ds

⇑c(θi(s),s)

HJB equation:

∂thi +ωi∂θ hi =1

2R(∂θ hi)2− c(θ , t)+η∗i −

σ2

2∂ 2

θθ hi

Optimal control law: u∗i (t) = ϕi(θ , t) =− 1R

∂θ hi(θ , t)

[1] D. P. Bertsekas (1995); [9] S. P. Meyn, IEEE TAC, 1997

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 20 / 69

Page 35: Maryland 2010

Derivation of model Derivation steps

Single oscillator with optimal control

Dynamics of the oscillator

dθi(t) =�

ωi−1R

∂θ hi(θi, t)�

dt +σ dξi(t)

Fokker-Planck equation for pdf p(θ , t,ωi)

FPK: ∂tp+ωi∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂ 2

θθ p

[7] A. Lasota and M. C. Mackey, “Chaos, Fractals and Noise,” Springer 1994

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 21 / 69

Page 36: Maryland 2010

Derivation of model Derivation steps

Mean-field Approximation

HJB equation for population

∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η(ω)− σ2

2∂ 2

θθ h h(θ , t,ω)

Population density

∂tp+ω∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂ 2

θθ p p(θ , t,ω)

Enforce cost consistency

c(θ , t) =�

Ω

� 2π

0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω

≈ 1N ∑

j�=ic•(θ ,ϑ)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 22 / 69

Page 37: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 38: Maryland 2010

Derivation of model PDE model

Summary

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η∗ − σ2

2∂ 2

θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂ 2

θθ p ⇒ p(θ , t,ω)

Mean-field approx.: c(ϑ , t) =�

Ω

� 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 24 / 69

Page 39: Maryland 2010

Derivation of model PDE model

Summary

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η∗ − σ2

2∂ 2

θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂ 2

θθ p ⇒ p(θ , t,ω)

Mean-field approx.: c(ϑ , t) =�

Ω

� 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 24 / 69

Page 40: Maryland 2010

Derivation of model PDE model

Summary

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η∗ − σ2

2∂ 2

θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂ 2

θθ p ⇒ p(θ , t,ω)

Mean-field approx.: c(ϑ , t) =�

Ω

� 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 24 / 69

Page 41: Maryland 2010

Derivation of model PDE model

Summary

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η∗ − σ2

2∂ 2

θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂ 2

θθ p ⇒ p(θ , t,ω)

Mean-field approx.: c(ϑ , t) =�

Ω

� 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 24 / 69

Page 42: Maryland 2010

Derivation of model PDE model

Summary

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η∗ − σ2

2∂ 2

θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂ 2

θθ p ⇒ p(θ , t,ω)

Mean-field approx.: c(ϑ , t) =�

Ω

� 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 24 / 69

Page 43: Maryland 2010

Derivation of model PDE model

1. Solution of PDE gives ε-Nash equilibrium

Optimal control law

uoi =− 1

R∂θ h(θ(t), t,ω)

��ω=ωi

ε-Nash property (as N → ∞)

ηi(uoi ;uo

−i)≤ ηi(ui;uo−i)+O(

1√N

), i = 1, . . . ,N.

So, we look for solutions of PDEs.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 25 / 69

Page 44: Maryland 2010

Derivation of model PDE model

1. Solution of PDE gives ε-Nash equilibrium

Optimal control law

uoi =− 1

R∂θ h(θ(t), t,ω)

��ω=ωi

ε-Nash property (as N → ∞)

ηi(uoi ;uo

−i)≤ ηi(ui;uo−i)+O(

1√N

), i = 1, . . . ,N.

So, we look for solutions of PDEs.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 25 / 69

Page 45: Maryland 2010

Derivation of model PDE model

1. Solution of PDE gives ε-Nash equilibrium

Optimal control law

uoi =− 1

R∂θ h(θ(t), t,ω)

��ω=ωi

ε-Nash property (as N → ∞)

ηi(uoi ;uo

−i)≤ ηi(ui;uo−i)+O(

1√N

), i = 1, . . . ,N.

So, we look for solutions of PDEs.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 25 / 69

Page 46: Maryland 2010

Derivation of model PDE model

2. Incoherence solution (PDE)

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2πincoherence

h(θ , t,ω) = 0 ⇒ ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η∗ − σ2

2∂ 2

θθ h

∂tp+ω∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂ 2

θθ p

c(θ , t) =�

Ω

� 2π

0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 26 / 69

Page 47: Maryland 2010

Derivation of model PDE model

2. Incoherence solution (PDE)

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2πincoherence

h(θ , t,ω) = 0⇒ ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η∗ − σ2

2∂ 2

θθ h

p(θ , t,ω) = 12π ⇒ ∂tp+ω∂θ p =

1R

∂θ [p(∂θ h)]+σ2

2∂ 2

θθ p

c(θ , t) =�

Ω

� 2π

0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 26 / 69

Page 48: Maryland 2010

Derivation of model PDE model

2. Incoherence solution (PDE)

Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2

�ϑ −θ

2

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

Optimal control u =− 1R

∂θ h = 0

Average cost

c(θ , t) =�

Ω

� 2π

0

12 sin2

�θ −ϑ

2

�1

2πg(ω)dϑ dω

η∗(ω) = c(θ , t) =14

=: η0 for all ω ∈Ω

incoherence soln.

No cost of control

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 27 / 69

Page 49: Maryland 2010

Derivation of model PDE model

2. Incoherence solution (Finite population)

Closed-loop dynamics dθi = (ωi + ui����=0

)dt +σ dξi(t)

Average cost

ηi = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i� �� �

=0

]dt

= limT→∞

1N ∑

j�=i

1T

� T

0E[ 1

2 sin2�

θi(t)−θj(t)2

�]dt

=1N ∑

j�=i

� 2π

0E[ 1

2 sin2�

θi(t)−ϑ2

�]

12π

dϑ =N−1

Nη0

incoherence

ε-Nash property

ηi(uoi ;uo

−i)≤ ηi(ui;uo−i)+O(

1√N

), i = 1, . . . ,N.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 28 / 69

Page 50: Maryland 2010

Derivation of model PDE model

2. Incoherence solution (Finite population)

Closed-loop dynamics dθi = (ωi + ui����=0

)dt +σ dξi(t)

Average cost

ηi = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i� �� �

=0

]dt

= limT→∞

1N ∑

j�=i

1T

� T

0E[ 1

2 sin2�

θi(t)−θj(t)2

�]dt

=1N ∑

j�=i

� 2π

0E[ 1

2 sin2�

θi(t)−ϑ2

�]

12π

dϑ =N−1

Nη0

incoherence

ε-Nash property

ηi(uoi ;uo

−i)≤ ηi(ui;uo−i)+O(

1√N

), i = 1, . . . ,N.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 28 / 69

Page 51: Maryland 2010

Derivation of model PDE model

3. Synchronization is a solution of game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

IncoherenceR−1/ 2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.350. 1

0.15

0. 2

0.25

ω= 0.95ω= 1ω= 1.05

R > Rc

η(ω) = η0

R < R

c

η(ω) < η0

c

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds η(ω) = min

uiηi(ui;uo

−i)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1t = 38.24

Synchrony solution of

Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69

Page 52: Maryland 2010

Derivation of model PDE model

3. Synchronization is a solution of game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

IncoherenceR−1/ 2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.350. 1

0.15

0. 2

0.25

ω= 0.95ω= 1ω= 1.05

R > Rc

η(ω) = η0

R < R

c

η(ω) < η0

c

incoherence soln.

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds η(ω) = min

uiηi(ui;uo

−i)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1t = 38.24

Synchrony solution of

Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69

Page 53: Maryland 2010

Derivation of model PDE model

3. Synchronization is a solution of game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

IncoherenceR−1/ 2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.350. 1

0.15

0. 2

0.25

ω= 0.95ω= 1ω= 1.05

R > Rc

η(ω) = η0

R < R

c

η(ω) < η0

c

synchrony soln.

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds η(ω) = min

uiηi(ui;uo

−i)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1t = 38.24

Synchrony solution of

Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69

Page 54: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 55: Maryland 2010

Analysis of phase transition Incoherence solution

Overview of the steps

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η∗ − σ2

2∂ 2

θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂ 2

θθ p ⇒ p(θ , t,ω)

c(ϑ , t) =�

Ω

� 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2

�ϑ −θ

2

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 31 / 69

Page 56: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 57: Maryland 2010

Analysis of phase transition Bifurcation analysis

Linearization and spectra

Linearized PDE (about incoherence solution)

∂∂ t

z(θ , t,ω) =

�−ω∂θ h− c− σ2

2 ∂ 2θθ h

−ω∂θ p+ 12πR ∂ 2

θθ h+ σ2

2 ∂ 2θθ p

�=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum {S(k)}+∞

k=−∞S(k) :=�

λ ∈ C��λ = ±σ2

2k2− kωi for all ω ∈Ω

2 Discrete spectrum

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

Page 58: Maryland 2010

Analysis of phase transition Bifurcation analysis

Linearization and spectra

Linearized PDE (about incoherence solution)

∂∂ t

z(θ , t,ω) =

�−ω∂θ h− c− σ2

2 ∂ 2θθ h

−ω∂θ p+ 12πR ∂ 2

θθ h+ σ2

2 ∂ 2θθ p

�=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum {S(k)}+∞

k=−∞S(k) :=�

λ ∈ C��λ = ±σ2

2k2− kωi for all ω ∈Ω

2 Discrete spectrum

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

Page 59: Maryland 2010

Analysis of phase transition Bifurcation analysis

Linearization and spectra

Linearized PDE (about incoherence solution)

∂∂ t

z(θ , t,ω) =

�−ω∂θ h− c− σ2

2 ∂ 2θθ h

−ω∂θ p+ 12πR ∂ 2

θθ h+ σ2

2 ∂ 2θθ p

�=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum {S(k)}+∞

k=−∞S(k) :=�

λ ∈ C��λ = ±σ2

2k2− kωi for all ω ∈Ω

�−0.2 −0.1 0 0.1 0.2 0.3

−3

−2

−1

0

1

2

3

real

imag

γ = 0.1

R decreases

k=2 k=2

k=1 k=1

2 Discrete spectrum

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

Page 60: Maryland 2010

Analysis of phase transition Bifurcation analysis

Linearization and spectra

Linearized PDE (about incoherence solution)

∂∂ t

z(θ , t,ω) =

�−ω∂θ h− c− σ2

2 ∂ 2θθ h

−ω∂θ p+ 12πR ∂ 2

θθ h+ σ2

2 ∂ 2θθ p

�=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum {S(k)}+∞

k=−∞S(k) :=�

λ ∈ C��λ = ±σ2

2k2− kωi for all ω ∈Ω

�−0.2 −0.1 0 0.1 0.2 0.3

−3

−2

−1

0

1

2

3

real

imag

γ = 0.1

R decreases

k=2 k=2

k=1 k=1

2 Discrete spectrum

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

Page 61: Maryland 2010

Analysis of phase transition Bifurcation analysis

Bifurcation diagram (Hamiltonian Hopf)

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

[3] Dellnitz et al., Int. Series Num. Math., 1992

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 34 / 69

Page 62: Maryland 2010

Analysis of phase transition Bifurcation analysis

Bifurcation diagram (Hamiltonian Hopf)

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

−0.2 −0.1 0 0.1 0.2

-0.6

-0.8

-1

-1.2

-1.4

real

imag

(a)

Cont. spectrum; ind. of RDisc. spectrum; fn. of R

Bifurcation point

[3] Dellnitz et al., Int. Series Num. Math., 1992

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 34 / 69

Page 63: Maryland 2010

Analysis of phase transition Bifurcation analysis

Bifurcation diagram (Hamiltonian Hopf)

Characteristic eqn:1

8R

Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

−0.2 −0.1 0 0.1 0.2

-0.6

-0.8

-1

-1.2

-1.4

real

imag

(a)

Cont. spectrum; ind. of RDisc. spectrum; fn. of R

Bifurcation point

0 0.05 0.1 0.15 0.215

20

25

30

35

40

45

50

Incoherence

R > RR

c(γ

γ

) (c)

Synchrony

0.05

[3] Dellnitz et al., Int. Series Num. Math., 1992

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 34 / 69

Page 64: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 65: Maryland 2010

Analysis of phase transition Numerics

Numerical solution of PDEs

Incoherence; R = 60incoherence

incoherence

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 36 / 69

Page 66: Maryland 2010

Analysis of phase transition Numerics

Numerical solution of PDEs

Incoherence; R = 60incoherence

incoherence

Synchrony; R = 10

synchrony

synchrony

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 36 / 69

Page 67: Maryland 2010

Analysis of phase transition Numerics

Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 37 / 69

Page 68: Maryland 2010

Analysis of phase transition Numerics

Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

incoherence soln.

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 37 / 69

Page 69: Maryland 2010

Analysis of phase transition Numerics

Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

synchrony soln.

dθi = (ωi +ui)dt +σ dξi

ηi(ui;u−i) = limT→∞

1T

� T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 37 / 69

Page 70: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 71: Maryland 2010

Learning Q-function approximation

Comparison to Kuramoto law

Control law u = ϕ(θ , t,ω)

−0.2

0

0.2

0.4

0.6

ω = 0.95ω = 1ω = 1.05

PopulationDensity

Control laws

0 π 2π θ

Equivalent control law in Kuramoto oscillator

u(Kur)i =

κN

N

∑j=1

sin(θj(t)−θi)N→∞≈ κ0 sin(ϑ0 + t−θi)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 39 / 69

Page 72: Maryland 2010

Learning Q-function approximation

Comparison to Kuramoto law

Control law u = ϕ(θ , t,ω)

−0.2

0

0.2

0.4

0.6

ω = 0.95

ω = 1

ω = 1.05

Kuramoto

Population

Density

Control laws

0 π 2π θ

Equivalent control law in Kuramoto oscillator

u(Kur)i =

κN

N

∑j=1

sin(θj(t)−θi)N→∞≈ κ0 sin(ϑ0 + t−θi)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 39 / 69

Page 73: Maryland 2010

Learning Q-function approximation

Optimality equation minui

{c(θ ;θ−i(t))+ 12 Ru2

i +Duihi(θ , t)� �� �=: Hi(θ ,ui;θ−i(t))

} = η∗i

Optimal control law Kuramoto law

u∗i =− 1R

∂θ hi(θ , t) u(Kur)i =−κ

N ∑j�=i

sin(θi−θj(t))

Parameterization:

H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1

2 Ru2i +(ωi−1+ui)AiS(φi)+

σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑

j�=isin(θ −θj−φ), C(φ)(θ ,θ−i) =

1N ∑

j�=icos(θ −θj−φ)

Approx. optimal control:

u(Ai,φi)i = argmin

ui

{H(Ai,φi)i (θ ,ui;θ−i(t))} =−Ai

RS(φi)(θ ,θ−i)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 40 / 69

Page 74: Maryland 2010

Learning Q-function approximation

Optimality equation minui

{c(θ ;θ−i(t))+ 12 Ru2

i +Duihi(θ , t)� �� �=: Hi(θ ,ui;θ−i(t))

} = η∗i

Optimal control law Kuramoto law

u∗i =− 1R

∂θ hi(θ , t) u(Kur)i =−κ

N ∑j�=i

sin(θi−θj(t))

Parameterization:

H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1

2 Ru2i +(ωi−1+ui)AiS(φi)+

σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑

j�=isin(θ −θj−φ), C(φ)(θ ,θ−i) =

1N ∑

j�=icos(θ −θj−φ)

Approx. optimal control:

u(Ai,φi)i = argmin

ui

{H(Ai,φi)i (θ ,ui;θ−i(t))} =−Ai

RS(φi)(θ ,θ−i)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 40 / 69

Page 75: Maryland 2010

Learning Q-function approximation

Optimality equation minui

{c(θ ;θ−i(t))+ 12 Ru2

i +Duihi(θ , t)� �� �=: Hi(θ ,ui;θ−i(t))

} = η∗i

Optimal control law Kuramoto law

u∗i =− 1R

∂θ hi(θ , t) u(Kur)i =−κ

N ∑j�=i

sin(θi−θj(t))

Parameterization:

H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1

2 Ru2i +(ωi−1+ui)AiS(φi)+

σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑

j�=isin(θ −θj−φ), C(φ)(θ ,θ−i) =

1N ∑

j�=icos(θ −θj−φ)

Approx. optimal control:

u(Ai,φi)i = argmin

ui

{H(Ai,φi)i (θ ,ui;θ−i(t))} =−Ai

RS(φi)(θ ,θ−i)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 40 / 69

Page 76: Maryland 2010

Learning Q-function approximation

Optimality equation minui

{c(θ ;θ−i(t))+ 12 Ru2

i +Duihi(θ , t)� �� �=: Hi(θ ,ui;θ−i(t))

} = η∗i

Optimal control law Kuramoto law

u∗i =− 1R

∂θ hi(θ , t) u(Kur)i =−κ

N ∑j�=i

sin(θi−θj(t))

Parameterization:

H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1

2 Ru2i +(ωi−1+ui)AiS(φi)+

σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑

j�=isin(θ −θj−φ), C(φ)(θ ,θ−i) =

1N ∑

j�=icos(θ −θj−φ)

Approx. optimal control:

u(Ai,φi)i = argmin

ui

{H(Ai,φi)i (θ ,ui;θ−i(t))} =−Ai

RS(φi)(θ ,θ−i)

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 40 / 69

Page 77: Maryland 2010

1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Page 78: Maryland 2010

Learning Steepest descent algorithm

Bellman error:

Pointwise: L (Ai,φi)(θ , t) = minui

{H(Ai,φi)i }−η (A∗i ,φ∗i )

i

Simple gradient descent algorithm

e(Ai,φi) =2

∑k=1

|�L (Ai,φi), ϕk(θ)�|2

dAi

dt=−ε

de(Ai,φi)dAi

,dφi

dt=−ε

de(Ai,φi)dφi

(∗)

Theorem (Convergence)

Assume population is in synchrony. The ith oscillator updatesaccording to (∗). Then

Ai(t)→ A∗ =1

2σ2

The pointwise Bellman error L (Ai,0)(θ , t) = ε(R)cos2(θ − t)

where ε(R) =1

16Rσ4

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 42 / 69

Page 79: Maryland 2010

Learning Steepest descent algorithm

Phase transition

Suppose all oscillators use approx. optimal control law:

ui =−A∗

R1N ∑

j�=isin(θi−θj(t))

then the phase transition boundary is

Rc(γ) =

�1

2σ4 if γ = 01

4σ2γ tan−1�

2γσ2

�if γ > 0

0 50 100 150 200 250 3002

2.5

3

3.5

4

4.5

5

5.5

6

t

k = 0.01; R = 1000

Ai

A*

0 0.05 0.1 0.15 0.215

20

25

30

35

40

45

50

γ

R

PDELearning

Incoherence

Synchrony

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 43 / 69

Page 80: Maryland 2010

Thank you!

Website: http://www.mechse.illinois.edu/research/mehtapg

Huibing Yin Sean P. Meyn Uday V. Shanbhag

H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010

Page 81: Maryland 2010

Bibliography

Dimitri P. Bertsekas.Dynamic Programming and Optimal Control, volume 1.Athena Scientific, Belmont, Massachusetts, 1995.

Eric Brown, Jeff Moehlis, and Philip Holmes.On the phase reduction and response dynamics of neuraloscillator populations.Neural Computation, 16(4):673–715, 2004.

M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle.Generic bifurcations of pendula.Int. Series Num. Math., 104:111–122, 1992.

J. Guckenheimer.Isochrons and phaseless sets.J. Math. Biol., 1:259–273, 1975.

Minyi Huang, Peter E. Caines, and Roland P. Malhame.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 69 / 69

Page 82: Maryland 2010

Bibliography

Large-population cost-coupled LQG problems with nonuniformagents: Individual-mass behavior and decentralized ε-nashequilibria.IEEE transactions on automatic control, 52(9):1560–1571, 2007.

Y. Kuramoto.International Symposium on Mathematical Problems in TheoreticalPhysics, volume 39 of Lecture Notes in Physics.Springer-Verlag, 1975.

Andrzej Lasota and Michael C. Mackey.Chaos, Fractals and Noise.Springer, 1994.

P. Mehta and S. Meyn.Q-learning and Pontryagin’s Minimum Principle.To appear, 48th IEEE Conference on Decision and Control,December 16-18 2009.

Sean P. Meyn.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 69 / 69

Page 83: Maryland 2010

Bibliography

The policy iteration algorithm for average reward markov decisionprocesses with general state space.IEEE Transactions on Automatic Control, 42(12):1663–1680,December 1997.

S. H. Strogatz and R. E. Mirollo.Stability of incoherence in a population of coupled oscillators.Journal of Statistical Physics, 63:613–635, May 1991.

Steven H. Strogatz, Daniel M. Abrams, Bruno Eckhardt, andEdward Ott.Theoretical mechanics: Crowd synchrony on the millenniumbridge.Nature, 438:43–44, 2005.

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 69 / 69