61
Control Techniques for Complex Systems Department of Electrical & Computer Engineering University of Florida Sean P. Meyn Coordinated Science Laboratory and the Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, USA April 21, 2011 1 / 26

Control Techniques for Complex Systems

Embed Size (px)

DESCRIPTION

The systems & control research community has developed a range of tools for understanding and controlling complex systems. Some of these techniques are model-based: Using a simple model we obtain insight regarding the structure of effective policies for control. The talk will survey how this point of view can be applied to approach resource allocation problems, such as those that will arise in the next-generation energy grid. We also show how insight from this kind of analysis can be used to construct architectures for reinforcement learning algorithms used in a broad range of applications. Much of the talk is a survey from a recent book by the author with a similar title,Control Techniques for Complex Networks. Cambridge University Press, 2007. https://netfiles.uiuc.edu/meyn/www/spm_files/CTCN/CTCN.html

Citation preview

Page 1: Control Techniques for Complex Systems

Control Techniques for Complex SystemsDepartment of Electrical & Computer Engineering

University of Florida

Sean P. Meyn

Coordinated Science Laboratoryand the Department of Electrical and Computer Engineering

University of Illinois at Urbana-Champaign, USA

April 21, 2011

1 / 26

Page 2: Control Techniques for Complex Systems

Outline

Control TechniquesFOR

Complex Networks

Sean Meyn

Markov Chainsand

Stochastic Stability

S. P. Meyn and R. L. Tweedie

π(f)<

∆V (x) ≤ −f(x) + bIC(x)

‖Pn(x, · )− π‖f → 0

sup

CEx [S

τC(f)]<

∞1 Control Techniques

2 Complex Networks

3 Architectures for Adaptation & Learning

4 Next Steps

2 / 26

Page 3: Control Techniques for Complex Systems

Control Techniques

System model

???

d

dtα = µ σ −Cα+ . . .

. . .d

dtq = 1

2µ I−1 (C −d

dtθ = q

Control Techniques?

3 / 26

Page 4: Control Techniques for Complex Systems

Control Techniques

Typical steps to control design

. Obtain simple model that capturesessential structure

– An equilibrium model if the goal is regulation

System model

???

d

dtα = µ σ −Cα+ . . .

. . .d

dtq = 1

2µ I−1 (C −d

dtθ = q

. Obtain feedback design, using dynamic programming, LQG, loop shaping, ...

Design for performance and reliability. Test via simulations and experiments, and refine design

If these steps fail, we may have to re-engineer thesystem (e.g., introduce new sensors), and start over.

This point of view is unique to control

4 / 26

Page 5: Control Techniques for Complex Systems

Control Techniques

Typical steps to control design

. Obtain simple model that capturesessential structure

– An equilibrium model if the goal is regulation

System model

???

d

dtα = µ σ −Cα+ . . .

. . .d

dtq = 1

2µ I−1 (C −d

dtθ = q

. Obtain feedback design, using dynamic programming, LQG, loop shaping, ...

Design for performance and reliability. Test via simulations and experiments, and refine design

If these steps fail, we may have to re-engineer thesystem (e.g., introduce new sensors), and start over.

This point of view is unique to control

4 / 26

Page 6: Control Techniques for Complex Systems

Control Techniques

Typical steps to control design

. Obtain simple model that capturesessential structure

– An equilibrium model if the goal is regulation

System model

???

d

dtα = µ σ −Cα+ . . .

. . .d

dtq = 1

2µ I−1 (C −d

dtθ = q

. Obtain feedback design, using dynamic programming, LQG, loop shaping, ...

Design for performance and reliability. Test via simulations and experiments, and refine design

If these steps fail, we may have to re-engineer thesystem (e.g., introduce new sensors), and start over.

This point of view is unique to control

4 / 26

Page 7: Control Techniques for Complex Systems

Control Techniques

Typical steps to control design

. Obtain simple model that capturesessential structure

– An equilibrium model if the goal is regulation

System model

???

d

dtα = µ σ −Cα+ . . .

. . .d

dtq = 1

2µ I−1 (C −d

dtθ = q

. Obtain feedback design, using dynamic programming, LQG, loop shaping, ...

Design for performance and reliability. Test via simulations and experiments, and refine design

If these steps fail, we may have to re-engineer thesystem (e.g., introduce new sensors), and start over.

This point of view is unique to control

4 / 26

Page 8: Control Techniques for Complex Systems

Control Techniques

Typical steps to scheduling

A simplified model of a semiconductormanufacturing facility

Similar demand-driven models can be usedto model allocation of locational reservesin a power grid

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

Obtain simple model –Frequently based on simple statistics to obtain a Markov model

Obtain feedback design based on heuristics, or dynamic programming

Performance evaluation via computation(e.g., Neuts’ matrix-geometric methods)

5 / 26

Page 9: Control Techniques for Complex Systems

Control Techniques

Typical steps to scheduling

A simplified model of a semiconductormanufacturing facility

Similar demand-driven models can be usedto model allocation of locational reservesin a power grid

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

Obtain simple model –Frequently based on simple statistics to obtain a Markov model

Obtain feedback design based on heuristics, or dynamic programming

Performance evaluation via computation(e.g., Neuts’ matrix-geometric methods)

5 / 26

Page 10: Control Techniques for Complex Systems

Control Techniques

Typical steps to scheduling

A simplified model of a semiconductormanufacturing facility.

Similar demand-driven models can be usedto model allocation of locational reservesin a power grid

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

Difficulty : A Markov model is not simple enough!

Obtain simple model –Frequently based on exponential statistics to obtain a Markov model

Obtain feedback design based on heuristics, or dynamic programming

Performance evaluation via computation (e.g., Neut’s matrix-geometric methods)

With the 16 buffers truncated to 0 ≤ x ≤ 10,

policy synthesis reduces to a linear program of dimension 1116 !

6 / 26

Page 11: Control Techniques for Complex Systems

Control Techniques

Typical steps to scheduling

A simplified model of a semiconductormanufacturing facility.

Similar demand-driven models can be usedto model allocation of locational reservesin a power grid

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

Difficulty : A Markov model is not simple enough!

Obtain simple model –Frequently based on exponential statistics to obtain a Markov model

Obtain feedback design based on heuristics, or dynamic programming

Performance evaluation via computation (e.g., Neut’s matrix-geometric methods)

With the 16 buffers truncated to 0 ≤ x ≤ 10,policy synthesis reduces to a linear program of dimension 1116 !

6 / 26

Page 12: Control Techniques for Complex Systems

Control Techniques

Control-theoretic approach to scheduling ddtq = Bu+ α

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

q: Queue length evolves on R16+ .

u: Scheduling/routing decisions —Convex relaxation

α: Mean exogenous arrivals of work

B: Captures network topology

Control-theoretic approach to scheduling:

Dimension reduced from a linear program of dimension 1116...to an HJB equation of dimension 16

Does this solve the problem?

7 / 26

Page 13: Control Techniques for Complex Systems

Control Techniques

Control-theoretic approach to scheduling ddtq = Bu+ α

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

q: Queue length evolves on R16+ .

u: Scheduling/routing decisions —Convex relaxation

α: Mean exogenous arrivals of work

B: Captures network topology

Control-theoretic approach to scheduling:

Dimension reduced from a linear program of dimension 1116...to an HJB equation of dimension 16

Does this solve the problem?

7 / 26

Page 14: Control Techniques for Complex Systems

Control Techniques

Control-theoretic approach to scheduling ddtq = Bu+ α

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

q: Queue length evolves on R16+ .

u: Scheduling/routing decisions —Convex relaxation

α: Mean exogenous arrivals of work

B: Captures network topology

Control-theoretic approach to scheduling:

Dimension reduced from a linear program of dimension 1116...to an HJB equation of dimension 16

Does this solve the problem?

7 / 26

Page 15: Control Techniques for Complex Systems

Complex Networks

UncongestedCongestedHighly Congested

Complex Networks

First, a review of some control theory...

8 / 26

Page 16: Control Techniques for Complex Systems

Complex Networks

UncongestedCongestedHighly Congested

Complex NetworksFirst, a review of some control theory...

8 / 26

Page 17: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDeterministic model x = f(x, u)

Controlled generator

Duh (x) = ddth(x(t))

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x)

Minimal total cost:

J∗(x) = infU

∫ ∞0

c(x(t), u(t)) dt , x(0) = x

HJB Equation:minu

{c(x, u) +DuJ∗ (x)

}= 0

9 / 26

Page 18: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDeterministic model x = f(x, u)

Controlled generator

Duh (x) = ddth(x(t))

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x)

Minimal total cost:

J∗(x) = infU

∫ ∞0

c(x(t), u(t)) dt , x(0) = x

HJB Equation:minu

{c(x, u) +DuJ∗ (x)

}= 0

9 / 26

Page 19: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDeterministic model x = f(x, u)

Controlled generator

Duh (x) = ddth(x(t))

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x)

Minimal total cost:

J∗(x) = infU

∫ ∞0

c(x(t), u(t)) dt , x(0) = x

HJB Equation:minu

{c(x, u) +DuJ∗ (x)

}= 0

9 / 26

Page 20: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDeterministic model x = f(x, u)

Controlled generator

Duh (x) = ddth(x(t))

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x)

Minimal total cost:

J∗(x) = infU

∫ ∞0

c(x(t), u(t)) dt , x(0) = x

HJB Equation:minu

{c(x, u) +DuJ∗ (x)

}= 0

9 / 26

Page 21: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDiffusion model dX = f(X,U)dt+ σ(X)dN

Controlled generator

Duh (x) =d

dtE[h(X(t))]

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x) + 12trace

(σ(x)T∇2h (x)σ(x)

)

Minimal average cost:

η∗ = infU

limT→∞

1

T

∫ T

0c(X(t), U(t)) dt

ACOE (Average Cost Optimality Equation):

minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗ is the relative value function

10 / 26

Page 22: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDiffusion model dX = f(X,U)dt+ σ(X)dN

Controlled generator

Duh (x) =d

dtE[h(X(t))]

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x) + 12trace

(σ(x)T∇2h (x)σ(x)

)Minimal average cost:

η∗ = infU

limT→∞

1

T

∫ T

0c(X(t), U(t)) dt

ACOE (Average Cost Optimality Equation):

minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗ is the relative value function

10 / 26

Page 23: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsDiffusion model dX = f(X,U)dt+ σ(X)dN

Controlled generator

Duh (x) =d

dtE[h(X(t))]

∣∣∣ t=0

x(0)=x

u(0)=u

= f(x, u) · ∇h (x) + 12trace

(σ(x)T∇2h (x)σ(x)

)Minimal average cost:

η∗ = infU

limT→∞

1

T

∫ T

0c(X(t), U(t)) dt

ACOE (Average Cost Optimality Equation):

minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗ is the relative value function

10 / 26

Page 24: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsMDP model X(t+ 1)−X(t) = f(X(t), U(t), N(t+ 1))

Controlled generator

Duh (x) = E[h(X(1))− h(X(0))]

= E[h(x+ f(x, u,N))]− h(x)

Minimal average cost:

η∗ = infU

limT→∞

1

T

T−1∑0

c(X(t), U(t))

ACOE (Average Cost Optimality Equation):

minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗ is the relative value function

11 / 26

Page 25: Control Techniques for Complex Systems

Complex Networks

Dynamic Programming EquationsMDP model X(t+ 1)−X(t) = f(X(t), U(t), N(t+ 1))

Controlled generator

Duh (x) = E[h(X(1))− h(X(0))]

= E[h(x+ f(x, u,N))]− h(x)

Minimal average cost:

η∗ = infU

limT→∞

1

T

T−1∑0

c(X(t), U(t))

ACOE (Average Cost Optimality Equation):

minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗ is the relative value function

11 / 26

Page 26: Control Techniques for Complex Systems

Complex Networks

Approximate Dynamic ProgrammingODE model from the MDP model, X(t+ 1)−X(t) = f(X(t), U(t), N(t+ 1))

Mean drift: f(x, u) = E[X(t+ 1)−X(t) | X(t) = x, U(t) = u]

Fluid Model: x(t) = f(x(t), u(t))

First-order Taylor series approximation:

Duh (x) = E[h(x+ f(x, u,N))]− h(x)

≈ f(x, u) · ∇h (x)

A second-order Taylor series expansionleads to a Diffusion Model.

12 / 26

Page 27: Control Techniques for Complex Systems

Complex Networks

Approximate Dynamic ProgrammingODE model from the MDP model, X(t+ 1)−X(t) = f(X(t), U(t), N(t+ 1))

Mean drift: f(x, u) = E[X(t+ 1)−X(t) | X(t) = x, U(t) = u]

Fluid Model: x(t) = f(x(t), u(t))

First-order Taylor series approximation:

Duh (x) = E[h(x+ f(x, u,N))]− h(x)

≈ f(x, u) · ∇h (x)

A second-order Taylor series expansionleads to a Diffusion Model.

12 / 26

Page 28: Control Techniques for Complex Systems

Complex Networks

Approximate Dynamic ProgrammingODE model from the MDP model, X(t+ 1)−X(t) = f(X(t), U(t), N(t+ 1))

Mean drift: f(x, u) = E[X(t+ 1)−X(t) | X(t) = x, U(t) = u]

Fluid Model: x(t) = f(x(t), u(t))

First-order Taylor series approximation:

Duh (x) = E[h(x+ f(x, u,N))]− h(x)

≈ f(x, u) · ∇h (x)

A second-order Taylor series expansionleads to a Diffusion Model.

12 / 26

Page 29: Control Techniques for Complex Systems

Complex Networks

Approximate Dynamic ProgrammingODE model from the MDP model, X(t+ 1)−X(t) = f(X(t), U(t), N(t+ 1))

Mean drift: f(x, u) = E[X(t+ 1)−X(t) | X(t) = x, U(t) = u]

Fluid Model: x(t) = f(x(t), u(t))

First-order Taylor series approximation:

Duh (x) = E[h(x+ f(x, u,N))]− h(x)

≈ f(x, u) · ∇h (x)

A second-order Taylor series expansionleads to a Diffusion Model.

12 / 26

Page 30: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksConclusions as of April 21, 2011

Stochastic Model: Q(t+ 1)−Q(t) = B(t+ 1)U(t) +A(t+ 1)

Fluid Model:d

dtq(t) = Bu(t) + α Cost c(x, u) = |x|

Relative value function h∗

Total cost value function J∗

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

q: Queue length evolves on R16+ .

u: Scheduling/routing decisions —Convex relaxation

α: Mean exogenous arrivals of work

B: Captures network topology

13 / 26

Page 31: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksConclusions as of April 21, 2011

Stochastic Model: Q(t+ 1)−Q(t) = B(t+ 1)U(t) +A(t+ 1)

Fluid Model:d

dtq(t) = Bu(t) + α Cost c(x, u) = |x|

Relative value function h∗

Total cost value function J∗

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

q: Queue length evolves on R16+ .

u: Scheduling/routing decisions —Convex relaxation

α: Mean exogenous arrivals of work

B: Captures network topology

13 / 26

Page 32: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksConclusions as of April 21, 2011

Stochastic Model: Q(t+ 1)−Q(t) = B(t+ 1)U(t) +A(t+ 1)

Fluid Model:d

dtq(t) = Bu(t) + α Cost c(x, u) = |x|

Relative value function h∗

Total cost value function J∗

Key conclusions – analytical

Stability of q implies stochastic stability of Q Dai, Dai & M. 1995

h∗(x) ≈ J∗(x) for large |x| M. 1996–2011

In many cases, the translation of the optimal policy for q isapproximately optimal, with logarithmic regret M. 2005 & 2009

14 / 26

Page 33: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksConclusions as of April 21, 2011

Stochastic Model: Q(t+ 1)−Q(t) = B(t+ 1)U(t) +A(t+ 1)

Fluid Model:d

dtq(t) = Bu(t) + α Cost c(x, u) = |x|

Relative value function h∗

Total cost value function J∗

Key conclusions – engineering

Stability of q implies stochastic stability of Q

Simple decentralized policies based on q Tassiulas, 1995 –

Workload relaxation for model reductionM. 2003 –, following “heavy traffic” theory: Laws, Kelly, Harrison, Dai, ...

Intuition regarding structure of good policies

15 / 26

Page 34: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksWorkload Relaxations

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

w2

-20 0 50

-20

0

50

R∗RSTO

w1

Workload process: W evolves on R2

Relaxation: Only lower bounds on rates are preservedEffective cost: c(w) is the minimum of c(x), over all x consistent w.

Optimal policy for fluid relaxation: Non-idling on region R∗

Optimal policy for stochastic relaxation: Introduce hedging

16 / 26

Page 35: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksWorkload Relaxations

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

w2

-20 0 50

-20

0

50

R∗RSTO

w1

Workload process: W evolves on R2

Relaxation: Only lower bounds on rates are preservedEffective cost: c(w) is the minimum of c(x), over all x consistent w.

Optimal policy for fluid relaxation: Non-idling on region R∗

Optimal policy for stochastic relaxation: Introduce hedging

16 / 26

Page 36: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksPolicy translation

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

w2

-20 0 50

-20

0

50

R∗RSTO

w1

Complete Policy Synthesis

1. Optimal control of relaxation

2. Translation to physical system:2a. Achieve the approximation c(Q(t)) ≈ c(W (t))2b. Address boundary constraints ignored in fluid approximations

achieved using safety stocks.

17 / 26

Page 37: Control Techniques for Complex Systems

Complex Networks

ADP for Stochastic NetworksPolicy translation

demand 1

demand 2

Inventory model: Controlled work-release, controlled routing, uncertain demand

w2

-20 0 50

-20

0

50

R∗RSTO

w1

Complete Policy Synthesis

1. Optimal control of relaxation

2. Translation to physical system:2a. Achieve the approximation c(Q(t)) ≈ c(W (t))2b. Address boundary constraints ignored in fluid approximations

achieved using safety stocks.

17 / 26

Page 38: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Singular PerturbationsWorkload Relaxations

Fluid model

0.01

0.02

0.03

0.04

0.05

0.06

−1 0 1−1

0

1

Optimal policy

50 100 150 200 250 30011

11.2

11.4

11.6

11.8

12

12.2

12.4

12.6

Iteration n

AverageCost

Standard VIA

Initialized with quadratic

Initialized with optimal �uid value function

Di�usion model

Mean-Field Games

x 1040 1 2 3 4 5 6 7 8 9 10

0

1

-1

(indi

vidu

al st

ate)

(ens

embl

e st

ate)

Agent 5 barely controllable

Agent 4

w2

-20 0 50

-20

0

50

R∗RSTO

w1

Station 4 Station 3

Station 5

Stat

ion

1

Stat

ion

2

1q

2q

1d

2d

3q

5q

6q

7q

8q

4q

9q

12q

11q

10q

13q

15q

14q

16q

µ 10a

µ 10b

Adaptation & Learning

18 / 26

Page 39: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: Q-learning

ACOE Equation: minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗: Relative value function

η∗: Minimal average cost

“Q-function”: Q∗(x, u) = c(x, u) +Duh∗ (x)Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.edu

Q-Learning: Given parameterized family {Qθ : θ ∈ Rd}.Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009

Compute θ∗ based on observations — without using a system model.

19 / 26

Page 40: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: Q-learning

ACOE Equation: minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗: Relative value function

η∗: Minimal average cost

“Q-function”: Q∗(x, u) = c(x, u) +Duh∗ (x)Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.edu

Q-Learning: Given parameterized family {Qθ : θ ∈ Rd}.Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009

Compute θ∗ based on observations — without using a system model.

19 / 26

Page 41: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: Q-learning

ACOE Equation: minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗: Relative value function

η∗: Minimal average cost

“Q-function”: Q∗(x, u) = c(x, u) +Duh∗ (x)Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.edu

Q-Learning: Given parameterized family {Qθ : θ ∈ Rd}.Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009

Compute θ∗ based on observations — without using a system model.

19 / 26

Page 42: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: Q-learning

ACOE Equation: minu

{c(x, u) +Duh∗ (x)

}= η∗

h∗: Relative value function

η∗: Minimal average cost

“Q-function”: Q∗(x, u) = c(x, u) +Duh∗ (x)Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.edu

Q-Learning: Given parameterized family {Qθ : θ ∈ Rd}.Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009

Compute θ∗ based on observations — without using a system model.

19 / 26

Page 43: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: TD-learning

Value functions: For a given policy U(t) = φ(X(t)),

η = limT→∞

1

T

∫ T

0c(X(t), U(t)) dt

Poisson’s equation: h is again called a relative value function,{c(x, u) +Duh (x)

}∣∣∣u=φ(x)

= η

TD-Learning: Given parameterized family {hθ : θ ∈ Rd}.

min{‖h− hθ‖ : θ ∈ Rd} Sutton 1988, Tsitsiklis & Van Roy, 1997

Compute θ∗ based on observations — without using a system model.

20 / 26

Page 44: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: TD-learning

Value functions: For a given policy U(t) = φ(X(t)),

η = limT→∞

1

T

∫ T

0c(X(t), U(t)) dt

Poisson’s equation: h is again called a relative value function,{c(x, u) +Duh (x)

}∣∣∣u=φ(x)

= η

TD-Learning: Given parameterized family {hθ : θ ∈ Rd}.

min{‖h− hθ‖ : θ ∈ Rd} Sutton 1988, Tsitsiklis & Van Roy, 1997

Compute θ∗ based on observations — without using a system model.

20 / 26

Page 45: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: TD-learning

Value functions: For a given policy U(t) = φ(X(t)),

η = limT→∞

1

T

∫ T

0c(X(t), U(t)) dt

Poisson’s equation: h is again called a relative value function,{c(x, u) +Duh (x)

}∣∣∣u=φ(x)

= η

TD-Learning: Given parameterized family {hθ : θ ∈ Rd}.

min{‖h− hθ‖ : θ ∈ Rd} Sutton 1988, Tsitsiklis & Van Roy, 1997

Compute θ∗ based on observations — without using a system model.

20 / 26

Page 46: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: How do we choose a basis?

Basis selection: hθ(x) =∑θiψi(x)

ψ1: Linearize

ψ2: Fluid model with relaxation

ψ3: Diffusion model with relaxation

ψ4: Mean-field game

Examples: Decentralized control, nonlinear control, processor speed-scaling

0.01

0.02

0.03

0.04

0.05

0.06

−1 0 1−1

0

1Optimal policy

x 104

0

1

-1

Agent 4

0 5 10

Mean-Field Game Linearization Fluid Model

0 5

10

0

5

15

J∗hApproximate relative value function

Fluid value function

h∗Relative value function

21 / 26

Page 47: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: How do we choose a basis?

Basis selection: hθ(x) =∑θiψi(x)

ψ1: Linearize

ψ2: Fluid model with relaxation

ψ3: Diffusion model with relaxation

ψ4: Mean-field game

Examples: Decentralized control, nonlinear control, processor speed-scaling

0.01

0.02

0.03

0.04

0.05

0.06

−1 0 1−1

0

1Optimal policy

x 104

0

1

-1

Agent 4

0 5 10

Mean-Field Game Linearization Fluid Model

0 5

10

0

5

15

J∗hApproximate relative value function

Fluid value function

h∗Relative value function

21 / 26

Page 48: Control Techniques for Complex Systems

Architectures for Adaptation & Learning

Reinforcement LearningApproximating a value function: How do we choose a basis?

Basis selection: hθ(x) =∑θiψi(x)

ψ1: Linearize

ψ2: Fluid model with relaxation

ψ3: Diffusion model with relaxation

ψ4: Mean-field game

Examples: Decentralized control, nonlinear control, processor speed-scaling

0.01

0.02

0.03

0.04

0.05

0.06

−1 0 1−1

0

1Optimal policy

x 104

0

1

-1

Agent 4

0 5 10

Mean-Field Game Linearization Fluid Model

0 5

10

0

5

15

J∗hApproximate relative value function

Fluid value function

h∗Relative value function

21 / 26

Page 49: Control Techniques for Complex Systems

Next Steps

0

50

100

4am 9am 2pm 7pm0

10,000

20,000Stratford

Otahuhu

http://www.electricityinfo.co.nz/

March 25:

March 26:

4am 9am 2pm 7pm

Nodal Power Prices in NZ: $/MWh

Next Steps

22 / 26

Page 50: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

Entropic Grid : Advances in systems theory...

. Complex systems: Model reduction specialized to tomorrow’s gridShort term operations and long-term planning

. Resource allocation: Controlling supply, storage, and demandResource allocation with shared constraints.

. Statistics and learning: For planning and forecastingBoth rare and common events

. Economics for an Entropic Grid: Incorporate dynamics and uncertaintyin a strategic setting.

How to create policies to protect participants on both sides of themarket, while creating incentives for R&D on renewable energy?

23 / 26

Page 51: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

Entropic Grid : Advances in systems theory...

. Complex systems: Model reduction specialized to tomorrow’s gridShort term operations and long-term planning

. Resource allocation: Controlling supply, storage, and demandResource allocation with shared constraints.

. Statistics and learning: For planning and forecastingBoth rare and common events

. Economics for an Entropic Grid: Incorporate dynamics and uncertaintyin a strategic setting.

How to create policies to protect participants on both sides of themarket, while creating incentives for R&D on renewable energy?

23 / 26

Page 52: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

Entropic Grid : Advances in systems theory...

. Complex systems: Model reduction specialized to tomorrow’s gridShort term operations and long-term planning

. Resource allocation: Controlling supply, storage, and demandResource allocation with shared constraints.

. Statistics and learning: For planning and forecastingBoth rare and common events

. Economics for an Entropic Grid: Incorporate dynamics and uncertaintyin a strategic setting.

How to create policies to protect participants on both sides of themarket, while creating incentives for R&D on renewable energy?

23 / 26

Page 53: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

How to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?

Our community must consider long-term planning and policy, along withtraditional systems operations

Planning and Policy, includes Markets & Competition

Evolution? Too slow!What we need is Intelligent Design

24 / 26

Page 54: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

How to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?

Our community must consider long-term planning and policy, along withtraditional systems operations

Planning and Policy, includes Markets & Competition

Evolution? Too slow!What we need is Intelligent Design

24 / 26

Page 55: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

How to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?

Our community must consider long-term planning and policy, along withtraditional systems operations

Planning and Policy, includes Markets & Competition

Evolution?

Too slow!What we need is Intelligent Design

24 / 26

Page 56: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

How to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?

Our community must consider long-term planning and policy, along withtraditional systems operations

Planning and Policy, includes Markets & Competition

Evolution? Too slow!

What we need is Intelligent Design

24 / 26

Page 57: Control Techniques for Complex Systems

Next Steps

Complex SystemsMainly energy

How to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?

Our community must consider long-term planning and policy, along withtraditional systems operations

Planning and Policy, includes Markets & Competition

Evolution? Too slow!What we need is Intelligent Design

24 / 26

Page 58: Control Techniques for Complex Systems

Next Steps

Conclusions

The control community has created many techniques for understandingcomplex systems, and a valuable philosophy for thinking about controldesign

In particular, stylized models can have great value:

. Insight in formulation of control policies

. Analysis of closed loop behavior, such as stability via ODE methods

. Architectures for learning algorithms

. Building bridges between OR, CS, and control disciplinesThe ideas surveyed here arose from partnerships with researchers in

mathematics, economics, computer science, and operations research.

Besides the many technical open questions, my hope is to extend theapplication of these ideas to long-range planning, especially in applicationsto sustainable energy.

25 / 26

Page 59: Control Techniques for Complex Systems

Next Steps

Conclusions

The control community has created many techniques for understandingcomplex systems, and a valuable philosophy for thinking about controldesign

In particular, stylized models can have great value:

. Insight in formulation of control policies

. Analysis of closed loop behavior, such as stability via ODE methods

. Architectures for learning algorithms

. Building bridges between OR, CS, and control disciplinesThe ideas surveyed here arose from partnerships with researchers in

mathematics, economics, computer science, and operations research.

Besides the many technical open questions, my hope is to extend theapplication of these ideas to long-range planning, especially in applicationsto sustainable energy.

25 / 26

Page 60: Control Techniques for Complex Systems

Next Steps

Conclusions

The control community has created many techniques for understandingcomplex systems, and a valuable philosophy for thinking about controldesign

In particular, stylized models can have great value:

. Insight in formulation of control policies

. Analysis of closed loop behavior, such as stability via ODE methods

. Architectures for learning algorithms

. Building bridges between OR, CS, and control disciplinesThe ideas surveyed here arose from partnerships with researchers in

mathematics, economics, computer science, and operations research.

Besides the many technical open questions, my hope is to extend theapplication of these ideas to long-range planning, especially in applicationsto sustainable energy.

25 / 26

Page 61: Control Techniques for Complex Systems

Next Steps

References

S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press,Cambridge, 2007.

S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Second edition,Cambridge University Press – Cambridge Mathematical Library, 2009.

S. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. SIAM J.Control Optim., 47(6):3259–3294, 2009.

V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochasticapproximation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000.

S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workloadrelaxations. SIAM J. Control Optim., 42(1):178–217, 2003.

P. G. Mehta and S. P. Meyn. Q-learning and Pontryagin’s minimum principle. In Proc. ofthe 48th IEEE Conf. on Dec. and Control, pp. 3598–3605, Dec. 2009.

W. Chen, D. Huang, A. A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, andA. Wierman. Approximate dynamic programming using fluid and diffusion approximationswith applications to power management. In Proc. of the 48th IEEE Conf. on Dec. andControl, pp. 3575–3580, Dec. 2009.

26 / 26