Optimal decentralized stochastic control: A common ...amahaj1/talks/gerad-2012.pdf · Optimal decentralized stochastic control: A common information approach Aditya Mahajan McGillUniversity

Optimal decentralized stochastic control:A common information approach

Aditya MahajanMcGill University

Joint work: Ashutosh Nayyar (UIUC) and Demosthenis Teneketzis (Univ of Michigan)

GERAD Seminar, April 23, 2012

Common theme:

multi-stage multi-agent decision

making under uncertainty

Interconnected Power Systems

Topics

Aditya Mahajan Optimal decentralized stochastic control


Topics

Aditya Mahajan Optimal decentralized stochastic control1

Region 1 Region 2


Topics


Region 1 Region 2

Controller 1 Controller 2


Topics


Region 1 Region 2


Interconnect


Topics


Region 1 Region 2


Interconnect

Communication


Topics


Region 1 Region 2


Interconnect

Communication

Challenges

How to coordinate?

When, what, and how to communicate?

Sensor and Surveillance Networks

Topics



Topics


Limited resources


Topics


Limited resources Noisy observations


Topics


Fusion Center


Communication


Topics


Fusion Center


Communication

Challenges

Real-time communication

Scheduling measurements and communication

Detect node failures

Networked Control Systems

Topics



Topics



Topics



Topics


Challenges

Control and communication over networks

(internet ⇒ delay, wireless ⟹ losses)


Topics


Challenges



Distributed estimation


Topics


Challenges



Distributed estimation

Distribued learning

Salient features indecentralized decision making

Topics


Multiple decision makersDecisions made by multiple controllers in a stochastic environment


Topics



Coordination issuesAll controllers must coordinate to achieve a system-wide objective


Topics




Communication issuesControllers can communicate either directly or indirectly


Topics




Communication issuesControllers can communicate either directly or indirectly

RobustnessSystem model may not be completely known

Outline of this talk

Topics


Decentralized stochastic controlClassification and examples

Solution approachesA common information based approach

Delayed sharing information structureStructure of optimal strategies and dynamic programming decomposition

Concluding remarksGeneralizations and Connection to other results


Topics






Classification of decentralized systems

Topics


Controllers/agents are coupled in two ways:1. Coupling due to cost/utility

2. Coupling due to dynamics


Topics




Decentralized systems may be classified according to:


Topics




Decentralized systems may be classified according to:1. Objective

Team vs Games


Topics





Team vs Games

2. DynamicsStatic vs Dynamic


Topics





Team vs Games


This talk will focus on Dynamic Teams


Topics





Team vs Games


This talk will focus on Dynamic TeamsStudied in economics and systems and control since the mid 50s.

Unlike games, agents have no incentive to cheat.

Instead of equilibrium, we seek globally optimal strategies.

Why is decentralized

stochastic control difficult?

An example of centralized static optimization

Topics


� = [ • • • • ]


Topics


� = [ • • • • ]= 1 1 2 2


Topics


� = [ • • • • ]= 1 1 2 2

= ∈ { , , }


Topics


� = [ • • • • ]= 1 1 2 2

= ∈ { , , }

, = • • • •= • • • •= • • • •= ��[ , ]


Topics


� = [ • • • • ]= 1 1 2 2

= ∈ { , , }

, = • • • •= • • • •= • • • •= ��[ , ]Brute force search min� , | | = | || | = 9 possibilities.


Topics


� = [ • • • • ]= 1 1 2 2

= ∈ { , , }


Systematic search + = 6 possibilities= =min �[ , | = ] min �[ , | = ]


Topics


� = [ • • • • ]= 1 1 2 2

= ∈ { , , }


(functional opt.)

Systematic search + = 6 possibilities (parametric opt.)= =min �[ , | = ] min �[ , | = ]

An example of decentralizedstatic optimization

Topics


� = [ • • • • ]


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2= ∈ { , , } = ℎ ∈ { , }


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2= ∈ { , , } = ℎ ∈ { , }

, , = • • • • • • • •= • • • • • • • •= • • • • • • • •=, ℎ = ��,ℎ[ , , ]


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2= ∈ { , , } = ℎ ∈ { , }

, , = • • • • • • • •= • • • • • • • •= • • • • • • • •=, ℎ = ��,ℎ[ , , ]

Brute force search min�,ℎ , ℎ , | | = | || |, |ℎ| = | || |,9 × = 6 possibilities.


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2= ∈ { , , } = ℎ ∈ { , }

, , = • • • • • • • •= • • • • • • • •= • • • • • • • •=, ℎ = ��,ℎ[ , , ]

Brute force search min�,ℎ , | | = | || |, |ℎ| = | || |,9 × = 6 possibilities.

For one controller/agent to choose an optimal action, it must

second guess the other controller’s/agent’s policy


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2= ∈ { , , } = ℎ ∈ { , }

, , = • • • • • • • •= • • • • • • • •= • • • • • • • •=, ℎ = ��,ℎ[ , , ]

Orthogonal search1. Suppose ℎ is fixed: min�ℎ[ , , | = ], = , , .

2. Suppose is fixed: min��[ , , | = ], = , .


Topics


� = [ • • • • ]= 1 1 2 2= 2 1 1 2= ∈ { , , } = ℎ ∈ { , }

, , = • • • • • • • •= • • • • • • • •= • • • • • • • •=, ℎ = ��,ℎ[ , , ]

Orthogonal search yields person-by-person opt strategy

1. Suppose ℎ is fixed: min�ℎ[ , , | = ], = , , .

2. Suppose is fixed: min��[ , , | = ], = , .

To find globally optimal strategies,

in general, we cannot do

better than brute force search

An example of centralizedmulti-stage optimization

Topics



Topics


==


Topics


= = ∈ { , }=


Topics


= = ∈ { , }== ⟹ = 1 1 2 2= ⟹ = 1 1 2 2= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1


Topics


= = ∈ { , }== ⟹ = 1 1 2 2 = , , ∈ { , }= ⟹ = 1 1 2 2= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1


Topics


= = ∈ { , }== ⟹ = 1 1 2 2 = , , ∈ { , }= ⟹ = 1 1 2 2= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1 , + ,, = �� ,� [ , + , ]


Topics


= = ∈ { , }= = { }= ⟹ = 1 1 2 2 = , , ∈ { , }= ⟹ = 1 1 2 2 = { , , }= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1 , + ,, = �� ,� [ , + , ]


Topics


= = ∈ { , }= = { }= ⟹ = 1 1 2 2 = , , ∈ { , }= ⟹ = 1 1 2 2 = { , , }= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1 , + ,, = �� ,� [ , + , ]Critical Assumption: Centralized information ⊆

Solution approach for centralizedmulti-stage optimization

Topics


Brute force search min� ,� , .| | = | || |, | | = | || |×| |×|� |. × = possiblities.


Topics


Brute force search min� ,� , .| | = | || |, | | = | || |×| |×|� |. × = possiblities.

Dynamic programming decomposition= min �[ , | , ]= min �[ , + | , ]


Topics


Brute force search min� ,� , . (functional opt.)| | = | || |, | | = | || |×| |×|� |. × = possiblities.

Dynamic programming decomposition (parametric opt.)= min �[ , | , ]= min �[ , + | , ]


Topics



Dynamic programming decomposition (parametric opt.)= min �[ , | , ]= min �[ , + | , ]Step 1 works because ℙ | does not depend on .

Step 2 works because ℙ | , does not depend on .


Topics



Dynamic programming decomposition (parametric opt.)= min �[ , | , ]= min �[ , + | , ]Step 1 works because ℙ | does not depend on .

Step 2 works because ℙ | , does not depend on .

Both steps work because ⊆

An example of decentralizedmulti-stage optimization

Topics


= = ∈ { , }= = { }= ⟹ = 1 1 2 2 = ∈ { , }= ⟹ = 1 1 2 2 = { }= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1 , + ,, = �� ,� [ , + , ]

An example of decentralizedmulti-stage optimization

Topics


= = ∈ { , }= = { }= ⟹ = 1 1 2 2 = ∈ { , }= ⟹ = 1 1 2 2 = { }= ⟹ = 1 2 2 1= ⟹ = 1 2 2 1 , + ,, = �� ,� [ , + , ]Critical Assumption: Decentralized information ⊈

Can we do better than brute force search?

Usual Dynamic programming does not work?

Topics


≟ min�� [ , | , ]≟ min�� [ , + | , ]

Usual Dynamic programming does not work?

Topics


≟ min�� [ , | , ]≟ min�� [ , + | , ]A sequential decomposition is possible (Witsenhausen, 1973)

Define � = ℙ | : − .� = min�� [ , + + � + | � ]But, the worst case complexity remains the same.

Can we obtain a systematic

approach to find optimal

strategies that does better

than brute force search?


Topics






The intrinsic model forcontrolled dynamical systems

Topics


Ω,ℱ, �

Dynamical Model


Topics


Ω,ℱ, �Dynamical

system

Controller

Dynamical Model


Topics



system

Controller

Dynamical Model


Topics



system

Controller

Dynamical Model


Topics



system

Controller

Ω,ℱ, �

Dynamical Model Intrinsic Model


Topics



system

Controller

Ω,ℱ, � −

+ +Dynamical Model Intrinsic Model


Topics



system

Controller

Ω,ℱ, � −

+ +all obs data

: −



Topics



system

Controller

Ω,ℱ, � −

+ +

: −


Information state and a general solutionapproach for centralized stochastic systems

Topics


In a centralized system, i.e., ⊆ + , a

function � = � is an information

state if it satisfies:

1. The controller Markov property��[� + | , ] = �[� + | � , ]2. The expected cost property��[ | , ] = �[ | � , ]


Topics






Info-state in MDPs: current state

Info-state in POMDPs:

posterior belief on current state


Topics









Structure of optimal strategyRestricting attention to control strategies

of the form = �is without any loss.


Topics









Structure of optimal strategyRestricting attention to control strategies

of the form = �is without any loss.

Search of optimal strategyAn optimal strategy of the form

above is given by the solution of the

following dynamic program:� = min� �[ + + � + | � , ]

How do we define an information

state for a decentralized system?

Common Knowledge (Aumann, 1976)

Topics


Ω,ℱ, �


Topics


Ω,ℱ, � � ∩ �


Topics


Ω,ℱ, � � ∩ �


Topics


Ω,ℱ, � � ∩ �

Exploiting common knowledge tosimplify decentralized static optimization

Topics


= , = ℎ, ℎ = ��,ℎ[ , , ]


Topics


= , = ℎ, ℎ = ��,ℎ[ , , ]Let denote the common knowledge

between and . Write:≡ , , ≡ , ,= ˜ , . = ℎ̃ , .


Topics



between and . Write:≡ , , ≡ , ,= ˜ , . = ℎ̃ , .˜ : , ↦ , ˜ : ↦ ↦⏝⎵⏟⎵⏝�


Topics



between and . Write:≡ , , ≡ , ,= ˜ , . = ℎ̃ , .˜ : , ↦ , ˜ : ↦ ↦⏝⎵⏟⎵⏝�Let � ⋅ = ˜ , ⋅ and � ⋅ = ℎ̃ , ⋅


Topics



between and . Write:≡ , , ≡ , ,= ˜ , . = ℎ̃ , .˜ : , ↦ , ˜ : ↦ ↦⏝⎵⏟⎵⏝�Let � ⋅ = ˜ , ⋅ and � ⋅ = ℎ̃ , ⋅A common knowledge based solutionmin�,� ��,�[ , , | ]


Topics



between and . Write:≡ , , ≡ , ,= ˜ , . = ℎ̃ , .˜ : , ↦ , ˜ : ↦ ↦⏝⎵⏟⎵⏝�Let � ⋅ = ˜ , ⋅ and � ⋅ = ℎ̃ , ⋅A common knowledge based solution (functional opt. over smaller space)min�,� ��,�[ , , | ]


Topics



between and . Write:≡ , , ≡ , ,= ˜ , . = ℎ̃ , .˜ : , ↦ , ˜ : ↦ ↦⏝⎵⏟⎵⏝�Let � ⋅ = ˜ , ⋅ and � ⋅ = ℎ̃ , ⋅A common knowledge based solution (functional opt. over smaller space)min�,� ��,�[ , , | ]Brute force: × possiblities. CK-based soln: ⋅ × possibilities.

Main idea: Extend CK-based

approach to decentralized

multi-stage systems.

Main idea: Extend CK-based

approach to decentralized

multi-stage systems.

A common information based approachfor decentralized multi-stage systems

Topics


(Nayyar, 2010; Nayyar, Mahajan, Teneketzis, 2011)

Split data at each controller/agent into two parts:

Common information: = ⋂≥Private information: = ∖


Topics




Common information: = ⋂≥Private information: = ∖

Objective Choose = , to minimize

:� = �� :�[ , :� ]


Topics




Common information: = ⋂≥ ⊆ +Private information: = ∖


:� = �� :�[ , :� ]


Topics




Common information: = ⋂≥ ⊆ +Private information: = ∖


:� = �� :�[ , :� ]Solution approach

1. Construct a coordinated system (that has classical info-struct.)

2. Show that coordinated system ≡ original system.

3. Find a solution to coordinated system using centralized stoc. control.

4. Translate the result back to original system


Topics


⋯ ⋯ ��


Topics


⋯ ⋯ ��

Coordinator{…, , …}{…, � , …}

Prescription: � : ↦ ,

chosen according to� = , � : −= �


Topics


⋯ ⋯ ��

Coordinator{…, , …}{…, � , …}


chosen according to� = , � : −= �The two systems are equivalent, = �⏟�� ,� :�−


Topics


⋯ ⋯ ��

Coordinator{…, , …}{…, � , …}


chosen according to� = , � : −= �The two systems are equivalent, = �⏟�� ,� :�−Coordinated system is centralized

Find information state � .

Without loss of optimality, choose � = �Write DP in terms of � : � = min�� [ ⋅ + + � + | � , � ]


Topics


⋯ ⋯ ��

Coordinator{…, , …}{…, � , …}


chosen according to� = , � : −= �The two systems are equivalent, = �⏟�� ,� :�−Coordinated system is centralized

Find information state � .

Without loss of optimality, choose � = � ≡ = � ,Write DP in terms of � : � = min�� [ ⋅ + + � + | � , � ]


Topics






Delayed sharing information structure

Topics


Sys

Obs channel

Obs channel

Controller 1

Controller 2


Topics


Sys

Obs channel

Obs channel

Controller 1

Controller 2

+ = , : , = ℎ , �


Topics


Sys

Obs channel

Obs channel

Controller 1

Controller 2

+ = , : , = ℎ , ��-step delayed info sharing Perfect recall at controller


Topics


Sys

Obs channel

Obs channel

Controller 1

Controller 2

+ = , : , = ℎ , ��-step delayed info sharing Perfect recall at controller,:� = �� .:�[ , , ]

Literature Overview

Topics


(Witsenhausen, 1971):

Proposed delayed-sharing information structure.

Asserted a structure of optimal control law (without proof).

Literature Overview

Topics





(Varaiya and Walrand, 1978):

Proved Witsenhausen’s assertion for � = .

Counter-example to disproved the assertion for delay � > .

Literature Overview

Topics





(Varaiya and Walrand, 1978):

Proved Witsenhausen’s assertion for � = .

Counter-example to disproved the assertion for delay � > .

The result of one-step delayed sharing used in various applications:

Queueing theory: Kuri and Kumar, 1995

Communication networks: Altman et. al, 2009, Grizzle et. al, 1982

Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983

Economics: Li and Wu, 1991

Solution based on commoninformation approach

Topics


Common information = ,: −�, ,: −� .

Private information � = −�+ : , −�+ : −Control actions = , � , = , �


Topics



Private information � = −�+ : , −�+ : −Control actions = , � , = , �Coordinated System

Data observerd (increasing with time)

Control actions � , � , where � : � ↦


Topics



Private information � = −�+ : , −�+ : −Control actions = , � , = , �Coordinated System

Data observerd (increasing with time)

Control actions � , � , where � : � ↦Find a solution to the coordinated system and translate it back to

the original system.

The coordinated system:state for I/O mapping

Topics


� �� , �

� �,

The coordinated system:state for I/O mapping

Topics


� �� , �

� �,

State for I/O mapping: , � , �

Information state for coordinated system

Topics


The coordinated system is a centralized partially observed system.

Info state = ℙ state for I/O mapping | data at controller


Topics



Info state = ℙ state for I/O mapping | data at controller� = ℙ , � , � | , � , �Structural Result There is no loss of optimality in restricting

prescriptions of the form� = � and hence, = � , �


Topics




prescriptions of the form� = � and hence, = � , �Dynamic Programming decomposition An optimal coordination strategy

is given by the solution to the following dynamic program� = min�� ,�� [ , � � , � � + + � + | � , � , � ]


Topics




prescriptions of the form� = � and hence, = � , �Dynamic Programming decomposition An optimal coordination strategy

is given by the solution to the following dynamic program� = min�� ,�� [ , � � , � � + + � + | � , � , � ]Setting � , � = � � gives optimal control strategy.

An easy solution to long

standing open problem


Topics






Connections

Topics


Many existing results on decentralized control are special casesDelayed state sharing (Aicardi et al, 1987)

Periodic sharing information structures (Ooi et al, 1997)

Control sharing (Bismut, 1972; Sandell and Athans, 1974; Mahajan 2011)

Finite sate memory controllers (Sandell, 1974, Mahajan, 2008)

Connections

Topics


Many existing results on decentralized control are special casesDelayed state sharing (Aicardi et al, 1987)

Periodic sharing information structures (Ooi et al, 1997)

Control sharing (Bismut, 1972; Sandell and Athans, 1974; Mahajan 2011)

Finite sate memory controllers (Sandell, 1974, Mahajan, 2008)

Generalization to other modelsInfinite horizon (discounted and average cost) models using

standard results for POMDPs

Computation algorithms based on algorithms for POMDPs

Extend results to systems with unknown models based on

Q-learning and adaptive control algorithms

Conclusion

Topics


Summary of the main ideaFind common information at the controllers

Look from the point of view of a coordinator that observes common

information and chooses prescriptions to the controllers

Find information state for the coordinated system and use it to set

up a dynamic program

Conclusion

Topics


Summary of the main ideaFind common information at the controllers

Look from the point of view of a coordinator that observes common

information and chooses prescriptions to the controllers

Find information state for the coordinated system and use it to set

up a dynamic program

Future DirectionsComputational algorithms

Connections with sequential games

Connections with large scale systems/mean field theory

Thank you

References

Topics


1. A. Nayyar, A. Mahajan, D. Teneketzis,

Optimal control strategies for delayed sharing information structures,

IEEE Trans. on Automatic Control, vol. 56, no. 7, pp. 1606-1620, July 2011.

2. A. Nayyar, A. Mahajan, D. Teneketzis,

Dynamic programming for decentralized stochastic control with partial

information sharing: a common information approach,

submitted to IEEE Trans. on Automatic Control, Dec 2011.

Documents

Optimal decentralized stochastic control: A common ...amahaj1/talks/gerad-2012.pdf · Optimal decentralized stochastic control: A common information approach Aditya Mahajan McGillUniversity