Discrete-Time Optimal Control Problem Hybrid …home.deib.polimi.it/prandini/file/2015_06_17 hybrid systems_2.pdf · Hybrid Systems Course: Switched Linear Quadratic Regulation 1/117

HybridSystemsCourse:

Switch

edLinear

QuadraticRegulation

1/117

Discrete-T

imeOptimalControlProblem

Adiscrete-timecontrolleddynam

ical

system

xk+1=

f(x

k,u

k),

k=

0,1,...,initialconditionx0

2/117

Discrete-T



ical

system

xk+1=

f(x

k,u

k),

k=


Problem:Given

atimehorizon

[0,N

],findtheop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

N−1

∑ k=0

ℓ(xk,u

k)+

φ(x

N)

•Runningcost

ℓ(xk,u

k)≥

0

•Terminal

cost

φ(x

N)≥

0

3/117

Discrete-T



ical

system

xk+1=

f(x

k,u

k),

k=


Problem:Given

atimehorizon

[0,N

],findtheop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

N−1

∑ k=0

ℓ(xk,u

k)+

φ(x

N)

•Runningcost

ℓ(xk,u

k)≥

0

•Terminal

cost

φ(x

N)≥

0

Extension

todiscrete-timehybridsystem

xk+1=

f(x

k,u

k,σ

k),

k=

0,1,...

4/117

Linear

QuadraticRegulation(L

QR)Problem

Adiscrete-timelin

earsystem

withgiveninitialconditionx0:

xk+1=

Axk+

Buk

5/117

Linear


QR)Problem

Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk

Problem:findop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

N−1

∑ k=0

(

xT kQxk+

uT kRuk

)

︸︷︷

︸

runningcost

+xT NQ

fxN

︸︷︷

︸

term

inalcost

6/117

Linear


QR)Problem

Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk

Problem:findop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

N−1

∑ k=0

(

xT kQxk+

uT kRuk

)

︸︷︷

︸

runningcost

+xT NQ

fxN

︸︷︷

︸

term

inalcost

•State

weigh

tmatrixQ

=Q

T�

0

•Con

trol

weigh

tmatrixR

=R

T≻

0(nofree

control)

•Final

stateweigh

tmatrixQ

f=

QT f�

0

7/117

LQR

Problem:Motivation

Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk

Problem:findop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

N−1

∑ k=0

(

xT kQxk+

uT kRuk

)

+xT NQ

fxN

Com

prom

ises

betweentheconflictinggoals:

•minim

izeoverallcontrol

effort

•minim

izeoverallstatedeviation

from

0

Largercontrol

inputcandrive

thestateto

zero

faster

8/117

LQR

Problem:SpecialCases

Energyefficientstabilization:Q

=Q

f=

αI,

R=

βI

J(u)=

α

N∑ k=0

‖xk‖2

+β

N−1

∑ k=0

‖uk‖2

9/117

LQR


Energyefficientstabilization:Q

=Q

f=

αI,

R=

βI

J(u)=

α

N∑ k=0

‖xk‖2

+β

N−1

∑ k=0

‖uk‖2

•Weigh

tsα,β

>0determinetheem

phasisbetweentwoob

jectives:

(i)statestayscloseto

0;(ii)use

less

control

energy

10/117

LQR


Problem:findthecontrol

sequence

u=

(u0,...,u

N−1)withtheleast

energy

that

cansteerthesystem

statefrom

x0to

xN=

0

11/117

LQR



sequence

u=

(u0,...,u

N−1)withtheleast

energy

that

cansteerthesystem

statefrom

x0to

xN=

0

•Set

Q=

0since

wedonot

care

abou

tdeviation

from

0of

states

attimes

0,1,...,N

−1

•Chooseavery

largeαsince

thefinal

statexNneedsto

be0in

optimal

solution

12/117

LQR



sequence

u=

(u0,...,u

N−1)withtheleast

energy

that

cansteerthesystem

statefrom

x0to

xN=

0

•Set

Q=

0since

wedonot

care

abou

tdeviation

from

oof

states

attimes

0,1,...,N

−1

•Chooseavery

largeαsince

thefinal

statexNneedsto

be0in

optimal

solution

Minim

um

energysteeringto

0:Q

=0,

Qf=

αI,

R=

I

J(u)=

α‖x

N‖2

+

N−1

∑ k=0

‖uk‖2

13/117

LQR


Adiscrete-timelin

earsystem

withoutputan

dgiveninitialconditionx0:

xk+1=

Axk+

Buk

yk=

Cxk

Problem:findop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

α

N∑ k=0

‖yk‖2

+β

N−1

∑ k=0

‖uk‖2

(α>

0,β>

0)

14/117

LQR


Adiscrete-timelin

earsystem

withoutputan

dgiveninitialconditionx0:

xk+1=

Axk+

Buk

yk=

Cxk

Problem:findop

timal

inputsequence

u=

(u0,...,u

N−1)that

minim

izes

J(u)=

α

N∑ k=0

‖yk‖2

+β

N−1

∑ k=0

‖uk‖2

(α>

0,β>

0)

Asan

LQRprob

lem

•State

weigh

tmatrixQ

=αC

TC

�0

•Con

trol

weigh

tmatrixR

=βI≻

0

•Final

stateweigh

tmatrixQ

f=

Q=

αC

TC

�0

15/117

LQR


Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk

Problem:trackareference

statetrajectory

xr 0,x

r 1,...,x

r Nwitheffi

cient

control:

16/117

LQR


Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk


statetrajectory

xr 0,x

r 1,...,x

r Nwitheffi

cient

control:

J(u)=

α

N∑ k=0

‖xk−xr k‖2

︸︷︷

︸

trackingerrorpen

alty

+β

N−1

∑ k=0

‖uk‖2

︸︷︷

︸

controlen

ergy

17/117

LQR


Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk


statetrajectory

xr 0,x

r 1,...,x

r Nwitheffi

cient

control:

J(u)=

α

N∑ k=0

‖xk−

xr k‖2

+β

N−1

∑ k=0

‖uk‖2

Can

beform

ulatedas

a(tim

e-varying)

LQRprob

lem

18/117

LQR


Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk


statetrajectory

xr 0,x

r 1,...,x

r Nwitheffi

cient

control:

J(u)=

α

N∑ k=0

‖xk−

xr k‖2

+β

N−1

∑ k=0

‖uk‖2

Can

beform

ulatedas

a(tim

e-varying)

LQRprob

lem

•Augm

entthestatexto

x=

[x z

]

withz∈R;letx0=

[x0 1

]

19/117

LQR


Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk


statetrajectory

xr 0,x

r 1,...,x

r Nwitheffi

cient

control:

J(u)=

α

N∑ k=0

‖xk−

xr k‖2

+β

N−1

∑ k=0

‖uk‖2

Can

beform

ulatedas

a(tim

e-varying)

LQRprob

lem

•Augm

entthestatexto

x=

[x z

]

withz∈R;letx0=

[x0 1

]

•Augm

entedstatedynam

icsto

xk+1=

[A

00

1

]

xk+

[B 0

]

uk

20/117

LQR


Adiscrete-timelin

earsystem


xk+1=

Axk+

Buk


statetrajectory

xr 0,x

r 1,...,x

r Nwitheffi

cient

control:

J(u)=

α

N∑ k=0

‖xk−

xr k‖2

+β

N−1

∑ k=0

‖uk‖2

Can

beform

ulatedas

a(tim

e-varying)

LQRprob

lem

•Augm

entthestatexto

x=

[x z

]

withz∈R;letx0=

[x0 1

]

•Augm

entedstatedynam

icsto

xk+1=

[A

00

1

]

xk+

[B 0

]

uk

•ChooseQ

k=

α

[I

−(x

r k)T

][I

−xr k

],R

=βI,

Qf=

QN

21/117

Switch

edLQR

Problem

Adiscrete-timesw

itched

linearsystem


xk+1=

Aσkxk+

Bσkuk,

•continuou

sstate:

xk∈Rn

•discretestate(m

ode):σk∈Σ

={1,2,...,M

}

22/117

Switch

edLQR

Problem

Adiscrete-timesw

itched

linearsystem


xk+1=

Aσkxk+

Bσkuk,

Problem:Findtheop

timal

inputsequence

(u0,...,u

N−1)an

dmode

sequence

(σ0,...,σ

N−1)that

minim

izethecost

function

N−1

∑ k=0

(

xT kQ

σkxk+

uT kRσkuk

)

+xT NQ

fxN

•State

weigh

tan

dcontrol

weigh

tmatricesmodedependent

23/117

Switch

edLQR

Problem

Adiscrete-timesw

itched

linearsystem


xk+1=

Aσkxk+

Bσkuk,

Problem:Findtheop

timal

inputsequence

(u0,...,u

N−1)an

dmode

sequence

(σ0,...,σ

N−1)that

minim

izethecost

function

N−1

∑ k=0

(

xT kQ

σkxk+

uT kRσkuk

)

+xT NQ

fxN

•State

weigh

tan

dcontrol

weigh

tmatricesmodedependent

Observations:

•In

differentmodes,bothdynam

icsan

drunningcostsaredifferent

•Ifmodesequence

isgiven,becom

esatime-varyingLQRprob

lem

•Mainchallenge

isdeterminingthemodesequence

24/117

Example

Build

ingcoolingsystem

:

•Multiple

build

ingzones

•AirHan

dlin

gUnits(A

HUs)

State

variab

les:

•Zon

etemperatures,

humidity

Con

trols:

•AHU

dam

per

open/close

•Fan

pow

ers

Objectives:

•Maintain

comfort

•Reduce

energy

usage

CourtesyofJianghai

Hu,Purdue

University

25/117

Outline

•Solve

LQRprob

lem

usingdynam

icprogrammingmethod

•Extendthemethodto

solveSLQRprob

lem

•Com

plexity

reductiontechniques

26/117

Outline

•Solve

LQRprob

lem

usingdynam

icprogrammingmethod

•Extendthemethodto

solveSLQRprob

lem

•Com

plexity

reductiontechniques

Wefirstlook

attheLQRprob

lem:

Minim

ize

[N−1

∑ k=0

(

xT kQxk+

uT kRuk

)

+xT NQ

fxN

]

subject

toxk+1=

Axk+

Buk,k=

0,...,N

−1

x0fixed

27/117

Direct

Approach

:LQR

via

Least-squares

Thestatealon

gthetimehorizon

[0,N

]isalin

earfunctionof

uan

dx0:

x1

x2 . . . xN

︸︷︷

︸

x

=

B0

···

AB

B0

···

. . .. . .

. ..

AN−1B

AN−2B

···

B

︸︷︷

︸

G

u0

u1 . . .

uN−1

︸︷︷

︸

u

+

A A2 . . .

AN

︸︷︷

︸

H

x0 28/117

Direct

Approach

:LQR

via

Least-squares

Thestatealon

gthetimehorizon

[0,N

]isalin

earfunctionof

uan

dx0:

x1

x2 . . . xN

︸︷︷

︸

x

=

B0

···

AB

B0

···

. . .. . .

. ..

AN−1B

AN−2B

···

B

︸︷︷

︸

G

u0

u1 . . .

uN−1

︸︷︷

︸

u

+

A A2 . . .

AN

︸︷︷

︸

H

x0

Minim

izethefunction:

J(u)=

xT

Q

Q. ..

Qf

︸︷︷

︸

Q

x+uT

R

R. ..

R

︸︷︷

︸

R

u

29/117

Direct

Approach

:LQR

via

Least-squares

Thestatealon

gthetimehorizon

[0,N

]isalin

earfunctionof

uan

dx0:

x=

Gu+

Hx0

Minim

izethefunction:

J(u)=

xTQx+

uTRu

=(G

u+

Hx0)T

Q(G

u+

Hx0)+

uTRu

30/117

Direct

Approach

:LQR

via

Least-squares

Thestatealon

gthetimehorizon

[0,N

]isalin

earfunctionof

uan

dx0:

x=

Gu+

Hx0

Minim

izethefunction:

J(u)=

xTQx+

uTRu

=(G

u+

Hx0)T

Q(G

u+

Hx0)+

uTRu

=‖Q

1 2(G

u+

Hx0)‖

2+

‖R1 2u‖2

Thisisaleast-squares

prob

lem.

31/117

Direct

Approach

:LQR

via

Least-squares

Thestatealon

gthetimehorizon

[0,N

]isalin

earfunctionof

uan

dx0:

x=

Gu+

Hx0

Minim

izethefunction:

J(u)=

xTQx+

uTRu

=(G

u+

Hx0)T

Q(G

u+

Hx0)+

uTRu

=‖Q

1 2(G

u+

Hx0)‖

2+

‖R1 2u‖2

Thisisaleast-squares

prob

lem.

Theop

timal

control

is u∗=

−(R

+G

TQG)−

1G

TQHx0

32/117

Direct

Approach

:LQR

via

Least-squares

Lim

itationsofDirect

Approach

:

•Matrixinversionneeded

tofindop

timal

control

•Problem

(matrices)

dim

ension

increaseswithtimehorizon

N

•Im

practicalforlargeN

letalon

einfinitehorizon

case

•Sensitivity

ofsolution

sto

numerical

errors

33/117

Direct

Approach

:LQR

via

Least-squares

Lim

itationsofDirect

Approach

:


tofindop

timal

control

•Problem

(matrices)

dim

ension


N

•Im

practicalforlargeN

letalon

einfinitehorizon

case

•Sensitivity

ofsolution

sto

numerical

errors

Observations:

•Problem

easier

tosolveforshortertimehorizon

N

•(N

+1)-horizon

solution

relatedto

N-horizon

solution

•Exploitthisrelation

todesignan

iterativesolution

procedure

34/117

Direct

Approach

:LQR

via

Least-squares

Lim

itationsofDirect

Approach

:


tofindop

timal

control

•Problem

(matrices)

dim

ension


N

•Im

practicalforlargeN

letalon

einfinitehorizon

case

•Sensitivity

ofsolution

sto

numerical

errors

Observations:

•Problem

easier

tosolveforshortertimehorizon

N

•(N

+1)-horizon

solution

relatedto

N-horizon

solution

•Exploitthisrelation

todesignan

iterativesolution

procedure

Dyn

amic

programming:an

iterativeap

proach

that

can

•Re-use

resultsforsm

allerN

tosolveforlarger

Ncase

•In

each

iterationon

lyneedto

dealwithaprob

lem

offixedsize

35/117

Dynamic

ProgrammingApproach

Idea:Solve

asequence

ofop

timal

control

prob

lemsover

timehorizon

s[t,N

],fordecreasingt,

=N,N

−1,...,0

36/117

Dynamic

ProgrammingApproach

Idea:Solve

asequence

ofop

timal

control

prob

lemsover

timehorizon

s[t,N

],fordecreasingt,

=N,N

−1,...,0

•Valuefunctionat

timetistheop

timal

cost

over

[t,N

]:

Vt(x)=

min

ut,u

t+1,...,u

N−1

N−1

∑ k=t

(

xT kQxk+uT kRuk

)

+xT NQ

fxN

withtheinitialconditionxt=

x

37/117

Dynamic

ProgrammingApproach

Idea:Solve

asequence

ofop

timal

control

prob

lemsover

timehorizon

s[t,N

],fordecreasingt,

=N,N

−1,...,0

•Valuefunctionat

timetistheop

timal

cost

over

[t,N

]:

Vt(x)=

min

ut,u

t+1,...,u

N−1

N−1

∑ k=t

(

xT kQxk+uT kRuk

)

+xT NQ

fxN


x

•Valuefunctionbackw

arditeration:

Vt−

1(·)canbecomputedbased

onVt(·)

38/117

Dynamic

ProgrammingApproach

Idea:Solve

asequence

ofop

timal

control

prob

lemsover

timehorizon

s[t,N

],fordecreasingt,

=N,N

−1,...,0

•Valuefunctionat

timetistheop

timal

cost

over

[t,N

]:

Vt(x)=

min

ut,u

t+1,...,u

N−1

N−1

∑ k=t

(

xT kQxk+uT kRuk

)

+xT NQ

fxN


x

•Valuefunctionbackw

arditeration:

Vt−

1(·)canbecomputedbased

onVt(·)

•Optimal

cost

oforiginal

prob

lem

isV0(x

0)

•Optimal

inputsequence

canberecoveredfrom

valuefunctions

39/117

MotivatingExample

•Start

from

pointA

•Try

toreachpointB

•Eachstep

only

moverigh

t(→

N=

6)

•Costlabeled

oneach

edge

Problem:Pathfrom

Ato

Bwiththeleastcost?

40/117

MotivatingExample

•Start

from

pointA

•Try

toreachpointB

•Eachstep

only

moverigh

t(→

N=

6)

•Costlabeled

oneach

edge

Problem:Pathfrom

Ato

Bwiththeleastcost?

•For

ℓ-by-ℓ

grid,thetotalnumber

oflegalpathsis

(2ℓ)!

(ℓ!)2,whichgrow

s

fast

withℓ.

Inou

rcase

ℓ=

3,hence

totalnumber

oflegalpathis20.

41/117

ValueFunctions

Valuefunctionat

zistheleastpossible

cost

toreachB

from

z

42/117

ValueFunctions

Valuefunctionat

zistheleastpossible

cost

toreachB

from

z

Principle

ofOptimality:Ifaleast-cost

pathfrom

Ato

Bis

x∗ 0=

A→

x∗ 1→

x∗ 2→

···→

x∗ 6=

B,

then

anytruncation

ofit:

x∗ t→

x∗ t+

1→

···→

x∗ 6=

B

isalso

aleast-cost

pathfrom

x∗ tto

B

43/117

ValueFunctions

Valuefunctionat

zistheleastpossible

cost

toreachB

from

z

Principle

ofOptimality:Ifaleast-cost

pathfrom

Ato

Bis

x∗ 0=

A→

x∗ 1→

x∗ 2→

···→

x∗ 6=

B,

then

anytruncation

ofit:

x∗ t→

x∗ t+

1→

···→

x∗ 6=

B

isalso

aleast-cost

pathfrom

x∗ tto

B

Valuefunctionat

anypointzreached

attimet

satisfies

Vt(z)=

min{w

u+

Vt+

1(z

′ u),wd+

Vt+

1(z

′ d)}

Optimal

action

when

xt=

zistheon

eprovidingthe

minim

um

argu

mentan

dcanberecoveredfrom

Vt+

1(·)

44/117

ValueFunctionIteration:Results

45/117

ValueFunctionIteration:Results

46/117

ValueFunctionIteration:SomeObservations

Reducedcomputation

alcomplexity:forℓ-by-ℓ

grid

•Only

needto

compute

ℓ2valuefunctions

•Noneedto

enumerate

(2ℓ)!

(ℓ!)2paths

•Solve

anop

timizationprob

lem

offixedsize

ineach

iteration

47/117


Reducedcomputation


grid

•Only

needto

compute

ℓ2valuefunctions

•Noneedto

enumerate

(2ℓ)!

(ℓ!)2paths

•Solve

anop

timizationprob

lem

offixedsize

ineach

iteration

Providesolution

sto

afamily

ofop

timal

control

prob

lems

•Evenifstartingfrom

adifferentinitialposition,thereisnoneedfor

re-com

putation

•Theinputisdetermined

asafunctionof

thecurrentstate(state

feedbackstatic

policy)

48/117


Reducedcomputation


grid

•Only

needto

compute

ℓ2valuefunctions

•Noneedto

enumerate

(2ℓ)!

(ℓ!)2paths

•Solve

anop

timizationprob

lem

offixedsize

ineach

iteration

Providesolution

sto

afamily

ofop

timal

control

prob

lems

•Evenifstartingfrom

adifferentinitialposition,thereisnoneedfor

re-com

putation

•Theinputisdetermined

asafunctionof

thecurrentstate(state

feedbackstatic

policy)

Particularlysuitab

leformulti-stagedecisionprob

lemswhen

thenumber

ofcontrol

choicesissm

allat

each

stage

49/117

ValueFunctionsofLQR

Problem

Theva

luefunctionat

timet∈{0,1,...,N

}an

dstatex∈Rnis

Vt(x)=

min

ut,u

t+1,...,u

N−1

N−1

∑ k=t

(

xT kQxk+

uT kRuk

)

+xT NQ

fxN

50/117

ValueFunctionsofLQR

Problem

Theva

luefunctionat

timet∈{0,1,...,N

}an

dstatex∈Rnis

Vt(x)=

min

ut,u

t+1,...,u

N−1

N−1

∑ k=t

(

xT kQxk+

uT kRuk

)

+xT NQ

fxN

•Vt(x)istheop

timal

cost

oftheLQRprob

lem

within

ashortertime

horizon

(from

timetto

N),

startingfrom

theinitialconditionxt=

x

•V0(x

0)istheop

timal

cost

oftheoriginal

LQRprob

lem

51/117

LQR

problem:Dynamic

ProgrammingSolution

Bellmanequation:

Vt(x)

︸︷︷︸

cost-to-goforthecu

rren

tstate

=min

ut=v

[xTQx+

vTRv

︸︷︷

︸

curren

trunningcost

+Vt+

1(A

x+

Bv)

︸︷︷

︸

cost-to-goforthenextstate

]

=xTQx+

min

ut=v

[vTRv+

Vt+

1(A

x+

Bv)]

52/117

LQR

problem:Dynamic

ProgrammingSolution

Bellmanequation:

Vt(x)

︸︷︷︸

cost-to-goforthecu

rren

tstate

=min

ut=v

[xTQx+

vTRv

︸︷︷

︸

curren

trunningcost

+Vt+

1(A

x+

Bv)

︸︷︷

︸

cost-to-goforthenextstate

]

=xTQx+

min

ut=v

[vTRv+

Vt+

1(A

x+

Bv)]

Optimalco

ntrol:

canberecoveredfrom

thevaluefunctionsas

follows

u∗ t(x)=

argmin v

[vTRv+

Vt+

1(A

x+

Bv)]

(optimal

control

attimetwhen

xt=

x)

53/117

LQR

problem:Dynamic

ProgrammingSolution

•Valuefunctionat

timeN

isquad

ratic:

VN(x)=

xTQ

fx

54/117

LQR

problem:Dynamic

ProgrammingSolution

•Valuefunctionat

timeN

isquad

ratic:

VN(x)=

xTQ

fx

•SupposeVt+

1(x)=

xTPt+

1xisquad

ratic,

then

Vt(x)=

min v

[

xTQx+

vTRv+

Vt+

1(A

x+

Bv)]

=xTPtx

55/117

LQR

problem:Dynamic

ProgrammingSolution

BysettingVt+

1(x)=

xTPt+

1xin

theexpression

ofVt(x),

weget

Vt(x)=

min v

[

xTQx+

vTRv+

Vt+

1(A

x+

Bv)]

=min v

[

xTQx+

vTRv+

(Ax+

Bv)T

Pt+

1(A

x+

Bv)]

=min v

[

xTQx+

vT(R

+B

TPt+

1B)v

+2v

TB

TPt+

1Ax

+xTATPt+

1Ax]

56/117

LQR

problem:Dynamic

ProgrammingSolution

BysettingVt+

1(x)=

xTPt+

1xin

theexpression

ofVt(x),

weget

Vt(x)=

min v

[

xTQx+

vTRv+

Vt+

1(A

x+

Bv)]

=min v

[

xTQx+

vTRv+

(Ax+

Bv)T

Pt+

1(A

x+

Bv)]

=min v

[

xTQx+

vT(R

+B

TPt+

1B)v

+2v

TB

TPt+

1Ax

+xTATPt+

1Ax]

Minim

izer

isgivenby u∗ t=

−(R

+B

TPt+

1B)−

1B

TPt+

1Ax

57/117

LQR

problem:Dynamic

ProgrammingSolution

BysettingVt+

1(x)=

xTPt+

1xin

theexpression

ofVt(x),

weget

Vt(x)=

min v

[

xTQx+

vTRv+

Vt+

1(A

x+

Bv)]

=min v

[

xTQx+

vTRv+

(Ax+

Bv)T

Pt+

1(A

x+

Bv)]

=min v

[

xTQx+

vT(R

+B

TPt+

1B)v

+2v

TB

TPt+

1Ax

+xTATPt+

1Ax]

Minim

izer

isgivenby u∗ t=

−(R

+B

TPt+

1B)−

1B

TPt+

1Ax

Ifweplugitbackinto

theexpression

ofVt(x),

weob

tain

Vt(x)=

xT(Q

+ATPt+

1A−

ATPt+

1B(R

+B

TPt+

1B)−

1B

TPt+

1A)

︸︷︷

︸

Pt

x

58/117

LQR

problem:Dynamic

ProgrammingSolution

•Valuefunctionat

timeN

isquad

ratic:

VN(x)=

xTQ

fx

•SupposeVt+

1(x)=

xTPt+

1xisquad

ratic,

then

Vt(x)=

min v

[

xTQx+

vTRv+

Vt+

1(A

x+

Bv)]

=xTPtx

isquad

raticwithPtob

tained

from

Pt+

1by

Ricca

timapping:

Pt:=

Q+

ATPt+

1A−

ATPt+

1B(R

+B

TPt+

1B)−

1B

TPt+

1A

whichisachievedby

thelin

earstatefeedbackcontrol

u∗ t=

−(R

+B

TPt+

1B)−

1B

TPt+

1A

︸︷︷

︸

Kalm

angain

x=

−Ktx

59/117

LQR

problem:Dynamic

ProgrammingSolution

•Valuefunctionat

anytimeisquad

ratic(easynumeric

representation

)

•Optimal

control

isof

linearstatefeedbackform

withtime-varying

gains

•Yieldstheop

timal

solution

sforallinitialconditionsx0an

dallinitial

times

t 0∈{0,1,...,N

}simultan

eously

•Easily

extended

totime-varyingdynam

icsan

dcostscases

60/117

LQR

SolutionAlgorithm

Set

PN=

Qf

fort=

N−

1,N

−2,...,0do

Com

pute

thevaluefunctionsbackw

ardin

time:

Pt:=

Q+

ATPt+

1A−

ATPt+

1B(R

+B

TPt+

1B)−

1B

TPt+

1A

endfor

Return

V0(x

0)as

theop

timal

cost

Set

x∗ 0=

x0

fort=

0,1,...,N

−1do

Recover

theop

timal

control

andstatetrajectory

forwardin

time:

u∗ t=

−(R

+B

TPt+

1B)−

1B

TPt+

1Ax∗ t

x∗ t+

1=

Ax∗ t+

Bu∗ t

endfor

Return

u∗ tan

dx∗ tas

theop

timal

control

andstatesequences

61/117

Example

xk+1=

[1

10

1

]

xk+

[0 1

]

uk,

yk=

[1

0]xk=

Cxk,x0=

[1 0

]

Costfunction:J(U

)=

∑N−1

k=0‖u

k‖2

+ρ∑

N k=0‖y

k‖2

(N=

20)

62/117

Example

Optimal

control

isof

theform

u∗ t=

[at

bt

]x∗ t,t=

0,1,...,19

TheKalman

gainsatan

dbtrapidly

convergeto

someconstan

tvalues

63/117

Convergence

ofRicca

tiRecu

rsion

Theorem

aIf(A

,B)isstabilizable,then

Riccatirecursionwillconvergeto

asolution

PssoftheAlgebraic

Ricca

tiEquation(A

RE):

Pss=

Q+

ATPssA−

ATPssB(R

+B

TPssB)−

1B

TPssA

Iffurther

Q=

CTC

forsomeC

such

that(C

,A)isdetectable,then

Pssis

unique,

andunder

thesteady-state

optimalcontrolgain

Kss=

(R+

BTPssB)−

1B

TPssA,

theclosed-loopsystem

Acl=

A−BKss

isstable

aOnthediscrete

timematrix

Riccatiequationofoptimalcontrol,P.E.

Caines

andD.Q

.Mayne,

Int.

J.Control,vol.12,no.5,pp.785-794,1970.

64/117

ImportantPropertiesofRicca

tiMapping

TheRiccatimap

pingPt=

ρ(P

t+1)defined

by

Pt=

Q+

ATPt+

1A−

ATPt+

1B(R

+B

TPt+

1B)−

1B

TPt+

1A

isamap

pingρ:S+→

S+betweensetof

positivesemidefinitematrices

Proposition(M

onotonicity)

ForP,P

′∈S+withP

�P′ ,wehaveρ(P

)�

ρ(P

′ )

Proposition(C

onca

vity)

ForP,P

′∈S+andθ∈[0,1],ρ(θP+(1

−θ)P

′ )�

θρ(P

)+

(1−

θ)ρ(P

′ )

“Kalm

anfilteringwithinterm

ittentobservations,”Sinopoli,B.et

al,IEEE

Trans.

AutomaticControl,vo

l.49,no.9,pp.1453-1464,2004.

65/117

Back

toSwitch

edLQR

Problem

Adiscrete-timesw

itched

linearsystem


xk+1=

Aσkxk+

Bσkuk,

•continuou

sstate:

xk∈Rn

•discretestate(m

ode):σk∈Σ

={1,2,...,M

}

Problem:Findtheop

timal

inputsequence

(u0,...,u

N−1)an

dmode

sequence

(σ0,...,σ

N−1)that

minim

izethecost

function

N−1

∑ k=0

(

xT kQ

σkxk+

uT kRσkuk

)

+xT NQ

fxN

66/117

Back

toSwitch

edLQR

Problem

Findcontrol

sequence

u0,...,u

N−1an

dmodesequence

σ0,...,σ

N−1to

minim

ize

N−1

∑ k=0

(

xT kQ

σkxk+

uT kRσkuk

)

+xT NQ

fxN

subject

toxk+1=

Aσkxk+

Bσkuk,k=

0,...,N

−1

x0fixed

Valuefunctionat

each

t=

0,1,...,N

andxistheop

timal

cost

over

horizon

[t,N

]assumingxt=

x

Vt(x)=

min

σt,...,σ

N−1

ut,...,u

N−1

N−1

∑ k=t

(

xT kQ

σkxT k+

uT kRσkuk

)

+xT NQ

fxN

67/117

Back

toSwitch

edLQR

Problem

Observations:

•Solution

strategy:dynam

icprogramming

•In

each

step,needto

determineboththeop

timal

uan

dσ

•Valuefunctionnolongerquad

ratic

•V0(x

0)istheop

timal

cost

oftheoriginal

prob

lem

•ValuefunctionVt(x)does

not

dependon

modeσt−

1

•Hint:

nosw

itchingcost

Robust

optimalco

ntrol:

assumeσisnot

controllable

inf

usup

σJ(u,σ

)

isaco

nve

xprob

lem!

68/117

BellmanEquationofSLQR

Problem

Valuefunctionsat

differenttimes

arerelatedby

Vt(x)=

min

σt=σ

ut=v

[

xTQ

σx+

vTRσv+

Vt+

1(A

σx+

Bσv)]

Optimal

control

andmodearetheon

esachievingminim

um

above:

•Optimal

state-dependentsw

itchingpolicyσ∗ t(x)

•Optimal

statefeedbackcontrolleru∗ t(x)

Bad

new

s:valuefunctionsarein

general

not

quad

ratic

•VN(x)=

xTQ

fxisquad

ratic

•How

ever,fort=

N−

1,N

−2,...

69/117

t=

N−

1Case

VN−1(x)=

min σ

min v

[

xTQ

σx+

vTRσv+

VN(A

σx+

Bσv)]

︸︷︷

︸

Bellm

aneq

uationfortheLQR

problem

ofσ-thsubsystem

=min σ

[

xTρσ(Q

f)x]

ρσisRiccatimap

pingof

subsystem

(Aσ,B

σ)withweigh

tsQ

σ,R

σ

70/117

t=

N−

1Case

VN−1(x)ispointw

iseminim

um

ofanumber

ofquad

raticfunctions

→piecewisequad

ratic

VN−1(x)=

min

P∈PN−1

xTPx

wherePN−1=

{ρ1(Q

f),...,ρM(Q

f)}

:=ρM(Q

f)

•State

spacepartition

edinto

cones

(rad

ially

invarian

tminim

izer)

•Oneop

timal

modeforeach

cone

•Oneop

timal

linearstatefeedback

controllerforeach

cone

71/117

t=

N−

2Case

VN−2(x)

=min σ

min v

[

xTQ

σx+

vTRσv+

VN−1(A

σx+

Bσv)]

=min σ

min v

min

P∈PN−1

[

xTQ

σx+

vTRσv+

(Aσx+

Bσv)T

P(A

σx+

Bσv)]

=min σ

min

P∈PN−1

min v

[

xTQ

σx+

vTRσv+

(Aσx+

Bσv)T

P(A

σx+

Bσv)]

︸︷︷

︸

Bellm

aneq

uationfortheLQR

problem

ofσ-thsubsystem

=min σ

min

P∈PN−1

xTρσ(P

)x

Conclusion:valuefunctionVN−2(x)isthepointw

iseminim

um

ofM

2

quad

raticfunctions:

VN−2(x)=

min

P∈PN−2

xTPx

wherePN−2=

ρΣ(P

N−1)=

{ρσ(P

):P

∈PN−1,σ∈Σ}.

72/117

GeneraltCase

Ifat

t+

1,Vt+

1(x)=

min

P∈Pt+1xTPxforasetPt+

1of

p.s.d.matrices,

then

attimet,

thevaluefunctionisgivenby

Vt(x)=

min

P∈Pt

xTPx

wherePtisob

tained

from

Pt+

1by

switched

Riccatirecursion:

Pt=

ρΣ(P

t+1):=

∪σ∈Σρσ(P

t+1)

Sizeof

Ptisbiggerthan

Pt+

1:|P

t|=

M·|Pt+

1|

73/117

SLQR

SolutionAlgorithm

Set

PN=

{Qf}

fort=

N−

1,N

−2,...,0do

Com

pute

thesetof

p.s.d.matrices:

Pt=

ρM(P

t+1)

endfor

Return

V0(x

0)=

min

P∈P0xT 0Px0as

theop

timal

cost

Set

x∗ 0=

x0

fort=

0,1,...,N

−1do

Recover

theop

timal

modeσ∗ tan

dtheop

timal

control

u∗ tfrom

(σ∗ t,u

∗ t)=

argmin

σ,v

[

(x∗ t)T

Qσx∗ t+

vTRσv+

Vt+

1(A

σx∗ t+

Bσv)]

Let

x∗ t+

1=

Aσ∗ tx∗ t+

Bσ∗ tu∗ t

endfor

Return

σ∗ tan

du∗ tas

theop

timal

modean

dcontrol

sequences

74/117

ComplexityReduction

Issue:Number

ofmatricesin

Ptgrow

sexpon

entially

75/117

ComplexityReduction

Issue:Number

ofmatricesin

Ptgrow

sexpon

entially

InthesetPtdefiningthevaluefunctionVt(x)=

min

P∈PtxTPx

•MatrixP

∈Ptiscalledeff

ective

ifforat

leaston

ex6=

0

xTPx<

xTP′ x,

∀P′∈Pt\{P

}

•OtherwiseP

iscalledredundan

t

76/117

ComplexityReduction

Issue:Number

ofmatricesin

Ptgrow

sexpon

entially


min

P∈PtxTPx

•MatrixP

∈Ptiscalledeff

ective

ifforat

leaston

ex6=

0

xTPx<

xTP′ x,

∀P′∈Pt\{P

}

•OtherwiseP

iscalledredundan

t

Redundan

tmatricescanbediscarded

withou

taff

ectingop

timal

solution

becau

seof

themon

oton

icityof

Riccatimap

ping

77/117

ComplexityReduction

Issue:Number

ofmatricesin

Ptgrow

sexpon

entially


min

P∈PtxTPx

•MatrixP

∈Ptiscalledeff

ective

ifforat

leaston

ex6=

0

xTPx<

xTP′ x,

∀P′∈Pt\{P

}

•OtherwiseP

iscalledredundan

t

Redundan


withou

taff

ectingop

timal

solution

becau

seof

themon

oton

icityof

Riccatimap

ping

Sufficientco

nditionforP

∈Ptto

beredundan

t:

P�

aconvexcombinationof

P′∈Pt\{P

}

78/117

ComplexityReduction

Issue:Number

ofmatricesin

Ptgrow

sexpon

entially


min

P∈PtxTPx

•MatrixP

∈Ptiscalledeff

ective

ifforat

leaston

ex6=

0

xTPx<

xTP′ x,

∀P′∈Pt\{P

}

•OtherwiseP

iscalledredundan

t

Redundan


withou

taff

ectingop

timal

solution

becau

seof

themon

oton

icityof

Riccatimap

ping

Sufficientco

nditionforP

∈Ptto

beredundan

t:

P�


P′∈Pt\{P

}

Proof: xTPx≥

∑

Pi∈

Pt\{P}

αix

TPix

≥xTPjx,

forsomePj∈Pt\{P

}

79/117

ComplexityReduction

Issue:Number

ofmatricesin

Ptgrow

sexpon

entially


min

P∈PtxTPx

•MatrixP

∈Ptiscalledeff

ective

ifforat

leaston

ex6=

0

xTPx<

xTP′ x,

∀P′∈Pt\{P

}

•OtherwiseP

iscalledredundan

t

Redundan


withou

taff

ectingop

timal

solution

becau

seof

themon

oton

icityof

Riccatimap

ping

Sufficientco

nditionforP

∈Ptto

beredundan

t:

P�


P′∈Pt\{P

}(1)

Proof: xTPx≥

∑

Pi∈

Pt\{P}

αix

TPix

≥xTPjx,

forsomePj∈Pt\{P

}

LMIfeasibility

conditionto

test.

80/117

Example

ofIneffectiveMatrices

Asw

itched

LQRprob

lem

specified

by

A1=

[2

00

2]

,B1=

[1 2

]

,A2=

[1.5

10

1.5

]

,B2=

[1 0]

Qσ=

I,Rσ=

1,N

=20

00.5

11.5

22.5

33.5

05

10

15

20

25

30

16matricesin

P16

evaluated

alon

ghalfof

the

unitcircle

becau

seof

the

radialinvarian

ce

81/117

DecisionTreePruning

82/117

FurtherReductionbyRelaxation

Rem

ovemorematricesby

relaxingCon

dition(1)to

P�

∑ i∈I

αiP

i−

εI(2)

•ε>

0isasm

allconstan

tspecifyingap

proxim

ationquality

•Evenasm

allεcould

resultin

sign

ificantreductionin

complexity

Infinitehorizonsw

itchedLQR

problemsin

discrete

time:A

suboptimalalgorithm

with

perform

anceanalysis,

W.Zhang,J.

HuandA.Abate,IEEETrans.

AutomaticControl,vo

l.57,

no.7,pp.1815-1821,Ju

l.2012.

Astudyofthediscrete-tim

esw

itchedLQR

problem,W

.Zhang,J.

Hu,andA.Abate,Purdue

TechnicalRep

ortTR-ECE09-03.

83/117

Example

A1=

[2

10

1

]

,B1=

[1 1

]

,A2=

[2

10

0.5

]

,B2=

[1 2

]

Q1=

Q2=

I,R1=

R2=

1,Q

f=

I,N

=100

k|P

N−k|

12

24

35

45

55

65

84/117

Example

(cont.)

Optimal

switchingpolicy(G

rayregion

:mode1op

timal;Black

region

:mode2op

timal)

85/117

AnotherExample

A1=

[2

00

2]

,B1=

[1 2

]

A2=

[1.5

10

1.5

]

,B2=

[1 0]

Qσ=

I,Rσ=

1,N

=20

05

1015

2010

0

102

104

106

108

1010

1012

# of

ste

ps le

ft

Complexity

|Hk|

|Hk|

|Hǫ k|

360

14

1012

•Withou

tan

yreduction,complexity

grow

sexpon

entially

•Withreduction,complexity

saturatesat

360matrices

•Withrelaxation

(ε=

10−3),

complexity

saturatesat

14matrices

86/117

SLQR

Problem

withSwitch

ingCost

Costfunctionto

beminim

ized:

N−1

∑ k=0

xT kQ

σkxk+

uT kRσkuk+

w(σ

k,σ

k+1)(xk)

︸︷︷

︸

switch

ingcost

+

xT NQ

fxN

87/117

SLQR

Problem

withSwitch

ingCost

Costfunctionto

beminim

ized:

N−1

∑ k=0

xT kQ

σkxk+

uT kRσkuk+

w(σ

k,σ

k+1)(xk)

︸︷︷

︸

switch

ingcost

+

xT NQ

fxN

ValuefunctionVt(σ,x)istheop

timal

cost-to-go

startingfrom

xt=

x

withpreviousmodebeingσt−

1=

σ

Bellmanequation:

Vt(σ,x)=

min

σt=σ′

ut=v

[

xTQ

σ′x+

vTRσ′v+w(σ,σ

′)(x)+

Vt+

1(σ

′ ,Aσ′x+

Bσ′v)]

88/117

SLQR

Problem

withSwitch

ingCost

Costfunctionto

beminim

ized:

N−1

∑ k=0

xT kQ

σkxk+

uT kRσkuk+

w(σ

k,σ

k+1)(xk)

︸︷︷

︸

switch

ingcost

+

xT NQ

fxN

ValuefunctionVt(σ,x)istheop

timal

cost-to-go

startingfrom

xt=

x

withpreviousmodebeingσt−

1=

σ

Bellmanequation:

Vt(σ,x)=

min

σt=σ′

ut=v

[

xTQ

σ′x+

vTRσ′v+w(σ,σ

′)(x)+

Vt+

1(σ

′ ,Aσ′x+

Bσ′v)]

•Optimal

switchingpolicyσ∗ t(σ,x)an

dop

timal

control

u∗ t(σ,x)

•Bothdependon

previousmodeσan

dcurrentstatex

•Sam

etechniquewithpiecewisequad

raticvaluefunctionsifw(σ,σ

′)(·)

isquad

ratic,

linear,or

constan

t.89/117

Continuous-Tim

eLQR

Problem

Acontinuou

s-timelin

earsystem


x=

Ax+

Bu

Problem:findtheop

timal

control

inputu(t)over

thetimehorizon

[0,t

f]

that

minim

izes J

=

∫t f

0

(

xTQx+

uTRu)

dt+

x(t

f)T

Qfx(t

f)

•State

runningweigh

tQ

=Q

T�

0

•Con

trol

runningweigh

tR

=R

T≻

0

•Final

stateweigh

tQ

f=

QT f�

0

90/117

Continuous-Tim

eLQR

Problem

Valuefunctionattimet∈[0,t

f]:

Vt(x)=

min

u(s),

s∈[t,tf]

∫t f

t

[

x(s)T

Qx(s)T

+u(s)T

Ru(s)]

ds+

x(t

f)T

Qfx(t

f)

•Vt(x)istheop

timal

cost-to-go

attimetfrom

statex

•op

timal

cost

oftheoriginal

LQRprob

lem

isgivenby

V0(x

0)

•at

timet f,thevaluefunctionisquad

raticVt f(x)=

xTQ

fx

91/117

Continuous-Tim

eLQR

Problem

Valuefunctionattimet∈[0,t

f]:

Vt(x)=

min

u(s),

s∈[t,tf]

∫t f

t

[

x(s)T

Qx(s)T

+u(s)T

Ru(s)]

ds+

x(t

f)T

Qfx(t

f)

•Vt(x)istheop

timal

cost-to-go

attimetfrom

statex

•op

timal

cost

oftheoriginal

LQRprob

lem

isgivenby

V0(x

0)

•at


raticVt f(x)=

xTQ

fx

Asin

thediscrete-timecase,thevaluefunctioncanbeshow

nto

be

quad

raticat

anytime:

Vt(x)=

xTP(t)x,t∈[0,t

f]

92/117

AHeuristic

DerivationofValueFunctions

•Assumethat

thesystem

starts

from

xat

timet

x(t)=

x,t∈[0,t

f),

x∈Rn

•Assumethat

thecontrol

inputiskeptconstan

tforabriefδ-length

timehorizon

u(s)=

w,s∈[t,t

+δ]

•Assumethat

thevaluefunctionisquad

raticat

anytime:

Vt(x)=

xTP(t)x,t∈[0,t

f]

93/117

AHeuristic


Bellm

anequation:

Vt(x)

︸︷︷︸

cost-to-goattimet

≃min

u(·)≡

w

δ(x

TQx+

wTRw)

︸︷︷

︸

cost

during[t,t+

δ]

+Vt+

δ(x

+δ(A

x+

Bw))

︸︷︷

︸

cost-to-gofrom

timet+

δ

94/117

AHeuristic


Bellm

anequation:

Vt(x)≃

min

u(·)≡

w

[

δ(x

TQx+

wTRw)+

Vt+

δ(x

+δ(A

x+

Bw))]

Vt+

δ(x

+δ(A

x+

Bw))

=[x

+δ(A

x+

Bw)]TP(t

+δ)[x+

δ(A

x+

Bw)]

≃[x

+δ(A

x+

Bw)]T[P

(t)+

δP(t)][x

+δ(A

x+

Bw)]

≃xTP(t)x

+δ[

xTP(t)(Ax+Bw)+(A

x+

Bw)T

P(t)x

+xTP(t)x]

=Vt(x)+

δ[

xTP(t)(Ax+

Bw)+

(Ax+

Bw)T

P(t)x

+xTP(t)x]

95/117

AHeuristic


Bellm

anequation:

Vt(x)≃

min

u(·)≡

w

[

δ(x

TQx+

wTRw)+

Vt+

δ(x

+δ(A

x+

Bw))]

Vt+

δ(x

+δ(A

x+Bw))

≃Vt(x)+

δ[

xTP(t)(Ax+

Bw)+

(Ax+

Bw)T

P(t)x

+xTP(t)x]

96/117

AHeuristic


Bellm

anequation:

Vt(x)≃

min

u(·)≡

w

[

δ(x

TQx+

wTRw)+

Vt+

δ(x

+δ(A

x+

Bw))]

Vt+

δ(x

+δ(A

x+Bw))

≃Vt(x)+

δ[

xTP(t)(Ax+

Bw)+

(Ax+

Bw)T

P(t)x

+xTP(t)x]

Asδ→

0,Bellm

anequationbecom

esasym

ptotically:

0=

min

u(t)=

w

{xTQx+

wTRw

+xTP(t)(Ax+

Bw)+

(Ax+

Bw)T

P(t)x

+xTP(t)x}

97/117

AHeuristic


Theop

timal

control

law

attimetisthen:

u∗(t)=

argmin w

{xTQx+

wTRw

+xTP(t)(Ax+

Bw)

+(A

x+

Bw)T

P(t)x

+xTP(t)x}

=−R

−1B

TP(t)

︸︷︷

︸

K(t):

Kalm

angain

x

98/117

AHeuristic


Theop

timal

control

law

attimetisthen:

u∗(t)=

argmin w

{xTQx+

wTRw

+xTP(t)(Ax+

Bw)

+(A

x+

Bw)T

P(t)x

+xTP(t)x}

=−R

−1B

TP(t)

︸︷︷

︸

K(t):

Kalm

angain

x

Plugthisbackinto

theasym

ptoticversionof

theBellm

anequation:

0=

{xTQx+

wTRw

+xTP(t)(Ax+Bw)+

(Ax+

Bw)T

P(t)x

+xTP(t)x}

w=u∗ t

→0=

xT{Q

+P(t)A

+ATP(t)−

P(t)B

R−1B

TP(t)+

P(t)}x,

∀x

→−P(t)=

Q+

P(t)A

+ATP(t)−

P(t)B

R−1B

TP(t)

Initialcondition:P(t

f)=

Qfan

dintegrated

backw

ardin

timetilltime0 99/117

LQR

problem:Dynamic

ProgrammingSolution

Thevaluefunctionsarequad

ratic

Vt(x)=

xTP(t)x

withP(t)satisfyingtheRiccati(m

atrix)

differential

equation:

−P(t)=

Q+

P(t)A

+ATP(t)−P(t)B

R−1B

TP(t),

P(t

f)=

Qf

Theop

timal

control

isalin

earstatefeedbackcontroller:

u∗(t)=

−R

−1B

TP(t)x

100/117

LQR

SolutionAlgorithm

Set

P(t

f)=

Qf

Solve

thematricRiccatiequationbackw

ardin

time:

−P(t)=

Q+

P(t)A

+ATP(t)−P(t)B

R−1B

TP(t)

Return

V0(x

0)=

xT 0P(0)x

0as

theop

timal

cost

Set

x∗(0)=

x0

Recover

theop

timal

control

andtrajectory

forwardin

time

{

x∗(t)

=Ax∗(t)+

Bu∗(t)

u∗(t)

=K(t)x

∗(t)

,t∈[0,t

f],

whereK(t)istheKalman

gain

computedby

K(t)=

−R

−1B

TP(t)

101/117

Switch

edLQR

Problem

Acontinuou

s-timesw

itched

linearsystem


x=

Aσx+

Bσu

•continuou

sstate:

x(t)∈Rn

•discretestate(m

ode):σ(t)∈Σ

={1,2,...,M

}

102/117

Switch

edLQR

Problem

Acontinuou

s-timesw

itched

linearsystem


x=

Aσx+

Bσu

Problem:Findtheop

timal

modeσ(t)∈Σ

andinputu(t)over

thetime

horizon

[0,t

f]that

minim

izethecost

function

∫t f

0

(

xTQ

σx+

uTRσu)

dt+x(t

f)T

Qfx(t

f)

•State

runningweigh

tQ

σ=

QT σ�

0,σ∈Σ

•Con

trol

runningweigh

tRσ=

RT σ≻

0,σ∈Σ

•Final

stateweigh

tQ

f=

QT f�

0

•Nosw

itchingcost

103/117

Switch

edLQR

Problem

Acontinuou

s-timesw

itched

linearsystem


x=

Aσx+

Bσu

Problem:Findtheop

timal

modeσ(t)∈Σ

andinputu(t)over

thetime

horizon

[0,t

f]that

minim

izethecost

function

∫t f

0

(

xTQ

σx+

uTRσu)

dt+x(t

f)T

Qfx(t

f)

•State

runningweigh

tQ

σ=

QT σ�

0,σ∈Σ

•Con

trol

runningweigh

tRσ=

RT σ≻

0,σ∈Σ

•Final

stateweigh

tQ

f=

QT f�

0

•Nosw

itchingcost

Observations:

•In

differentmodes,bothdynam

icsan

drunningcostsaredifferent

•Ifmodesequence

isgiven,becom

esatime-varyingLQRprob

lem

•Mainchallenge

isdeterminingthemodesequence

104/117

Continuous-Tim

eSLQR

Problem

Valuefunctionat

timet∈[0,t

f]:

Vt(x)=

min

u(s),σ(s),s∈[t,tf]

{∫

t f

t

[x(s)T

Qσ(s)x(s)+

u(s)T

Rσ(s)u(s)]ds

+x(t

f)T

Qfx(t

f)}

•Vt(x)istheop

timal

cost-to-go

attimetfrom

statex

•valuefunctionindependentof

σdueto

theab

sence

ofsw

itchingcost

•op

timal

cost

oftheoriginal

SQLRprob

lem

isgivenby

V0(x

0)

•at


ratic:

Vt f(x)=

xTQ

fx

105/117

Continuous-Tim

eSLQR

Problem

Valuefunctionat

timet∈[0,t

f]:

Vt(x)=

min

u(s),σ(s),s∈[t,tf]

{∫

t f

t

[x(s)T

Qσ(s)x(s)+

u(s)T

Rσ(s)u(s)]ds

+x(t

f)T

Qfx(t

f)}

•Vt(x)istheop

timal

cost-to-go

attimetfrom

statex

•valuefunctionindependentof

σdueto

theab

sence

ofsw

itchingcost

•op

timal

cost

oftheoriginal

SQLRprob

lem

isgivenby

V0(x

0)

•at


ratic:

Vt f(x)=

xTQ

fx

Asin

thediscrete-timecase,thevaluefunctionat

anytimeisthe

minim

um

ofa(tim

e-varying)

setof

quad

raticfunctions:

Vt(x)=

inf

P∈P(t)xTPx

106/117


Toob

tain

amoretractable

optimal

control

prob

lem:

•em

bed

thesw

itched

system

inthelarger

family

x=

Aλx+

Bλu,

x(0)=

x0

whereAλ=

∑M i=

1λiA

ian

dBλ=

∑M i=

1λiB

iareparam

eterized

by

λ=

(λ1,...,λ

M)withλi≥

0,i=

1,...,M,

M∑ i=

1

λi=

1

→λ∈S,Sbeingasimplex.

When

λtakesvaluein

avertex

ofS,wegeton

eof

thedynam

ical

system

sam

ongwhichsw

itchingoccurs.For

instan

ce,ifλi=

1,then,

x=

Aix

+Biu

107/117


Toob

tain

amoretractable

optimal

control

prob

lem:

•reform

ulate

theop

timal

control

prob

lem

asfollows:

Findu(t)an

dλ(t),t∈[0,t

f],to

minim

ize

∫t f

0

(

xTQ

λx+

uTRλu)

dt+

x(t

f)T

Qfx(t

f)

subject

tox=

Aλx+Bλu,t∈[0,t

f]

x0fixed

whereQ

λ=

∑M i=

1λiQ

ian

dRλ=

∑M i=

1λiR

i

108/117


Toob

tain

amoretractable

optimal

control

prob

lem:

•reform

ulate

theop

timal

control

prob

lem

asfollows:

Findu(t)an

dλ(t),t∈[0,t

f],to

minim

ize

∫t f

0

(

xTQ

λx+

uTRλu)

dt+

x(t

f)T

Qfx(t

f)

subject

tox=

Aλx+Bλu,t∈[0,t

f]

x0fixed

whereQ

λ=

∑M i=

1λiQ

ian

dRλ=

∑M i=

1λiR

i

Iftheop

timal

λ(t),t∈[0,t

f],takesvalues

inthevertices

ofthesimplex

S,then,thesolution

isalso

optimal

fortheoriginal

switched

prob

lem,

otherwiseon

lyasubop

timal

solution

canbedetermined.

109/117


•Assumethat

thesystem

starts

from

xat

timet

x(t)=

x,t∈[0,t

f),

x∈Rn

•Assumethat

thecontrol

inputiskeptconstan

tforabriefδ-length

timehorizon

u(s)=

w,s∈[t,t

+δ]

•Assumethat

thevaluefunctionistheminim

um

ofa(tim

e-varying)

setof

quad

raticfunctions: Vt(x)=

inf

P∈P(t)xTPx

110/117


Bellm

anequation:

Vt(x)

︸︷︷︸

cost-to-goatt

≃min

w,λ

[

δ(x

TQ

λx+

wTRλw)

︸︷︷

︸

cost

during[t,t+

δ]

+Vt+

δ(x

+δ(A

λx+Bλw))

︸︷︷

︸

cost-to-gofrom

timet+

δ

]

111/117


Bellm

anequation:

Vt(x)

︸︷︷︸

cost-to-goatt

≃min

w,λ

[

δ(x

TQ

λx+

wTRλw)

︸︷︷

︸

cost

during[t,t+

δ]

+Vt+

δ(x

+δ(A

λx+Bλw))

︸︷︷

︸

cost-to-gofrom

timet+

δ

]

Notethat

since

Vt+

δ(x)=

inf

P(t+δ)∈

P(t+δ)xTP(t

+δ)x

weget V

t(x)≃

min

w,λ,P

(t+δ)∈

P(t+δ)

[

δ(x

TQ

λx+

wTRλw)+

(x+

δ(A

λx+

Bλw))

TP(t

+δ)(x+

δ(A

λx+Bλw))]

112/117


Expan

dP(t

+δ)∈P(t

+δ)as

P(t

+δ)≃

P(t)+δP(t)forsome

P(t)∈P(t)

113/117


Expan

dP(t

+δ)∈P(t

+δ)as

P(t

+δ)≃

P(t)+δP(t)forsome

P(t)∈P(t)

Let

δ→

0.TheBellm

anequationbecom

esasym

ptotically:

0=

min

w,λ,P

(t)∈

P(t)

{xTQ

λx+

wTRλw

+xTP(t)(Aλx+

Bλw)

+(A

λx+

Bλw)T

P(t)x

+xTP(t)x}

114/117


Expan

dP(t

+δ)∈P(t

+δ)as

P(t

+δ)≃

P(t)+δP(t)forsome

P(t)∈P(t)

Let

δ→

0.TheBellm

anequationbecom

esasym

ptotically:

0=

min

w,λ,P

(t)∈

P(t)

{xTQ

λx+

wTRλw

+xTP(t)(Aλx+

Bλw)

+(A

λx+

Bλw)T

P(t)x

+xTP(t)x}

Usingtheop

timal

control,thevaluefunctionisof

theform

Vt(x)=

inf

P∈P(t)xTPx

wherethesetP(t)satisfies

−P(t)∈{Q

λ+P(t)A

λ+

AT λP(t)−

P(t)B

λR

−1

λB

T λP(t):λ∈S}

∀P(t)∈P(t).

115/117

ValueFunctionsofC.-T.SLQR

Problem

ThevaluefunctionVt(x)isstill

oftheform

Vt(x)=

inf

P∈P(t)xTPx

P(t)canbecomputedfrom

theRiccatidifferential

inclusion

−P(t)∈{Q

λ+P(t)A

λ+

AT λP(t)−

P(t)B

λR

−1

λB

T λP(t):λ∈S}

whereAλ,B

λ,Q

λ,R

λisan

yconvexcombinationof

Ai,Bi,Q

i,Rifori∈Σ

116/117

ValueFunctionsofC.-T.SLQR

Problem

ThevaluefunctionVt(x)isstill

oftheform

Vt(x)=

inf

P∈P(t)xTPx

P(t)canbecomputedfrom

theRiccatidifferential

inclusion

−P(t)∈{Q

λ+P(t)A

λ+

AT λP(t)−

P(t)B

λR

−1

λB

T λP(t):λ∈S}

whereAλ,B

λ,Q

λ,R

λisan

yconvexcombinationof

Ai,Bi,Q

i,Rifori∈Σ

Ingeneral,P(t)isvery

diffi

cultto

compute

analytically

andnumerically

•DiscretizetheC.-T.SLSinto

D.-T.SLS

117/117

Documents

Discrete-Time Optimal Control Problem Hybrid …home.deib.polimi.it/prandini/file/2015_06_17 hybrid systems_2.pdf · Hybrid Systems Course: Switched Linear Quadratic Regulation 1/117