Upload
trinhcong
View
235
Download
0
Embed Size (px)
Citation preview
HybridSystemsCourse:
Switch
edLinear
QuadraticRegulation
1/117
Discrete-T
imeOptimalControlProblem
Adiscrete-timecontrolleddynam
ical
system
xk+1=
f(x
k,u
k),
k=
0,1,...,initialconditionx0
2/117
Discrete-T
imeOptimalControlProblem
Adiscrete-timecontrolleddynam
ical
system
xk+1=
f(x
k,u
k),
k=
0,1,...,initialconditionx0
Problem:Given
atimehorizon
[0,N
],findtheop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
N−1
∑ k=0
ℓ(xk,u
k)+
φ(x
N)
•Runningcost
ℓ(xk,u
k)≥
0
•Terminal
cost
φ(x
N)≥
0
3/117
Discrete-T
imeOptimalControlProblem
Adiscrete-timecontrolleddynam
ical
system
xk+1=
f(x
k,u
k),
k=
0,1,...,initialconditionx0
Problem:Given
atimehorizon
[0,N
],findtheop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
N−1
∑ k=0
ℓ(xk,u
k)+
φ(x
N)
•Runningcost
ℓ(xk,u
k)≥
0
•Terminal
cost
φ(x
N)≥
0
Extension
todiscrete-timehybridsystem
xk+1=
f(x
k,u
k,σ
k),
k=
0,1,...
4/117
Linear
QuadraticRegulation(L
QR)Problem
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
5/117
Linear
QuadraticRegulation(L
QR)Problem
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:findop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
N−1
∑ k=0
(
xT kQxk+
uT kRuk
)
︸︷︷
︸
runningcost
+xT NQ
fxN
︸︷︷
︸
term
inalcost
6/117
Linear
QuadraticRegulation(L
QR)Problem
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:findop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
N−1
∑ k=0
(
xT kQxk+
uT kRuk
)
︸︷︷
︸
runningcost
+xT NQ
fxN
︸︷︷
︸
term
inalcost
•State
weigh
tmatrixQ
=Q
T�
0
•Con
trol
weigh
tmatrixR
=R
T≻
0(nofree
control)
•Final
stateweigh
tmatrixQ
f=
QT f�
0
7/117
LQR
Problem:Motivation
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:findop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
N−1
∑ k=0
(
xT kQxk+
uT kRuk
)
+xT NQ
fxN
Com
prom
ises
betweentheconflictinggoals:
•minim
izeoverallcontrol
effort
•minim
izeoverallstatedeviation
from
0
Largercontrol
inputcandrive
thestateto
zero
faster
8/117
LQR
Problem:SpecialCases
Energyefficientstabilization:Q
=Q
f=
αI,
R=
βI
J(u)=
α
N∑ k=0
‖xk‖2
+β
N−1
∑ k=0
‖uk‖2
9/117
LQR
Problem:SpecialCases
Energyefficientstabilization:Q
=Q
f=
αI,
R=
βI
J(u)=
α
N∑ k=0
‖xk‖2
+β
N−1
∑ k=0
‖uk‖2
•Weigh
tsα,β
>0determinetheem
phasisbetweentwoob
jectives:
(i)statestayscloseto
0;(ii)use
less
control
energy
10/117
LQR
Problem:SpecialCases
Problem:findthecontrol
sequence
u=
(u0,...,u
N−1)withtheleast
energy
that
cansteerthesystem
statefrom
x0to
xN=
0
11/117
LQR
Problem:SpecialCases
Problem:findthecontrol
sequence
u=
(u0,...,u
N−1)withtheleast
energy
that
cansteerthesystem
statefrom
x0to
xN=
0
•Set
Q=
0since
wedonot
care
abou
tdeviation
from
0of
states
attimes
0,1,...,N
−1
•Chooseavery
largeαsince
thefinal
statexNneedsto
be0in
optimal
solution
12/117
LQR
Problem:SpecialCases
Problem:findthecontrol
sequence
u=
(u0,...,u
N−1)withtheleast
energy
that
cansteerthesystem
statefrom
x0to
xN=
0
•Set
Q=
0since
wedonot
care
abou
tdeviation
from
oof
states
attimes
0,1,...,N
−1
•Chooseavery
largeαsince
thefinal
statexNneedsto
be0in
optimal
solution
Minim
um
energysteeringto
0:Q
=0,
Qf=
αI,
R=
I
J(u)=
α‖x
N‖2
+
N−1
∑ k=0
‖uk‖2
13/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withoutputan
dgiveninitialconditionx0:
xk+1=
Axk+
Buk
yk=
Cxk
Problem:findop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
α
N∑ k=0
‖yk‖2
+β
N−1
∑ k=0
‖uk‖2
(α>
0,β>
0)
14/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withoutputan
dgiveninitialconditionx0:
xk+1=
Axk+
Buk
yk=
Cxk
Problem:findop
timal
inputsequence
u=
(u0,...,u
N−1)that
minim
izes
J(u)=
α
N∑ k=0
‖yk‖2
+β
N−1
∑ k=0
‖uk‖2
(α>
0,β>
0)
Asan
LQRprob
lem
•State
weigh
tmatrixQ
=αC
TC
�0
•Con
trol
weigh
tmatrixR
=βI≻
0
•Final
stateweigh
tmatrixQ
f=
Q=
αC
TC
�0
15/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:trackareference
statetrajectory
xr 0,x
r 1,...,x
r Nwitheffi
cient
control:
16/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:trackareference
statetrajectory
xr 0,x
r 1,...,x
r Nwitheffi
cient
control:
J(u)=
α
N∑ k=0
‖xk−xr k‖2
︸︷︷
︸
trackingerrorpen
alty
+β
N−1
∑ k=0
‖uk‖2
︸︷︷
︸
controlen
ergy
17/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:trackareference
statetrajectory
xr 0,x
r 1,...,x
r Nwitheffi
cient
control:
J(u)=
α
N∑ k=0
‖xk−
xr k‖2
+β
N−1
∑ k=0
‖uk‖2
Can
beform
ulatedas
a(tim
e-varying)
LQRprob
lem
18/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:trackareference
statetrajectory
xr 0,x
r 1,...,x
r Nwitheffi
cient
control:
J(u)=
α
N∑ k=0
‖xk−
xr k‖2
+β
N−1
∑ k=0
‖uk‖2
Can
beform
ulatedas
a(tim
e-varying)
LQRprob
lem
•Augm
entthestatexto
x=
[x z
]
withz∈R;letx0=
[x0 1
]
19/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:trackareference
statetrajectory
xr 0,x
r 1,...,x
r Nwitheffi
cient
control:
J(u)=
α
N∑ k=0
‖xk−
xr k‖2
+β
N−1
∑ k=0
‖uk‖2
Can
beform
ulatedas
a(tim
e-varying)
LQRprob
lem
•Augm
entthestatexto
x=
[x z
]
withz∈R;letx0=
[x0 1
]
•Augm
entedstatedynam
icsto
xk+1=
[A
00
1
]
xk+
[B 0
]
uk
20/117
LQR
Problem:SpecialCases
Adiscrete-timelin
earsystem
withgiveninitialconditionx0:
xk+1=
Axk+
Buk
Problem:trackareference
statetrajectory
xr 0,x
r 1,...,x
r Nwitheffi
cient
control:
J(u)=
α
N∑ k=0
‖xk−
xr k‖2
+β
N−1
∑ k=0
‖uk‖2
Can
beform
ulatedas
a(tim
e-varying)
LQRprob
lem
•Augm
entthestatexto
x=
[x z
]
withz∈R;letx0=
[x0 1
]
•Augm
entedstatedynam
icsto
xk+1=
[A
00
1
]
xk+
[B 0
]
uk
•ChooseQ
k=
α
[I
−(x
r k)T
][I
−xr k
],R
=βI,
Qf=
QN
21/117
Switch
edLQR
Problem
Adiscrete-timesw
itched
linearsystem
withgiveninitialconditionx0:
xk+1=
Aσkxk+
Bσkuk,
•continuou
sstate:
xk∈Rn
•discretestate(m
ode):σk∈Σ
={1,2,...,M
}
22/117
Switch
edLQR
Problem
Adiscrete-timesw
itched
linearsystem
withgiveninitialconditionx0:
xk+1=
Aσkxk+
Bσkuk,
Problem:Findtheop
timal
inputsequence
(u0,...,u
N−1)an
dmode
sequence
(σ0,...,σ
N−1)that
minim
izethecost
function
N−1
∑ k=0
(
xT kQ
σkxk+
uT kRσkuk
)
+xT NQ
fxN
•State
weigh
tan
dcontrol
weigh
tmatricesmodedependent
23/117
Switch
edLQR
Problem
Adiscrete-timesw
itched
linearsystem
withgiveninitialconditionx0:
xk+1=
Aσkxk+
Bσkuk,
Problem:Findtheop
timal
inputsequence
(u0,...,u
N−1)an
dmode
sequence
(σ0,...,σ
N−1)that
minim
izethecost
function
N−1
∑ k=0
(
xT kQ
σkxk+
uT kRσkuk
)
+xT NQ
fxN
•State
weigh
tan
dcontrol
weigh
tmatricesmodedependent
Observations:
•In
differentmodes,bothdynam
icsan
drunningcostsaredifferent
•Ifmodesequence
isgiven,becom
esatime-varyingLQRprob
lem
•Mainchallenge
isdeterminingthemodesequence
24/117
Example
Build
ingcoolingsystem
:
•Multiple
build
ingzones
•AirHan
dlin
gUnits(A
HUs)
State
variab
les:
•Zon
etemperatures,
humidity
Con
trols:
•AHU
dam
per
open/close
•Fan
pow
ers
Objectives:
•Maintain
comfort
•Reduce
energy
usage
CourtesyofJianghai
Hu,Purdue
University
25/117
Outline
•Solve
LQRprob
lem
usingdynam
icprogrammingmethod
•Extendthemethodto
solveSLQRprob
lem
•Com
plexity
reductiontechniques
26/117
Outline
•Solve
LQRprob
lem
usingdynam
icprogrammingmethod
•Extendthemethodto
solveSLQRprob
lem
•Com
plexity
reductiontechniques
Wefirstlook
attheLQRprob
lem:
Minim
ize
[N−1
∑ k=0
(
xT kQxk+
uT kRuk
)
+xT NQ
fxN
]
subject
toxk+1=
Axk+
Buk,k=
0,...,N
−1
x0fixed
27/117
Direct
Approach
:LQR
via
Least-squares
Thestatealon
gthetimehorizon
[0,N
]isalin
earfunctionof
uan
dx0:
x1
x2 . . . xN
︸︷︷
︸
x
=
B0
···
AB
B0
···
. . .. . .
. ..
AN−1B
AN−2B
···
B
︸︷︷
︸
G
u0
u1 . . .
uN−1
︸︷︷
︸
u
+
A A2 . . .
AN
︸︷︷
︸
H
x0 28/117
Direct
Approach
:LQR
via
Least-squares
Thestatealon
gthetimehorizon
[0,N
]isalin
earfunctionof
uan
dx0:
x1
x2 . . . xN
︸︷︷
︸
x
=
B0
···
AB
B0
···
. . .. . .
. ..
AN−1B
AN−2B
···
B
︸︷︷
︸
G
u0
u1 . . .
uN−1
︸︷︷
︸
u
+
A A2 . . .
AN
︸︷︷
︸
H
x0
Minim
izethefunction:
J(u)=
xT
Q
Q. ..
Qf
︸︷︷
︸
Q
x+uT
R
R. ..
R
︸︷︷
︸
R
u
29/117
Direct
Approach
:LQR
via
Least-squares
Thestatealon
gthetimehorizon
[0,N
]isalin
earfunctionof
uan
dx0:
x=
Gu+
Hx0
Minim
izethefunction:
J(u)=
xTQx+
uTRu
=(G
u+
Hx0)T
Q(G
u+
Hx0)+
uTRu
30/117
Direct
Approach
:LQR
via
Least-squares
Thestatealon
gthetimehorizon
[0,N
]isalin
earfunctionof
uan
dx0:
x=
Gu+
Hx0
Minim
izethefunction:
J(u)=
xTQx+
uTRu
=(G
u+
Hx0)T
Q(G
u+
Hx0)+
uTRu
=‖Q
1 2(G
u+
Hx0)‖
2+
‖R1 2u‖2
Thisisaleast-squares
prob
lem.
31/117
Direct
Approach
:LQR
via
Least-squares
Thestatealon
gthetimehorizon
[0,N
]isalin
earfunctionof
uan
dx0:
x=
Gu+
Hx0
Minim
izethefunction:
J(u)=
xTQx+
uTRu
=(G
u+
Hx0)T
Q(G
u+
Hx0)+
uTRu
=‖Q
1 2(G
u+
Hx0)‖
2+
‖R1 2u‖2
Thisisaleast-squares
prob
lem.
Theop
timal
control
is u∗=
−(R
+G
TQG)−
1G
TQHx0
32/117
Direct
Approach
:LQR
via
Least-squares
Lim
itationsofDirect
Approach
:
•Matrixinversionneeded
tofindop
timal
control
•Problem
(matrices)
dim
ension
increaseswithtimehorizon
N
•Im
practicalforlargeN
letalon
einfinitehorizon
case
•Sensitivity
ofsolution
sto
numerical
errors
33/117
Direct
Approach
:LQR
via
Least-squares
Lim
itationsofDirect
Approach
:
•Matrixinversionneeded
tofindop
timal
control
•Problem
(matrices)
dim
ension
increaseswithtimehorizon
N
•Im
practicalforlargeN
letalon
einfinitehorizon
case
•Sensitivity
ofsolution
sto
numerical
errors
Observations:
•Problem
easier
tosolveforshortertimehorizon
N
•(N
+1)-horizon
solution
relatedto
N-horizon
solution
•Exploitthisrelation
todesignan
iterativesolution
procedure
34/117
Direct
Approach
:LQR
via
Least-squares
Lim
itationsofDirect
Approach
:
•Matrixinversionneeded
tofindop
timal
control
•Problem
(matrices)
dim
ension
increaseswithtimehorizon
N
•Im
practicalforlargeN
letalon
einfinitehorizon
case
•Sensitivity
ofsolution
sto
numerical
errors
Observations:
•Problem
easier
tosolveforshortertimehorizon
N
•(N
+1)-horizon
solution
relatedto
N-horizon
solution
•Exploitthisrelation
todesignan
iterativesolution
procedure
Dyn
amic
programming:an
iterativeap
proach
that
can
•Re-use
resultsforsm
allerN
tosolveforlarger
Ncase
•In
each
iterationon
lyneedto
dealwithaprob
lem
offixedsize
35/117
Dynamic
ProgrammingApproach
Idea:Solve
asequence
ofop
timal
control
prob
lemsover
timehorizon
s[t,N
],fordecreasingt,
=N,N
−1,...,0
36/117
Dynamic
ProgrammingApproach
Idea:Solve
asequence
ofop
timal
control
prob
lemsover
timehorizon
s[t,N
],fordecreasingt,
=N,N
−1,...,0
•Valuefunctionat
timetistheop
timal
cost
over
[t,N
]:
Vt(x)=
min
ut,u
t+1,...,u
N−1
N−1
∑ k=t
(
xT kQxk+uT kRuk
)
+xT NQ
fxN
withtheinitialconditionxt=
x
37/117
Dynamic
ProgrammingApproach
Idea:Solve
asequence
ofop
timal
control
prob
lemsover
timehorizon
s[t,N
],fordecreasingt,
=N,N
−1,...,0
•Valuefunctionat
timetistheop
timal
cost
over
[t,N
]:
Vt(x)=
min
ut,u
t+1,...,u
N−1
N−1
∑ k=t
(
xT kQxk+uT kRuk
)
+xT NQ
fxN
withtheinitialconditionxt=
x
•Valuefunctionbackw
arditeration:
Vt−
1(·)canbecomputedbased
onVt(·)
38/117
Dynamic
ProgrammingApproach
Idea:Solve
asequence
ofop
timal
control
prob
lemsover
timehorizon
s[t,N
],fordecreasingt,
=N,N
−1,...,0
•Valuefunctionat
timetistheop
timal
cost
over
[t,N
]:
Vt(x)=
min
ut,u
t+1,...,u
N−1
N−1
∑ k=t
(
xT kQxk+uT kRuk
)
+xT NQ
fxN
withtheinitialconditionxt=
x
•Valuefunctionbackw
arditeration:
Vt−
1(·)canbecomputedbased
onVt(·)
•Optimal
cost
oforiginal
prob
lem
isV0(x
0)
•Optimal
inputsequence
canberecoveredfrom
valuefunctions
39/117
MotivatingExample
•Start
from
pointA
•Try
toreachpointB
•Eachstep
only
moverigh
t(→
N=
6)
•Costlabeled
oneach
edge
Problem:Pathfrom
Ato
Bwiththeleastcost?
40/117
MotivatingExample
•Start
from
pointA
•Try
toreachpointB
•Eachstep
only
moverigh
t(→
N=
6)
•Costlabeled
oneach
edge
Problem:Pathfrom
Ato
Bwiththeleastcost?
•For
ℓ-by-ℓ
grid,thetotalnumber
oflegalpathsis
(2ℓ)!
(ℓ!)2,whichgrow
s
fast
withℓ.
Inou
rcase
ℓ=
3,hence
totalnumber
oflegalpathis20.
41/117
ValueFunctions
Valuefunctionat
zistheleastpossible
cost
toreachB
from
z
42/117
ValueFunctions
Valuefunctionat
zistheleastpossible
cost
toreachB
from
z
Principle
ofOptimality:Ifaleast-cost
pathfrom
Ato
Bis
x∗ 0=
A→
x∗ 1→
x∗ 2→
···→
x∗ 6=
B,
then
anytruncation
ofit:
x∗ t→
x∗ t+
1→
···→
x∗ 6=
B
isalso
aleast-cost
pathfrom
x∗ tto
B
43/117
ValueFunctions
Valuefunctionat
zistheleastpossible
cost
toreachB
from
z
Principle
ofOptimality:Ifaleast-cost
pathfrom
Ato
Bis
x∗ 0=
A→
x∗ 1→
x∗ 2→
···→
x∗ 6=
B,
then
anytruncation
ofit:
x∗ t→
x∗ t+
1→
···→
x∗ 6=
B
isalso
aleast-cost
pathfrom
x∗ tto
B
Valuefunctionat
anypointzreached
attimet
satisfies
Vt(z)=
min{w
u+
Vt+
1(z
′ u),wd+
Vt+
1(z
′ d)}
Optimal
action
when
xt=
zistheon
eprovidingthe
minim
um
argu
mentan
dcanberecoveredfrom
Vt+
1(·)
44/117
ValueFunctionIteration:Results
45/117
ValueFunctionIteration:Results
46/117
ValueFunctionIteration:SomeObservations
Reducedcomputation
alcomplexity:forℓ-by-ℓ
grid
•Only
needto
compute
ℓ2valuefunctions
•Noneedto
enumerate
(2ℓ)!
(ℓ!)2paths
•Solve
anop
timizationprob
lem
offixedsize
ineach
iteration
47/117
ValueFunctionIteration:SomeObservations
Reducedcomputation
alcomplexity:forℓ-by-ℓ
grid
•Only
needto
compute
ℓ2valuefunctions
•Noneedto
enumerate
(2ℓ)!
(ℓ!)2paths
•Solve
anop
timizationprob
lem
offixedsize
ineach
iteration
Providesolution
sto
afamily
ofop
timal
control
prob
lems
•Evenifstartingfrom
adifferentinitialposition,thereisnoneedfor
re-com
putation
•Theinputisdetermined
asafunctionof
thecurrentstate(state
feedbackstatic
policy)
48/117
ValueFunctionIteration:SomeObservations
Reducedcomputation
alcomplexity:forℓ-by-ℓ
grid
•Only
needto
compute
ℓ2valuefunctions
•Noneedto
enumerate
(2ℓ)!
(ℓ!)2paths
•Solve
anop
timizationprob
lem
offixedsize
ineach
iteration
Providesolution
sto
afamily
ofop
timal
control
prob
lems
•Evenifstartingfrom
adifferentinitialposition,thereisnoneedfor
re-com
putation
•Theinputisdetermined
asafunctionof
thecurrentstate(state
feedbackstatic
policy)
Particularlysuitab
leformulti-stagedecisionprob
lemswhen
thenumber
ofcontrol
choicesissm
allat
each
stage
49/117
ValueFunctionsofLQR
Problem
Theva
luefunctionat
timet∈{0,1,...,N
}an
dstatex∈Rnis
Vt(x)=
min
ut,u
t+1,...,u
N−1
N−1
∑ k=t
(
xT kQxk+
uT kRuk
)
+xT NQ
fxN
50/117
ValueFunctionsofLQR
Problem
Theva
luefunctionat
timet∈{0,1,...,N
}an
dstatex∈Rnis
Vt(x)=
min
ut,u
t+1,...,u
N−1
N−1
∑ k=t
(
xT kQxk+
uT kRuk
)
+xT NQ
fxN
•Vt(x)istheop
timal
cost
oftheLQRprob
lem
within
ashortertime
horizon
(from
timetto
N),
startingfrom
theinitialconditionxt=
x
•V0(x
0)istheop
timal
cost
oftheoriginal
LQRprob
lem
51/117
LQR
problem:Dynamic
ProgrammingSolution
Bellmanequation:
Vt(x)
︸︷︷︸
cost-to-goforthecu
rren
tstate
=min
ut=v
[xTQx+
vTRv
︸︷︷
︸
curren
trunningcost
+Vt+
1(A
x+
Bv)
︸︷︷
︸
cost-to-goforthenextstate
]
=xTQx+
min
ut=v
[vTRv+
Vt+
1(A
x+
Bv)]
52/117
LQR
problem:Dynamic
ProgrammingSolution
Bellmanequation:
Vt(x)
︸︷︷︸
cost-to-goforthecu
rren
tstate
=min
ut=v
[xTQx+
vTRv
︸︷︷
︸
curren
trunningcost
+Vt+
1(A
x+
Bv)
︸︷︷
︸
cost-to-goforthenextstate
]
=xTQx+
min
ut=v
[vTRv+
Vt+
1(A
x+
Bv)]
Optimalco
ntrol:
canberecoveredfrom
thevaluefunctionsas
follows
u∗ t(x)=
argmin v
[vTRv+
Vt+
1(A
x+
Bv)]
(optimal
control
attimetwhen
xt=
x)
53/117
LQR
problem:Dynamic
ProgrammingSolution
•Valuefunctionat
timeN
isquad
ratic:
VN(x)=
xTQ
fx
54/117
LQR
problem:Dynamic
ProgrammingSolution
•Valuefunctionat
timeN
isquad
ratic:
VN(x)=
xTQ
fx
•SupposeVt+
1(x)=
xTPt+
1xisquad
ratic,
then
Vt(x)=
min v
[
xTQx+
vTRv+
Vt+
1(A
x+
Bv)]
=xTPtx
55/117
LQR
problem:Dynamic
ProgrammingSolution
BysettingVt+
1(x)=
xTPt+
1xin
theexpression
ofVt(x),
weget
Vt(x)=
min v
[
xTQx+
vTRv+
Vt+
1(A
x+
Bv)]
=min v
[
xTQx+
vTRv+
(Ax+
Bv)T
Pt+
1(A
x+
Bv)]
=min v
[
xTQx+
vT(R
+B
TPt+
1B)v
+2v
TB
TPt+
1Ax
+xTATPt+
1Ax]
56/117
LQR
problem:Dynamic
ProgrammingSolution
BysettingVt+
1(x)=
xTPt+
1xin
theexpression
ofVt(x),
weget
Vt(x)=
min v
[
xTQx+
vTRv+
Vt+
1(A
x+
Bv)]
=min v
[
xTQx+
vTRv+
(Ax+
Bv)T
Pt+
1(A
x+
Bv)]
=min v
[
xTQx+
vT(R
+B
TPt+
1B)v
+2v
TB
TPt+
1Ax
+xTATPt+
1Ax]
Minim
izer
isgivenby u∗ t=
−(R
+B
TPt+
1B)−
1B
TPt+
1Ax
57/117
LQR
problem:Dynamic
ProgrammingSolution
BysettingVt+
1(x)=
xTPt+
1xin
theexpression
ofVt(x),
weget
Vt(x)=
min v
[
xTQx+
vTRv+
Vt+
1(A
x+
Bv)]
=min v
[
xTQx+
vTRv+
(Ax+
Bv)T
Pt+
1(A
x+
Bv)]
=min v
[
xTQx+
vT(R
+B
TPt+
1B)v
+2v
TB
TPt+
1Ax
+xTATPt+
1Ax]
Minim
izer
isgivenby u∗ t=
−(R
+B
TPt+
1B)−
1B
TPt+
1Ax
Ifweplugitbackinto
theexpression
ofVt(x),
weob
tain
Vt(x)=
xT(Q
+ATPt+
1A−
ATPt+
1B(R
+B
TPt+
1B)−
1B
TPt+
1A)
︸︷︷
︸
Pt
x
58/117
LQR
problem:Dynamic
ProgrammingSolution
•Valuefunctionat
timeN
isquad
ratic:
VN(x)=
xTQ
fx
•SupposeVt+
1(x)=
xTPt+
1xisquad
ratic,
then
Vt(x)=
min v
[
xTQx+
vTRv+
Vt+
1(A
x+
Bv)]
=xTPtx
isquad
raticwithPtob
tained
from
Pt+
1by
Ricca
timapping:
Pt:=
Q+
ATPt+
1A−
ATPt+
1B(R
+B
TPt+
1B)−
1B
TPt+
1A
whichisachievedby
thelin
earstatefeedbackcontrol
u∗ t=
−(R
+B
TPt+
1B)−
1B
TPt+
1A
︸︷︷
︸
Kalm
angain
x=
−Ktx
59/117
LQR
problem:Dynamic
ProgrammingSolution
•Valuefunctionat
anytimeisquad
ratic(easynumeric
representation
)
•Optimal
control
isof
linearstatefeedbackform
withtime-varying
gains
•Yieldstheop
timal
solution
sforallinitialconditionsx0an
dallinitial
times
t 0∈{0,1,...,N
}simultan
eously
•Easily
extended
totime-varyingdynam
icsan
dcostscases
60/117
LQR
SolutionAlgorithm
Set
PN=
Qf
fort=
N−
1,N
−2,...,0do
Com
pute
thevaluefunctionsbackw
ardin
time:
Pt:=
Q+
ATPt+
1A−
ATPt+
1B(R
+B
TPt+
1B)−
1B
TPt+
1A
endfor
Return
V0(x
0)as
theop
timal
cost
Set
x∗ 0=
x0
fort=
0,1,...,N
−1do
Recover
theop
timal
control
andstatetrajectory
forwardin
time:
u∗ t=
−(R
+B
TPt+
1B)−
1B
TPt+
1Ax∗ t
x∗ t+
1=
Ax∗ t+
Bu∗ t
endfor
Return
u∗ tan
dx∗ tas
theop
timal
control
andstatesequences
61/117
Example
xk+1=
[1
10
1
]
xk+
[0 1
]
uk,
yk=
[1
0]xk=
Cxk,x0=
[1 0
]
Costfunction:J(U
)=
∑N−1
k=0‖u
k‖2
+ρ∑
N k=0‖y
k‖2
(N=
20)
62/117
Example
Optimal
control
isof
theform
u∗ t=
[at
bt
]x∗ t,t=
0,1,...,19
TheKalman
gainsatan
dbtrapidly
convergeto
someconstan
tvalues
63/117
Convergence
ofRicca
tiRecu
rsion
Theorem
aIf(A
,B)isstabilizable,then
Riccatirecursionwillconvergeto
asolution
PssoftheAlgebraic
Ricca
tiEquation(A
RE):
Pss=
Q+
ATPssA−
ATPssB(R
+B
TPssB)−
1B
TPssA
Iffurther
Q=
CTC
forsomeC
such
that(C
,A)isdetectable,then
Pssis
unique,
andunder
thesteady-state
optimalcontrolgain
Kss=
(R+
BTPssB)−
1B
TPssA,
theclosed-loopsystem
Acl=
A−BKss
isstable
aOnthediscrete
timematrix
Riccatiequationofoptimalcontrol,P.E.
Caines
andD.Q
.Mayne,
Int.
J.Control,vol.12,no.5,pp.785-794,1970.
64/117
ImportantPropertiesofRicca
tiMapping
TheRiccatimap
pingPt=
ρ(P
t+1)defined
by
Pt=
Q+
ATPt+
1A−
ATPt+
1B(R
+B
TPt+
1B)−
1B
TPt+
1A
isamap
pingρ:S+→
S+betweensetof
positivesemidefinitematrices
Proposition(M
onotonicity)
ForP,P
′∈S+withP
�P′ ,wehaveρ(P
)�
ρ(P
′ )
Proposition(C
onca
vity)
ForP,P
′∈S+andθ∈[0,1],ρ(θP+(1
−θ)P
′ )�
θρ(P
)+
(1−
θ)ρ(P
′ )
“Kalm
anfilteringwithinterm
ittentobservations,”Sinopoli,B.et
al,IEEE
Trans.
AutomaticControl,vo
l.49,no.9,pp.1453-1464,2004.
65/117
Back
toSwitch
edLQR
Problem
Adiscrete-timesw
itched
linearsystem
withgiveninitialconditionx0:
xk+1=
Aσkxk+
Bσkuk,
•continuou
sstate:
xk∈Rn
•discretestate(m
ode):σk∈Σ
={1,2,...,M
}
Problem:Findtheop
timal
inputsequence
(u0,...,u
N−1)an
dmode
sequence
(σ0,...,σ
N−1)that
minim
izethecost
function
N−1
∑ k=0
(
xT kQ
σkxk+
uT kRσkuk
)
+xT NQ
fxN
66/117
Back
toSwitch
edLQR
Problem
Findcontrol
sequence
u0,...,u
N−1an
dmodesequence
σ0,...,σ
N−1to
minim
ize
N−1
∑ k=0
(
xT kQ
σkxk+
uT kRσkuk
)
+xT NQ
fxN
subject
toxk+1=
Aσkxk+
Bσkuk,k=
0,...,N
−1
x0fixed
Valuefunctionat
each
t=
0,1,...,N
andxistheop
timal
cost
over
horizon
[t,N
]assumingxt=
x
Vt(x)=
min
σt,...,σ
N−1
ut,...,u
N−1
N−1
∑ k=t
(
xT kQ
σkxT k+
uT kRσkuk
)
+xT NQ
fxN
67/117
Back
toSwitch
edLQR
Problem
Observations:
•Solution
strategy:dynam
icprogramming
•In
each
step,needto
determineboththeop
timal
uan
dσ
•Valuefunctionnolongerquad
ratic
•V0(x
0)istheop
timal
cost
oftheoriginal
prob
lem
•ValuefunctionVt(x)does
not
dependon
modeσt−
1
•Hint:
nosw
itchingcost
Robust
optimalco
ntrol:
assumeσisnot
controllable
inf
usup
σJ(u,σ
)
isaco
nve
xprob
lem!
68/117
BellmanEquationofSLQR
Problem
Valuefunctionsat
differenttimes
arerelatedby
Vt(x)=
min
σt=σ
ut=v
[
xTQ
σx+
vTRσv+
Vt+
1(A
σx+
Bσv)]
Optimal
control
andmodearetheon
esachievingminim
um
above:
•Optimal
state-dependentsw
itchingpolicyσ∗ t(x)
•Optimal
statefeedbackcontrolleru∗ t(x)
Bad
new
s:valuefunctionsarein
general
not
quad
ratic
•VN(x)=
xTQ
fxisquad
ratic
•How
ever,fort=
N−
1,N
−2,...
69/117
t=
N−
1Case
VN−1(x)=
min σ
min v
[
xTQ
σx+
vTRσv+
VN(A
σx+
Bσv)]
︸︷︷
︸
Bellm
aneq
uationfortheLQR
problem
ofσ-thsubsystem
=min σ
[
xTρσ(Q
f)x]
ρσisRiccatimap
pingof
subsystem
(Aσ,B
σ)withweigh
tsQ
σ,R
σ
70/117
t=
N−
1Case
VN−1(x)ispointw
iseminim
um
ofanumber
ofquad
raticfunctions
→piecewisequad
ratic
VN−1(x)=
min
P∈PN−1
xTPx
wherePN−1=
{ρ1(Q
f),...,ρM(Q
f)}
:=ρM(Q
f)
•State
spacepartition
edinto
cones
(rad
ially
invarian
tminim
izer)
•Oneop
timal
modeforeach
cone
•Oneop
timal
linearstatefeedback
controllerforeach
cone
71/117
t=
N−
2Case
VN−2(x)
=min σ
min v
[
xTQ
σx+
vTRσv+
VN−1(A
σx+
Bσv)]
=min σ
min v
min
P∈PN−1
[
xTQ
σx+
vTRσv+
(Aσx+
Bσv)T
P(A
σx+
Bσv)]
=min σ
min
P∈PN−1
min v
[
xTQ
σx+
vTRσv+
(Aσx+
Bσv)T
P(A
σx+
Bσv)]
︸︷︷
︸
Bellm
aneq
uationfortheLQR
problem
ofσ-thsubsystem
=min σ
min
P∈PN−1
xTρσ(P
)x
Conclusion:valuefunctionVN−2(x)isthepointw
iseminim
um
ofM
2
quad
raticfunctions:
VN−2(x)=
min
P∈PN−2
xTPx
wherePN−2=
ρΣ(P
N−1)=
{ρσ(P
):P
∈PN−1,σ∈Σ}.
72/117
GeneraltCase
Ifat
t+
1,Vt+
1(x)=
min
P∈Pt+1xTPxforasetPt+
1of
p.s.d.matrices,
then
attimet,
thevaluefunctionisgivenby
Vt(x)=
min
P∈Pt
xTPx
wherePtisob
tained
from
Pt+
1by
switched
Riccatirecursion:
Pt=
ρΣ(P
t+1):=
∪σ∈Σρσ(P
t+1)
Sizeof
Ptisbiggerthan
Pt+
1:|P
t|=
M·|Pt+
1|
73/117
SLQR
SolutionAlgorithm
Set
PN=
{Qf}
fort=
N−
1,N
−2,...,0do
Com
pute
thesetof
p.s.d.matrices:
Pt=
ρM(P
t+1)
endfor
Return
V0(x
0)=
min
P∈P0xT 0Px0as
theop
timal
cost
Set
x∗ 0=
x0
fort=
0,1,...,N
−1do
Recover
theop
timal
modeσ∗ tan
dtheop
timal
control
u∗ tfrom
(σ∗ t,u
∗ t)=
argmin
σ,v
[
(x∗ t)T
Qσx∗ t+
vTRσv+
Vt+
1(A
σx∗ t+
Bσv)]
Let
x∗ t+
1=
Aσ∗ tx∗ t+
Bσ∗ tu∗ t
endfor
Return
σ∗ tan
du∗ tas
theop
timal
modean
dcontrol
sequences
74/117
ComplexityReduction
Issue:Number
ofmatricesin
Ptgrow
sexpon
entially
75/117
ComplexityReduction
Issue:Number
ofmatricesin
Ptgrow
sexpon
entially
InthesetPtdefiningthevaluefunctionVt(x)=
min
P∈PtxTPx
•MatrixP
∈Ptiscalledeff
ective
ifforat
leaston
ex6=
0
xTPx<
xTP′ x,
∀P′∈Pt\{P
}
•OtherwiseP
iscalledredundan
t
76/117
ComplexityReduction
Issue:Number
ofmatricesin
Ptgrow
sexpon
entially
InthesetPtdefiningthevaluefunctionVt(x)=
min
P∈PtxTPx
•MatrixP
∈Ptiscalledeff
ective
ifforat
leaston
ex6=
0
xTPx<
xTP′ x,
∀P′∈Pt\{P
}
•OtherwiseP
iscalledredundan
t
Redundan
tmatricescanbediscarded
withou
taff
ectingop
timal
solution
becau
seof
themon
oton
icityof
Riccatimap
ping
77/117
ComplexityReduction
Issue:Number
ofmatricesin
Ptgrow
sexpon
entially
InthesetPtdefiningthevaluefunctionVt(x)=
min
P∈PtxTPx
•MatrixP
∈Ptiscalledeff
ective
ifforat
leaston
ex6=
0
xTPx<
xTP′ x,
∀P′∈Pt\{P
}
•OtherwiseP
iscalledredundan
t
Redundan
tmatricescanbediscarded
withou
taff
ectingop
timal
solution
becau
seof
themon
oton
icityof
Riccatimap
ping
Sufficientco
nditionforP
∈Ptto
beredundan
t:
P�
aconvexcombinationof
P′∈Pt\{P
}
78/117
ComplexityReduction
Issue:Number
ofmatricesin
Ptgrow
sexpon
entially
InthesetPtdefiningthevaluefunctionVt(x)=
min
P∈PtxTPx
•MatrixP
∈Ptiscalledeff
ective
ifforat
leaston
ex6=
0
xTPx<
xTP′ x,
∀P′∈Pt\{P
}
•OtherwiseP
iscalledredundan
t
Redundan
tmatricescanbediscarded
withou
taff
ectingop
timal
solution
becau
seof
themon
oton
icityof
Riccatimap
ping
Sufficientco
nditionforP
∈Ptto
beredundan
t:
P�
aconvexcombinationof
P′∈Pt\{P
}
Proof: xTPx≥
∑
Pi∈
Pt\{P}
αix
TPix
≥xTPjx,
forsomePj∈Pt\{P
}
79/117
ComplexityReduction
Issue:Number
ofmatricesin
Ptgrow
sexpon
entially
InthesetPtdefiningthevaluefunctionVt(x)=
min
P∈PtxTPx
•MatrixP
∈Ptiscalledeff
ective
ifforat
leaston
ex6=
0
xTPx<
xTP′ x,
∀P′∈Pt\{P
}
•OtherwiseP
iscalledredundan
t
Redundan
tmatricescanbediscarded
withou
taff
ectingop
timal
solution
becau
seof
themon
oton
icityof
Riccatimap
ping
Sufficientco
nditionforP
∈Ptto
beredundan
t:
P�
aconvexcombinationof
P′∈Pt\{P
}(1)
Proof: xTPx≥
∑
Pi∈
Pt\{P}
αix
TPix
≥xTPjx,
forsomePj∈Pt\{P
}
LMIfeasibility
conditionto
test.
80/117
Example
ofIneffectiveMatrices
Asw
itched
LQRprob
lem
specified
by
A1=
[2
00
2]
,B1=
[1 2
]
,A2=
[1.5
10
1.5
]
,B2=
[1 0]
Qσ=
I,Rσ=
1,N
=20
00.5
11.5
22.5
33.5
05
10
15
20
25
30
16matricesin
P16
evaluated
alon
ghalfof
the
unitcircle
becau
seof
the
radialinvarian
ce
81/117
DecisionTreePruning
82/117
FurtherReductionbyRelaxation
Rem
ovemorematricesby
relaxingCon
dition(1)to
P�
∑ i∈I
αiP
i−
εI(2)
•ε>
0isasm
allconstan
tspecifyingap
proxim
ationquality
•Evenasm
allεcould
resultin
sign
ificantreductionin
complexity
Infinitehorizonsw
itchedLQR
problemsin
discrete
time:A
suboptimalalgorithm
with
perform
anceanalysis,
W.Zhang,J.
HuandA.Abate,IEEETrans.
AutomaticControl,vo
l.57,
no.7,pp.1815-1821,Ju
l.2012.
Astudyofthediscrete-tim
esw
itchedLQR
problem,W
.Zhang,J.
Hu,andA.Abate,Purdue
TechnicalRep
ortTR-ECE09-03.
83/117
Example
A1=
[2
10
1
]
,B1=
[1 1
]
,A2=
[2
10
0.5
]
,B2=
[1 2
]
Q1=
Q2=
I,R1=
R2=
1,Q
f=
I,N
=100
k|P
N−k|
12
24
35
45
55
65
84/117
Example
(cont.)
Optimal
switchingpolicy(G
rayregion
:mode1op
timal;Black
region
:mode2op
timal)
85/117
AnotherExample
A1=
[2
00
2]
,B1=
[1 2
]
A2=
[1.5
10
1.5
]
,B2=
[1 0]
Qσ=
I,Rσ=
1,N
=20
05
1015
2010
0
102
104
106
108
1010
1012
# of
ste
ps le
ft
Complexity
|Hk|
|Hk|
|Hǫ k|
360
14
1012
•Withou
tan
yreduction,complexity
grow
sexpon
entially
•Withreduction,complexity
saturatesat
360matrices
•Withrelaxation
(ε=
10−3),
complexity
saturatesat
14matrices
86/117
SLQR
Problem
withSwitch
ingCost
Costfunctionto
beminim
ized:
N−1
∑ k=0
xT kQ
σkxk+
uT kRσkuk+
w(σ
k,σ
k+1)(xk)
︸︷︷
︸
switch
ingcost
+
xT NQ
fxN
87/117
SLQR
Problem
withSwitch
ingCost
Costfunctionto
beminim
ized:
N−1
∑ k=0
xT kQ
σkxk+
uT kRσkuk+
w(σ
k,σ
k+1)(xk)
︸︷︷
︸
switch
ingcost
+
xT NQ
fxN
ValuefunctionVt(σ,x)istheop
timal
cost-to-go
startingfrom
xt=
x
withpreviousmodebeingσt−
1=
σ
Bellmanequation:
Vt(σ,x)=
min
σt=σ′
ut=v
[
xTQ
σ′x+
vTRσ′v+w(σ,σ
′)(x)+
Vt+
1(σ
′ ,Aσ′x+
Bσ′v)]
88/117
SLQR
Problem
withSwitch
ingCost
Costfunctionto
beminim
ized:
N−1
∑ k=0
xT kQ
σkxk+
uT kRσkuk+
w(σ
k,σ
k+1)(xk)
︸︷︷
︸
switch
ingcost
+
xT NQ
fxN
ValuefunctionVt(σ,x)istheop
timal
cost-to-go
startingfrom
xt=
x
withpreviousmodebeingσt−
1=
σ
Bellmanequation:
Vt(σ,x)=
min
σt=σ′
ut=v
[
xTQ
σ′x+
vTRσ′v+w(σ,σ
′)(x)+
Vt+
1(σ
′ ,Aσ′x+
Bσ′v)]
•Optimal
switchingpolicyσ∗ t(σ,x)an
dop
timal
control
u∗ t(σ,x)
•Bothdependon
previousmodeσan
dcurrentstatex
•Sam
etechniquewithpiecewisequad
raticvaluefunctionsifw(σ,σ
′)(·)
isquad
ratic,
linear,or
constan
t.89/117
Continuous-Tim
eLQR
Problem
Acontinuou
s-timelin
earsystem
withgiveninitialconditionx0:
x=
Ax+
Bu
Problem:findtheop
timal
control
inputu(t)over
thetimehorizon
[0,t
f]
that
minim
izes J
=
∫t f
0
(
xTQx+
uTRu)
dt+
x(t
f)T
Qfx(t
f)
•State
runningweigh
tQ
=Q
T�
0
•Con
trol
runningweigh
tR
=R
T≻
0
•Final
stateweigh
tQ
f=
QT f�
0
90/117
Continuous-Tim
eLQR
Problem
Valuefunctionattimet∈[0,t
f]:
Vt(x)=
min
u(s),
s∈[t,tf]
∫t f
t
[
x(s)T
Qx(s)T
+u(s)T
Ru(s)]
ds+
x(t
f)T
Qfx(t
f)
•Vt(x)istheop
timal
cost-to-go
attimetfrom
statex
•op
timal
cost
oftheoriginal
LQRprob
lem
isgivenby
V0(x
0)
•at
timet f,thevaluefunctionisquad
raticVt f(x)=
xTQ
fx
91/117
Continuous-Tim
eLQR
Problem
Valuefunctionattimet∈[0,t
f]:
Vt(x)=
min
u(s),
s∈[t,tf]
∫t f
t
[
x(s)T
Qx(s)T
+u(s)T
Ru(s)]
ds+
x(t
f)T
Qfx(t
f)
•Vt(x)istheop
timal
cost-to-go
attimetfrom
statex
•op
timal
cost
oftheoriginal
LQRprob
lem
isgivenby
V0(x
0)
•at
timet f,thevaluefunctionisquad
raticVt f(x)=
xTQ
fx
Asin
thediscrete-timecase,thevaluefunctioncanbeshow
nto
be
quad
raticat
anytime:
Vt(x)=
xTP(t)x,t∈[0,t
f]
92/117
AHeuristic
DerivationofValueFunctions
•Assumethat
thesystem
starts
from
xat
timet
x(t)=
x,t∈[0,t
f),
x∈Rn
•Assumethat
thecontrol
inputiskeptconstan
tforabriefδ-length
timehorizon
u(s)=
w,s∈[t,t
+δ]
•Assumethat
thevaluefunctionisquad
raticat
anytime:
Vt(x)=
xTP(t)x,t∈[0,t
f]
93/117
AHeuristic
DerivationofValueFunctions
Bellm
anequation:
Vt(x)
︸︷︷︸
cost-to-goattimet
≃min
u(·)≡
w
δ(x
TQx+
wTRw)
︸︷︷
︸
cost
during[t,t+
δ]
+Vt+
δ(x
+δ(A
x+
Bw))
︸︷︷
︸
cost-to-gofrom
timet+
δ
94/117
AHeuristic
DerivationofValueFunctions
Bellm
anequation:
Vt(x)≃
min
u(·)≡
w
[
δ(x
TQx+
wTRw)+
Vt+
δ(x
+δ(A
x+
Bw))]
Vt+
δ(x
+δ(A
x+
Bw))
=[x
+δ(A
x+
Bw)]TP(t
+δ)[x+
δ(A
x+
Bw)]
≃[x
+δ(A
x+
Bw)]T[P
(t)+
δP(t)][x
+δ(A
x+
Bw)]
≃xTP(t)x
+δ[
xTP(t)(Ax+Bw)+(A
x+
Bw)T
P(t)x
+xTP(t)x]
=Vt(x)+
δ[
xTP(t)(Ax+
Bw)+
(Ax+
Bw)T
P(t)x
+xTP(t)x]
95/117
AHeuristic
DerivationofValueFunctions
Bellm
anequation:
Vt(x)≃
min
u(·)≡
w
[
δ(x
TQx+
wTRw)+
Vt+
δ(x
+δ(A
x+
Bw))]
Vt+
δ(x
+δ(A
x+Bw))
≃Vt(x)+
δ[
xTP(t)(Ax+
Bw)+
(Ax+
Bw)T
P(t)x
+xTP(t)x]
96/117
AHeuristic
DerivationofValueFunctions
Bellm
anequation:
Vt(x)≃
min
u(·)≡
w
[
δ(x
TQx+
wTRw)+
Vt+
δ(x
+δ(A
x+
Bw))]
Vt+
δ(x
+δ(A
x+Bw))
≃Vt(x)+
δ[
xTP(t)(Ax+
Bw)+
(Ax+
Bw)T
P(t)x
+xTP(t)x]
Asδ→
0,Bellm
anequationbecom
esasym
ptotically:
0=
min
u(t)=
w
{xTQx+
wTRw
+xTP(t)(Ax+
Bw)+
(Ax+
Bw)T
P(t)x
+xTP(t)x}
97/117
AHeuristic
DerivationofValueFunctions
Theop
timal
control
law
attimetisthen:
u∗(t)=
argmin w
{xTQx+
wTRw
+xTP(t)(Ax+
Bw)
+(A
x+
Bw)T
P(t)x
+xTP(t)x}
=−R
−1B
TP(t)
︸︷︷
︸
K(t):
Kalm
angain
x
98/117
AHeuristic
DerivationofValueFunctions
Theop
timal
control
law
attimetisthen:
u∗(t)=
argmin w
{xTQx+
wTRw
+xTP(t)(Ax+
Bw)
+(A
x+
Bw)T
P(t)x
+xTP(t)x}
=−R
−1B
TP(t)
︸︷︷
︸
K(t):
Kalm
angain
x
Plugthisbackinto
theasym
ptoticversionof
theBellm
anequation:
0=
{xTQx+
wTRw
+xTP(t)(Ax+Bw)+
(Ax+
Bw)T
P(t)x
+xTP(t)x}
w=u∗ t
→0=
xT{Q
+P(t)A
+ATP(t)−
P(t)B
R−1B
TP(t)+
P(t)}x,
∀x
→−P(t)=
Q+
P(t)A
+ATP(t)−
P(t)B
R−1B
TP(t)
Initialcondition:P(t
f)=
Qfan
dintegrated
backw
ardin
timetilltime0 99/117
LQR
problem:Dynamic
ProgrammingSolution
Thevaluefunctionsarequad
ratic
Vt(x)=
xTP(t)x
withP(t)satisfyingtheRiccati(m
atrix)
differential
equation:
−P(t)=
Q+
P(t)A
+ATP(t)−P(t)B
R−1B
TP(t),
P(t
f)=
Qf
Theop
timal
control
isalin
earstatefeedbackcontroller:
u∗(t)=
−R
−1B
TP(t)x
100/117
LQR
SolutionAlgorithm
Set
P(t
f)=
Qf
Solve
thematricRiccatiequationbackw
ardin
time:
−P(t)=
Q+
P(t)A
+ATP(t)−P(t)B
R−1B
TP(t)
Return
V0(x
0)=
xT 0P(0)x
0as
theop
timal
cost
Set
x∗(0)=
x0
Recover
theop
timal
control
andtrajectory
forwardin
time
{
x∗(t)
=Ax∗(t)+
Bu∗(t)
u∗(t)
=K(t)x
∗(t)
,t∈[0,t
f],
whereK(t)istheKalman
gain
computedby
K(t)=
−R
−1B
TP(t)
101/117
Switch
edLQR
Problem
Acontinuou
s-timesw
itched
linearsystem
withgiveninitialconditionx0:
x=
Aσx+
Bσu
•continuou
sstate:
x(t)∈Rn
•discretestate(m
ode):σ(t)∈Σ
={1,2,...,M
}
102/117
Switch
edLQR
Problem
Acontinuou
s-timesw
itched
linearsystem
withgiveninitialconditionx0:
x=
Aσx+
Bσu
Problem:Findtheop
timal
modeσ(t)∈Σ
andinputu(t)over
thetime
horizon
[0,t
f]that
minim
izethecost
function
∫t f
0
(
xTQ
σx+
uTRσu)
dt+x(t
f)T
Qfx(t
f)
•State
runningweigh
tQ
σ=
QT σ�
0,σ∈Σ
•Con
trol
runningweigh
tRσ=
RT σ≻
0,σ∈Σ
•Final
stateweigh
tQ
f=
QT f�
0
•Nosw
itchingcost
103/117
Switch
edLQR
Problem
Acontinuou
s-timesw
itched
linearsystem
withgiveninitialconditionx0:
x=
Aσx+
Bσu
Problem:Findtheop
timal
modeσ(t)∈Σ
andinputu(t)over
thetime
horizon
[0,t
f]that
minim
izethecost
function
∫t f
0
(
xTQ
σx+
uTRσu)
dt+x(t
f)T
Qfx(t
f)
•State
runningweigh
tQ
σ=
QT σ�
0,σ∈Σ
•Con
trol
runningweigh
tRσ=
RT σ≻
0,σ∈Σ
•Final
stateweigh
tQ
f=
QT f�
0
•Nosw
itchingcost
Observations:
•In
differentmodes,bothdynam
icsan
drunningcostsaredifferent
•Ifmodesequence
isgiven,becom
esatime-varyingLQRprob
lem
•Mainchallenge
isdeterminingthemodesequence
104/117
Continuous-Tim
eSLQR
Problem
Valuefunctionat
timet∈[0,t
f]:
Vt(x)=
min
u(s),σ(s),s∈[t,tf]
{∫
t f
t
[x(s)T
Qσ(s)x(s)+
u(s)T
Rσ(s)u(s)]ds
+x(t
f)T
Qfx(t
f)}
•Vt(x)istheop
timal
cost-to-go
attimetfrom
statex
•valuefunctionindependentof
σdueto
theab
sence
ofsw
itchingcost
•op
timal
cost
oftheoriginal
SQLRprob
lem
isgivenby
V0(x
0)
•at
timet f,thevaluefunctionisquad
ratic:
Vt f(x)=
xTQ
fx
105/117
Continuous-Tim
eSLQR
Problem
Valuefunctionat
timet∈[0,t
f]:
Vt(x)=
min
u(s),σ(s),s∈[t,tf]
{∫
t f
t
[x(s)T
Qσ(s)x(s)+
u(s)T
Rσ(s)u(s)]ds
+x(t
f)T
Qfx(t
f)}
•Vt(x)istheop
timal
cost-to-go
attimetfrom
statex
•valuefunctionindependentof
σdueto
theab
sence
ofsw
itchingcost
•op
timal
cost
oftheoriginal
SQLRprob
lem
isgivenby
V0(x
0)
•at
timet f,thevaluefunctionisquad
ratic:
Vt f(x)=
xTQ
fx
Asin
thediscrete-timecase,thevaluefunctionat
anytimeisthe
minim
um
ofa(tim
e-varying)
setof
quad
raticfunctions:
Vt(x)=
inf
P∈P(t)xTPx
106/117
DerivationofValueFunctions
Toob
tain
amoretractable
optimal
control
prob
lem:
•em
bed
thesw
itched
system
inthelarger
family
x=
Aλx+
Bλu,
x(0)=
x0
whereAλ=
∑M i=
1λiA
ian
dBλ=
∑M i=
1λiB
iareparam
eterized
by
λ=
(λ1,...,λ
M)withλi≥
0,i=
1,...,M,
M∑ i=
1
λi=
1
→λ∈S,Sbeingasimplex.
When
λtakesvaluein
avertex
ofS,wegeton
eof
thedynam
ical
system
sam
ongwhichsw
itchingoccurs.For
instan
ce,ifλi=
1,then,
x=
Aix
+Biu
107/117
DerivationofValueFunctions
Toob
tain
amoretractable
optimal
control
prob
lem:
•reform
ulate
theop
timal
control
prob
lem
asfollows:
Findu(t)an
dλ(t),t∈[0,t
f],to
minim
ize
∫t f
0
(
xTQ
λx+
uTRλu)
dt+
x(t
f)T
Qfx(t
f)
subject
tox=
Aλx+Bλu,t∈[0,t
f]
x0fixed
whereQ
λ=
∑M i=
1λiQ
ian
dRλ=
∑M i=
1λiR
i
108/117
DerivationofValueFunctions
Toob
tain
amoretractable
optimal
control
prob
lem:
•reform
ulate
theop
timal
control
prob
lem
asfollows:
Findu(t)an
dλ(t),t∈[0,t
f],to
minim
ize
∫t f
0
(
xTQ
λx+
uTRλu)
dt+
x(t
f)T
Qfx(t
f)
subject
tox=
Aλx+Bλu,t∈[0,t
f]
x0fixed
whereQ
λ=
∑M i=
1λiQ
ian
dRλ=
∑M i=
1λiR
i
Iftheop
timal
λ(t),t∈[0,t
f],takesvalues
inthevertices
ofthesimplex
S,then,thesolution
isalso
optimal
fortheoriginal
switched
prob
lem,
otherwiseon
lyasubop
timal
solution
canbedetermined.
109/117
DerivationofValueFunctions
•Assumethat
thesystem
starts
from
xat
timet
x(t)=
x,t∈[0,t
f),
x∈Rn
•Assumethat
thecontrol
inputiskeptconstan
tforabriefδ-length
timehorizon
u(s)=
w,s∈[t,t
+δ]
•Assumethat
thevaluefunctionistheminim
um
ofa(tim
e-varying)
setof
quad
raticfunctions: Vt(x)=
inf
P∈P(t)xTPx
110/117
DerivationofValueFunctions
Bellm
anequation:
Vt(x)
︸︷︷︸
cost-to-goatt
≃min
w,λ
[
δ(x
TQ
λx+
wTRλw)
︸︷︷
︸
cost
during[t,t+
δ]
+Vt+
δ(x
+δ(A
λx+Bλw))
︸︷︷
︸
cost-to-gofrom
timet+
δ
]
111/117
DerivationofValueFunctions
Bellm
anequation:
Vt(x)
︸︷︷︸
cost-to-goatt
≃min
w,λ
[
δ(x
TQ
λx+
wTRλw)
︸︷︷
︸
cost
during[t,t+
δ]
+Vt+
δ(x
+δ(A
λx+Bλw))
︸︷︷
︸
cost-to-gofrom
timet+
δ
]
Notethat
since
Vt+
δ(x)=
inf
P(t+δ)∈
P(t+δ)xTP(t
+δ)x
weget V
t(x)≃
min
w,λ,P
(t+δ)∈
P(t+δ)
[
δ(x
TQ
λx+
wTRλw)+
(x+
δ(A
λx+
Bλw))
TP(t
+δ)(x+
δ(A
λx+Bλw))]
112/117
DerivationofValueFunctions
Expan
dP(t
+δ)∈P(t
+δ)as
P(t
+δ)≃
P(t)+δP(t)forsome
P(t)∈P(t)
113/117
DerivationofValueFunctions
Expan
dP(t
+δ)∈P(t
+δ)as
P(t
+δ)≃
P(t)+δP(t)forsome
P(t)∈P(t)
Let
δ→
0.TheBellm
anequationbecom
esasym
ptotically:
0=
min
w,λ,P
(t)∈
P(t)
{xTQ
λx+
wTRλw
+xTP(t)(Aλx+
Bλw)
+(A
λx+
Bλw)T
P(t)x
+xTP(t)x}
114/117
DerivationofValueFunctions
Expan
dP(t
+δ)∈P(t
+δ)as
P(t
+δ)≃
P(t)+δP(t)forsome
P(t)∈P(t)
Let
δ→
0.TheBellm
anequationbecom
esasym
ptotically:
0=
min
w,λ,P
(t)∈
P(t)
{xTQ
λx+
wTRλw
+xTP(t)(Aλx+
Bλw)
+(A
λx+
Bλw)T
P(t)x
+xTP(t)x}
Usingtheop
timal
control,thevaluefunctionisof
theform
Vt(x)=
inf
P∈P(t)xTPx
wherethesetP(t)satisfies
−P(t)∈{Q
λ+P(t)A
λ+
AT λP(t)−
P(t)B
λR
−1
λB
T λP(t):λ∈S}
∀P(t)∈P(t).
115/117
ValueFunctionsofC.-T.SLQR
Problem
ThevaluefunctionVt(x)isstill
oftheform
Vt(x)=
inf
P∈P(t)xTPx
P(t)canbecomputedfrom
theRiccatidifferential
inclusion
−P(t)∈{Q
λ+P(t)A
λ+
AT λP(t)−
P(t)B
λR
−1
λB
T λP(t):λ∈S}
whereAλ,B
λ,Q
λ,R
λisan
yconvexcombinationof
Ai,Bi,Q
i,Rifori∈Σ
116/117
ValueFunctionsofC.-T.SLQR
Problem
ThevaluefunctionVt(x)isstill
oftheform
Vt(x)=
inf
P∈P(t)xTPx
P(t)canbecomputedfrom
theRiccatidifferential
inclusion
−P(t)∈{Q
λ+P(t)A
λ+
AT λP(t)−
P(t)B
λR
−1
λB
T λP(t):λ∈S}
whereAλ,B
λ,Q
λ,R
λisan
yconvexcombinationof
Ai,Bi,Q
i,Rifori∈Σ
Ingeneral,P(t)isvery
diffi
cultto
compute
analytically
andnumerically
•DiscretizetheC.-T.SLSinto
D.-T.SLS
117/117